Filtering in Finance

Alireza Javaheri, RBC Capital Markets Delphine Lautier, Université Paris IX, Ecole Nationale Supérieure des Mines de Paris Alain Galli, Ecole Nationale Supérieure des Mines de Paris

Abstract (EKF) as well as the Unscented Kalman Filter (UKF) similar to Kushner’s Nonlinear In this article we present an introduction to various Filtering algorithms and some of Filter. We also tackle the subject of Non-Gaussian filters and describe the Particle their applications to the world of Quantitative Finance. We shall first mention the Filtering (PF) algorithm. Lastly, we will apply the filters to the term structure model of fundamental case of Gaussian noises where we obtain the well-known Kalman Filter. commodity prices and the stochastic model. Because of common nonlinearities, we will be discussing the Extended Kalman Filter 1 Filtering Further, we shall provide a mean to estimate the model parameters via the maximization of the likelihood function. The concept of filtering has long been used in Control Engineering and . Filtering is an iterative process that enables us to esti- 1.1 The Simple and Extended Kalman Filters mate a model’s parameters when the latter relies upon a large quantity 1.1.1 Background and Notations of observable and unobservable data. The Kalman Filter is fast and easy In this section we describe both the traditional Kalman Filter used for lin- to implement, despite the length and noisiness of the input data. ear systems and its extension to nonlinear systems known as the Extended We suppose we have a temporal time-series of observable data z k (e.g. Kalman Filter (EKF). The latter is based upon a first order linearization of stock prices as in Javaheri (2002), Wells (1996), interest rates as in Babbs the transition and measurement equations and therefore would coincide and Nowman (1999), Pennacchi (1991), futures prices as in Lautier (2000), with the traditional filter when the equations are linear. For a detailed Lautier and Galli (2000)) and a model using some unobservable time- introduction, see Harvey (1989) or Welch and Bishop (2002).

series x k (e.g. volatility, correlation, convenience yield) where the index k Given a dynamic process x k following a transition equation corresponds to the time-step. This will allow us to construct an algo- = rithm containing a transition equation linking two consecutive unob- x k f(x k−1, wk) (1) servable states, and a measurement equation relating the observed data we suppose we have a measurement z such that to this hidden state. k

The idea is to proceed in two steps: first we estimate the hidden state z k = h(x k, u k) (2) a priori by using all the information prior to that time-step. Then using

this predicted value together with the new observation, we obtain a con- where wk and uk are two mutually-uncorrelated sequences of temporally- ditional a posteriori estimation of the state. uncorrelated Normal random-variables with zero means and covariance 4 In what follows we shall first tackle linear and nonlinear equations with matrices Q k, R k respectively . Moreover, wk is uncorrelated with x k−1 and Gaussian noises. We then will extend this idea to the Non-Gaussian case. u k uncorrelated with x k.

2 Wilmott magazine TECHNICAL ARTICLE 1

We denote the dimension of x k as nx, the dimension of w k as nw and Needless to say for a linear system, the function matrices are equal to so on. these Jacobians. This is the case for the simple Kalman Filter. We define the a priori process estimate as 1.1.2 The Algorithm The actual algorithm could be implemented as follows: ˆ − = x k E [x k] (3) (1) Initialization of x0 and P0 which is the estimation at time step k − 1 prior to the step k measurement. For k in 1...N Similarly, we define the a posteriori estimate (2) Time Update (Prediction) equations

ˆ = | ˆ − = ˆ x k E [x k z k] (4) x k f(xk−1, 0) (8) which is the estimation at time step k after the measurement. and − − = − ˆ − t t We also have the corresponding estimation errors e k x k x k and = + P k A kPk−1A k W kQ k−1W k (9) ek = x k − xˆ k and the estimate error covariances (3-a) Innovation: We define − − −t P = E e e k k k ˆ− = ˆ − z k h(x k , 0) = t P k E e ke k (5) and = − ˆ− where the superscript t corresponds to the transpose operator. ν k z k z k In order to evaluate the above means and covariances we will need as the innovation process. the conditional densities p(x |z − ) and p(x |z ), which are determined k k 1 k k (3-b) Measurement Update (Filtering) equations iteratively via the Time Update and Measurement Update equations. The basic idea is to find the probability density function corresponding to a ˆ = ˆ − + x k x k K kν k (10) hidden state x k at time step k given all the observations z1:k up to that time. and = − − The Time Update step is based upon the Chapman-Kolmogorov equation P k (I K kH k)P k (11) with − − − p(x k|z1:k−1) = p(x k|xk−1, z1:k−1)p(xk−1|z1:k−1)dxk−1 = t t + t 1 K k P k H k(H kP k H k U kR kU k) (12) (6)

= p(x k|xk−1)p(xk−1|z1:k−1)dxk−1 and I the Identity matrix. The above Kalman gain K k corresponds to the mean of the condition- via the . al distribution of x k upon the observation z k or equivalently, the matrix The Measurement Update step is based upon the Bayes rule that would minimize the mean square error P k within the class of linear estimators.

p(zk|x k)p(x k|z1:k−1) This interpretation is based upon the following observation. Having x p(x k|z1:k) = (7) p(z k|z1:k−1) a Normally distributed random-variable with a mean mx and variance Pxx , with and z another Normally distributed random-variable with a mean mz and variance P , and having P = P the covariance between x and z, the con- p(z k|z1:k−1) = p(z k|x k)p(x k|z1:k−1)dx k zz zx xz ditional distribution of x|z is also Normal with A proof of the above is given in the appendix. mx|z = mx + K(z − mz) EKF is based on the linearization of the transition and measurement equations, and uses the conservation of Normal property within the class with = −1 of linear functions. We therefore define the Jacobian matrices of f with K Pxz Pzz respect to the state variable and the system noise as A k and W k respec- which corresponds to our Kalman gain. tively; and similarly for h, as H k and U k respectively. More accurately, for every row i and column j we have 1.1.3 Parameter Estimation For a parameter-set in the model, the calibration could be carried out = ˆ = ˆ Aij ∂fi/∂x j(xk−1, 0), Wij ∂fi/∂wj(xk−1, 0), via a Maximum Likelihood Estimator (MLE) or in case of conditionally ^ = ˆ − = ˆ − Gaussian state variables, a Quasi-Maximum Likelihood (QML) algorithm. Hij ∂hi/∂x j(x k , 0), Uij ∂hi/∂u j(x k , 0)

Wilmott magazine 3 N | = 2 + − To do this we need to maximize k=1 p(z k z1:k−1), and given the where the scaling parameters α, β, κ and λ α (na κ) na will be cho- Normal form of this probability density function, taking the logarithm, sen for tuning. changing the signs and ignoring constant terms, this would be equiva- For k in 1...N lent to minimizing over the parameter-set (1-b) State Space Augmentation: As mentioned earlier, we concatenate the state vector with the system noise and the observation noise, and cre- N z − m L = ln(P ) + k z k (13) ate an augmented state vector for each time-step 1:N z k z k P   = z k z k k 1 xk−1 a   x − = wk−1 For the KF or EKF we have k 1 uk−1 − m = zˆ z k k and therefore   xˆ k−1 and a a   − xˆ − = E [x − |z k] = 0 P = H P Ht + U R Ut k 1 k 1 z k z k k k k k k k 0 and   1.2 The Unscented Kalman Filter and Kushner’s Pk−1 Pxw (k − 1|k − 1) 0 a =  − | − − | −  Nonlinear Filter Pk−1 Pxw (k 1 k 1) Pww (k 1 k 1) 0 1.2.1 Background and Notations 00Puu (k − 1|k − 1) Recently Julier and Uhlmann (1997) proposed a new extension of the (1-c) The Unscented Transformation: Following this, in order to use the Kalman Filter to Nonlinear systems, different from the EKF. The new Normal approximation, we need to construct the corresponding Sigma method called the Unscented Kalman Filter (UKF) will calculate the Points through the Unscented Transformation: mean to a higher order of accuracy than the EKF, and the covariance to a = ˆ a the same order of accuracy. χk−1(0) xk−1 Unlike the EKF, this method does not require any Jacobian calculation = ... since it does not approximate the nonlinear functions of the process and For i 1 na a = ˆ a + + a the observation. Indeed it uses the true nonlinear models but approxi- χk−1(i) xk−1 ( (na λ)Pk−1)i

mates the distribution of the state random variable x k (as well as the and for i = na + 1...2na observation z k ) with a Normal distribution by applying an Unscented a a a χ ( ) = ˆ − ( ( + λ) ) − Transformation to it. k−1 i xk−1 na Pk−1 i na (15) In order to be able to apply this Gaussian approximation (unless we − th − th have x k = f(xk−1) + w k and z k = h(x k) + u k , i.e. unless the equations are where the above subscripts i and i na correspond to the i and i na linear in noise) we will need to augment the state space by concatenating columns of the square-root matrix5. the noises to it. This augmented state will have a dimension (2) Time Update equations are n = n + n + n = x w a x w u . χk|k−1(i) f(χk−1(i), χk−1(i)) (16) 1.2.2 The Algorithm for i = 0...2na + 1 and The UKF algorithm could be written in the following way: 2na Similarly to the EKF, we start with an initial choice − (m) (1-a) Initialization: xˆ = W χk|k−1(i) (17) ˆ k i for the state vector x0 = E [x0] and its covariance matrix i=0 t P0 = E [(x0 − xˆ 0)(x0 − xˆ 0) ]. and 2na (m) (c) We also define the weights W and W as − = (c) − ˆ − − ˆ − t i i P k Wi (χk|k−1(i) x k )(χk|k−1(i) x k ) (18) i=0 λ W (m) = The superscripts x and w correspond to the state and system-noise por- 0 + λ na tions of the augmented state respectively. and (3-a) Innovation: We define λ W (c) = + (1 − α2 + β) = u 0 zk|k−1(i) h(χk|k−1(i), χk−1(i)) (19) na + λ = and for i 1...2na and 2na (m) (c) 1 ˆ− (m) W = W = (14) z = W Z k|k−1(i) (20) i i ( + λ) k i 2 na i=0

4 Wilmott magazine ^ 5 1 t N ) k M − (24) and ˆ x N − ... ) 1 N y i d ] = ) j ) , ..., m 1 N i i ( − 1 ) -dimensional (pos- -dimensional y − N ) ζ) ( i k . N | , ..., N we can write we 1 P k i 1 () i − 2 1 -dimensional Normal √ ( q k P 1 t , ..., )(χ N ) − 1 − + k , ..., − w i dx k technique: 1 ˆ ) the corresponding mean x m i as the Kalman or Kushner M k m ζ( ( and an and 2 ) ( 1: 1 1 P − ), χ − k z G − − M | N ) a z k k y N i k | | i N ( k P k i x w χ ( x and − N ( [ p i ... , ..., ) 1 distribution distribution i 1 k + w , ..., m sigma points, while NLF needs i 1 exp x ( w 1 i ... ) ( of degree 1 1 1 ( − 1 f i y a k − 1 = ( M x G k ˆ x w N + − i TECHNICAL ARTICLE TECHNICAL G k χ 1 | with k N N = = M ... R proposal ) f N = 2 i ) 1 (χ P ] Importance Sampling N = N ) M , i i = 1 2 1 ... k is the vector of the Gauss-Hermite roots of i | w ) 1 x m P ) N ( = ( M | = i ... 1 f N i , ..., 2 N 1 i ] [ i 1 N 1 ) E i w = are the corresponding weights. ( X π) 1 = , ..., ( 1 N k 2 − 1 i = − G M ( i a ˆ ... ζ [ k N X x ( i w E χ 1 = ... − ... 1 1 k ] i i | 1 ) k w = M (ζ X χ 1 i ( G = [ = t and E as k ζ − M P M As the Kushner paper indicates, having an Note that even if both Kushner’s NLF and UKF use Gaussian The idea is based on the value an expected can calculate We More accurately, for a Quadrature-order Quadrature-order a for accurately, More ... 1 = j and similarly for the measurement update equations. the for measurement update and similarly Quadratures, Quadratures, UKF only uses Filters Filters do. A detailed description is given in Doucet, De Freitas (2001). Gordon and by using a known and simple using a known and simple by where this square-root corresponds to the correspondswhere this Cholesky factorization. to square-root equations the Time Update have we the UKF, to Similarly but now and Case: The Particle The Non-Gaussian Filter 1.3 It is possible to generalize the algorithm for the fundamental Gaussian distribution. any one applicable to case to Background and Notations 1.3.1 In this approach, we use Markov-Chain Monte-Carlo simulations instead of using a Gaussian for approximation which is equal to which is equal where of the the integrals. points for computation order order sibly sibly augmented) variable, the sigma-points are defined for and covariance, for a polynomial polynomial a for and covariance, random-variable random-variable i . ) k x (21) (23) (23) ( h − k z . 7 t t ) ) k k − − ˆ ˆ z z − − ) ) i i ( ( 1 1 − − k k | | k k k t Z Z K k k )( )( k z ν z k k 1 k k − − − k k z , after calculating these condi- ˆ − z ˆ ˆ x z z contains the exponent the contains ) P K P k k ) k − − − k z z K + | . k is Gaussian. ) k ) x k x i i | β k z 1 − ( ( k − P x ˆ 1 1 x − ( z k k − ( − − = p = k k 1: p | | P = k and z k k k | k ν Z ) = K α ˆ ( (χ k x . They then. They use the fact that distri- a Normal ) ) and k c c x ) ( ( , the numeric integration in the UKF will cor- i i is strongly nonlinear, the Gauss Hermite inte- ( k P ) 2 h z 1 ) W W | , − u k k a k a 0 = 0 , n n x x z = = x 2 2 | i i ( λ ( k p h x = = ( k k p and z z k k and defined above. as z x 1 ) k P P z 1 k magazine = − z k x P z n | k x (NLF) based on an approximation of the conditional distribution and ( p 6 k − ˆ z When Note Note that when They They finally therewrite calculation equations (3), (4) and (5) Wilmott respond to a Gauss-Hermite Quadrature of order 3. in However the UKF we can tune the filter and reduce the higher term errors via the previ- mentioned parameters ously The iterative methods based on the idea of importance sampling proposed sampling importance of idea the on based methods iterative The in Kushner (1967), Kushner and Budhiraja (2000) correct this problem at and Ito in suggested As time. computation in increase strong a of price the addi- the make to be would integration this avoid to way one (2000), Xiong thattional hypothesis as explained in Kushner (1967), Kushner and Budhiraja (2000). In thistheden- to approximation Normal a using theauthorssuggest approach, sities gration is not efficient for evaluating the moments of the measurement term the since equation, update 1.2.4 Analogy with Kushner’s Nonlinear Filter Analogy with Nonlinear Kushner’s 1.2.4 Nonlinear Kushner’s to algorithmthis compare to interesting be would It Filter and we have as previously have and we and (3-b) Measurement Update and as before bution is entirely determined via its first two moments, which reduces the calculations considerably. using the above tional densities via the time and measurement update equations (6) and via Gaussian Quadratures could be evaluated (7). All integrals which gives us the Kalmanwhich gives Gain as well as as well Equations. the Measurement Update which completes Estimation Parameter 1.2.3 The maximization of the likelihood could be done exactly as in (13) tak- ing More precisely, it is possible to write However this choice of the proposal distribution does not take into

account our most recent observation z k at all and therefore could p(x |z ) E[f (x )] = f (x ) k 1:k q(x |z )dx become inefficient. k k | k 1:k k q(x k z1:k) Hence the idea of using a Gaussian Approximation for the proposal, which could be also written as and in particular an approximation based on the Kalman Filter, in order to incorporate the observations. w k(x k) We therefore will have E[f (x k)] = f (x k) q(x k|z1:k)dx k (25) p(z1:k) | = N ˆ with q(x k xk−1, z1:k) (x k, P k) (29) p(z1:k|x k)p(x k) w k(x k) = q(x k|z1:k) using the same notations as in the section on the Kalman Filter. Such fil- defined as the filtering non-normalized weight as step k. ters are sometimes referred to as the Extended (EPF) and We therefore have the Unscented Particle Filter (UPF). See Haykin (2001) for a detailed description of these algorithms. E [w (x )f (x )] = q k k k = ˜ E[f (x k)] Eq[w k(x k)f (x k)] (26) 1.3.2 The Algorithm E [w (x )] q k k Given the above framework, the algorithm for an Extended or Unscented with Particle Filter could be implemented in the following way: w (x ) ˜ = k k w k(x k) (1) For time step k = 0 choose x and P > 0. Eq[w k(x k)] 0 0 ≤ ≤ defined as the filtering normalized weight as step k. For i such that 1 i Nsims take | Using Monte-Carlo sampling from the distribution q(x k z1:k) we can (i) = + (i) x0 x0 P0Z write in the discrete framework: where Z(i) is a standard Gaussian simulated number. N sims (i) (i) (i) (i) = = / E[f (x )] ≈ w˜ (x )f (x ) (27) Also take P0 P0 and w0 1 Nsims k k k k ≤ ≤ i=1 While 1 k N with again (2) For each simulation-index i w (x(i)) ˜ ( (i)) = k k w k x k ( ) (i) (i) Nsims j xˆ = (x ) j=1 w k(x k ) k KF k−1

(i) Now supposing that our proposal distribution q() satisfies the Markov with P k the associated a posteriori error covariance matrix.

property, it can be shown that w k verifies the recursive identity (KF could be either the EKF or the UKF)

(3) For each i between 1 and Nsims p(z |x(i))p(x(i)|x(i) ) (i) = (i) k k k k−1 w k wk−1 (i) (i) (28) | (i) (i) (i) q(x k xk−1, z1:k) ˜ = ˆ + (i) x k x k P k Z which completes the Sequential Importance Sampling algorithm. It is impor- where again Z(i) is a standard Gaussian simulated number. tant to note that this means that the state x k cannot depend on future observations, i.e. we are dealing with Filtering and not Smoothing8. (4) Calculate the associated weights for each i One major issue with this algorithm is that the variance of the |˜(i) ˜(i)| (i) (i) (i) p(z k x k )p(x k xk−1) weights increases randomly over time. In order to solve this problem, we w = w − k k 1 q(x˜(i)|x(i) , z ) could use a Resampling algorithm which would map our unequally k k−1 1:k

weighted x k’s to a new set of equally weighted sample points. Different with q() the Normal density with mean xˆ(i) and variance P(i). methods have been suggested for this9. k k Needless to say, the choice of the proposal distribution is crucial. (5) Normalize the weights Many suggest using (i) (i) w k | = | w˜ = q(x k xk−1, z1:k) p(x k xk−1) k Nsims (i) i=1 w k since it will give us a simple weight identity ˜(i) (i) (i) = ˜ (i) = (6) Resample the points x k and get x k and reset w k w k 1/Nsims . (i) = (i) | (i) w k wk−1p(z k x k ) (7) Increment k, Go back to step (2) and Stop at the end of the While loop.

6 Wilmott magazine ^ 7 1 , ,κ, S α . C (σ = κτ 2 ψ − 3 e , κ S (τ ) − B (τ ) 1 NT + B , , × C + κτ 2 S C − ,... 4 e σ κτ 1 κ dz κτ − S − − e σ = 2 κ e C + 1 κ t − + − , dz 0 t 1 × dt τ C 1 dt ) ) ρ σ t > Tw t ( × ( 2 C S C = + C + × σ ] − ,σ 1 2 C dt ρ , we also have: also have: , we − S , and the state variables. This solu- ] ) C 1 1 ) T 2 dz C S − κ σ )) − − κ C ( t , t S t σ C κ, σ ( × C G σ exp ln 10 − S S − ( − TECHNICAL ARTICLE TECHNICAL × − = dz (α ρ ln [ ) µ × C k 2 t G E 2 [ C ( F σ κ = S σ S 2 = = σ + )) = c + T + ) dC dG , t T = α , , t ακ C

for for a delivery in t − , (λ/κ) t , C C G r F S , − ( , C S F α ( ( + is the maturity of the futures contract. F t ln = = and − S α

T (τ ) is the increment of the associated withis the associated motion increment of the Brownian is the increment of the Brownian motion associated withis the S, motion associated increment of the Brownian B is the convenience yield’s volatility, yield’s is the convenience is the spot price’s volatility, is the immediate expected return for the return for spot price expected is the immediate S C is the correlation between the two Brownian motions associated is the long run yield mean of the convenience = represents the convergence of the convenience yield towards yield towards of the convenience represents the convergence is the risk premium associated withis the yield, the risk premium associated convenience is the risk-free interest rate interest is the risk-free S C r λ τ µ σ dz α κ σ dz with ρ ,ρ,λ) C — — — The model risk-neutral parameter-set is therefore: — — — — — — — — The model’s solution expresses the relationship between an observ- tion is: Then, to apply the Kalman thefilter, model must be expressed under its is: equation The transition form. state-spaced with: where: the Schwartz filter to theApplying simple model 2.1.2 a under expressed be must model Schwartzthe filter, simple the apply To linear form: Considering the relationship with: where: able futures price α, σ , () we q (30) total

k k dx ) during the resam- k dx ) 1: 1 z − , sims k 1 N 1: − C / z k S | 1 x k | dz k x C ) Sdz ( k x σ S l p ( ( σ ) ’s above and therefore’s above the log- q + k k ) ) i ln l k ( x + k | ) 1 dt 1: w k ] 1 N ’s are distributed according to = z ) z ) , can explain the behavior of the k − 1 i , Sdt ( k C k ( sims C = ) 1 p i ˜ x 1: N = C − z − k | with ) k x = to to a constant − k | N x l k k ) k 1: i ( = l k ˜ κ(α ( x L [ p (µ ( ) ( w 1 q = ln = ) − k k 1: x dC dS | z | k k z z ( ( p p = k = . The convenience yield can brieflycan yield convenience definedThe be . theas com- l F k l magazine and the convenience yield could be written as could be written S k . l Given the likelihood at step at step the likelihood Given Wilmott considering the resetting of futures prices futures 2.1 The Schwartz model The Schwartz model supposes that two state variables, namely the spot price 2 Term Structure Models of Commodity 2 Term Structure Models Prices In this section, the Kalman filter is applied to a well known term struc- ture model of commodity prices developed by Schwartz (1997). We first present the Schwartz model, and show how it can be transformed into a state-spaced model for a simple filter as well as an extended filter. then We explain how the iteration process can be initiated, and how it can withthetheobtained performancesim- compare we Lastly, stabilized. be a sensitivity analysis. filters, and make ple and extended which will provide us with an interpretation of the likelihood as theas thelikelihood of interpretation withan us provide will which weight and given that by construction the the total likelihood is thethe likelihood total product of the 1.3.3 Parameter Estimation Parameter 1.3.3 As in the previous section, in order to estimate the parameter-set likelihood to be maximized is to likelihood can use an ML Estimator. However since the particle filter does not neces- thenot particlefilter since does However Estimator. ML an use can sarily assume Gaussian noise, the likelihood function to be maximized than form the sections. has a more general used in previous one Now fort fort associated with the possession of physical stocks. There are usually of the variables, because most these time there data for two no empirical is yield theconvenience and price, thespot for series time reliable no are asset. not a traded Presentation of the model 2.1.1 variables are theThe dynamics of these following: state pling step, we can approximate can approximate we pling step, where: where: th (τ ) — The i component of the vector zt is F i 1 2 th µ − σ t — h(St, Ct) is a vector whose i component is: = 2 S × — c is a (2 1) vector, and t is the period sepa- × − + καt [St exp( ziCt/t−1 B(τi))] − rating 2 observation dates 1 − e κτi with: Z = i κ − 1 t — F = is a (2 × 2) matrix, − 01− κt σ 2 σ σ ρ σ 2 1 − e 2κτi B(τ ) = r − α + C − S C × τ + C × i 2κ 2 κ i 4 κ 3 — T is an identity matrix, (2 × 2), − σ 2 1 − e κτi — wt is a sequence of uncorrelated errors, with: + ακ + σ σ ρ − C × S C κ κ 2 σ 2t ρσ σ t = = = S S C E[wt] 0, and Q Var[wt] 2 ρσSσC t σC t α = α − λ/κ The measurement equation is directly written from the solution of — ut is a vector, (N × 1), with no serial correlation: the model: Gt zt = d + H × + ut, t = 1,...NT E[ut] = 0 and R = Var[ut]. R is (N × N) Ct where: Lastly A and H, the derivatives of the functions f and h with respect to the th — The i component of the vector zt of size N is ln(F(τi)). N is the state variables, are the following: number of maturities that were used for the estimation. — The ith component of the vector d of size N is B(τ ) i × − — A is a (2 2) matrix: 1 − e κτi — H is the (N × 2) matrix whose ith line is 1, − 1 + µt − Ct−1t −St−1t κ A(S − , C − ) = t 1 t 1 0 (1 − κt) — ut is a white noise vector, of size N, with no serial correlation: — H is a (N × 2) matrix whose ith line is: = = × E[ut] 0 and R Var[ut]. R is (N N) (−Zi Ct−1 +B(τi )) (−zi Ct−1 +B(τi )) e − Zi × e 2.1.3 Applying the extended filter to the Schwartz model − The transition equation is directly obtained from the dynamics of the σ 2 σ σ ρ σ 2 1 − e 2κτi B(τ ) = r − α + C − S C × τ + C × state variables: i 2κ 2 κ i 4 κ 3 − St 2 − κτi = f (St−1, Ct−1) + T(St−1, Ct−1)wt σC 1 e C + ακ + σSσC ρ − × t κ κ 2 where: = − α α λ/κ S — t is the state vector, (2 × 1), Ct 2.2 Practical difficulties associated with the empirical study — f (St−1, Ct−1) is a (2 × 1) vector: To perform the empirical study, some difficulties must be overcome. + − St−1(1 µt Ct−1t) First, there are choices to make when the iterative process is started. f (St−1, Ct−1) = κα + − ( − κ ) t Ct 1 1 t Second, if the model has been expressed under the logarithmic form for St−1 0 the simple Kalman filter, some precautions must be taken when the per- —T(S − , C − ) is a (2 × 2) matrix: T(S − , C − ) = t 1 t 1 t 1 t 1 01 formance is being considered. Third, the stability of the iteration process σ 2 ρσ σ and the model’s performance are extremely sensitive to the covariance × = S S C — Q is a (2 2) matrix: Var(wt) 2 matrix R. ρσSσC σC

The measurement equation becomes: 2.2.1 Starting the iterative process To start the iterative process, there is a need for the initial values of the = + zt h(St, Ct) ut non-observable variables and for their covariance matrix. In the case of

8 Wilmott magazine ^ 9 1 were 11 is actual- 1 τ : 2 τ ,τ)) n ,τ)) ( n F ( F Keeping Keeping the first observa- of October 2001. They have They 2001. October of is the estimated futures price futures estimated the is − − th ... , ,τ) is the observed futures price. is the observed ,τ) 2 n n ,τ) τ ( ( n ˆ ˆ F F ( ,τ) ( ˆ F n ( 1 ( N 1 = F n N = n TECHNICAL ARTICLE TECHNICAL 1 N 1 N = = 12 MPE . These measures decrease also withalways maturity, RMSE 14 of May 1998, and the15 and 1998, May of at time-step n, and at time-step th τ is the number of observations, of number the is N Retaining theRetaining error same notations, the (RMSE), is root mean squared The mean pricing errors (MPE) are defined way: in the following been arranged such thatbeen arranged the first futures price’s maturity defined in the following way, for one given maturity one given for defined way, the in following The differences between the optimal obtained parameters with the two filters show that Schwartzthose as size order same the have parameters the Nevertheless, the linearization has on the on different crude periods. obtained in 1997 oil market, a significant influence. 2.3.3.2 The model’s performance The filtersimple is more always precise than the extended one. This is true all thefor maturities where results 2.3.3 The empirical compared. are filters two thewith obtained parameters optimal the First, Then the model’s ability represent to the price and curve its dynamics is the the sensitivity of the errors results to covariance Finally, appreciated. matrix is presented. 2.3.3.1 Optimal parameters linearization on the results. We first present the empirical data. Then the performance criteria are exposed. the Finally, results are delivered upon. and commented data 2.3.1 The empirical The data used for the empirical study corresponds to daily crude oil prices for the settlement of the Nymex’s the18 between WTI futures contracts, tion of each group of five, these daily data points were transformed data. theFor and theestimation, for weekly parameters measure into of the model’s performance, four series of futures prices for a maturity a maturity for retained, corresponding to the one, the three, the six thethree-month and T-bill to set are rates theinterest The months maturities. nine themodel, in constant be to supposed are rates interest Because rates. and 2002. use the 1995 mean of all thewe observations between 2.3.2 The performance criteria To measure the model’s performance, two criteria were retained: the mean pricing errors errors. and the root mean squared ly ly the one month maturity, and the second futures price’s corre- sponds to the two months maturity is a − t ˆ z is calcu- C and taking V is a standard- σ e V × − t )) ˆ z 2 e T , = t t , z S e ( F 2 ( 2 σ 2 . The exponential of e T ln V − t is the one immediately after- ˆ z σ × − − 2 ] T 1 − t + )) ˆ z T 1 e [ − t T ˆ z E , t , and the convenience yield , = = S S ] t ( z t z e F [ ( E ln − r = c magazine : 2 is the nearest delivery, and T is crucial for the because iteration’s stability, it is added to the is the standard deviation of the innovations and of theis the deviation innovations standard , and if the terms of this matrix are too high, the iteration can 1 R R T σ Most of the time, the easiest way to estimate this matrix is to calculate to is matrix this estimate to way easiest the time, the of Most The covariance matrix associated with the state variables must also be also must variables state thewith associated matrix covariance The and in 1 Wilmott innovations covariance matrix, during the innovation phase. Therefore, the updating of the non-observable variable is strongly affected by the matrix 2.3 Comparison between the two filters the two between 2.3 Comparison meas- model Schwartzthe of performance the between comparison The ured with the two filters allows us to appreciate the influence of the 2.2.3 Stabilizing the iteration process 2.2.3 Stabilizing the iteration Another important choice must be made before initiating the Kalman iteration process, concerning the estimation of the covariance matrix This equation. measurement the in introduced errors thewith associated matrix We therefore have to correct it by the correct it by factor. exponential to therefore have We where where T term term structure models of commodity prices, the nearest futures price is generally used as the spot price lated lated as the solution of the Brennan and Schwartz (1985) model. solution This requires the use of two observed futures prices, for delivery in ized Gaussian residual independent to biased estimator of the future prices, because find:the expectation we become unstable. the variances and the covariances of the estimation method was chosen measure to the model’s database. performance and is present- This theempirical much how know to important is it But 2.3. paragraph in ed a few this question, answer this to by choice. In order results are affected simulations will be run and analyzed. 2.2.2 Measuring the performance thetransformed taking theequations have As logarithms,we in order to use the Kalmancare introduce not be has a taken to to bias simple filter, thatthein case, theinnovations Indeed back results. transforming when with theare calculated logarithms of the futures prices. wards. wards. We choose the first value of the estimation period for observable variables. the non- initialized. We choose a diagonal matrix, and calculate the variances with the first 30 points of the period. estimation TABLE 1. OPTIMAL PARAMETERS, 1998–200113 TABLE 3. THE COMPARISON BETWEEN THE MODEL’S PERFORMANCE ASSOCI- Simple filter Extended filter ATED WITH THE SIMPLE FILTER, WITH Parameters Gradients Parameters Gradients AND WITHOUT CORRECTIONS FOR THE Pull back force: κ 1.59171 −0.003631 1.258133 0.000628 LOGARITHM, 1998–2001 Trend: µ 0.379926 0.000497 0.352014 −0.001178

Spot price’s volatility: σs 0.263525 −0.000448 0.320235 −0.000338 Simple filter Simple filter corrected Long run mean: α 0.252260 −0.012867 0.232547 0.004723 Maturity MPE RMSE MPE RMSE

Convenience yield’s volatility: σc 0.237071 −0.000602 0.288427 −0.001070 1 month −0.060423 2.319730 0.065644 2.314178 Correlation coefficient: ρ 0.938487 −0.001533 0.969985 0.000008 3 months −0.107783 1.989428 0.006419 1.981453 Risk premium: λ 0.177159 0.009272 0.181955 −0.002426 6 months −0.054536 1.715223 0.026010 1.709931 9 months −0.007316 1.567467 0.061301 1.564854 Average −0.057514 1.897962 0.036637 1.892604 TABLE 2. THE MODEL’S PERFORMANCE Unit: USD/b. WITH THE SIMPLE AND THE EXTENDED FILTERS, 1998–2001

Simple filter Extended filter 7 6 Maturity MPE RMSE MPE RMSE 5 1 month −0.060423 2.319730 0.09793 2.294503 4 3 months −0.107783 1.989428 0.057327 2.120727 3 2 6 months −0.054536 1.715223 0.109584 1.877654 1 − 0

9 months 0.007316 1.567467 0.141204 1.695222 ($/b) − − 1 Average 0.057514 1.897962 0.101511 1.997027 −2 Unit: USD/b. −3 −4 −5 −6 which is consistent with Schwartz’s results on other periods. Nevertheless, −7 Schwartz has worked with longer maturities, and shown that the root 8/05/98 mean squared error increases again for deliveries after 15 months. 1 18/07/9818/09/9818/11/9818/01/9918/03/9918/05/9918/07/9918/09/9918/11/9918/01/0018/03/0018/05/0018/07/0018/09/0018/11/0018/01/0118/03/0118/05/0118/07/0118/09/01

To be rigorous, the model’s performance associated with the simple Simple filter Extended filter Kalman filter should be corrected when, as is the case here, the loga- rithms of the estimations are used to obtain the estimations themselves. FIGURE 1: Innovations, 1998–2001 The correction slightly improves the performance, as shown in table 3: the root mean squared errors and the mean pricing errors diminish a bit for almost all the maturities. Finally, the innovations range diminishes with the futures con- Nevertheless with an extended filter, the model’s ability to represent the tracts maturity. Figure 1 illustrates the innovations behavior for the price curve is still quite good for this case. one-month maturity case. It shows that they tend to return to zero for Another way to assess a model’s performance is to see if it is able to the two filters. Nevertheless as the figure illustrates it, even if the reproduce the price dynamics, which can be shown graphically. mean pricing errors are low for the two filters, the pricing errors can Considering this, the first important conclusion is that the model is be rather important at certain specific dates. They reach USD 6 in able to reproduce the price dynamics quite precisely, even if as in 1998— absolute value for the extended filter, which represents 25% of the 2001, there are very large fluctuations in the futures prices. Figure 2 mean futures price for the one-month maturity case. It is a bit lower shows the results obtained for the one-month maturity case. During for the simple filter: USD 5. that period, the crude oil price goes from USD 11 per barrel to USD 37 Consequently we can say that there is clearly an impact of the lin- per barrel! Even if the Kalman filters are often suspected to be unstable, earization introduced in the extended filter: it can be observed from the these results show that they can be used even with extremely volatile optimal parameters, from the performance, and from the innovations. data. The graph also shows that the two filters attenuate the range of

10 Wilmott magazine ^ 11

1

18/09/01

18/07/01

by by (1/2), (1/16), 18/05/01

R 18/03/01

18/01/01

18/11/00

18/09/00

18/07/00

18/05/00

18/03/00 18/01/00

TECHNICAL ARTICLE TECHNICAL

18/11/99

18/09/99

18/07/99

18/05/99

18/03/99

18/01/99 18/11/98

One month futures prices observed/estimated, 1998–2001 18/09/98

Observed futures prices for a one month's maturity Estimated futures prices for a one month's maturity, with a matrix R corresponding to the observations 4) Estimated futures prices for a one month's maturity (simulation matrix lowered and an artificially 18/07/98 Most of the of Most time, the of this terms the correspond matrix variances to Simulations 1 to 4 correspond respectively to the model’s perform-

35,00 30,00 25,00 20,00 15,00 10,00

18/05/98 performance strongly improves: from the initial simulation to the fourththe to simulation initial the from improves: strongly performance one, the root-mean-squared-error is almost divided by two. The compari- fact the illustrates also simulations fourththe and thirdthe between son that there is a limit to the performance amelioration. Figure 3 portrays ($/b) and the covariances of the estimation in database, thenamely, case stud- ied here, the variances and covariances between futures prices for differ- ent maturities. But one must know that the results obtained with Kalman filterthe can be more precise if these terms are (artificially) lowered, as in shown table 4. This table exposes the different results obtained dur- ing with 1998—2001 the extended Kalman filter. This period is especially interesting because the data strongly fluctuates. The performance is achieved, first with the matrix based on the observations, then with the artificially one. lowered ance obtained by multiplying the system’s matrix (1/160), (1/160), and (1/1600). As the matrix elements are lowered, the model’s

FIGURE 3: 2.4 Conclusion The main conclusions of this section are the following: approximation Firstly, introduced the in the extended Kalman filter has clearly an influence on the model’s performance; the extended filter generally leads to less precise estimations than the simple one. Nevertheless, the difference between the two filters is quite low and the extended filter is still acceptable. theSecondly, estimation results are sensitive to the sys- tem’s matrix containing the errors of the measurement equation, and this matrix can be used obtain to more precise results on the estimation filter Kalman extended the in approximationmade the Thirdly, database. is not a real problem until the model starts to become highly nonlinear. the situation. illustrates synthetic example The following the main results of these simulations. 18/09/01

0.0105 1.9970 0.4739 1.8210 0.0244 1.1037 1.0227 0.0172 1.0202 0.1015 18/07/01 18/05/01

Extended filter 18/03/01

18/01/01

18/11/00

18/09/00 18/07/00

0.0383 0.0005 18/05/00

Simple filter 18/03/00 18/01/00

0003 18/11/99 . 0

18/09/99 R

18/07/99

18/05/99 18/03/99

1 month 3 months 6 months 9 months Average 18/01/99 Futures one month

magazine 18/11/98 Estimated and observed futures prices for the one month maturity, Estimated and observed futures prices for . This matrix represents the errors in the measurement equa-

R 18/09/98

MPE 0.0013MPE 0.0935 0.0073MPE 0.1501 0.0152 0.0035 1.6506 MPE 0.0612 0.0131 0.0137 0.0067 0.0415 0.0075 MPE 0.0979 0.0573 0.1096 0.1412

RMSE 1.3812RMSE 1.0950 1.3602 0.8647 1.0919 0.7499 0.8697 0.7591 RMSE 1.8356RMSE 1.5405 1.4759 1.2478 1.1686 2.6602 0.9386 0.8317 RMSE 2.2945 2.1207 1.8777 1.6952 18/07/98

35 30 25 20 15 10

($/b) 18/05/98 Simulation 1 Simulation 2 Simulation 3 Simulation 4 Observations TABLE 4. SIMULATIONS WITH DIFFERENT TABLE 4. SIMULATIONS WITH SYSTEM’S MATRIX tion and the way it is has estimated a strong influence on the empirical results. price fluctuations. This phenomenon can actually be observed for every maturity. 2.3.3.3 Simulations The last results presented in this sesction are simulations. They show how the model’s performance is affected by the choice of the system’s matrix FIGURE 2: 1998–2001 Wilmott Let x follow an Ornstein-Uhlenbeck or mean-reverting diffusion, like initial series and the estimations we obtain with the two versions of the convenience yield in the Schwartz model, and z be an exponential: the Kalman filter when a = 0.8.

z = exp(ax) + 0.4N(0, 1) 3 Models In this section, we apply the different filters to a few stochastic volatil- The advantage of the example chosen is that we can easily simulate ity models including the Heston, the GARCH and the 3/2 models. To x and it is therefore easy to assess the performance of each version of test the performance of each filter, we use five years of S&P500 time- the Kalman filter. It also allows us to make the nonlinearity vary with a. series. We first illustrate the effect of an increasing nonlinearity by making The idea of applying the Kalman Filter to Stochastic Volatility models the parameter a vary from 0.2 to 1. The for the expectation and goes back to Harvey, Ruiz & Shephard (1994), where the authors attempt the variance of the errors have been obtained on a series of 2000 points. to determine the system parameters via a QML Estimator. This approach Table 5 shows clearly that when a is small, the bias is small as well. has the obvious advantage of simplicity, however it does not account for The latter increases with a. The estimation of the variance is quite high the nonlinearities and non-Gaussianities of the system. when the values of a are smaller than or equal to 0.4, because the rela- More recently, Pitt & Shephard (Doucet, De Freitas and Gordon (2001)) tive weight of the noise is large. These estimations of the variance also suggested the use of Auxiliary Particle Filters to overcome some of these increase with the nonlinearity. difficulties. An alternative method based upon the Fourier transform has To illustrate the performance of the filters and for a better under- been presented in Bates (2002). standing of the bias, we show on Figure 4 the 200 first points of the 3.1 The State Space Model Let us first present the state-space form of the stochastic volatility models: TABLE 5. PERFORMANCE OF THE 3.1.1 The Let us study the Euler-discretized Heston (1993) Stochastic Volatility TWO FILTERS model in a risk-neutral framework15

EKF Kushner’s NLF √ √ 1 = − + ( − − − ) + −  − a E(X-X*) Var(X-X*) E(X-X*) Var(X-X*) lnSk lnSk 1 rk 1 2 v k 1 t v k 1 tBk 1 (31) 0.2 0.01 0.94 0.02 0.92 √ √ − − 0.4 0.1 0.65 0.06 0.60 v k = vk−1 + (ω − θvk−1)t + ξ vk−1 tZk−1 (32) 0.6 −0.23 0.47 −0.11 0.41 0.8 0.4 0.62 0.14 0.31 where S k is the stock price at time-step k, t the time interval, r k the 1 0.68 1.67 −0.12 0.26 risk-free rate of interest (possibly netted by a dividend-yield), v k the stock variance and B k , Z k two sequences of temporally-uncorrelated Gaussian random-variables with a mutual correlation ρ. The model risk- neutral parameter-set is therefore = (ω,θ,ξ,ρ).

Considering v k as the hidden state and lnSk+1 as the observation, we EKF X k can subtract from both sides of the transition equation

x k = f (xk−1, wk−1), a multiple of the quantity h(xk−1, uk−1) − zk−1 which is equal to zero. This would allow us to eliminate the correlation between the system and the measurement noises. Indeed, if the system equation is √ √ v k = vk−1 + (ω − θvk−1)t + ξ vk−1 tZk−1 − ρξ[lnSk−1 Kushner's NLF √ √ 1 + − − + − − − (r k 2 vk 1)t vk 1 tBk 1 lnSk]

posing for every k

˜ 1 Z k = (Z k − ρB k) FIGURE 4: Real process versus its estimates: Top EKF, bottom Kushner’s NLF 1 − ρ2

12 Wilmott magazine ^ 13 1 t t t    1 2 1 √ 1 t ) ) − − i , we will have will have , we i − ( k p k ( ) k s i  x ˜ ( v k x √ P p 1 2 ) ) 1 2 − 2 1 = p ) ) − i ) = s + k ( 1 2 ) m x , s t ) p − s i t 1 ( , k ( ) 2 2 ) )  − i − x i k ( k ( k ρ ) x x √ 1: ) ˆ x | ( i z 1 ρξ ) k ( − t i ρξ( k , − ( ˜ x 1 2 − = p k 1 2 1 ˜ t 1 x  ( 1 2 v ) ( − i 2  k ( − p √ m − x ) 1 2 k ρ − ξ exp , | ) θ θ i ) v ) k k i s ( i k 1 k ( ( − r = ˜ x k + √ ˜ π x ˜ | ( − x 1 k 1 s ( k S =− 2 3 2 − S 1 q , z = + and standard deviation deviation and standard k ) − 1 2 ( x √ − n p 2 k 1 ξ k p − TECHNICAL ARTICLE TECHNICAL H v − m − p m 1 U k = ln k = ) , ) = z − i z 3 2 ) 1 ) ( k i ) 1 k k ) ( − k s 2 1 i − − − ˜ w ( k x = p , k 1: x W 1 z v ( − = m − , k m n k p 1 ) , r , ρξ i z ) k i ( x − k ( ( = k ( z 1 2 w k ρξ x ) n | r 1 2 − 1 ) p i − ) ( k − ) i n ( k ρξ − ˜ 1 x ω x ) ( − i | p = ( k q ) i k x ) ( − ) + ˜ x i ( k ( 1 + 1 ˜ x p ) ρξ( − | i ( k = k x + z k ( A p = x m and as before we have we and as before need for thewhich provides us with filter what we implementation. and used be could equations update measurement and update time same The with NLF. the UKF or Kushner’s 3.2.2 Particle Filters We could also apply the Particle Filtering algorithm to our problem. and calling Using the same notations as in section 1.3.2 the Normal density with mean as as well and with 3.2.1 Gaussian Filters have will theFor we EKF and as as well 0 2 v / 3 (35) (34) (36) (33) to the to k t 2 1  / − k 3 k RW ˜ Z 1 t 1 t − − = + 1 tB  k k ] k  − ˜ v Z p 1 k  t √ − tB k √ tz 1 1 2 k 1  v  −  v − − k √ p k v √ √ 1 v √ k 1 − √ and v ρξ p k − 2 + ρξ k v p k 1 2 √ ρ is made. t v 2 B 1 2 k ξ ρ + − − v ) − t k + θ − 1 v θ t = 1 with 1 2 ) 16 k k ξ − ) should not affect the results enor- x v − − 1 ξ k 1 2 + 0 k − r 1 2 1 v k r + R v − − − ( p k 1 ρξ θ k v k − + r k 1 k S − − ( k r k − S the filter. k S ω S + uncorrelated [ (ω ρξ lnS k k ln + ˜ tune + − Z = ln 1 , and therefore we do not have a direct control on control direct a have not do we therefore and , 1 lnS ρξ 1 k ω 1 2 − 1 − k + k − − = k v tv + v p k v 1 +  = lnS + = 1 k √ k ρξ k − k v = would would naturally correspond to the Heston (Square-Root) v v + lnS k = 2 to the GARCH diffusion-limitand model, theGARCH to z magazine = / = k 1 we should have a robust system. a robust should have we k 1 k x v z = = i.e. p p The new state transition equation would therefore become would equation transition state The new As we saw in the previous section, the system stability greatly Wilmott and the measurement equation would be would and the measurement equation 3.1.3 Robustness and Stability Robustness 3.1.3 In this state-space formulation, we only need to choose a value for this issue. Nevertheless we could add an independent Normal measure- variance ment noise with a given where 3.1.2 Other Stochastic Volatility Models Volatility Other Stochastic 3.1.2 It is easy to generalize the above state space model to other could replace (32) with approaches. Indeed we volatility stochastic 3.2 The Filters 3.2 The Filters our problem: the apply to Gaussian and the can now Particle Filters We which would allow us to us to allow which would which could be set to the historic-volatility over a period preceding our time-series. Ideally, the choice of model. These models have all been described and analyzed in Lewis (2000). we will have as expected as expected will have we mously, mously, model, where the same choice of state space where the choice of state same depends on the measurement noise. However in this case the system precisely is noise = − = − 3.3 Parameter Estimation and Back-Testing MPEEKF −Heston 3.58207e 05 RMSEEKF −Heston 1.83223e 05

For the Gaussian MLE we will need to minimize MPEEKF −GARCH = 2.78438e − 05 RMSEEKF −GARCH = 1.42428e − 05 = − = − MPEEKF − 3 2.63227e 05 RMSEEKF − 3 1.74760e 05 K 2 2 2 ν k φ(ω,θ,ξ,ρ) = ln(F k) + F k k=1 EKF errors for different SV Models = − ˆ− = − t + t 7e-05 with ν k z k z k and F k H kP k H k U kU k for the EKF. For the Particle MLE, as previously mentioned, we need to maximize Heston GARCH 6e-05 3/2 N Nsims (i) ln w k 5e-05 k=1 i=1

4e-05 By maximizing the above likelihood functions, we will find the optimal ˆ = (ω,ˆ θ,ˆ ξ,ˆ ρ)ˆ parameter-set . This calibration procedure could then be errors used for pricing of derivatives instruments or forecasting volatility. 3e-05 We could perform a back-testing process in the following way. Choosing an original parameter-set 2e-05

∗ = (0.02, 0.5, 0.05, −0.5) 1e-05

and using a Monte-Carlo simulation, we generate an artificial time-series. 0 We take S0 = 1000 USD, v0 = 0.04, r k = 0.027 and t = 1. Note that we 0 200 400 600 800 1000 1200 are taking a large t in order to have meaningful errors. The time-series Days is generated via the above transition equation (32) and the usual log-nor- FIGURE 5: Comparison of EKF Filtering errors for Heston, GARCH and 3/2 mal measurement equation. Models We find the following optimal17 parameter-sets:

ˆ EKF = (0.036335, 0.928632, 0.036008, −1.000000) UKF errors for different SV Models ˆ 0.00012 UKF = (0.033104, 0.848984, 0.033263, −0.983985) ˆ Heston = (0.019357, 0.500021, 0.033354, −0.581794) GARCH EPF 0.0001 ˆ 3/2 UPF = (0.019480, 0.489375, 0.047030, −0.229242)

which show the better performance of the Particle Filters. However, it 8e-05 should be reminded that the Particle Filters are also more computation- intensive than the Gaussian ones. 6e-05 errors

3.4 Application to the S&P500 Index 4e-05 The above filters were applied to five years of S&P500 time-series (1996 to 2001) in Javaheri (2002) and the filtering errors were considered for the Heston model, the GARCH model and the 3/2 model. Daily index close- 2e-05 prices were used for this purpose, and the time interval was set to t = 1/252. The appropriate risk-free rate was applied and was adjusted 0 by the index dividend yield at the time of the measurement. 0 200 400 600 800 1000 1200 Days As in the previous section, the performance could be measured via the MPE and the RMSE. We could also refer to figures 5 to 11 for a visual FIGURE 6: Comparison of UKF Filtering errors for Heston, GARCH and 3/2 interpretation of the performance measurements. Models

14 Wilmott magazine ^ 15 1 1200 1200 EKF EPF UPF UKF EPF EKF UKF UPF 1000 1000 800 800 Days 600 600 Days TECHNICAL ARTICLE TECHNICAL 400 400 Heston model errors for different Filters Heston model errors for GARCH model errors for different Filters Comparison of Filtering errors for the GARCH Model 200 200 Comparison of Filtering errors for the Heston Model Comparison of Filtering errors for the Heston 0 0

8e-05 7e-05 6e-05 5e-05 4e-05 3e-05 2e-05 1e-05

7e-05 6e-05 5e-05 4e-05 3e-05 2e-05 1e-05 errors errors FIGURE 9: FIGURE 10: 1200 1200 3/2 3/2 Heston 1000 Heston GARCH 1000 GARCH 800 800 600 600 Days Days 400 400 EPF errors for different SV Models EPF errors for different UPF errors for different SV Models 200 200 magazine Comparison of EPF Filtering errors for Heston, GARCH and 3/2 Comparison of EPF Filtering errors for Heston, Comparison of UPF Filtering errors for Heston, GARCH and 3/2 0 0 1e-05 5e-06 4e-05 3e-05 2e-05 5e-05 2e-05

1.5e-05 3.5e-05 2.5e-05 5.5e-05 4.5e-05

1.6e-05 1.4e-05 1.2e-05 2.2e-05 1.8e-05 2.8e-05 2.6e-05 2.4e-05

error FIGURE 7: Models errors FIGURE 8: Models Wilmott 3/2 model errors for different Filters As expected, we observe an improvement when Particle Filters are 7e-05 used. What is more, given the tests carried out on the S&P500 data it EKF UKF seems that, despite its vast popularity, the Heston model does not per- 6e-05 EPF form as well as the 3/2 representation. UPF This suggests further research on other existing models such as Jump

5e-05 Diffusion (Merton (1976)), Variance Gamma (Madan, Carr and Chang (1998)) or CGMY (Carr, Geman, Madan and Yor (2002)). Clearly, because of the non-Gaussianity of these models, the Particle Filtering technique 4e-05 will need to be applied to them18. errors Finally it would be instructive to compare the risk-neutral parameter- 3e-05 set obtained from the above time-series based approaches, to the parame- ter-set resulting from a cross-sectional approach using options market 2e-05 prices at a given point in time19. Inconsistent parameters between the two approaches would signal either arbitrage opportunities in the mar- 1e-05 ket, or a misspecification in the tested model20.

0 200 400 600 800 1000 1200 Days 4 Summary In this article, we present an introduction to various filtering algo- FIGURE 11: Comparison of Filtering errors for the 3/2 Model rithms: the Kalman filter, the Extended Filter (EKF), as well as the Unscented Kalman Filter (UKF) similar to Kushner’s Nonlinear filter. We also tackle the subject of Non-Gaussian filters and describe the Particle

MPEUKF −Heston = 3.00000e − 05 RMSEUKF −Heston = 1.91280e − 05 Filtering (PF) algorithm. We then apply the filters to a term structure model of commodity MPEUKF −GARCH = 2.99275e − 05 RMSEUKF −GARCH = 2.58131e − 05 = − = − prices. Our main results are the following: Firstly, the approximation MPEUKF − 3 2.82279e 05 RMSEUKF − 3 1.55777e 05 2 2 introduced in the Extended filter has an influence on the model per- formances. Secondly, the estimation results are sensitive to the system

MPEEPF −Heston = 2.70104e − 05 RMSEEPF −Heston = 1.34534e − 05 matrix containing the errors of the measurement equation. Thirdly, the approximation made in the extended filter is not a real issue until the MPEEPF −GARCH = 2.48733e − 05 RMSEEPF −GARCH = 4.99337e − 06 = − = − model becomes highly nonlinear. In that case, other nonlinear filters MPEEPF − 3 2.26462e 05 RMSEEPF − 3 2.58645e 06 2 2 such as those described in section 1.2 may be used. Lastly, the application of the filters to stochastic volatility models

MPEUPF −Heston = 2.04000e − 05 RMSEUPF −Heston = 2.74818e − 06 shows that the Particle Filters perform better than the Gaussian ones, however they are also more expensive. What is more, given the tests car- MPEUPF −GARCH = 2.63036e − 05 RMSEUPF −GARCH = 8.44030e − 07 = − = − ried out on the S&P500 data it seems that, despite its vast popularity, the MPEUPF − 3 1.73857e 05 RMSEUPF − 3 4.09918e 06 2 2 Heston model does not perform as well as the 3/2 representation. This suggests further research on other existing models such as Jump Two immediate observations can be made: On the one hand the Particle Diffusion, Variance Gamma or CGMY. Clearly, because of the non- Filters have a better performance than the Gaussian ones, which recon- Gaussianity of these models, the Particle Filtering technique will need to firms what one would anticipate. On the other hand for most of the be applied to them. Filters, the 3/2 model seems to outperform the Heston model, which is in line with the findings of Engle & Ishida (2002). Appendix 3.5 Conclusion The Measurement Update equation is Using the Gaussian or Particle Filtering techniques, it is possible to esti- p(z |x )p(x |z − ) mate the stochastic volatility parameters from the underlying asset time- p(x |z ) = k k k 1:k 1 k 1:k | series, in the risk-neutral or real-world context. p(z k z1:k−1)

16 Wilmott magazine ^ 17 1 k 1 k+ The , and = t , ln S* one , 101 k  IEEE such as f(ln , Vol. 6, No. 2 proxy Wiley Inter-Science , Vol. 34, No. 1 Journal of the Royal Englewood Cliffs, Prentice )|) = E[ln(|f(lnS* applicable to the tested +1 k is IEEE Transactions on Signal S but with a volatility of Journal of Econometrics t , ln k S (2001) “Sequential Monte-Carlo Ph.D. Thesis in progress, Ecole des Mines de Review of Financial Studies = ln(|f(ln TECHNICAL ARTICLE TECHNICAL (Editors) k z , Vol. 57, No. 3 , Vol. 45, No. 5 vy processes has recently been done in Barndorff—Nielsen vy processes has recently é and ignoring the stock-drift term, given that and ignoring the stock-drift * the same process as S , Vol. 61, No. 2 k t ln S Springer-Verlag − with S (2001) “Kalman Filtering and Neural Networks” k , Vol. 58, No. 2 , Vol. 75, No. 2 Journal of Financial and Quantitative Analysis +1 Journal of Finance k , Series B, Vol. 64 + k (Editor) , Vol. 50, No. 2 ) = ln S ln v The University of Oxford, The Robotics Research Group Working Paper, New York University and University of California San Diego +1 1 2 k ). We would therefore write ). We would therefore write t  An alternative approach would consist in choosing a volatility An alternative approach + This idea is explored in Aït-Sahalia, Wang and Yared (2001) in a Non-parametric This comparison supposes that the In this section, all optimizations were made via the Direction-Set algorithm as were made via the Direction-Set algorithm In this section, all optimizations L A study on Filtering and Babbs S. H., Nowman K. B. (1999) “Kalman Filtering of Generalized Vasicek Term Babbs S. H., Nowman K. B. (1999) “Kalman Filtering Julier S. J., Uhlmann J.K. (1997) “A New Extension of the Kalman Filter to Nonlinear Arulampalam S., Maskell S., Gordon N., Clapp T. (2002) “A Tutorial on Particle Filters Arulampalam S., Maskell S., Gordon N., Clapp T. (2002) Heston S. (1993) “A Closed-Form Solution for Options with Stochastic Volatility with Ito K., Xiong K. (2000) “Gaussian Filters for Nonlinear Filtering Problems” Javaheri A. (2002) “The Volatility Process” Aït-Sahalia Y., Wang Y., Yared F. (2001) “Do Markets Correctly Price the Barndorff-Nielsen O. E., Shephard N. (2002) “Econometric Analysis of Realized Barndorff-Nielsen O. E., Shephard N. (2002) “Econometric of Latent Affine Processes” Bates D. S. (2002) “Maximum Likelihood Estimation Resource Investments” Brennan M.J., Schwartz E.S. (1985) “Evaluating Natural Anderson B. D. O., Moore J. B. (1979) “Optimal Filtering” Anderson B. D. O., Moore J. B. (1979) “Optimal Filtering” Carr P., Geman H., Madan D., Yor M. (2002) “The Fine Structure of Asset Returns” Doucet A., De Freitas N., Gordon N. Engle R. F., Ishida I. (2002) “The Square-Root, the Affine, and the CEV GARCH Harvey A. C. (1989) “Forecasting, Structural Models and the Kalman Filter” Harvey A. C., Ruiz E., Shephard N. (1994) “Multivariate Stochastic Variance Models” Haykin S. Alizadeh S., Brandt M.W., Diebold F.X. (2002) “Range-Based Estimation of Stochastic Alizadeh S., Brandt M.W., Diebold F.X. (2002) “Range-Based √ , ln S k Processing Structure Models” Paris Systems” Hall for On-line Nonlinear/Non-Gaussian Bayesian Tracking” Applications to Bond and Currency Options” Transactions on Automatic Control and Shephard N. (2002). 16. fashion. 17. model. Probabilities of Movement of the Underlying Asset?” 13. )|)] (2002) noise. See Alizadeh, Brandt and Diebold corresponding to the measurement for details. 14. The precision was set to 1.0e-6. described in Press et al. (1997). 15. o( Models” Volatility and its use in Estimating Stochastic Volatility Statistical Society University of Iowa & NBER Journal of Business Journal of Business Methods in Practice” Models” Cambridge University Press Review of Economic Studies Volatility Models” S = ρ . k = 0.4; ) C k σ x ( p ) k 1 = 0.1; ) − k k ) dx α )) k ) x 1: k 1 ( x z x ( ( p − ( k ) p p h ) ) 1 1: = 0.3; ) k 1 z − 1 S | k − x − k − | σ k 1: k k 1 x z 1: z 1: ( ( − z ( z k ) | p p ) 1 ( k k ) 1: ) 1 k k p − = 0.1; x z x 1 − ) ( ( ( x R k − 1 | t ) µ p p p k ) k 1: 1 − ) ) ) ) 1 1: )) z k z k k k could be written as could be written k − | k z − ( k 1: | k x x x k x ) ). Needless to say, the two notations are x | p z k 1: 1 ( , , x 1 | ( 1: 1 = 0.5; z z ) ( 1 1 k p z k- h − ( k − , | ) k p κ z − − k in order to be able to apply the Girsanov theorem. w k k p k ) k k 1: ( 1: − , 1: t k z z z x 1: 1: p z = z ( ( ( | | 1 k z z x  k k , | | | p p p ) z k- k k k k 1: z ( 1 x ( z z z z z ( 2 1 − ( ( ( ( ( f p k p p p p p = − 1: z k | x = = = = = k z ) exp ( k is proportional to p 1: ) z k | k z | x k magazine ( x p ( p = 0.1. λ In the whole empirical study, optimizations have been made with a precision of 1e-5 For the two filters, the parameters values retained to initiate the optimization are the The MPE and the RMSE presented here can not be directly compared with those pro- The same exact methodology could be used in a non risk-neutral setting. We are sup- In that model, interest rates are supposed to be constant. This means that N = 4 in the case we study. Indeed using the Bayes rule and the Markov property, we have have we rule property, and theIndeed using the Markov Bayes . The square-root can be computed via a Cholesky factorization combined with a . This analogy between Kushner’s NLF and the UKF has been studied in Ito and Xiong . A description of the Gaussian Quadrature could be found in Press et al. (1997). . See Harvey (1989) for an explanation on Smoothing. . See for instance Arulampalam et al. (2002) or Van der Merwe, et al. (2000) for details. . Some prefer to write 0.5; posed by Schwartz in 1997 because his computations were done using the logarithm of the futures prices. 12. equivalent. 2 Singular Value Decomposition. See Press et al. (1997) for details. 3 (2000). 4 5 6 7. posing we have a small time-step 1 8. 9. 10. on the gradients. 11. same. These values are the following: FOOTNOTES & REFERENCES Wilmott under the hypothesis of additive measurement noises. measurement under the of additive hypothesis Note that Note and corresponds to the Likelihood Function for the time-step the for the Function time-step and corresponds Likelihood to where the denominator where the denominator TECHNICAL ARTICLE 1

Kushner H. J. (1967) “Approximations to Optimal Nonlinear Filters” IEEE Transactions on Automatic Control, Vol. 12 Kushner H. J., Budhiraja A. S. (2000) “A Nonlinear Filtering Algorithm based on an Approximation of the Conditional Distribution” IEEE Transactions on Automatic Control, Vol. 45, No. 3 Lautier D. (2000) “La Structure par Terme des Prix des Commodités : Analyse Théorique et Applications au Marché Pétrolier” Thése de Doctorat, Université Paris IX Lautier D., Galli A. (2000) “Un Mode`le de Structure par Terme des Prix des Matie`res Premie`res avec Comportement Asymétrique du Rendement d’Opportunité” Finéco, Vol. 10 Lewis A. (2000) “Option Valuation under Stochastic Volatility” Finance Press Madan D., Carr P., Chang E.C. (1998) “The Variance and Option Pricing” European Finance Review, Vol. 2, No. 1 Merton R. C. (1976) “Option Pricing when the Underlying Stock Returns are Discontinuous” Journal of , No. 3 Pennacchi G. G. (1991) “Identifying the Dynamics of Real Interest Rates and Inflation : Evidence Using Survey Data” The Review of Financial Studies, Vol. 4, No. 1 Press W. H., Teukolsky S.A., Vetterling W.T., Flannery B. P. (1997) “Numerical Recipes in C: The Art of Scientific Computing” Cambridge University Press, 2nd Edition Schwartz E. S. (1997) “The Stochastic Behavior of Commodity Prices : Implications for Valuation and Hedging” The Journal of Finance, Vol. 52, No. 3 Van der Merwe R., Doucet A., de Freitas N., Wan E. (2000) “The Unscented Particle Filter” Oregon Graduate Institute, Cambridge University and UC Berkeley Welch G., Bishop G. (2002) “An Introduction to the Kalman Filter” Department of Computer Science, University of North Carolina at Chapel Hill Wells C. (1996) “The Kalman Filter in Finance” Advanced Studies in Theoretical and Applied Econometrics, Kluwer Academic Publishers, Vol. 32 ACKNOWLEDGEMENT

The second part of the study was realized with the support of the French Institute for Energy (IFE). The data were provided by TotalFinaElf.

W

18 Wilmott magazine