Econometric analysis of multivariate realised QML: estimation of the covariation of equity prices under asynchronous trading

Neil Shephard Nuffield College, New Road, Oxford OX1 1NF, UK, Department of Economics, University of Oxford [email protected] Dacheng Xiu 5807 S. Woodlawn Ave, Chicago, IL 60637, USA Booth School of Business, University of Chicago [email protected]

First full draft: February 2012 This version: 22nd October 2012

Abstract Estimating the covariance between assets using high frequency data is challenging due to market microstructure effects and asynchronous trading. In this paper we develop a multivari- ate realised quasi-likelihood (QML) approach, carrying out inference as if the observations arise from an asynchronously observed vector scaled Brownian model observed with error. Under stochastic volatility the resulting realised QML estimator is positive semi-definite, uses all avail- able data, is consistent and asymptotically mixed normal. The quasi-likelihood is computed using a Kalman filter and optimised using a relatively simple EM algorithm which scales well with the number of assets. We derive the theoretical properties of the estimator and prove that it achieves the efficient rate of convergence. We show how to make it obtain the non-parametric efficiency bound for this problem. The estimator is also analysed using Monte Carlo methods and applied to equity data with varying levels of liquidity.

Keywords: EM algorithm; Kalman filter; market microstructure noise; non-synchronous data; portfolio optimisation; quadratic variation; quasi-likelihood; semimartingale; volatility. JEL codes: C01; C14; C58; D53; D81

1 Introduction

1.1 Core message

The strength and stability of the dependence between asset returns is crucial in many areas of financial economics. Here we propose an innovative, theoretically sound, efficient and convenient method for estimating this dependence using high frequency financial data. We explore the prop- erties of the methods theoretically, in simulation experiments and empirically. Our realised quasi maximum likelihood (QML) estimator of the covariance matrix of asset prices is positive semidefinite and deals with both market microstructure effects such as bid/ask

1 bounce and crucially non-synchronous recording of data (the so-called Epps (1979) effect). Positive semidefiniteness allows us to define a coherent estimator of correlations and betas, objects of impor- tance in financial economics. We derive the theoretical properties of our estimator and prove that it achieves the efficient rate of convergence. We show how to make it achieve the non-parametric efficiency bound for this problem and demonstrate theoretically the effect of asynchronous trading. The estimator is also analysed using Monte Carlo methods and applied on equity data in a high dimensional case. Our results show our methods deliver particularly strong gains over existing methods for un- balanced data: that is where some assets trade slowly while others are more frequently available.

1.2 Quasi-likelihood context

Our approach can be thought to be the natural integration of three influential econometric es- timators, completing a line of research and opening up many more areas of development and application. The first is the realised variance estimator, which is the QML estimator of the quadratic vari- ation of a univariate semimartingale and was econometrically formalised by Andersen, Bollerslev, Diebold, and Labys (2001) and Barndorff-Nielsen and Shephard (2002). There the quasi-likelihood is generated by assuming the log-price is Brownian motion. Multivariate versions of these estima- tors were developed and applied in Andersen, Bollerslev, Diebold, and Labys (2003) and Barndorff- Nielsen and Shephard (2004). These estimators are called realised covariances and have been widely applied. The second is the Hayashi and Yoshida (2005) estimator, which is the QML estimator for the corresponding multivariate problem where there is irregularly spaced non-synchronous data, though sampling intervals are not incorporated into their estimator. Again the underlying log- price is modelled as a rotated vector Brownian motion. Neither of the above estimators deals with noise. Xiu (2010) studied the univariate QML estimator where the Brownian motion is observed with Gaussian noise. He called this the “realised QML estimator” and showed this was an effective estimator for semimartingales cloaked in non- Gaussian noise. Moreover, the QML estimator is asymptotically equivalent to the optimal realized kernel, by Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008), with a suboptimal bandwidth. Note also the related Zhou (1996), Zhou (1998), Andersen, Bollerslev, Diebold, and Ebens (2001) and Hansen, Large, and Lunde (2008). Our paper moves beyond this work to produce a distinctive and empirically important result. It proposes and analyses in detail the multivariate realised QML estimator which deals with irregularly spaced non-synchronous noisy multivariate data. We develop methods to allow it to be easily

2 implemented and develop the corresponding asymptotic theory under realistic assumptions. We show this estimator has a number of optimal properties.

1.3 Alternative approaches

A number of authors have approached this sophisticated multivariate problem using a variety of techniques. Here we discuss them to place our work in a better context. As we said above the first generation of multivariate estimators, realised covariances, were based upon moderately high frequency data. Introduced by Andersen, Bollerslev, Diebold, and Labys (2003) and Barndorff-Nielsen and Shephard (2004), these realised covariances use synchronised data sampled sufficiently sparsely that they could roughly ignore the effect of noise and non-synchronous trading. Related is Hayashi and Yoshida (2005) who tried to overcome non-synchronous trading but did not deal with any aspects of noise (see also Voev and Lunde (2007)). More recently there has been an attempt to use the finest grain of data where noise and non- synchronous trading become important issues. There are five existing methods which have been proposed. Two deliver positive semi-definite estimators, so allowing correlations and betas to be coherently computed. They are the multivariate realised kernel of Barndorff-Nielsen, Hansen, Lunde, and Shephard (2011) and the non-biased corrected preaveraging estimator of Christensen, Kinnebrock, and Podolskij (2010). Both use a synchronisation device called refresh time sampling. Neither converges at the optimal rate. Two other estimators have been suggested which rely on polarisation of quadratic variation. Each has the disadvantage that they are not guaranteed to be positive semi-definite, so ruling out their direct use for correlations and betas. The papers are A¨ıt-Sahalia, Fan, and Xiu (2010) and Zhang (2011). The bias-corrected Christensen, Kinnebrock, and Podolskij (2010) is also not necessarily positive semi-definite. Further, none of them achieve the non-parametric efficiency bound. In a paper written concurrently with this one, Park and Linton (2012a) develop a Fourier based estimator of covariances, which extends the multivariate work of Mancino and Sanfelici (2009) and Sanfelici and Mancino (2008). Finally, we note that related univariate work on ameliorating the effect of noise includes Zhou (1996), Zhou (1998), Hansen and Lunde (2006), Zhang, Mykland, and A¨ıt-Sahalia (2005), Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008), Jacod, Li, Mykland, Podolskij, and Vet- ter (2009)), Andersen, Bollerslev, Diebold, and Labys (2000), Bandi and Russell (2008), Kalnina and Linton (2008), Li and Mykland (2007), Gloter and Jacod (2001a), Gloter and Jacod (2001b), Kunitomo and Sato (2009), Reiss (2011), Large (2011), Malliavin and Mancino (2002), Mancino and Sanfelici (2008), Malliavin and Mancino (2009), A¨ıt-Sahalia, Jacod, and Li (2012) and Hansen

3 and Horel (2009). Surveys include, for example, McAleer and Medeiros (2008), A¨ıt-Sahalia and Mykland (2009), Park and Linton (2012b) and A¨ıt-Sahalia and Xiu (2012).

1.4 More details on our paper

Here we use a QML estimator in the multivariate case where we model efficient prices as correlated Brownian motion observed at irregularly spaced and asynchronously recorded datapoints. Each observation is cloaked in noise. We provide an asymptotic theory which shows how this approach deals with general continuous semimartingales observed with noise irregularly sampled in time. The above approach can be implemented computationally efficiently using Kalman filtering. The optimisation of the likelihood is most easily carried out using an EM algorithm, which is implemented using a smoothing algorithm. The resulting estimator of the integrated covariance is positive semidefinite. In practice it can be computed rapidly, even in significant dimensions.

1.5 Some particularly noteworthy papers

There are a group of papers which are closest to our approach. A¨ıt-Sahalia, Fan, and Xiu (2010) apply the univariate estimator of Xiu (2010) to the multivariate case using polarisation. That is they estimate the covariance between x1 and x2, by applying univariate methods to estimate Var(x +x ) and Var(x x ) and then looked at a scaled difference 1 2 1 − 2 of these two estimates. The implied covariance matrix is not guaranteed to be positive semi- definite. During our work on this paper we were sent a copy of Corsi, Peluso, and Audrino (2012) in January 2012 which was carried out independently and concurrently with our work. This paper is distinct in a number of ways, most notably we have a fully developed econometric theory for the method under general conditions and our computations are somewhat different. However, the over- arching theme is the same: dealing with the multivariate case using a missing value approach (see the related Elerian, Chib, and Shephard (2001), Roberts and Stramer (2001) and Papaspiliopoulos and Roberts (2012) for discretely observed non-linear diffusions) based on Brownian motion ob- served with error. A variant of this paper, also dated January 2012, by Peluso, Corsi, and Mira (2012), who carry out a related exercise to Corsi, Peluso, and Audrino (2012) but this time using Bayesian techniques. They assume the efficient price is Markovian and have no limiting theory for their quasi-likelihood based approach. Related to these papers is the earlier more informal univariate analysis of Owens and Steigerwald (2006) and the multivariate analysis of Cartea and Karyampas (2011). In late April 2012 we also learnt of Liu and Tang (2012). They study a realised QML estimator of a multivariate exactly synchronised dataset. They propose using Refresh Time type devices to

4 achieve exact synchronicity. Under exact synchronicity their theoretical development is significant and independently generates the results in one of the theorems in this paper. We will spell out the precise theoretical overlap with our paper later. To be explicit they do not deal with asynchronous trading, which is the major contribution of our paper.

1.6 Structure of the paper

The structure of our paper is as follows. In Section 2 we define the model which generates the quasi- likelihood and more generally establish our notation. We also define our multivariate estimator. In Section 3 we derive the asymptotic theory of our estimator under some rather general conditions. In Section 4 we extend the core results in various important directions. In Section 5 we report on some Monte Carlo experiments we have conducted to assess the finite sample performance of our approach. In Section 6 we provide results from empirical studies, where the performance of the estimator is evaluated with a variety of equity prices. In Section 7 we draw our conclusions. The paper finishes with a lengthy appendix which contains the proofs of various theorems given in the paper, and a detailed online appendix with more empirical analysis.

2 Models

2.1 Notation

′ We consider a d-dimensional log-price process x = (x1, ..., xd) . These prices are observed irregularly and non-synchronous over the interval [0, T ], where T is fixed and often thought of as a single day. These observations could be trades or quote updates. Throughout we will refer to them as trades. We write the union of all times of trades as

ti, 1, 2, ..., n, where we have ordered the times so that 0 t < ... < t < ... < t T . Note that the t times ≤ 1 i n ≤ i must be distinct. Price updates can occur exactly simultaneously, a feature dealt with next.

Associated with each ti is an asset selection matrix Zi. Let the number of assets which trade at time t be d and so 1 d d. Then Z is d d, full of zeros and ones where each row sums i i ≤ i ≤ i i × exactly to one. Unit elements in column k of Zi shows the k-th asset traded at time ti.

2.2 Efficient price x is assumed to be driven by y, the efficient log-price, abstracting from market microstructure effects. The efficient price is modelled as a Brownian semimartingale defined on some filtered

5 probability space (Ω, , ( ) , P ), F Ft t t y(t)= µ(u)du + σ(u)dW (u), (1) Z0 Z0 where µ is a vector of elements which are predictable locally bounded drifts, σ is a c`adl`ag volatility matrix process and W is a vector of independent Brownian motions. For reviews of the econometrics of this type of process see, for example, Ghysels, Harvey, and Renault (1996). Then the ex-post covariation is

T ′ [y,y]T = Σ(u)du, where Σ= σσ , Z0 where

n ′ [y,y]T = plim y(τ j) y(τ j−1) y(τ j) y(τ j−1) , n→∞ { − } { − } Xj=1 (e.g. Protter (2004, p. 66–77) and Jacod and Shiryaev (2003, p. 51)) for any sequence of deter- ministic synchronized partitions 0 = τ <τ < ... < τ = T with sup τ τ 0 for n . 0 1 n j{ j+1 − j}→ →∞ This is the quadratic variation of y. Our interest is in estimating [y,y]T using x. Throughout we will assume that y and the random times of trades t ,Z are stochastically { i i} independent. This is a strong assumption and commonly used in the literature (but note the discussion in, for example, Engle and Russell (1998) and Li, Mykland, Renault, Zhang, and Zheng (2009)). This assumption means we can make our inference conditional on t ,Z and so regard { i i} these times of trades as fixed. Throughout we assume that we see a blurred version of y, with our data being

xi = Ziy(ti)+ Ziεi, i = 1, 2, ..., n, where εi is a vector of potential market microstructure effects. We will assume E(εi) = 0 and 1 write Cov(εi) = Λ, where Λ is diagonal . General time series discussions of missing data includes Harvey (1989, Ch. 6.4), Durbin and Koopman (2001, Ch. 2.7) and Ljung (1989).

2.3 A Gaussian quasi-likelihood

We proxy the Brownian semimartingale by Brownian motion, which is non-synchronously observed. This will be used to generate a quasi-likelihood. We model

y(t)= σW (t).

1Corsi, Peluso, and Audrino (2012) use a slightly different approach. They update at time points T i/n whether there is new data or not. They used a linear Gaussian state space model xi = Ziy(T i/n)+ εi, where a selection matrix Zi is always d d, but some rows are entirely made up of zeros if a price is not available at that particular × time. If a price is entirely missing, the input for their Kalman innovations vi is set to zero.

6 Then writing Σ = σσ′, we have that

y(t ) y(t ) N(0, Σ (t t )), i − i−1 ∼ i − i−1 while all the non-overlapping innovations are independent. Throughout we will write

u = y(t ) y(t ), ∆n = t t . i i − i−1 i i − i−1

n Of course ∆i > 0 is a scalar. At this point we assume that

ε iid N(0, Λ). i ∼ ′ where Λ is diagonal. Then we can think of the time series of observations x1:n = (x1,...,xn) as a Gaussian state space model. A discussion of the corresponding literature is available in, for example, Harvey (1989), West and Harrison (1989) and Durbin and Koopman (2001).

2.4 ML estimation via EM algorithm

Our goal is to develop positive semidefinite estimators of Σ, noting for us that Λ is a nuisance. We would like our methods to work in quite high dimensions and so the EM approach to maximising the log-likelihood function is attractive. EM algorithms are discussed in, for example, Tanner (1996) and Durbin and Koopman (2001, Ch. 7.3.4). We note that the complete log-likelihood is, writing and recalling, e = x Z y(t ), u = i i − i i i y(t ) y(t ), of the form, writing y = (y ,...,y )′, and assuming y N(ˆy , P ) which is i − i−1 1:n 1 n 1 ∼ 1 1 independent of (Σ, Λ), n n 1 1 log f(x y ; Λ)+ log f(y ;Σ) = c log Z ΛZ′ e′ Z ΛZ′ −1 e 1:n| 1:n 1:n − 2 i i −2 i i i i i=1 i=1 X X  n n 1 1 1 ′ −1 log Σ n uiΣ ui. −2 | |−2 ∆i Xi=2 Xi=2 Then the EM algorithm works with the

E [ log f(x1:n y1:n; Λ) + log f(y1:n; Σ) x1:n; Λ, Σ] { n | n } | 1 1 = c log Z ΛZ′ E e′ Z ΛZ′ −1 e x ; Λ, Σ − 2 i i − 2 i i i i| 1:n i=1 i=1 n o nX n X  1 1 1 log Σ E u′ Σ−1u x ; Λ, Σ . −2 | |− 2 ∆n i i| 1:n i=2 i=2 i X X  Writing e = E (e x ) and D = Mse (e x ), then i|n i| 1:n i|n i| 1:n −1 −1 −1 E be′ Z ΛZ′ e x = tr Z ΛZ′ E e e′ x = tr Z ΛZ′ e e′ + D , i i i i| 1:n i i i i| 1:n i i i|n i|n i|n n  o n  o h  n oi 7 b b and, writing u = E (u x ) and N = Mse (u x ), then i|n i| 1:n i|n i| 1:n E u′ Σ−1u x = tr Σ−1E u u′ x = tr Σ−1 u u′ + N . i b i| 1:n i i| 1:n i|n i|n i|n    h n oi Then the EM update is b b n n −1 n 1 1 ′ ′ ′ ′ Σ= n ui|nui|n + Ni|n , diag Λ = ZiZi diag Zi ei|nei|n + Di|n Zi . n 1 ∆i ! ! − Xi=2 n o   Xi=1 Xi=1 n o Iteratingb these updates,b b the sequence of (Σ, Λ)b converges to a maximum in the likelihoodb b function.

2.5 Recalling the disturbance smootherb b

Computing ei|n, ui|n, Di|n and Ni|n is routine and rapid, if rather tedious to write down. It is carried out computationally efficiently using the “disturbance smoother.” The following subsection b b is entirely computational and can be skipped on first reading without loss. The smoother starts by running with the Kalman filter (e.g. Durbin and Koopman (2001, p. 67)), which is run forward in time i = 1, 2,...,n through the data. In our case it takes on the form v = x Z y , F = Z (P + Λ) Z′, K = P Z′F −1, L = I K Z then y = y + K v , i i − i i i i i i i i i i i − i i i+1 i i i ′ n Pi+1 = PiLi +∆i+1Σ. Here yi = E(yi xi:i−1) and Fi = Cov(xi xi:i−1). These recursions need some b | | b b initial conditions y1 and P1. Throughout we will assume their choice does not depend upon Σ or b Λ. A typical selection for y1 is the opening auction price, whereas an alternative is to use a diffuse b prior. Here vi is di 1, Fi is di di, Ki is d di, yi+1|i is d 1 and Pi+1 and Li are d d. Note × b × × × × that for large d, the update for Pi+1 is the most expensive, but it is highly sparse as Li is sparse. b Each iteration of the EM algorithm will lead to a non-negative change in the quasi log-likelihood log f(x ; Λ, Σ) = c 1 n log F 1 n v′F −1v . 1:n − 2 i=1 | i|− 2 i=1 i i i The disturbance smootherP (e.g. DurbinP and Koopman (2001, p. 76)) is run backwards i = n,n 1, ..., 1 through the data. It takes the form, writing H = Z ΛZ′, a d d matrix, e = − i i i i × i i|n −1 ′ −1 ′ n n n 2 Hi(Fi vi Kiri), Di|n = Hi Hi(Fi +KiMiKi)Hi, ui|n =∆i Σri−1, Ni|n =∆i Σ (∆i ) ΣMi−1Σ, − − − b ′ −1 ′ ′ −1 ′ where we recursively compute ri−1 = ZiFi vi +Liri, Mi−1 = ZiFi Zi +LiMiLi, starting out with b r = 0, M = 0. Here e is d 1 and D is d d . While u and r are d 1, and N and n n i|n i × i|n i × i i|n i × i|n Mi are d d. Notice again the updates for Mi are highly sparse. × b b 3 Econometric theory

In this section, we develop the asymptotic theory for the bivariate case, as the general multivariate case can be derived similarly. To get to the heart of the issues our analysis follows four steps. First we look at the benchmark bivariate ML estimator case where the volatility matrix is fixed and there are equidistant observations. Secondly we show how those results change when the noise is non-Gaussian and we have stochastic volatility effects, but still have equidistant observations.

8 Thirdly and more realistically we discuss the impact of having unequally spaced data which is wrongly synchronised in the quasi-likelihood. Finally we discuss the impact of non-synchronised data on a fully non-synchronised quasi-likelihood.

3.1 Step one: benchmark bivariate MLE 3.1.1 The model

We start with the constant covariance matrix case with equidistant observations, which means that

Zt = I2 and ti = T i/n. This means that we have synchronised trading. We also assume the market microstructure effects are independent and initially normal. Then we observe returns

r = x x , i = 1, 2, ..., n, j = 1, 2, j,i j,i − j,i−1 with, we assume,

x = y(i/n)+ ε , i = 0, 1, 2, ..., n, y(i/n)= y((i 1) /n)+ T/nu , (2) i i − i p where

ε i.i.d. Λ 0 Λ 0 Σ Σ i N , Λ= 11 , Σ= 11 12 . (3) ui ∼ 0 Σ 0 Λ22 Σ12 Σ22         In discrete time x would be called a bivariate “local level model” (e.g. Durbin and Koopman (2001, Ch. 2)). ′ Suppose the observed returns are collected as r = (r1,1, r1,2, . . . , r1,n, r2,1, . . . , r2,n) , then the likelihood can be rewritten as

1 1 L = n log(2π) log(det Ω) r′Ω−1r, (4) − − 2 − 2 where Ω = ∆Σ I + Λ J . Here denotes the Kronecker product and J is a n n matrix ⊗ n ⊗ n ⊗ n × 2 1 0 0 − ·. · · .  1 2 1 .. .  − − . Jn =  0 1 2 .. 0  . (5)  −   ......   . . . . 1   −   0 0 1 2   · · · −    The likelihood (4) is tractable as we know the eigenvalues and eigenvectors of Jn and so Ω.

3.1.2 The asymptotic theory

Before we give the bivariate case we recall the univariate one

1 L 1/2 3/2 n 4 Σ Σ N 0, 8Λ Σ T −1/2 , 11 − 11 −→ 11 11     b 9 see, for example, Stein (1987), Gloter and Jacod (2001a), Gloter and Jacod (2001b), A¨ıt-Sahalia, Mykland, and Zhang (2005) and Xiu (2010). This result establishes the optimal rate – n1/4–that can be obtained with noisy data. We now go onto the Gaussian bivariate case.

Theorem 1 (Bivariate MLE) Assume the model (2)-(3) is true. Then the ML estimators Σ and Λ satisfy the central limit theorem as n →∞ b

b Σ11 Σ11 1 − L n 4 Σ Σ N(0, Π).  12 − 12  −→ Σb22 Σ22  −   b  ′ −1 ∂Ψθ Here for a 3b 1 vector Σθ = vech(Σ) = (Σ1,1, Σ1,2, Σ2,2) the Π matrix is such that Π = × ∂Σθ′ with

∞ i,j ∂ΨΣ 1 ∂ω (Σ, Λ,x) u,v = (1 + 1 ) dx , i, j, u, v = 1, 2, ∂Σ − u=6 v 2 ∂Σ i,j Z0 u,v  ∗ where, writing Λii = Λii/T , Σ + Λ∗ π2x2 ω1,1(Σ, Λ,x) = 22 22 (Σ + Λ∗ π2x2) (Σ + Λ∗ π2x2) Σ2 11 11 22 22 − 12 Σ + Λ∗ π2x2 ω2,2(Σ, Λ,x) = 11 11 (Σ + Λ∗ π2x2) (Σ + Λ∗ π2x2) Σ2 11 11 22 22 − 12 Σ ω1,2(Σ, Λ,x) = − 12 . (Σ + Λ∗ π2x2) (Σ + Λ∗ π2x2) Σ2 11 11 22 22 − 12 Proof. Given in the Appendix. Each of the integrals in Π−1 has an analytic solution (e.g. Mathematica will solve the integrals), but the result is not informative and so we prefer to leave it in this compact form.

When Σ12 = 0, there is no “externality,” i.e. the asymptotic variances for Σ11 and Σ22 in the bivariate case reproduce the one-dimensional MLE case. As the correlation increases from b b 0 to 1, the multivariate MLE Σ11 becomes more efficient than the univariate one, because more information is collected via the correlation with the other series. This is illustrated in Figure 1. b 3.2 Step two: bivariate QMLE with equidistant observations 3.2.1 Assumptions

We now move to the cases which are more realistic. We deal with them one at a time: stochastic volatility, irregularly spaced synchronised data and finally and crucially asynchronised data.

Assumption 1 The underlying latent d-dimensional log-price process satisfies (1), where the drift µ is predictable locally bounded, the d d volatility process σ is locally bounded Itˆosemimartingales × and W is a d-dimensional Brownian motion.

10 1.0 11 S 0.9

0.8

0.7

0.6

0.5

0.4 Asymptotic Relative Efficiency of 0.3 0.0 0.2 0.4 0.6 0.8 1.0 Ρ

Figure 1: The figure plots the relative efficiency for bivariate MLE of Σ11 over the univariate 2 alternative, against the correlation. Σ11 = 0.25 . As the number falls below zero the gains from bivariate MLE become greater. The blue line, the red dashed line, and the black dotted line 2 2 2 correspond to the cases with Σ22 = 0.3 , 0.25 , and 0.2 , respectively.

Assumption 2 The noise εi is a vector random variable which is independent and identically 2 distributed, and independent of ti, W , σ and has fourth moments .

Assumption 3 The trades are synchronised at times ti = T i/n and Zi = In.

3.2.2 The asymptotic theory

2 Before we give the bivariate case we define R = 1 T σ4dt / 1 T σ2dt 1, by Jensen’s T T 0 t T 0 t ≥ inequality and recall the univariate result  R   R 

T T 3/2 1 1 Ls 1/2 1 n 4 Σ σ2dt MN 0, (5R + 3) Λ σ2dt T −1/2 , 11 − T t −→ T 11 T t  Z0   Z0  ! b which is a rewrite of the result due to Xiu (2010). Here the asymptotic variance of the estimator 1 T 2 increases with RT keeping T 0 σt dt fixed. We now extend this to theR multivariate case. The asymptotic theory for the realised QML estimator Σ is given below.

2The i.i.d.b assumption can be replaced by more general noise process, which is independent conditionally on Y . This allows some dependence and endogeniety, but the noise is still uncorrelated with Y . This assumption is the focus of, for example, Jacod, Podolskij, and Vetter (2010). We choose not to adopt it as the idea of the proof remains the same except for some technicalities.

11 Theorem 2 (Bivariate QMLE) Under Assumptions 1-3, we have

1 T Σ11 Σ11,tdt 1 T 0 − 1 T Ls n 4 Σ Σ dt MN(0, ΠQ), (6)  12 − T R0 12,t  −→ Σb 1 T Σ dt  22 − T R0 22,t   b  R where ΠQ isb

−1 −1 ′ 1 ∂Ψθ (2) (3) (4) ∂Ψθ ΠQ = Avar + Avar + Avar , 4 ∂Σθ′ ( ∂Σθ′ )   n o   2 ∞ v,u l,s T (2) ∂ω (Σ, Λ,x) ∂ω (Σ, Λ,x) 1 Avar = 2 ′ dx Σsv,tΣul,tdt , 0 ∂Σθ ∂Σθ T 0 l,s,u,vX=1 Z  Z  2 ∞ l,s l,v T (3) ∗ ∂ω (Σ, Λ,x) ∂ω (Σ, Λ,x) 2 2 1 Avar = 4 Λll ′ π x dx Σsv,tdt , 0 ∂Σθ ∂Σθ T 0 l,s,vX=1 Z  Z  2 ∞ l,s l,s (4) ∗ ∗ ∂ω (Σ, Λ,x) ∂ω (Σ, Λ,x) 4 4 Avar = 2 ΛllΛss ′ π x dx. 0 ∂Σθ ∂Σθ l,sX=1 Z 1 T Here all the derivatives are evaluated at Σ= T 0 Σtdt. R Proof. Given in the Appendix. In independent and concurrent work Liu and Tang (2012) have established a similar result, although their derivation follows the steps in Xiu (2010), which is different from what we suggest here.

3.3 Step 3: bivariate QMLE with irregularly space synchronised observations

We now build a quasi-likelihood upon irregularly spaced but synchronised times t ,t ,...,t , so { 1 2 n} that Z = I . These times imply a collection of increments ∆n = t t , 1 i n . We then i 2 { i i − i−1 ≤ ≤ } make the following assumption.

n ¯ ¯ T Assumption 4 We assume that ∆i = ∆(1+ ξi), i = 1, 2, ..., n, ∆ = n , E(ξi) = 0, where Var(ξ ) < and ξ , 1 i n are i.i.d.. Also assume Y and ξ are independent. i ∞ { i ≤ ≤ } { i} −1 Assumption 4 means that ξi = Op(1) and so the individual gaps shrink at rate Op(n ).

Corollary 1 Assume Assumptions 1, 2 and 4 hold and additionally Σ is constant and µ = 0. Then the asymptotic variance of the MLE is the same as that in Theorem 1.

Proof. Given in the Appendix. This makes it clear that in the synchronised case irregularly spacing of the data has no impact on the asymptotic distribution of the multivariate realised QML estimator. The reason for this is that the effect is less important than the presence of noise, a result which appears in the univariate

12 work of A¨ıt-Sahalia, Mykland, and Zhang (2005) and A¨ıt-Sahalia and Mykland (2003). This result contrasts with the realised variance where Mykland and Zhang (2006) have shown that the quadratic variation of the sampling times impacts the asymptotic distribution in the absence of noise. This synchronised case is important in practice. Some researchers have analysed covariances by applying a synchronisation scheme to the non-synchronous high frequency observations. This delivers an irregularly spaced sequence of synchronised times of trades, although some prices could be somewhat stale. Typically there is a very large drop in the sample size due to synchronisation. The most well known such scheme is the Refresh Time method analysed by Barndorff-Nielsen, Hansen, Lunde, and Shephard (2011) and subsequently employed by, for example, Christensen, Kinnebrock, and Podolskij (2010) and A¨ıt-Sahalia, Fan, and Xiu (2010). See also the earlier more informal papers by Harris, McMcInish, Shoesmith, and Wood (1995) and Martens (2003). Alternatively, Zhang (2011) discusses the Previous Tick approach, which always discards more data than what is suggested by the Refresh Time.

3.4 Step 4: bivariate QMLE with non-synchronous observations

In this paper, synchronisation is not needed for our method. Instead we write

xj,i = yj(tj,i)+ εj,i, j = 1, 2, i = 1, 2,...,nj, where t is the i-th observation on the j-th asset. The corresponding returns are r = x x . j,i j,i j,i − j,i−1

n n Assumption 5 We define ∆ j = t t , i = 1, 2,...,n and ∆nj =diag(∆ j ). Assume that i j,i − j,i−1 j i nj ¯ ¯ ¯ ¯ ∆i = ∆j, for j = 1, 2, and ∆2 = m∆1, where ∆j = T/nj.

These assumptions mean that the data is asynchronous, unless m = 1, but always equally spaced in time. The latter assumption is made as the previous subsection has shown that irregularly spacing of the data does not impact the asymptotic analysis of the realised QML and so it is simply cumbersome to include that case here without any increase in understanding. This notation means that ∆¯ j is the average time gap between observations while nj is the sample size for the j-th asset. ′ Collecting the observed returns r = (r1,1, r1,2, . . . , r1,n1 , r2,1, . . . , r2,n2 ) , then our likelihood can be rewritten as

1 1 L = (n + n ) log(2π) log(det Ω) r′Ω−1r, (7) − 1 2 − 2 − 2 where

Σ ∆n1 + Λ J Σ ∆n1,n2 Ω= 11 11 n1 12 , (8) Σ ∆n2,n1 Σ ∆n2 + Λ J  12 22 22 n2  13 with t t , if t t

Theorem 3 Assume Assumptions 1, 2 and 5 hold and additionally Σ is constant and µ = 0. Assume also that ∆nj = ∆¯ , for j = 1, 2, and ∆¯ = m∆¯ , where n >> n , i.e. m . Then in i j 2 1 1 2 →∞ this asynchronous case, the central limit theorem is given by:

∂ΨΣ −1 1/4 1,1 0 0 n1 (Σ11 Σ11) ∂Σ1,1 − L ∂ΨΣ ∂ΨΣ 1/4  1,2 2,2  n (Σ12 Σ12) N(0, ΠA), where ΠA = 0 ,  2 −  −→ ∂Σ1,2 ∂Σ1,2 1/4 b ∂ΨΣ2 2 ∂ΨΣ2 2 n2 (Σ22 Σ22)  0 , ,   −   ∂Σ1,2 ∂Σ2,2   b      such that b

∞ i,j ∂ΨΣ 1 ∂ω (Σ, Λ,x) u,v = (1 + 1 ) dx , i, j, u, v = 1, 2, ∂Σ − u=6 v 2 ∂Σ i,j Z0 u,v  ∗ where, writing Λii = Λii/T , 1 Σ ω1,1(Σ, Λ,x) = , ω2,2(Σ, Λ,x)= 11 , (Σ + Λ∗ π2x2) Σ (Σ + Λ∗ π2x2) Σ2 11 11 11 22 22 − 12 Σ ω1,2(Σ, Λ,x) = − 12 . Σ (Σ + Λ∗ π2x2) Σ2 11 22 22 − 12 Proof. Given in the Appendix. This Theorem suggests that including the extremely illiquid assets into estimation should not greatly affect variance and covariance estimates of the liquid ones. This property is distinct from any approach in the literature for which the synchronization method matters and the estimates become considerably worse when some asset is illiquid.

Interestingly, the asymptotic covariance of Σ22 and Σ12 can be obtained by plugging Λ11 = 0 in Theorem 1, as if the liquid asset were not affected by the microstructure noise. This is due to b b a type of “sparse sampling” induced by the substantial difference in the number of observations.

3.4.1 Extension: general m case

Note that Theorems 1 and 3 represent two points at either end of an important continuum. The- orem 1 in effect deals with the m = 1 case and Theorem 3 deals with m . Of course results → ∞ for finite values of m > 1 would be of great practical importance, but we still have not sufficient

14 control of the terms to state that result entirely confidently. However, our theoretical studies and some Monte Carlo results we do not report here suggest that the result is simply the asymptotic distribution given in Theorem 1 but

1/4 n1 (Σ11 Σ11) 1/4 − L n (Σ Σ ) N(0, ΠA,m),  2 12 − 12  −→ n1/4(Σb Σ )  2 22 − 22   b  ∗ where ΠA,m takesb on the form given in Theorem 1 except Λ22 = Λ22/ (mT ) holding. Hence the role of m is simply to rescale the measurement error variances.

3.4.2 Extension: irregularly and asychronously spaced data

It is clear from Corollary 1 that the previous results hold under the type of irregularly spaced data which obeys Assumption 6 which simply extends Assumption 4.

n n Assumption 6 We assume that ∆ j = t t = ∆¯ 1+ ξ , i = 1, 2,...,n , ∆nj =diag(∆ j ) i j,i− j,i−1 j j,i j i where E(ξ ) = 0, Var(ξ ) < and ξ , 1 i n . Also assume Y and ξ are independent. j,i j,i ∞ j,i ≤ ≤ j j,i   3.4.3 Extension: allowing stochastic volatility

Likewise the extension to more interesting dynamics, stated in Assumption 1, combined with As- sumption 6 is now clear. Again

1/4 n1 (Σ11 Σ11) 1/4 − Ls n (Σ Σ ) MN(0, ΠQ,m), (10)  2 12 − 12  −→ n1/4(Σb Σ )  2 22 − 22   b  ∗ where ΠQ,m simplyb replaces ΠQ in Theorem 2 by adjusting Λ22 = Λ22/ (mT ). The proof of such a result just combines the proofs of the previous theorems. It is tedious.

4 Additional developments

4.1 Realised QML correlation and regression estimator

The theorems above have an immediate corollary for the estimator of the daily “realised QML correlation estimator”

1 T Σ Σ12,tdt ρ = 12 [ 1, 1], of ρ = T 0 [ 1, 1]. 12 ∈ − 12 ∈ − Σ Σ 1 T R 1 T b11 22 T 0 Σ11,tdt T 0 Σ22,tdt b q r  R   R  Also of importanceb b is the corresponding regression or “realised QML beta”

1 T Σ Σ12,tdt β = 12 , which estimates β = T 0 . 1|2 1|2 1 T Σ22 R Σ22,tdt b T 0 b R b 15 ′ The corresponding limit theory follows by the application of the delta method: Avar(ρ12)= νρVQνρ, ′ and Avar(β1|2)= νβVQνβ, where b 1 Σ 1 1 Σ ′ 1 Σ ′ ν =b 12 , , 12 , and ν = 0, , 12 . ρ 3 3 β 2 − 2 Σ Σ22 √Σ11Σ22 −2 Σ11Σ Σ22 −Σ22  11 22    These are noisep and asynchronous tradingp robust versions of the realised quantities studied by Andersen, Bollerslev, Diebold, and Labys (2003) and Barndorff-Nielsen and Shephard (2004).

4.2 Miniature realised QML based estimation

So far we have carried out realised QML estimation using the data all at once over the interval 0 to T . It is possible to follow a different track which is to break up the time interval [0, T ] into non-stochastic blocks 0 = b0 < b1 < ... < bB = T . Then we can compute a realised QML estimator within each block. We then sum the resulting estimator up to produce our estimator of the required covariance matrix. Such a blocking strategy was used by, in a different context, Mykland and Zhang (2009) and Mykland, Shephard, and Sheppard (2012) for example.

We call the i-th block estimator the “miniature realised QML” estimator and write it as Σi. For fixed block sizes the resulting estimator, as the sample goes to infinity, is b 1/2 3/2 1 bi (5R +3) Λ 1 bi n1/4 Σ σ2dt Ls MN 0, ii 11 σ2dt , i 11,i t 1/2 t − bi bi−1 bi−1 ! −→  (bi bi−1) bi bi−1 bi−1 !  − Z − − Z b   where n = n(b b )/T , i i − i−1 1 bi σ4dt bi−bi−1 bi−1 t Rii = 2 1. bi ≥ 1 R σ2dt bi−bi−1 bi−1 t  R  For fixed non-overlapping blocks the joint limit theory for the group of miniature realised QML estimators is normal with uncorrelated errors across blocks.

B Corollary 2 Define the unblocked QML estimator of Σ , Σ = 1 (b b ) Σ then for 11 11 T i − i−1 11,i i=1 fixed b and B we have as n X i →∞ e b 3/2 T B b 1 Ls 1 1 i n1/4 Σ σ2dt MN 0, (b b ) (5R +3) Λ1/2 σ2dt . 11 t 3/2 i i−1 ii 11 t − T 0 −→  T − bi bi−1 bi−1 !   Z  Xi=1 − Z e   Proof. Immediate extension of Xiu (2010), noting each block is conditionally independent. t 2 At first sight this does not look like much of an advance. The virtue though is that 0 σt dt t and σ4dt are of bounded variation in t and so are both O (t) as t 0. This meansR that 0 t p ↓ −1 bi 2 2 Rii R1+ Op(bi bi−1) and (bi bi−1) σ dt σ = Op(bi bi−1). Now Rii is crucial for it ≃ − − bi−1 t − bi−1 − drives the inefficiency of this quasi-likelihoodR approach to inference. Driving it down to one allows

16 the estimator to be optimal in the limit and is achieved by allowing max (b b ) = o(1) as a i i − i−1 function of n. In practice we take the gaps to very slowly shrink with n. Of course this shrinkage requires B to increase with n very slowly. The result is very simple and achieves the non-parametric efficiency bound

T T 1 Ls 1 n1/4 Σ σ2dt MN 0, 8Λ1/2 σ3dt T −1/2 . 11 − T t −→ 11 T t  Z0    Z0   e This approach is also efficient when the variance of the noise is time-varying, for Λ11 is estimated separately within each block. Reiss (2011) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008) discuss other estimators which achieve this bound.

4.3 Multistep realised QML estimator

There may be robustness advantages in estimating the integrated variances using univariate QML methods Σ11, Σ22. These two estimates can then be combined with the QML correlation estimator ρ , obtained by simply maximising the quasi-likelihood with respect to ρ keeping Σ ,Σ fixed 12 b b 12 11 22 at the first stage Σ11, Σ22. We call such an estimator the “multistep covariance estimator”. b A potential advantage of this approach is that model specification for one asset price will not b b impact the estimator of the integrated variance for the other asset. Of course volatility estimation is crucial in terms of risk scaling.

4.4 Sparse and subsampled realised QML

A virtue of the realised QML is that it is applied to all of the high frequency data. However, this estimator may have challenges if the noise has more complicated dynamics. Although we have proved results assuming the noise is i.i.d., it is clear from the techniques in the literature that the results will hold more generally if the noise is a martingale difference sequence (e.g. this covers some forms of price discreteness and diurnal volatility clustering in the noise). However, dependence which introduces autocorrelation in the noise could be troublesome. We might sometimes expect this feature if there are differential rates of price discovery in the different markets, e.g. an index fund leading price movements in the thinly traded Washington Post. To overcome this kind of dependence in asset returns we define a “sparse realised QML” estima- tor, which corresponds to the sparse sampling realised variance. The approach we have explored is as follows.

We first list all the times of trades for asset i, which are written as tj,i, which has a sample size of ni. Now think about collecting a subset of these times, taking every k-th time of trade. We ∗ ∗ write these times as tj,i and the corresponding sample size as ni . We perform the same thinning ∗ operation for each asset. Then the union of the corresponding times will be written as ti . This

17 subset of the data can be analysed using the realised QML approach. Our asymptotic theory can ∗ ∗ be applied immediately to these tj,i and ni , and the corresponding prices. In practice it makes sense to amend this approach so that for each n∗ n where n is i ≥ min min something like 20 or 50. This enforces that there is little thinning on infrequently traded assets. Once we have defined a sparse realised QML, it is obvious that we could also simply subsample this approach, which means constructing k sets of subsampled datasets and for each computing the corresponding quasi-likelihood. We then average the k quasi-likelihoods and maximise them using the corresponding EM algorithm. We call this the “subsampled realised QML” estimator. This is simple to code and has the virtue that is employs all of the data in the sample while being less sensitive to the i.i.d. assumption.

5 Monte Carlo experiments

5.1 Monte Carlo design

Throughout we follow the design of A¨ıt-Sahalia, Fan, and Xiu (2010), which is a bivariate model. Each day financial markets are open will be taken as lasting T = 1/252 units of time, so T = 1 would represent a financial year. Here we recall the structure of their model

dy = α dt + σ dW , dσ2 = κ σ2 σ2 dt + s σ dB + σ J V dN it it it it it i i − it i it it it− it it  where E(dW dB )= δ ρ dt and E(dW dW ρ∗)= ρ∗dt. Here κ > 0. it jt ij i 1t 2t| i 2 Ignoring the impact of jumps, the variance process σit has a marginal distribution given by Γ(2κ σ2/s2,s2/2κ ). Throughout when jumps happen the log-jumps log J V iid N(θ ,µ ), while N i i i i i it ∼ i i it is a Poisson process with intensity λ . Likewise ε iid N(0, a2). i it ∼ i We now depart slightly from their setup. For each day we draw independently σ2 Γ(2κ σ2/s2,s2/2κ ) i0 ∼ i i i i i over i = 1, 2, which means each replication will be independent. For each separate day we sim- ulate independently ρ∗ ρ Beta(ρ∗, ρ∗), where ρ = (1 ρ2)(1 ρ2), guarantees the positive- ∼ 0 1 2 0 − 1 − 2 ∗ ∗ ∗ ∗ definiteness of the covariance matrix of (W1,W2,B1,Bp2). This means E(ρ )= ρ0ρ1/ (ρ1 + ρ2) and ∗ ∗ ∗ ∗ ∗ ∗ ∗ 2 ∗ sd(ρ )= ρ0 ρ1ρ2/ (ρ1 + ρ2) ρ1 + ρ2 + 1 . The values of ai, αi, ρi, κi, θi, µi, λi, si, σi , ρ1 and ∗ ρ2 are givenp in Table 1. To checkp our limit theory calculations, Figure 2 plots the histograms of the

2 ai αi ρi κi θi µi λi σi si ∗ i = 1 0.005 0.05 -0.6 3 -5 0.8 12 0.16 0.8 ρ1 = 2 ρ0 = 0.529 ∗ ∗ i = 2 0.001 0.01 -0.75 2 -6 1.2 36 0.09 0.5 ρ2 = 1 E(ρ ) = 0.176 sd(ρ∗) = 0.125

Table 1: Parameter values which index the Monte Carlo design. Simulates from a bivariate model. standardized pivotal statistics (standardising using the infeasible true random asymptotic variance

18 in each case) with 1, 000 Monte Carlo repetitions sampled regularly in time at frequency of every 10 seconds, that is n = 2, 340. This corresponds to an 6.5 hour trading day, which is the case for the NYSE and NASDAQ (we note the LSE and Xetra are open for 8.5 hours a day). The histograms show the limiting result provides a reasonable guide to the finite sample behaviour in these cases.

Σ Σ Σ 11 12 22 0.5 0.5 0.5

0.45 0.45 0.45

0.4 0.4 0.4

0.35 0.35 0.35

0.3 0.3 0.3

0.25 0.25 0.25

0.2 0.2 0.2

0.15 0.15 0.15

0.1 0.1 0.1

0.05 0.05 0.05

0 0 0 −5 0 5 −5 0 5 −5 0 5

Figure 2: The figure plots the histograms of the standardized pivotal statistics, which verify the asymptotic theory developed in Theorem 2. The standardisation is carried out using the infeasible true random asymptotic variance for each replication.

In our main Monte Carlo we take n 117, 1170, 11700 and all results are based on 1, 000 ∈ { } stochastically independent replications. Having fixed the overall sample size n we randomly and uniform scatter these points over the time interval t [0, T ], recalling T = 1/252. For asset 1 we ∈ will scatter exactly n̥ points and for asset 2 there will be exactly n(1 ̥) points. This kind of − stratified scatter corresponds to a sample from a Poisson bridge process with intensity n̥/T and n (1 ̥) /T , respectively. We call ̥ the mixture rate and take ̥ 0.1, 0.5, 0.9 . − ∈ { } 1 T We will report on the accuracy on the daily estimation of the random Σ11 = T 0 Σ11,tdt, 1 T 1 T Σ22 = T 0 Σ22,tdt,Σ12 = T 0 Σ12,tdt, ρ1,2 =Σ12/√Σ11Σ22, β1|2 =Σ12/Σ22, β2|1 =Σ12R /Σ11. R R 5.2 Our suite of estimators

We will compute six estimators of Σ11,Σ22,Σ12, β1|2, β2|1 and ρ1,2. The six are: (i) realised QML, (ii) multistep realised QML estimator, (iii) blocked realised QML3, (iv) realised QML but using the reduced data synchronised by Refresh Time, (v) the realised kernel of Barndorff-Nielsen,

3The number of blocks was taken to be √n/3. Sometimes this yielded very unequally sized blocks in which case we decreased the number of blocks so that there were at least two observations for each asset.

19 Hansen, Lunde, and Shephard (2011) which uses Refresh Time, (vi) A¨ıt-Sahalia, Fan, and Xiu

(2010) which uses polarisation and Refresh time. We will use the notation θQML, θStep, θBloc, θRT ,

θKern, θP ol, respectively, where θ is some particular parameter. We write this generically as θL, with L QML,Step,Bloc,RT,Kern,Pol , and the corresponding estimator as θ . ∈ { } L All the estimators but (vi) deliver positive semi-definite estimators. Only (i) and (ii) use all b the data, the others are based on Refresh Time. (i)-(iv) and (vi) converge at the optimal rate. (iii) should be the most efficient, followed by (i), then (ii), then (iii), then (vi) and finally (v).

5.3 Results

Throughout we report in Table 2 simulation based estimates of the 0.9 quantiles of n1/4 θ θ L − L ̥ 1/4   for various values of n, L and . The table also shows n , which allows us to see the actual speed b by which the quantiles for θ θ contract. L − L

The results indicate that all six estimators perform roughly similarly for Σ11 and Σ22 when b ̥ = 0.5, with a small degree of underperformance for RT, Kern and Pol. When the data was more unbalanced, with ̥ = 0.1 or 0.9, then RT, Kern and Pol were considerably worse while realised QML being the best by a small margin. Bloc was a little disappointing when estimating the volatility of the less liquid asset. QML almost always outperformed Step but not by a great deal. The quantiles for Kern seem to mildly increase with n, which is what we would expect due to their slower rate of convergence.

For the measure of dependence Σ12 there are signs that the realised QML type estimators “QML”, “Step” and “Bloc” perform better than “RT”. All seem to perform more strongly than the existing “Kern” and “Pol” estimators. The differences are less important in the case where ̥ = 0.5.

When we move onto ρ1,2 the differences become more significant, although we recall that realised QML and Step are identical in this case. When ̥ = 0.9 then Pol estimator struggles with the quantiles being around twice that of QML and Bloc. A doubling of the quantile is massive, for these estimators are converging at rate n1/4 so halving a quantile needs the sample size to increase by 24 = 16 fold. The results for Kern sit between Pol and QML, while RT is disappointing. This latter result shows it is the effect of refresh time sampling which is hitting these estimators. QML is able to coordinate the data more effectively. Similar results hold for ̥ = 0.1. Overall the new methods seem to deliver an order of magnitude improvement in the accuracy of the estimator. In the balanced sampling case of ̥ = 0.5 the differences are more moderate but similar. Before we progress to the regression case it is helpful to calibrate how accurately we have estimated ρ1,2 in the realised QML case. When ̥ = 0.1 the quantile is 2.61 with n = 117, so

20 ̥ = 0.9 ̥ = 0.5 ̥ = 0.1 n n1/4 QML Step Bloc RT Kern Pol QML Step Bloc RT Kern Pol QML Step Bloc RT Kern Pol 117 3.29 0.40 0.43 0.42 0.75 0.78 0.79 0.49 0.52 0.61 0.49 0.55 0.54 0.72 0.72 0.77 0.74 0.69 0.72 b Σ11 1,170 5.84 0.38 0.39 0.43 0.70 0.84 0.79 0.44 0.45 0.59 0.48 0.74 0.47 0.69 0.73 1.25 0.70 0.83 0.73 11,700 10.3 0.36 0.35 0.43 0.63 0.95 0.64 0.42 0.43 0.57 0.43 0.95 0.43 0.63 0.67 1.93 0.65 0.96 0.67 117 3.29 0.23 0.22 0.24 0.27 0.32 0.31 0.19 0.18 0.19 0.18 0.22 0.22 0.27 0.24 0.31 0.28 0.33 0.32 b Σ12 1,170 5.84 0.19 0.19 0.21 0.25 0.35 0.31 0.17 0.17 0.19 0.17 0.22 0.20 0.25 0.24 0.30 0.25 0.35 0.32 11,700 10.3 0.18 0.18 0.19 0.23 0.36 0.27 0.17 0.17 0.19 0.17 0.22 0.18 0.22 0.22 0.28 0.24 0.34 0.27 117 3.29 0.29 0.35 0.31 0.29 0.39 0.35 0.16 0.20 0.16 0.19 0.24 0.19 0.13 0.16 0.13 0.29 0.40 0.35 b Σ22 1,170 5.84 0.23 0.29 0.36 0.23 0.46 0.29 0.14 0.16 0.15 0.15 0.26 0.16 0.11 0.14 0.13 0.22 0.46 0.28 11,700 10.3 0.21 0.22 0.24 0.21 0.42 0.22 0.14 0.14 0.14 0.14 0.25 0.15 0.11 0.11 0.11 0.21 0.45 0.24 117 3.29 2.35 2.35 2.34 3.05 2.96 8.14 1.83 1.83 1.69 2.02 1.58 2.46 2.61 2.61 2.62 3.21 2.81 7.06 b 1.60 1.60 1.62 1.62 2.27 2.27 ρ1,2 1,170 5.84 1.83 2.53 2.54 3.10 1.66 1.66 1.72 1.86 2.31 2.50 2.52 3.04 1.48 1.48 1.43 1.43 1.99 1.99 21 11,700 10.3 1.92 2.17 2.61 2.47 1.70 1.54 2.07 1.67 2.77 2.11 2.57 2.47 117 3.29 6.36 11.70 6.26 7.10 7.69 14.09 3.20 3.28 3.66 3.36 3.41 4.05 5.12 4.97 5.33 7.76 7.91 13.98 b 3.18 2.61 3.56 β1|2 1,170 5.84 3.24 3.99 4.09 4.75 4.82 2.62 2.98 2.82 3.00 3.03 3.65 4.68 4.12 5.06 4.93 11,700 10.3 2.82 2.86 4.23 4.10 5.50 4.36 2.62 2.64 3.00 2.82 3.44 3.04 3.52 3.54 4.49 3.78 5.06 4.65 117 3.29 2.97 2.55 2.57 7.62 3.22 23.27 3.28 3.38 1.90 3.85 1.75 4.17 5.60 11.05 3.41 7.18 3.42 15.16 b 2.02 2.14 3.07 β2|1 1,170 5.84 2.05 2.13 3.91 3.17 4.44 2.23 2.17 2.25 2.60 2.51 3.31 3.79 3.89 3.29 4.72 11,700 10.3 1.68 1.67 2.67 3.04 3.26 3.20 1.75 1.76 2.72 1.91 3.61 1.97 2.66 2.65 4.90 2.83 3.50 3.24

Table 2: Monte Carlo results for the volatility, covariance, correlation and beta estimation. Throughout we report the 0.9 quantiles of n1/4 θ θ over the 1,000 independent replications. ̥ denotes the percentage of the data corresponding to trades in asset 1. “QML”is our L − L multivariate QMLE. “Step”is our multistep QMLE. “Bloc”is our blocked multivariate QMLE. “RT”is our multivariate QML using the Refresh

Time.b “Kern”is the existing multivariate realised kernel. “Pol”is the existing polarisation and Refresh Time estimator. The numbers in bold indicates the minimum of quantiles in comparison. the corresponding quantile for θ θ is 0.794. When n = 1, 170 it is 0.388. When QML − QML ̥ n = 11, 700 it is 0.191. In the balanced case = 0.5 the corresponding results are 0.556, 0.277 and b 0.137. Hence balanced data helps, but not by very much as long as n is moderately large and the realised QML method is used. Balancing is much more important for RT, Kern and Pol. We think this makes the realised QML approach distinctly promising. A final point is worth noting. Even though n = 11, 700 the quantiles in the balanced case of 0.137 are not close to zero. Hence although we can non-parametrically estimate the correlation between assets, the estimation in practice is not without important error. This is important econometrically when we come to using these objects for forecasting or decision making. The regression cases deliver the same type of results to the correlation, with again the QML, Step, and Bloc performing around an order of magnitude better than RT, Kern and Pol.

6 Empirical implementation

6.1 Our database

We use data from the cleaned trade database developed by Lunde, Shephard, and Sheppard (2012). It is taken from the TAQ database accessed through the Wharton Research Data Services (WRDS) system. They followed the step-by-step cleaning procedure used in Barndorff-Nielsen, Hansen, Lunde, and Shephard (2009). Their cleaning rules include using data from a single exchange, selected as the one which generates the most trades on each day. It should also be noted that no quote information is used in the data cleaning process. The exchanges open at 9.30 and close at 16.00 local time. An important feature of this TAQ data is that times are recorded to a second, so we take the median of multiple trades which occur in the same second. This can be thought of as a form of miniature preaveraging. As prices are recorded in seconds the maximum sample size is 60 60 6.5 = 23, 400. × × The data range from 1st January 2006 until 31st December 2009. We have selected 13 stocks from the S&P 500 with the aim of having 2 infrequently traded and 11 highly traded assets. This will allow us to assess the estimator in different types of data environments. The assets we study are the Spyder (SPY), an S&P 500 ETF, along with some of the most liquid stocks in the Dow Jones 30 index. These are: Alcoa (AA), American Express (AXP), Bank of America (BAC), Coca Cola (KO), Du Pont (DD), General Electric (GE), International Business Machines (IBM), JP Morgan (JPM), Microsoft (MSFT), and Exxon Mobil (XOM). We supplement these 11 series with two relatively infrequently traded stocks: Washington Post (WPO) and Berkshire Hathaway Inc. New Com (BRK-B). These 13 series are “unbalanced” in terms of

22 individual daily sample sizes, while the restricted 11 series are reasonably “balanced”.

6.2 Summaries 6.2.1 Sample sizes

Figure 3 shows the sample sizes of each asset on each day through time. What we plot is the median sample size of the 13 series together with the following quantile ranges: 0 to 25%, 25% to 75%, 75% to maximum. These ranges are indicated by shading. This is backed up by a line indicating the median. In addition we show the Refresh Time sample sizes when we use the 11 assets (denoted RefT1) and the corresponding result for all the 13 assets (RefT2). This type of “trading intensity graph” is highly informative and has the property that it scales with the number of assets. It first appeared in Lunde, Shephard, and Sheppard (2012) (who used it to look at 100s of assets).

Trading Intensity Graph

min−25% & 75−max% 25−75% 50% 23400 11700

4680 2340 1170

468 234

78 39 Refresh Sampling 11 13 Refresh Sampling 13 Jan 06 Oct 06 Aug 07 May 08 Mar 09 Dec 09

Figure 3: The trading intensity graph for our 13 assets. The figure plots the min, max, 25%, 50%, and 75% quantiles of the number of observations for the cleaned dataset. The number of refresh sampling for the 11 asset (RefT1) and 13 asset (RefT2) databases are also plotted.

The trading intensity graph shows the median intensity for the 13 assets is around 5, 000 a day, slightly increasing through time. The maximum daily sample sizes are around 15, 000, a tad below the feasible maximum of 23, 400. The Refresh time for the 11 assets delivers a sample size of around 1, 000 a day. However, for the 13 asset case the Refresh time dives down to around 60 a day. This is, of course, driven by the presence of the slow trading WPO and BRK-B. This trading intensity graph demonstrates that the Refresh time approach is limited, for in large unbalanced systems it will lead to a significant reduction in the amount of data available to us. This could damage the effectiveness of the realised kernel or preaveraging in properly estimating

23 the covariances between highly active assets.

6.3 Volatilities 6.3.1 Summary statistics

We start our analysis of the realised quantities by looking at the univariate volatilities. We will compare realised QML with realised kernels and realised volatilities. The comparison will be made using estimators which use the one dimensional datasets and those based on the 13 dimensional series. The question is whether the use of the high dimensional series will disrupt the behaviour of the volatility estimators, due to the use of Refresh time4. In our web appendix we give a detailed analysis of each of the 13 sets of series and their associated volatility estimators. Here we will focus on a single representative series, Exxon Mobile Corporation common stock (XOM), and some cross sectional summaries. It is important not to overreact to the specific features of a single series, we will make remarks only on characteristics which work out in the cross section. Figure 4 shows 6 graphs. The first graph has the daily open to close returns, scaled by √252 to place the data on an annual scale. Here 2 roughly represents 200% annualised volatility. The middle and right hand top graphs show the time series of the daily estimates of the QML volatility in the series, drawn on the log10 scale. The first uses the univariate data, the second comes out of the 13 dimensional covariance fit. There is not a great deal of difference. For those unfamiliar with these type of non-parametric graphs, there is absolutely no time series smoothing here, the open to close volatility each day is estimated only with data on that day. The results are stunning, the volatility changes by an order of magnitude during this period. As the web appendix shows, this is entirely common across the assets. For the moment we can now turn to Table 3 which shows summaries of the volatility estimators. They work in the following ways. Means are the square root of the average daily variance measure. For Open to Close returns this is simple the daily standard deviation. For the realised quantities this is the square root of the time series mean of the daily square volatility. For the autocorrelations, we report here results for lag 1 and lag 100. The results are always the autocorrelations of the

4To be explicit, when we compute the realised kernel we take the data and approximately synchronise it using Refresh Time. This synchronised dataset is then used in all the realised kernel calculations. In the case where the analysis is carried out using the univariate databases, Refresh Time has no impact. In the 13 dimensional case it dramatically reduces the sample size due to the inclusion of the slow trading markets. When we sparsely sample, we first sparsely sample and then compute the Refresh Time coordination. This has less impact than one might expect at first sight, as sparsely sampling has little impact on the slow trading stocks and these largely determine the Refresh Times. Hence the realised kernel will not be very impacted by sparse sampling in the multivariate case. There is an argument that with the realised kernel we should only report results for sparsity being one, as that is the way it was introduced. For completeness though we have recomputed it for all the different levels of sparsity.

24 Daily OtC XOM QML 1D Uni Vol QML 13D Uni Vol 2

1 1 1

0.4 0.4 0 0.3 0.3 0.2 0.2 -1 0.1 0.1

2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 2006 2007 2008 2009 2010 Acf for XOM Vol sig for QML Vol sig for RK 1.0 QML 1D QML 13D OtC QML1 OtC RK1 RK 1D RK 13D QML13 RVol RK13 RVol Absolute returns Square returns

0.27 0.27

0.5

0.24 0.24

0.0 1 2 3 4 10 20 100 1 2 3 10 20 100 1 2 3 10 20 100

Figure 4: The log daily returns scaled by √252 to produce annual numbers. Vol for XOM computed using QML on the univariate series and in the 13 dimensional case. Multiply numbers by 100 to get yearly % changes. All acf calculations are based on volatilities, not variances, except for the squared returns. Vol sig are volatility signature plots, which plot as a function of sparsity the square root of the temporal average of the realised variance, realised QML, etc. Middle picture is the results for QML, right hand side is the results for the realised kernel (RK).

25 volatilities, not the squared volatilities. In terms of Open to Close returns, this means we report the autocorrelations of the absolute returns. The Table reports results for 1 and 13 dimensions. It also gives results for different levels of sparse sampling, ranging from 1 trade (which means all the available 1 second return data is used) to 300 trades (which means taking every 300 trade). As the level of sparsity increases we might expect any bias in these estimators to fall but for them to become more variable.

Vol QML RK RV Mkt Dimen Spar Mean acf 1 acf 100 Mean acf 1 acf 100 Mean acf 1 acf 100 XOM OtC 0.27 0.32 0.03 XOM 1 1 0.30 0.89 0.13 0.29 0.84 0.10 0.29 0.88 0.10 XOM 1 2 0.29 0.84 0.10 0.29 0.83 0.09 0.29 0.87 0.11 XOM 1 3 0.28 0.83 0.10 0.29 0.82 0.09 0.29 0.86 0.10 XOM 1 5 0.28 0.83 0.10 0.29 0.81 0.09 0.29 0.85 0.10 XOM 1 10 0.29 0.82 0.11 0.29 0.80 0.09 0.29 0.83 0.09 XOM 1 15 0.28 0.82 0.10 0.29 0.80 0.09 0.29 0.83 0.09 XOM 1 20 0.28 0.81 0.10 0.28 0.80 0.09 0.29 0.83 0.09 XOM 1 30 0.28 0.79 0.09 0.28 0.81 0.09 0.29 0.82 0.09 XOM 1 60 0.28 0.80 0.10 0.28 0.82 0.08 0.29 0.79 0.09 XOM 1 120 0.28 0.81 0.10 0.27 0.81 0.09 0.29 0.79 0.09 XOM 1 180 0.28 0.78 0.09 0.27 0.80 0.09 0.29 0.79 0.08 XOM 1 300 0.28 0.76 0.09 0.27 0.78 0.08 0.28 0.81 0.08 XOM 13 1 0.27 0.86 0.12 0.26 0.78 0.11 0.29 0.88 0.10 XOM 13 2 0.28 0.84 0.10 0.25 0.78 0.12 0.29 0.87 0.11 XOM 13 3 0.28 0.84 0.10 0.25 0.78 0.12 0.29 0.86 0.10 XOM 13 5 0.28 0.83 0.10 0.25 0.76 0.11 0.29 0.85 0.10 XOM 13 10 0.29 0.82 0.11 0.24 0.74 0.09 0.29 0.83 0.09 XOM 13 15 0.29 0.82 0.11 0.24 0.72 0.09 0.29 0.83 0.09 XOM 13 20 0.29 0.81 0.10 0.24 0.72 0.09 0.29 0.83 0.09 XOM 13 30 0.29 0.80 0.10 0.24 0.71 0.09 0.29 0.82 0.09 XOM 13 60 0.29 0.82 0.10 0.24 0.74 0.09 0.29 0.79 0.09 XOM 13 120 0.29 0.78 0.10 0.24 0.72 0.08 0.29 0.79 0.09 XOM 13 180 0.30 0.78 0.09 0.24 0.71 0.08 0.29 0.79 0.08 XOM 13 300 0.29 0.80 0.10 0.24 0.72 0.09 0.28 0.81 0.08

Table 3: OtC summaries. Unconditional volatilities of annualised daily log returns. Multiply by 100 to produce percentages. The acf of the OtC is the acf of the absolute value of the daily returns.

We first focus on the one dimensional case. The Table indicates at sparsity of 1 the QML estimator is slightly above the OtC average level of volatility, while this average level is roughly constant as sparsity varies above 1. The autocorrelation for these realised quantities is very much higher than for the absolute returns, both at lag 1 and 100. Roughly similar results hold for the realised kernel. Here the realised volatility does quite well, with roughly the same average value and a high degree of autocorrelation. This is the case in roughly half the series, the other half show quite pronounced upward bias in the estimator for low levels of sparsity. This is the famous upward bias of realised volatility due to market microstructure noise which has prompted

26 GARCHX QML RK RV Mkt Dimen Spar α β γ logL α β γ logL α β γ logL XOM 1 0.10 0.86 XOM 1 1 0.03 0.58 0.24 30.8 0.01 0.62 0.26 31.7 0.01 0.51 0.34 31.4 XOM 1 2 0.01 0.62 0.26 33.3 0.01 0.64 0.26 31.8 0.01 0.55 0.30 31.8 XOM 1 3 0.01 0.63 0.27 32.8 0.01 0.64 0.25 31.6 0.01 0.60 0.27 31.9 XOM 1 5 0.01 0.63 0.27 32.7 0.01 0.66 0.24 31.4 0.01 0.63 0.25 31.7 XOM 1 10 0.01 0.67 0.23 31.3 0.01 0.69 0.22 31.1 0.01 0.64 0.26 32.5 XOM 1 15 0.01 0.70 0.21 30.3 0.01 0.70 0.22 30.2 0.01 0.64 0.25 31.7 XOM 1 20 0.01 0.73 0.19 29.6 0.01 0.71 0.21 29.7 0.01 0.65 0.24 31.7 XOM 1 30 0.01 0.74 0.18 27.8 0.01 0.72 0.20 28.8 0.01 0.67 0.23 31.3 XOM 1 60 0.01 0.75 0.18 27.3 0.00 0.72 0.22 28.0 0.01 0.71 0.20 30.0 XOM 1 120 0.01 0.74 0.19 27.3 0.00 0.74 0.21 26.2 0.01 0.74 0.19 28.1 XOM 1 180 0.00 0.75 0.19 27.3 0.01 0.75 0.20 27.1 0.01 0.75 0.17 26.9 XOM 1 300 0.00 0.77 0.17 25.7 0.01 0.76 0.19 26.3 0.01 0.75 0.18 25.8 XOM 13 1 0.02 0.63 0.27 31.9 0.00 0.72 0.25 29.9 0.01 0.51 0.34 31.4 XOM 13 2 0.01 0.63 0.29 33.0 0.00 0.71 0.26 30.3 0.01 0.54 0.30 31.7 XOM 13 3 0.01 0.63 0.29 32.7 0.00 0.74 0.23 29.3 0.02 0.56 0.29 31.6 XOM 13 5 0.01 0.63 0.27 32.1 0.00 0.77 0.23 30.0 0.01 0.60 0.27 31.8 XOM 13 10 0.01 0.65 0.24 31.2 0.00 0.76 0.24 27.5 0.01 0.62 0.26 32.1 XOM 13 15 0.01 0.67 0.23 30.7 0.00 0.77 0.23 27.2 0.01 0.63 0.26 32.5 XOM 13 20 0.01 0.69 0.21 29.4 0.00 0.76 0.24 28.7 0.01 0.62 0.26 32.5 XOM 13 30 0.01 0.70 0.20 29.0 0.00 0.77 0.24 29.2 0.01 0.65 0.24 32.4 XOM 13 60 0.02 0.71 0.20 28.3 0.00 0.76 0.25 27.8 0.01 0.69 0.22 31.4 XOM 13 120 0.01 0.75 0.18 27.5 0.00 0.78 0.23 26.8 0.01 0.70 0.21 30.4 XOM 13 180 0.01 0.76 0.16 27.0 0.00 0.80 0.22 26.5 0.01 0.72 0.19 28.9 XOM 13 300 0.01 0.76 0.16 25.7 0.00 0.81 0.20 21.3 0.01 0.74 0.18 26.7

Table 4: Forecasting exercises. GARCHX models. LogL denotes increase in the log-likelihood 2 2 2 compared to the GARCH model. σt = ω + αyt−1 + βσt−1 + γxt−1, where xt is a realised quantity. such a large amount of theoretical work (e.g. Zhou (1996), Zhang, Mykland, and A¨ıt-Sahalia (2005), Jacod, Li, Mykland, Podolskij, and Vetter (2009) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008)). When we move to the 13 dimensional case the only substantial impact is on the realised kernel when the autocorrelations considerably fall. This is consistent with a deterioration in their quality caused by the use of Refresh Time. The QML estimator is hardly effected and of course realised volatility is not effected at all. If we return to Figure 4 now some of these points are reiterated. The bottom left shows the autocorrelation function of the different estimators and show the deterioration in the realised kernel with dimension. It also shows the acf of the absolute and squared returns, indicating how much more noisy these estimators are. The middle graph shows the volatility signature plot of the realised quantities as a function of the level of sparsity. Recall the volatility signature plot is the square root of the temporal average of the realised estimator of the variance. It should be roughly flat as a function of sparsity if the estimator is robust to market microstructure effects. This is

27 true here. The seemingly small upward bias in some of the estimator is not material and will be matched by some other series which go the other way — QML and realised kernel are basically unbiased in this cross section. The results do not change when we look at the right hand picture, which shows the same thing for the 13 dimensional case. Realised kernels do not really exhibit bias in this case, Refresh Time impacts their variability.

6.3.2 Volatility forecasting

Another way of seeing the relative importance of these realised measures is through a prediction exercise. Here we use GARCHX models, supplementing the usual GARCH models of returns 2 with X variables which are logged realised variance type quantities. In particular we fit σt = Var(y y,x ) where t|Ft−1

2 2 2 σt = ω + αyt−1 + βσt−1 + γxt−1.

Here yt is the t-th open to close return. These kind of extended GARCH models are now common in the literature, examples include Engle and Gallo (2006), Brownlees and Gallo (2010), Shephard and Sheppard (2010), Noureldin, Shephard, and Sheppard (2012),Hansen, Huang, and Shek (2011) and Hansen, Lunde, and Voev (2010). The model is fitted using a Gaussian quasi-likelihood with

n 1 y2 log σ2 + t , −2 t σ2 t=12 t X   2 1 11 2 taking σ1 = 11 t=1 yt . Here we will report only the estimated α, β, γ and the change the log likelihood in comparisonP with the simpler GARCH model. Clearly those changes are going to be non-negative by construction. If the presence of the realised quantity moves the likelihood up a great deal we will think this is evidence for its statistical usefulness. The results for all 13 assets are in our Web Appendix. We first just focus on the XOM case. Table 4 shows the results in the univariate and 13 dimensional cases. The results show across the board important improvements when using the realised quantities and α is basically forced to zero. This is the common feature of these models in the literature, once the realised quantities are there there is no need for the square return (Shephard and Sheppard (2010)). Further β falls dramatically and meaningfully. It means that the average lookback of the forecast on past data has reduced considerably, making it also more robust to structural breaks. For some series the realised volatility adds little when the sparsity is 1 (due to the impact of the market microstructure), but this is not the case for XOM. There is some evidence that the QML

28 QML1 RK1 40 40

30 30

20 20

10 10

1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200 QML13 RK13

40 40

30 30 AA AXP BAC DD 20 GE IBM 20 JPM KO MSFT SPY XOM Median 10 10

1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200

Figure 5: logL improvement in GARCHX compared to GARCH, where X is the realised quantity. High values are thus good. X-axis is the level of sparsity. Top left is the 1 dimensional realised QML result. Top right is the 1 dimensional realised kernel result. Bottom left is the 13 dimensional realised QML. Bottom right is the 13 dimensional realised kernel. estimator does a little better when sparsity is a tiny amount above 1. All the estimators trail off as the sparsity gets large. When we move to the 13 dimensional case the QML results hardly change and obviously the RVol case does not change at all. The realised kernel results are reasonably consistently damaged in this case, although the damage is not exceptional. We can now average the log-likelihood changes using the cross-section of thickly traded stocks. The results are given in Figures 5, 6 and 7. Figure 5 shows the change in the likelihood by including the realised quantities for the different assets, drawn against the level of sparsity. The thick line is the cross-sectional median. QML1 is the QML estimator based on the univariate series, QML13 estimator uses all 13 assets. RK denotes the corresponding results for the realised kernel. There are three noticeable features. The QML1 and QML13 results are very close. QML1 results get better as we move sparsity to 2 or 3 and then tail off. RK just tails off. RK13 looks a little worse than RK1. Figure 6 reports the likelihood for QML minus the likelihood for RK. On the left hand side we deal with the univariate case. On the right the 13 dimensional case is the focus. So negative numbers prefer RK. This shows in the univariate case at sparsity of 1 RK is better, but this

29 1 dim difference, >0 favours QML 13 dim difference, >0 favours QML AA AXP 13 BAC DD 13 GE IBM 11 JPM KO 11 MSFT SPY XOM Median 9 9

7 7

5 5

3 3

1 1

-1 -1

-3 -3

-5 -5

-7 -7

-9 -9 1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200

Figure 6: logL improvement of realised QML minus logL of RK, with numbers greater than 0 being supportive of QML. X-axis is the level of sparsity. Left hand side is the 1 dimensional case. Right hand side is the 13 case. preference is removed by the time we reach sparsity of 3. After that they are basically the same. When we look at the 13 dimensional case, except for sparsity of 1, QML is better. This is consistent across many different levels of sparsity. Finally, Figure 7 shows the difference in likelihood of the 13 dimensional model minus the likelihood for the 1 dimensional model. Thus a positive number means some damage is done to the predictions by using the 13 dimensional data rather than the univariate data. On the left hand side we report QML, on the right is RK. There are two things to see. First QML shows little change on average and the scatter in changes has little width. This means QML is hardly effected by the increase in dimension of the problem. The results for RK are very different, there are is a substantial reduction in the average fit, shown by the dark line, at all levels of sparsity. But further, the scatter in changes is quite large. Hence RK is indeed sensitive to the dimension, as expected.

6.4 Dependence 6.4.1 Summary statistics

We now turn to looking at covariation amongst the assets. We will focus on QML, realised kernel and realised covariance estimators, which are computed each day. In all cases they will be based

30 QML change, >0 bad RK change, >0 bad AA AXP 15 BAC DD 15 GE IBM JPM KO 13 MSFT SPY 13 XOM Median 11 11

9 9

7 7

5 5

3 3

1 1

-1 -1

-3 -3

-5 -5

-7 -7 1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200

Figure 7: logL from the GARCHX model using 1 dimensional realised measure minus the logL from the GARCHX model using the 13 dimensional realised measure. Positive results thus suggest a fall in the fit using as the dimensional of the realised quantity increases. X-axis is the level of sparsity. Left hand side has the realised QML results. Right hand side has the realised kernel results.

31 BAC and SPY QML Vol BAC and SPY beta QML Cor:BAC,SPY,Spar=10 3 5 BAC SPY 2 0.8 4 1 0.7 3 0.6

0.3 2 0.5 0.2 1 0.4 0.1

2006 2007 2008 2009 2010 2006 2008 2010 2006 2008 2010 Acf for BAC, SPY,Spar=10 Cov sig plot for BAC,SPY logL comparison for BAC,SPY 1.0 10.0 QML RK 0.10 RC Returns QML RK RC 7.5

0.5 0.05 5.0 QML13 RK13 RC OtC 2.5 0.0 0.00 1 2 3 4 10 20 100 1 2 3 10 20 100 1 2 3 10 20 100

Figure 8: Summary picture for of the BAC and SPY pair. All realised quantities are computed using all 13 series. Top left are left are the conditional volatilities based upon the realised QMLs. Top middle, conditional beta for BAC on SPY. Top right is the daily realised QML covariance. Bottom right is the acf of the realised QML, realised kernel and realised covariance estimators. Also shown is the acf of the cross-product of the daily returns. Bottom middle is the covariance signature plot, graphed against the level of sparsity. Bottom right is the log likelihood improvement in model fit by including the realised quantity, graphed against the level of sparsity. on the 13 unbalanced database, which means we compute each day a 13 by 13 dimensional positive semidefinite estimator of the covariance matrix. To look inside the covariance matrix we will focus on pairs of assets. To be concrete our focus will be on Bank of America (BAC) and SPY, the other 77 pairs are discussed in our web appendix. Again we will only flag up issues which hold up in the cross section. The top left of Figure 8 shows the time series evolution of the conditional volatilities for these series based upon the past QMLs. Again these are plotted on the log10 scale. Of course it shows the typically lower level in the SPY series. What is key is that the wedge between these two series dramatically opens from 2008 onwards which means the ratio of the standard deviation of BAC and SPY has increased a great deal. If the correlation between the two series is stable this would deliver a massive increase in the “beta” of BAC. This is what actually happened, as can be seen in the middle top graph, which we will discuss in a moment. The top right hand graph shows the time series of the daily correlations computed using the QML method. This shows a moderate increase in correlation during the crisis from around 0.55

32 up to around 0.7, with some weakening of the correlation from 2009 onwards after TARP. If we return to the top middle graph we can see how enormously the beta changes through time. It was relatively stable until close to the end of 2007, but then it rapidly exploded reaching around 5 during some periods of the crisis. Nearly all of this move is a volatility induced change, although the correlation shift also has some impact. This boosting of the beta is also seen in our sample, but to a lesser extend, for J P Morgan and GE. The other stocks have more stable betas. Of course like Bank of America, J P Morgan is another financial company, while GE at the time has a significant exposure finance business. Bottom left shows the autocorrelations of the QML, realised kernel, realised covariance and cross products of the open to close returns. These time series are quite heavy tailed and so heavily influenced by a handful of datapoints. This means it is particularly important to be careful in thinking through what these pictures means. There are a number of interesting features here. First the raw open to close returns have a small amount of autocorrelation in them, just as square daily returns are only moderately autocorrelated. The realised kernel is quite a bit better, but the realised covariance and QML show stronger autocorrelation. This indicates they are less noisy estimators than the realised kernel and the open to close return based measure, although they have biases.

Covariance QML RK RV Spar Mean acf 1 acf 100 Mean acf 1 acf 100 Mean acf 1 acf 100 OtC 0.09 0.20 0.07 1 0.03 0.79 0.21 0.06 0.55 0.17 0.01 0.68 0.16 2 0.04 0.80 0.19 0.06 0.53 0.16 0.02 0.69 0.13 3 0.05 0.80 0.18 0.06 0.45 0.14 0.03 0.74 0.14 5 0.05 0.79 0.18 0.06 0.51 0.16 0.04 0.72 0.15 10 0.06 0.76 0.18 0.06 0.51 0.18 0.05 0.74 0.15 15 0.06 0.76 0.19 0.06 0.54 0.18 0.05 0.75 0.16 20 0.07 0.74 0.20 0.05 0.54 0.19 0.06 0.76 0.17 30 0.07 0.70 0.20 0.05 0.51 0.19 0.06 0.74 0.17 60 0.07 0.61 0.16 0.05 0.55 0.20 0.07 0.67 0.18 120 0.07 0.52 0.16 0.05 0.56 0.21 0.07 0.61 0.16 180 0.08 0.66 0.22 0.05 0.54 0.20 0.07 0.55 0.14 300 0.08 0.56 0.21 0.05 0.44 0.17 0.07 0.63 0.18

Table 5: BAC and SPY pair summaries for the covariances. OtC denotes the open to close (an- nualised) daily log returns and the figure which follows it is the sample covariance of the returns. The acf of the OtC is the acf of the cross product of the returns, here reported at lags 1 and 100. QML, RK and RV are, respectively, the realised QML, the realised realised kernel and the realised covariance. All estimate the daily covariance of the series and are computed using the 13 dimensional database. Spar denotes sparsity. Mean is the temporal average of the time series.

Table 5 is more informative. It first shows the average open to close covariance during this

33 sample. Then the Table shows the average level of the daily covariance estimator, using the QML, realised kernel and realised covariance. This is printed out for different levels of sparsity. The focus here is thus on the different biases in the estimators5. Estimating covariances is hard. The famous Epps effect can be seen through RV, which under estimates the covariance by an order of magnitude when using every trade. It takes sparsity of 10 to reproduce half of the correlation. The same impression can be gleaned from looking at the lower part of Figure 8, which is a covariance signature plot — showing the temporal average of the daily covariance estimators as a function of the level of sparsity. We can see that for low levels of sparsity the realised kernel is the least biased and the realised covariance the most. For higher levels of sparsity the realised kernel gets a little worse and both the QML and realised covariance improves and overtakes the realised kernel. Throughout QML is better than the realised covariance by a considerable margin. The Table also shows the autocorrelation of the individual time series, recording results at lags 1 and 100. These results are striking, with more dependence for the QML series than the realised kernel. This matches results we saw for the volatilities in the previous subsection.

6.4.2 Dependence forecasting

We now move on to forecasting. The focus will be on the conditional covariance matrix

Σ = Cov(y y,x ), t t|Ft−1 where yt is the d-dimensional open to close daily return vector. We will write Σt = DtRtDt where

Dt is a diagonal matrix with conditional standard deviations on the diagonal where the conditional variances are σ2 = Var(y y,x ) where it it|Ft−1

σ2 = ω + α y2 + β σ2 + γ x , ω , α , β , γ 0. i,t i i i,t−1 i i,t−1 i i,t−1 i i i i ≥

This is the same conditional volatility model as we used in the previous subsection. We use these volatilities to construct

yi,t ei,t = , (11) σi,t the i-th devolatilised series.

Here Rt is a conditional correlation matrix with i, j-th element

Cor(y ,y y,x )= Cor(e , e y,x ). i,t j,t|Ft−1 i,t j,t|Ft−1

It is the focus of this subsection. 5Recall the realised kernel uses Refresh Time which is a kind of sparse sampling. Hence the fact that it has less bias at small levels of sparse sampling is not a surprise.

34 In this exercise we will assume the following dynamic evolution

R = ωΠ+ αC + βR + γX , ω, α, β, γ 0, (12) t t−1 t−1 t−1 ≥ where ω + α + β + γ = 1, X is a realised type correlation matrix and Π is a matrix of parameters which form a correlation matrix. Here the ij-th element of Ct is a moving block correlation of the devolatilised series

M e e C = s=1 i,t−s j,t−s [ 1, 1] . i,j,t ∈ − MP 2 M 2 s=1 ei,t−s s=1 ej,t−s r P  P  In the case where M = d and there is no X variables, then (12) is the Tse and Tsui (2002) model6. However, throughout we take M = 66, representing around 3 months of past data. In our experiments X will represent the realised QML, realised kernel and realised covariance matrices. We will take R = C for all t M. t M ≤ The key feature of (12) is that it is made up of the weighted sum of four correlation matrices, where the weights sum to one. Hence Rt is always a correlation matrix. If X is biased, for example, then Π can partially compensate by not being the unconditional correlations of the innovations (11). We will see this happen in practice. In order to tune the model and to assess the fit, we will work with the joint log-likelihood function 1 1 log L = log Σ y′Σ−1y −2 | t|− 2 t t t

= log LM + log LC where d 1 1 ′ log L = log D 2 y′D−1 y′D−1 = log L M −2 | t| − 2 t t t t Mi Xj=1 n   1 y2 log L = log σ2 + it , Mi −2 it σ2 t=M+1  it  Xn 1 1 log L = log R + e′ R−1e e′ e . C −2 | t| 2 t t t − t t t=XM+1   6An alternative would be to use the DCC model which would have the form of

′ Qt = ωΠ+ αet−1et−1 + βQt−1 + γXt−1, where

−1/2 −1/2 Rt = diag(Qt) Qtdiag(Qt) .

Unfortunately the impact of the non-linear transform for Rt could be rather gruesome on the realised correlation matrix Xt−1 as the rescaling by the diagonal elements of Qt destroys all of its attractive properties. DCC models are discussed in Engle (2009).

35 −1 We define here et = Dt yt, the vector of “devolatilised returns”. The log LC term is setup to be a copula type likelihood. We estimate the model using a two-step procedure, which can be formalised using the method of moments (e.g. Newey and McFadden (1994)). First we estimate the univariate models, and fix the volatility dynamic parameters at those estimated values. We then estimate the dependence model by optimising log LC . A review of the literature on multivariate models is given by Silvennoinen and Ter¨asvirta (2009) and Engle (2009).

QML RK RV Spar ρ α β γ logL ρ α β γ logL ρ α β γ logL 0.69 0.59 0.00 0.00 0.69 0.59 0.00 0.00 0.69 0.59 0.00 0.00 1 0.99 0.31 0.13 0.25 5.5 0.94 0.06 0.73 0.14 6.8 0.99 0.43 0.00 0.17 2.0 2 1.00 0.12 0.41 0.25 8.3 0.99 0.09 0.69 0.17 8.3 0.99 0.31 0.00 0.23 3.7 3 1.00 0.08 0.53 0.23 9.0 0.80 0.09 0.71 0.11 3.2 0.99 0.13 0.34 0.18 4.8 5 0.99 0.05 0.61 0.21 9.6 0.78 0.09 0.70 0.10 2.8 0.99 0.07 0.44 0.18 6.0 10 0.99 0.06 0.64 0.20 7.5 0.81 0.05 0.77 0.10 4.4 0.99 0.07 0.50 0.19 5.9 15 0.99 0.07 0.72 0.15 5.1 0.80 0.07 0.71 0.12 4.8 0.99 0.07 0.55 0.18 5.9 20 0.99 0.07 0.74 0.14 4.1 0.80 0.07 0.74 0.10 2.9 0.99 0.08 0.55 0.19 5.5 30 0.99 0.06 0.78 0.11 4.0 0.84 0.05 0.74 0.12 5.1 0.99 0.10 0.57 0.18 4.9 60 0.96 0.07 0.79 0.11 3.4 0.87 0.04 0.76 0.12 5.9 0.99 0.10 0.64 0.16 4.7 120 0.95 0.08 0.76 0.11 3.7 0.82 0.08 0.71 0.09 3.2 0.99 0.14 0.56 0.20 4.9 180 0.89 0.09 0.71 0.14 4.3 0.82 0.08 0.72 0.08 2.5 0.99 0.13 0.57 0.21 4.7 300 0.85 0.09 0.70 0.12 2.9 0.82 0.08 0.72 0.08 1.8 0.99 0.10 0.69 0.15 4.6

Table 6: Forecasting exercises for crosss-sectional dependence between BAC (Bank of America) and SPY (S&P 500 exchange traded fund). The estimated model is (12). Here ρ is the non-unit element of Π. Note that ω = 1 α β γ and so is not reported here. LogL is the improvement − − − in the log likelihood compared to the base model with no realised quantities, that is γ = 0.

The results from this forecasting exercise are given in Table 6. These are based upon the innovations from the univariate volatility models for BAC and SPY conditioning on lagged realised QML statistics. The results for the dependence model when we do not condition on any additional realised quantities, that is γ = 0, are given above the line in the Table. As β = 0 it means ω = 1 α − and so is roughly 0.41. Here ρ is the non-unit element of Π. Hence the estimated model for the conditional correlation is 0.41 0.69 + 0.59C where C is the block correlation amongst × 1,2,t−1 1,2,t−1 the BAC and SPY innovations. This can be thought of as simply a shrunk block correlation. When we condition on lagged realised quantities the log likelihood will typically rise. The improvement is recorded as logL in the Table. The results are reported for the QML, realised kernel and realised covariance. Obviously the results vary with the level of sparsity. For low levels of sparsity, RK does best. It drives α down to near zero, reminding us of the results we saw in the univariate cases discussed in the previous subsection. However, the improvement in the log likelihood is relatively modest, certainly less than we were used to from the univariate cases.

36 logL comparison for QML logL comparison for RK 30 30 Median Median

20 20

10 10

0 0 1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200 logL comparison for RC logL change: QML minus RK & 30 2.5 QML minus RC

Median

20 0.0

10 -2.5 QML-RK QML-RC

0 1 2 3 4 5 10 20 30 100 200 1 2 3 4 5 10 20 30 100 200

Figure 9: Improvements in logL for pairs involving SPY by including the realised quantities. On the x-axis is the level of sparsity. The heavy line denotes the median at each level of sparsity. All realised quantities are computed using all 13 series. Bottom right is the log likelihood for the model including realised QML minus the corresponding figure for realised kernel. This is plotted against sparsity. Also drawn is the corresponding result for the realised QML against realised covariance.

For low levels of sparsity QML is downward biased and so ρ is estimated to be high to compen- sate. In the QML case we need larger sparsity to successfully drive down α, but that estimator is certainly low with sparsity being 5 or more. This kind of levels of sparsity delivers a better fitting model than the results for RK, but the difference is not particularly large.

6.4.3 Cross section

Here we just focus on the cross section involving SPY based pairs. Of course there are 12 of these.

The top left of Figure 9 shows the log-likelihood improvement in log LC by including the realised QML information, i.e. allowing γ to be greater than zero. The improvement is shown for each level of sparsity and is plotted separately for each of the 12 pairs. The median improvement is shown by the dark line. Almost throughout the improvement is modest, for a sole series the improvement is quite large. The realised QML performs better as the level of sparsity increases, but once again it tails off at the very end with very large sparsity for in those cases the sample sizes tend to be moderate and so the realised estimator is noisy. Bottom left shows the corresponding results for the realised covariance. The results here are

37 poor for low levels of sparsity, adding basically nothing to the forecasting model. This is the influence of the Epps effect again. For higher levels of sparsity the realised covariance performs much better and approaches the realised QML estimator in terms of added value. The top right shows the results for the realised kernel. This does best for very low levels of sparsity, making an important improvement in forecasting performance. However, as the level of sparsity increases the improvement due to the realised kernel falls away. The bottom right graph shows the median log-likelihood improvement of QML minus the median log-likelihood improvement for the realised kernel. Positive numbers give a preference for QML. For low levels of sparsity, as we would expect, the realised kernel outperforms. However, for moderate degrees of sparsity the QML estimator has better performance. Overall we can see that for the dependence modelling the inclusion of the realised information does add value, but the effects are not enormous. Realised QML again needs a moderate degree of sparsity to be competitive. For this level of sparsity realised QML slightly outperforms the realised kernel. Unlike the univariate case, the open to close information is not tested out of the model. But its importance does reduce a great deal by including the realised quantities.

7 Conclusion

This paper proposes and systematically studies a new method for estimating the dependence amongst financial asset price processes. The realised QML estimator is robust to certain types of market microstructure noise and can deal with non-synchronised time stamps. It is also guaranteed to be positive semi-definite and converges at the optimal asymptotic rate. This combination of properties is unique in the literature and so it is worthwhile exploring this estimator in some detail. In this paper we develop the details of the quasi-likelihood and show how to numerically op- timise it in a simple way even in large dimensions. We also develop some of the theory needed to understand the properties of the estimator and the corresponding results for realised QML es- timators or betas and correlations. Particularly important is our theory for asynchronous data. Our Monte Carlo experiments are extensive, comparing the estimator to various alternatives. The realised QML performs well in these comparisons, in particular in unbalanced cases. Our initial empirical results are somewhat encouraging, although much work remains. The volatilities seem to be robust to the presence of slowly trading stocks in the dataset. The im- provement in the fit of the model by including these realised quantities is large. The results for measures for dependence are mixed, with the improvements from the realised quantities being modest. The realised QML is underestimating long-run dependence in these empirical experiments unless the level of sparsity is quite high. This underestimation also does not appear in our Monte

38 Carlo experiments. There are various explanations for this, but it would seem clear to need a more sophisticated model of market microstructure effects. At the moment our recommendation for empirical work is for researchers to use realised QML or realised kernel estimators inside their conditional volatility models. When modelling dependence amongst the devolatilised returns the decision to use extra realised information is more balanced — it does increase the performance of the model but not by a great deal.

8 Acknowledgements

We thank Siem Jan Koopman for some early comments on some aspects of filtering with massive missing data, and Markus Bibinger for discussions on quadratic variation of time. We are partic- ularly grateful to Fulvio Corsi, Stefano Peluso and Francesco Audrino for sharing with us a copy of their related work on this topic. The same applies to Cheng Liu and Cheng Yong Tang. We also thank Kevin Sheppard for allowing us to use the cleaned high frequency data he developed for Lunde, Shephard, and Sheppard (2012), as well as advice on all things multivariate. Last but not least, we thank seminar participants at CEMFI, especially St´ephane Bonhomme, Enrique Sentana, and David Veredas for helpful comments. This research was supported in part by the FMC Faculty Scholar Fund at the University of Chicago Booth School of Business.

References

A¨ıt-Sahalia, Y., J. Fan, and D. Xiu (2010). High frequency covariance estimates with noisy and asyn- chronous financial data. Journal of the American Statistical Association 105, 1504–1517. A¨ıt-Sahalia, Y., J. Jacod, and J. Li (2012). Testing for jumps in noisy high frequency data. Journal of Econometrics 168, 207 – 222. A¨ıt-Sahalia, Y. and P. A. Mykland (2003). The effects of random and discrete sampling when estimating continuous-time diffusions. Econometrica 71, 483–549. A¨ıt-Sahalia, Y. and P. A. Mykland (2009). Estimating volatility in the presence of market microstructure noise: A review of the theory and practical considerations. In T. G. Andersen, R. Davis, J.-P. Kreiss, and T. Mikosch (Eds.), Handbook of Financial Time Series, pp. 577–598. A¨ıt-Sahalia, Y., P. A. Mykland, and L. Zhang (2005). How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. A¨ıt-Sahalia, Y. and D. Xiu (2012). Likelihood-based volatility estimators in the presence of market mi- crostructure noise: A review. In L. Bauwens, C. M. Hafner, and S. Laurent (Eds.), Handbook of Volatility Models and their Applications, Chapter 14. Forthcoming. Andersen, T. G., T. Bollerslev, F. X. Diebold, and H. Ebens (2001). The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2000). Great realizations. Risk 13, 105–108. Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2001). The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003). Modeling and forecasting realized volatility. Econometrica 71, 579–625. Bandi, F. M. and J. R. Russell (2008). Microstructure noise, realized variance, and optimal sampling. Review of Economic Studies 75, 339–369.

39 Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2008). Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009). Realised kernels in practice: trades and quotes. Econometrics Journal 12, C1–C32. Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2011). Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non- synchronous trading. Journal of Econometrics 162, 149–169. Barndorff-Nielsen, O. E. and N. Shephard (2002). Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O. E. and N. Shephard (2004). Econometric analysis of realised covariation: high frequency covariance, regression and correlation in financial economics. Econometrica 72, 885–925. Brownlees, C. T. and G. M. Gallo (2010). Comparison of volatility measures: a risk management perspec- tive. Journal of Financial Econometrics 8, 29–56. Cartea, A. and D. Karyampas (2011). Volatility and covariation of financial assets: A high-frequency analysis. Journal of Banking and Finance 35, 3319–3334. Christensen, K., S. Kinnebrock, and M. Podolskij (2010). Pre-averaging estimators of the ex-post co- variance matrix in noisy diffusion models with non-synchronous data. Journal of Econometrics 159, 116–133. Corsi, F., S. Peluso, and F. Audrino (2012). Missing asynchronicity: a Kalman-EM approach to multi- variate realized covariance estimation. Unpublished paper: University of St. Gallen. Durbin, J. and S. J. Koopman (2001). Time Series Analysis by State Space Methods. Oxford: . Elerian, O., S. Chib, and N. Shephard (2001). Likelihood inference for discretely observed non-linear diffusions. Econometrica 69, 959–993. Engle, R. F. (2009). Anticipating Correlations. Princeton University Press. Engle, R. F. and J. P. Gallo (2006). A multiple indicator model for volatility using intra daily data. Journal of Econometrics 131, 3–27. Engle, R. F. and J. R. Russell (1998). Forecasting transaction rates: the autoregressive conditional duration model. Econometrica 66, 1127–1162. Epps, T. W. (1979). Comovements in stock prices in the very short run. Journal of the American Statistical Association 74, 291–296. Ghysels, E., A. C. Harvey, and E. Renault (1996). Stochastic volatility. In C. R. Rao and G. S. Maddala (Eds.), Statistical Methods in Finance, pp. 119–191. Amsterdam: North-Holland. Gloter, A. and J. Jacod (2001a). Diffusions with measurement errors. I — local asymptotic normality. ESAIM: Probability and Statistics 5, 225–242. Gloter, A. and J. Jacod (2001b). Diffusions with measurement errors. II — measurement errors. ESAIM: Probability and Statistics 5, 243–260. Hansen, P. R. and G. Horel (2009). Quadratic variation by Markov chains. Unpublished paper: Department of Economics, Stanford University. Hansen, P. R., Z. Huang, and H. H. Shek (2011). Realized GARCH: a joint model for returns and realized measures of volatility. Journal of Applied Econometrics 27, 877–906. Hansen, P. R., J. Large, and A. Lunde (2008). Moving average-based estimators of integrated variance. Econometric Reviews 27, 79–111. Hansen, P. R. and A. Lunde (2006). Realized variance and market microstructure noise (with discussion). Journal of Business and Economic Statistics 24, 127–218. Hansen, P. R., A. Lunde, and V. Voev (2010). Realized beta GARCH: a multivariate GARCH model with realized measures of volatility and covolatility. Working paper: CREATES, . Harris, F. H. d., T. H. McMcInish, G. Shoesmith, and R. A. Wood (1995). Cointegration, error correction and price discovery on informationally linked security markets. Journal of Financial and Quantitative Analysis 30, 563–579. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.

40 Hayashi, T. and N. Yoshida (2005). On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11, 359–379. Jacod, J. (2012). Statistics and high frequency data. In M. Kessler, A. Lindner, and M. Sorensen (Eds.), Statistical methods for stochastic differential equations, pp. 191–310. Chapman and Hall. Jacod, J., Y. Li, P. A. Mykland, M. Podolskij, and M. Vetter (2009). Microstructure noise in the continuous case: the pre-averaging approach. Stochastic Processes and Their Applications 119, 2249–2276. Jacod, J., M. Podolskij, and M. Vetter (2010). Limit theorems for moving averages of discretized processes plus noise. Annals of Statistics 38, 1478 – 1545. Jacod, J. and A. N. Shiryaev (2003). Limit Theorems for Stochastic Processes (2 ed.). Springer: Berlin. Kalnina, I. and O. Linton (2008). Estimating quadratic variation consistently in the presence of correlated measurement error. Journal of Econometrics 147, 47–59. Kunitomo, N. and S. Sato (2009). Separating information maximum likelihood estimation of realized volatility and covariance with micro-market noise. Unpublished paper: Graduate School of Economics, University of Tokyo. Large, J. (2011). Estimating quadratic variation when quoted prices jump by a constant increment. Journal of Econometrics 160, 2–11. Li, Y. and P. Mykland (2007). Are volatility estimators robust to modelling assumptions? Bernoulli 13, 601–622. Li, Y., P. Mykland, E. Renault, L. Zhang, and X. Zheng (2009). Realized volatility when endogeniety of time matters. Working Paper, Department of Statistics, University of Chicago. Liu, C. and C. Y. Tang (2012). A quasi-maximum likelihood approach to covariance matrix with high frequency data. Unpublished paper: Department of Statistics and Applied Probability, National Uni- versity of Singapore. Ljung, G. M. (1989). A note on the estimation of missing values in time series. Communications in Statistics - Simulation and Computation 18, 459–465. Lunde, A., N. Shephard, and K. K. Sheppard (2012). Econometric analysis of vast covariance matrices using composite realized kernels. Unpublished paper: Department of Economics, University of Oxford. Malliavin, P. and M. E. Mancino (2002). Fourier series method for measurement of multivariate volatilities. Finance and Stochastics 6, 49–61. Malliavin, P. and M. E. Mancino (2009). A fourier transform method for nonparametric estimation of multivariate volatility. Annals of Statistics 37, 1983–2010. Mancino, M. E. and S. Sanfelici (2008). Robustness of fourier estimator of integrated volatility in the presence of microstructure noise. Department of Economics, Parma University. Mancino, M. E. and S. Sanfelici (2009). Covariance estimation and dynamic asset allocation under mi- crostructure effects via fourier methodology. DiMaD Working Papers 2009-09, Dipartimento di Matem- atica per le Decisioni, Universita degli Studi di Firenze. Martens, M. (2003). Estimating unbiased and precise realized covariances. Unpublished paper: Depart- ment of Finance, Erasmus School of Economics, Rotterdam. McAleer, M. and M. C. Medeiros (2008). Realized volatility: a review. Econometric Reviews 27, 10–45. Mykland, P. A., N. Shephard, and K. K. Sheppard (2012). Efficient and feasible inference for the com- ponents of financial variation using blocked multipower variation. Unpublished paper: Department of Economics, Oxford University. Mykland, P. A. and L. Zhang (2006). ANOVA for diffusions and Ito processes. Annals of Statistics 34, 1931–1963. Mykland, P. A. and L. Zhang (2009). Inference for continuous semimartingales observed at high frequency. Econometrica 77, 1403–1455. Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and D. McFadden (Eds.), The Handbook of Econometrics, Volume 4, pp. 2111–2245. North-Holland. Noureldin, D., N. Shephard, and K. Sheppard (2012). Multivariate high-frequency-based volatility (heavy) models. Journal of Applied Econometrics 27, 907–933. Owens, J. P. and D. G. Steigerwald (2006). Noise reduced realized volatility: A Kalman filter approach. Unpublished paper: University of California at Santa Barbara.

41 Papaspiliopoulos, O. and G. Roberts (2012). Importance sampling techniques for estimation of diffusion models. In M. Kessler, A. Lindner, and M. Sørensen (Eds.), Statistical Methods for Stochastic Differ- ential Equations, pp. 311–337. Chapman and Hall. Monographs on Statistics and Applied Probability. Park, S. and O. B. Linton (2012a). Estimating the quadratic covariation matrix for an asynchronously observed continuous time signal masked by additive noise. Unpublished paper: Financial Markets Group, London School of Economics. Park, S. and O. B. Linton (2012b). Realized volatility: Theory and application. In L. Bauwens, C. Hafner, and L. Sebastien (Eds.), Volatility Models And Their Applications, pp. 293–312. London: Wiley. Peluso, S., F. Corsi, and A. Mira (2012). A Bayesian high-frequency estimator of the multivariate covari- ance of noisy and asynchronous returns. Unpublished paper: University of Lugano, Switzerland. Protter, P. (2004). Stochastic Integration and Differential Equations. New York: Springer-Verlag. Reiss, M. (2011). Asymptotic equivalence for inference on the volatility from noisy observations. Annals of Statistics 39, 772–802. Roberts, G. O. and O. Stramer (2001). On inference for nonlinear diffusion models using the Hastings- Metropolis algorithms. Biometrika 88, 603–621. Sanfelici, S. and M. E. Mancino (2008). Covariance estimation via fourier method in the presence of asynchronous trading and microstructure noise. Economics Department Working Papers 2008-ME01, Department of Economics, Parma University (Italy). Shephard, N. and K. K. Sheppard (2010). Realising the future: forecasting with high-frequency-based volatility (HEAVY) models. Journal of Applied Econometrics 25, 197–231. Silvennoinen, A. and T. Ter¨asvirta (2009). Multivariate GARCH models. In T. G. Andersen, R. A. Davis, J. P. Kreiss, and T. Mikosch (Eds.), Handbook of Financial Time Series, pp. 201–229. Springer-Verlag. Stein, M. L. (1987). Minimum norm quadratic estimation of spatial variograms. Journal of the American Statistical Association 82, 765–772. Tanner, M. A. (1996). Tools for Statistical Inference: Methods for Exploration of Posterior Distributions and Likelihood Functions (3 ed.). New York: Springer-Verlag. Tse, Y. K. and K. C. Tsui (2002). A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business and Economic Statistics 20, 351–362. Voev, V. and A. Lunde (2007). Integrated covariance estimation using high-frequency data in the presence of noise. Journal of Financial Econometrics 5, 68–104. West, M. and J. Harrison (1989). Bayesian Forecasting and Dynamic Models. New York: Springer-Verlag. Xiu, D. (2010). Quasi-maximum likelihood estimation of volatility with high frequency data. Journal of Econometrics 159, 235–250. Zhang, L. (2011). Estimating covariation: Epps effect and microstructure noise. Journal of Economet- rics 160, 33–47. Zhang, L., P. A. Mykland, and Y. A¨ıt-Sahalia (2005). A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394– 1411. Zhou, B. (1996). High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52. Zhou, B. (1998). Parametric and nonparametric volatility measurement. In C. L. Dunis and B. Zhou (Eds.), Nonlinear Modelling of High Frequency Financial Time Series, Chapter 6, pp. 109–123. New York: John Wiley Sons Ltd. Appendices

42 A Mathematical proofs

A.1 Proof of Theorem 1

There exists an orthogonal matrix U = (uij) given below, such that U Ω Ω U ′ diag(µ )Ω 11 12 = 1j 12 =: V U Ω Ω U ′ Ω diag(µ )   12 22    12 2j  where

Ω =Σ ∆ I, 12 12 ⊗ Ω =Σ ∆ I + Λ J, i = 1, 2, ii ii ⊗ ii ⊗ 2 i j uij = sin · π , i, j = 1,...,n, rn + 1 n + 1   j µ =Σ ∆ + 2Λ 1 cos π , i = 1, 2, and j = 1,...,n. ij ii ii − n + 1    Since U = U ′−1, we have

U ′ U Ω−1 = V −1 U ′ U     where

µ21 Σ12∆ 2 2 2 2 µ11µ21−Σ12∆ − µ11µ21−Σ12∆ . .  .. ..  µ2n Σ12∆ 2 2 2 2 −1  µ1nµ2n−Σ12∆ − µ1nµ2n−Σ12∆  V =  Σ12∆ µ11   2 2 2 2   µ11µ21−Σ12∆ µ11µ21−Σ12∆   −   .. ..   . .   Σ12∆ µ1n   2 2 2 2   − µ1nµ2n−Σ12∆ µ1nµ2n−Σ12∆    One important observation is that U does not depend on parameters, hence taking derivatives of Ω becomes very convenient with the help of the decomposition. Note that

1 ∂L 1 ∂Ω ∂Ω = tr Ω−1 tr Ω−1 Ω−1rr′ √n ∂θ −2√n ∂θ − ∂θ !     where for θ =Σ11 and Σ12, n −1 ∂Ω −1 ∂V µ2i∆ tr Ω = tr V = 2 2 ∂Σ11 ∂Σ11 µ1iµ2i Σ12∆     Xi=1 − n 2 −1 ∂Ω −1 ∂V 2Σ12∆ tr Ω = tr V = − 2 2 ∂Σ12 ∂Σ12 µ1iµ2i Σ12∆     Xi=1 − and

∂Ω tr Ω−1 Ω−1rr′ ∂Σ11   43 2 2 µ2i∆ −Σ12µ2i∆ diag 2 2 2 diag 2 2 2 ′ (µ1iµ2i−Σ12∆ ) (µ1iµ2i−Σ12∆ ) U ′ U =tr 2 2 3 rr ′ −Σ12µ2i∆ Σ12∆ U U  diag 2 2 2  diag 2 2 2   ! (µ1iµ2i−Σ12∆ ) (µ1iµ2i−Σ12∆ )      ∂Ω     tr Ω−1 Ω−1rr′ ∂Σ12   2 2 3 −2µ2iΣ12∆ Σ12∆ +µ1iµ2i∆ diag 2 2 2 diag 2 2 2 ′ (µ1iµ2i−Σ12∆ ) (µ1iµ2i−Σ12∆ ) U ′ U =tr 2 3 2 rr ′ Σ12∆ +µ1iµ2i∆ −2µ1iΣ12∆ U U  diag 2 2 2  diag 2 2 2   ! (µ1iµ2i−Σ12∆ ) (µ1iµ2i−Σ12∆ )     Therefore, by direct calculations and using symmetry, we have

2 n 2 2 1 ∂ L 1 µ2i∆ E 2 = 2 2 2 , − √n ∂Σ11 2√n (µ1iµ2i Σ12∆ )   Xi=1 − 1 ∞ (Σ T + Λ π2x2)2T 2 22 22 dx := IΣ , ∼2 ((Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2)2 11 Z0 11 11 22 22 − 12 2 n 3 1 ∂ L 1 2Σ12µ2i∆ E = − 2 2 2 , − √n ∂Σ11∂Σ12 2√n (µ1iµ2i Σ12∆ )   Xi=1 − ∞ Σ (Σ T + Λ π2x2)T 3 − 12 22 22 dx := IΣ , ∼ ((Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2)2 12 Z0 11 11 22 22 − 12 2 n 2 4 1 ∂ L 1 Σ12∆ E = 2 2 2 , − √n ∂Σ11∂Σ22 2√n (µ1iµ2i Σ12∆ )   Xi=1 − 1 ∞ Σ2 T 4 12 dx := IΣ , ∼2 ((Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2)2 13 Z0 11 11 22 22 − 12 2 n 2 4 2 1 ∂ L 1 2(Σ12∆ +∆ µ1iµ2i) E 2 = 2 2 2 , − √n ∂Σ12 2√n (µ1iµ2i Σ12∆ )   Xi=1 − ∞ (Σ T + Λ π2x2)(Σ T + Λ π2x2)T 2 +Σ2 T 4 11 11 22 22 12 dx := IΣ . ∼ ((Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2)2 22 Z0 11 11 22 22 − 12 Σ Σ Σ Σ By symmetry, we can obtain I23, I33, and I22 by simply switching the indices 1 and 2 in I12 and Σ I11. The asymptotic variance for (Σ11, Σ12, Σ22) is given by Σ Σ Σ −1 I11 I12 I13 b b b Π= IΣ IΣ .  · 22 23  IΣ · · 33   Similarly derivations on 1/n-scaled likelihood can show that the asymptotic variance for (Λ11, Λ22) is given by b b −1 IΛ 2Λ2 11 = 11 . IΛ 2Λ2  22   22  Notice that the above integrals have explicit forms, which can be obtained easily by Mathematica. However, the explicit formulae are tedious and hence omitted here.

A.2 Proof of Theorem 2

The proof is made of the following steps: first, we show that the differences of the score vectors, scaled by appropriate rates, and their target “conditional expectations” converge uniformly to 0, and

44 satisfy the identification condition. (This step is easily achieved from the following calculations). Second, we derive the stable CLTs for the differences, and this where the higher order moments of volatility process come into play. Third, we solve the equations that the target equal to 0, and find that the difference between the pseudo true parameter values and the parameters of interest are asymptotically negligible. Last, we use the sandwich theorem and consistency to establish the CLT for the QMLE. To clarify our notation, we use subscript 0 to mark quantities that are made of true values. The true values for the Brownian covariances are obviously written in integral forms. The pseudo true parameters are marked with a superscript such as Σ¯ and Λ,¯ and the QML estimators are marked as Σ and Λ. The other Σ, Λ, etc without any special marks represent any parameter values within the parameter space, which is assumed to be a compact set. b b The drift term can be ignored without loss of generality, as a simple change of measure argument makes it sufficient to investigate the case without drift. Recall that in (4), we have 1 1 L = n log(2π) log(det Ω) r′Ω−1r − − 2 − 2 Now we consider the following function: 1 1 L¯ = n log(2π) log(det Ω) tr(Ω−1Ω ), − − 2 − 2 0 where the subscript 0 denotes the true value, Ω11 Ω12 Ω = 0 0 , 0 Ω21 Ω22  0 0  with Ωll = ti Σ dt + 2Λ ,Ωll = Ωll = Λ , and Ω12 = Ω21 = ti Σ dt. 0,ii ti−1 ll,t 0,ll 0,i,i+1 0,i,i−1 0,ll 0,i,i 0,i,i ti−1 12,t Therefore, theR difference between L and L¯ is given by: R 1 1 U ′ U L L¯ = tr Ω−1(rr′ Ω ) = tr V −1 (rr′ Ω ) − 2 − 0 2 U ′ U − 0      2 2 n  2 2 n 1 ti = ωls (∆ y )(∆ y ) Σ dt + ωls∆ny ∆ny 2 ii i l i s − ls,t ij i l j s s=1 ti−1 s=1 Xl=1 X Xi=1  Z  Xl=1 X Xi=1 Xj

n 1 µ2k kl 2 2 cos π n + 1 µ1kµ2k Σ12∆ n + 1 Xk=1 −   n (k+1)l (k−1)l 1 µ sin( π) sin( π) = 2k n+1 − n+1 2(n + 1) 2 2 l µ1kµ2k Σ12∆ sin( n+1 π) Xk=1 − n (k+1)l 1 sin( π) µ µ = n+1 2k 2,k+2 2(n + 1) l 2 2 2 2 sin( n+1 π) µ1kµ2k Σ12∆ − µ1,k+2µ2,k+2 Σ12∆ Xk=0  − −  n (k+1)l 1 sin( π) µ µ (µ µ ) + (µ µ )Σ2 ∆2 = n+1 2,k+2 2,k 1,k+2 − 1,k 2,k+2 − 2,k 12 2(n + 1) l 2 2 2 2 sin( n+1 π) (µ1kµ2k Σ12∆ )(µ1,k+2µ2,k+2 Σ12∆ ) Xk=0 − − n (k+1)l π k+1 2 2 2 sin( π) sin( ) sin( π) (µ µ Λ11 + Λ22Σ ∆ ) = n+1 n+1 n+1 2,k+2 2,k 12 n + 1 l 2 2 2 2 sin( n+1 π) (µ1kµ2k Σ12∆ )(µ1,k+2µ2,k+2 Σ12∆ ) Xk=0 − − 1 Clearly, ω11 = ω11 = ω11 . For any n 2 +δ l [ n+1 ], we have i,j j,i n+1−i,n+1−j ≤ ≤ 2 n n 1 k 1 µ2k kl 1 1 n n cos π C 2 o(√n) 2 2 l 1 k 2 n + 1 µ1kµ2k Σ12∆ n + 1 ≤ n ( + 2 ) ∼ k=1 −   k=1 n n n X X 1 1 hence, for any n 2 +δ i n n 2 +δ, ≤ ≤ − ∞ Σ T + Λ π2x2 11 22 22 √ ωi,i = 2 2 2 2 2 2 dx n(1 + o(1)). 0 (Σ11T + Λ11π x )(Σ22T + Λ22π x ) Σ12T ·  Z −  Similarly, we can derive

∞ Σ T + Λ π2x2 ω22 = 11 11 dx √n(1 + o(1)) i,i (Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2 · Z0 11 11 22 22 − 12  ∞ Σ T  12 12 √ ωi,i = 2 2 − 2 2 2 2 dx n(1 + o(1)). 0 (Σ11T + Λ11π x )(Σ22T + Λ22π x ) Σ12T ·  Z −  To simply our notation, let

Σ T + Λ π2x2 ω11(Σ, Λ,x)= 22 22 (Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2 11 11 22 22 − 12 Σ T + Λ π2x2 ω22(Σ, Λ,x)= 11 11 (Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2 11 11 22 22 − 12 Σ T ω12(Σ, Λ,x)= − 12 (Σ T + Λ π2x2)(Σ T + Λ π2x2) Σ2 T 2 11 11 22 22 − 12 We define the score vectors and their targets as

1 ∂L 1 ∂L¯ 1 ∂L 1 ∂L¯ Ψ = , Ψ¯ = , Ψ = , and Ψ¯ = , Σ −√n ∂Σ Σ −√n ∂Σ Λ −n ∂Λ Λ −n ∂Λ

46 where

∂ ∂ ∂Σ11 ∂ ∂ = ∂ , and = ∂Λ11 .  ∂Σ12  ∂ ∂Σ ∂ ∂Λ ∂Λ22 !  ∂Σ22    Then we have

1 Ψ Ψ¯ = M (Σ) + 2M (Σ) + 2M (Σ) + M (Σ) , Σ − Σ 2√n 1 2 3 4   where

2 2 n ls τ i (Σ) ∂ωii M1 = (∆iyl)(∆iys) Σls,tdt ∂Σ − τ i−1 Xl=1 Xs=1 Xi=1  Z  2 2 n ∂ωls M (Σ) = ij ∆ny ∆ny 2 ∂Σ i l j s s=1 Xl=1 X Xi=1 Xj

1 (Σ) (Σ) LX n− 4 (M + 2M ) MN(0, Avar(2)(Σ)), 1 2 −→ where

2 n v,u l,s 1 ∂ω ∂ω (2) − 2 i,j i,j 2 Avar (Σ) = lim 4n Σsv,t Σul,t ∆ (A.1) n→∞ ∂Σ ∂Σ′ i j l,s,u,vX=1 Xi=1 Xj

1 (Σ) LX n− 4 2M MN(0, Avar(3)(Σ)), 3 −→ where

Avar(3)(Σ) n 2 n ls lv lv lv 1 ∂ω ∂ω ∂ω ∂ω − 2 ij ij i,j−1 i,j+1 = lim 4 n Λ0,ll 2 Σsv,t ∆ n→∞ ∂Σ ∂Σ′ − ∂Σ′ − ∂Σ′ j Xj=1 l,s,vX=1 Xi=1   2 ∞ ls lv T ∂ω (Σ, Λ,x) ∂ω (Σ, Λ,x) 2 2 =4 Λ0,ll ′ π x dx Σsv,tdt. (A.3) 0 ∂Σ ∂Σ 0 l,s,vX=1 Z Z

47 Finally, we have

1 (Σ) L n− 4 M N(0, Avar(4)(Σ)), 4 −→ where

Avar(4)(Σ) n 11 11 12 12 22 22 1 ∂ω ∂ω ∂ω ∂ω ∂ω ∂ω − 2 i,j k,l ij,kl i,j k,l i,k j,l i,j k,l ij,kl = lim n ′ K11 + 4 ′ K11 K22 + ′ K22 n→∞ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ ! i,j,k,lX=1 1 11 11 11 11 12 12 − ∂ω ∂ω ∂ω ∂ω ∂ω ∂ω = lim n 2 V1 , + V2 , + 2V2 , n→∞ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ       ∂ω22 ∂ω22 ∂ω22 ∂ω22 + V1 , + V2 , , ∂Σ ∂Σ ∂Σ ∂Σ !     and

ll ll n−1 ll ll ll ll ll ll ∂ω ∂ω ∂ωi,i+1 ∂ωi+1,i+1 ∂ωi,i ∂ωi+1,i+1 ∂ωi,i+1 ∂ωi,i+1 V1 , = 8 ′ + 2 ′ + 4 ′ ∂Σ ∂Σ − ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ   Xi=1   n ll ll ∂ωii ∂ωii + 2 ′ cum4[εl] ∂Σ ∂Σ ! Xi=1    O(1), ∼ ll ll n ll ll ll ll ll ∂ω ∂ω 2 ∂ωi,j ∂ωj−1,i−1 ∂ωj−1,i+1 ∂ωj−1,i ∂ωj+1,i−1 V2 , =2(Λ0,ll) ′ + ′ 2 ′ + ′ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ − ∂Σ ∂Σ   i,jX=1  ll ll ll ll ll ∂ωj+1,i+1 ∂ωj+1,i ∂ωj,i−1 ∂ωj,i+1 ∂ωj,i + ′ 2 ′ 2 ′ + ′ 2 ′ ∂Σ − ∂Σ − ∂Σ ∂Σ − ∂Σ !   ∞ ∂ωll ∂ωll 1 2(Λ )2 π4x4dx n 2 , ∼ 0,ll ∂Σ ∂Σ′ Z0 12 12  n 12 12 12 12 12 ∂ω ∂ω ∂ωi,j ∂ωj−1,i−1 ∂ωj−1,i+1 ∂ωj−1,i ∂ωj+1,i−1 V2 , =2(Λ0,11)(Λ0,22) ′ + ′ 2 ′ + ′ ∂Σ ∂Σ ∂Σ ∂Σ ∂Σ − ∂Σ ∂Σ   i,jX=1  12 12 12 12 12 ∂ωj+1,i+1 ∂ωj+1,i ∂ωj,i−1 ∂ωj,i+1 ∂ωj,i + ′ 2 ′ 2 ′ + ′ 2 ′ ∂Σ − ∂Σ − ∂Σ ∂Σ − ∂Σ !   ∞ ∂ω12 ∂ω12 1 4 4 2 2(Λ0,11)(Λ0,22) ′ π x dx n . ∼ 0 ∂Σ ∂Σ  Z  i,j i,j ij,kl ij,kl n n Here, K11 , K22 , K11 and K22 are the corresponding cumulants for ∆i ε1 and ∆i ε2, and cum4[ε1] and cum4[ε2] are the fourth cumulants of ε1 and ε2. Therefore, we have

Avar(4)(Σ) 2 ∞ ls ls ∂ω (Σ, Λ,x) ∂ω (Σ, Λ,x) 4 4 =2 Λ0,llΛ0,ss ′ π x dx. (A.4) 0 ∂Σ ∂Σ l,sX=1 Z

48 In summary, we have

1 L 1 n− 4 (Ψ Ψ¯ ) N 0, Avar(2)(Σ) + Avar(3)(Σ) + Avar(4)(Σ) . Σ − Σ −→ 4    Similarly, we can obtain

2 2(Λ0,11) +cum4[ε1] 1 ¯ 4 − ΨΛ11 ΨΛ11 L 0 4Λ11 n 2 N , 2 . (A.5) − ¯ 2(Λ0,22) +cum4[ε2] ΨΛ22 ΨΛ22 −→  0  4   −    4Λ22   ∗ Further, we need to solve Ψ¯ Σ = 0 and Ψ¯ Λ = 0 for the pseudo-true parameters θ , and show that the distance between θ∗ and the values of interest are negligible asymptotically. In fact, for any Σ Σ , Σ , Σ , we have uv ∈ { 11 12 22} −1 ¯ 1 −1 ∂Ω ∂tr(Ω Ω0) ΨΣuv = tr Ω + 2√n ∂Σuv ∂Σuv n   o 1 ∂Ω ∂tr(Ω−1(Ω + J (Λ Λ) + Γ)) = tr Ω−1 + ⊗ 0 − 2√n ∂Σuv ∂Σuv n   o 1 ∂Ω−1 ∂Ω−1 = tr J (Λ0 Λ) + tr Γ 2√n ∂Σuv ⊗ − ∂Σuv n    o 2 n ll ll ll 2 n ls 1 ∂ωii ∂ωi,i−1 ∂ωi,i+1 ∂ωii ls = 2 (Λ0,ll Λll)+ Γii 2√n ∂Σuv − ∂Σuv − ∂Σuv − ∂Σuv n Xl=1 Xi=1   l,sX=1 Xi=1 o 2 1 ∞ ∂ωls(Σ, Λ,x) T = dx Σls,tdt ΣlsT (1 + o(1)) 2 0 ∂Σuv 0 − n l,sX=1  Z  Z  2 ∞ ll ∂ω (Σ, Λ,x) 2 2 + (Λ0,ll Λll) π x dx , − 0 ∂Σuv Xl=1 Z   o ls where Λ0 denotes the true covariance matrix of noise, Γ is block diagonal matrix, with Γii = τ i Σls,tdt Σls∆, and J is an n n tridiagonal matrix with matrix diagonal elements equal to τ i−1 − × R2 and off-diagonal elements equal to 1. − Similarly, for Λ Λ , Λ , we have uu ∈ { 11 22} −1 ¯ 1 −1 ∂Ω ∂tr(Ω Ω0) ΨΛuu = tr Ω + 2n ∂Λuu ∂Λuu 1 n  ∂Ω  ∂tr(Ω−1(Ω +oJ (Λ Λ) + Γ)) = tr Ω−1 + ⊗ 0 − 2n ∂Λuu ∂Λuu 1 n ∂Ω−1  ∂Ω−1 o = tr J (Λ0 Λ) + tr Γ 2n ∂Λuu ⊗ − ∂Λuu n 2 n ll ll  ll o 2 n ls 1 ∂ωii ∂ωi,i−1 ∂ωi,i+1 ∂ωii ls = 2 (Λ0,ll Λll)+ Γii 2n ∂Λuu − ∂Λuu − ∂Λuu − ∂Λuu n Xl=1 Xi=1   l,sX=1 Xi=1 o 2 1 1 ∞ ∂ωls(Σ, Λ,x) T = dx Σls,tdt ΣlsT (1 + o(1)) 2 √n 0 ∂Λuu 0 − n l,sX=1  Z  Z  2 ∞ ll 2 1 ∂ω (Σ, Λ,x) 2 2 δlu + π x dx Λ0,ll Λll 2 Λ0,ll Λll . √n 0 ∂Λuu − − Λll − Xl=1  Z   Xl=1  o 49 Therefore, solving for Σ¯ and Λ,¯ we obtain:

¯ 2 2 ∞ ls ¯ ¯ T Λll ∂ω (Σ, Λ,x) Λ¯ ll =Λ0,ll + dx Σls,tdt Σ¯ lsT √n( 0 ∂Λuu 0 − l,sX=1  Z  Z  2 ∞ ll ¯ ¯ ∂ω (Σ, Λ,x) 2 2 + π x dx Λ0,ll Λ¯ ll (1 + op(1)), for l = 1, 2, 0 ∂Λuu − ) l=1  Z   XT 1 1 1 Σ¯ = Σ dt + O (n− 2 )=Σ + O (n− 2 ), for l,s = 1, 2. ls T ls,t p 0,ls p Z0 Further, applying Theorem 2 in Xiu (2010), we have

Σˆ Σ¯ = o (1), and Λˆ Λ¯ = o (1), ls − ls p ll − ll p hence consistency is established. To find the central limit theorem, we do the usual “sandwich” calculations. Denote,

¯ ¯ ¯ ∂ΨΣ11 ∂ΨΣ11 ∂ΨΣ11 ¯ ∂Σ11 ∂Σ12 ∂Σ22 ∂ΨΣ ¯ ¯ ¯ P ∂ΨΣ0 =  ∂ΨΣ12 ∂ΨΣ12 ∂ΨΣ12  , ∂Σ ∂Σ11 ∂Σ12 ∂Σ22 −→ ∂Σ ∂Ψ¯ Σ ∂Ψ¯ Σ ∂Ψ¯ Σ  22 22 22   ∂Σ11 ∂Σ12 ∂Σ22    where

¯ ∞ ij ∞ ij ∂ΨΣ0,uv T ∂ω (Σ0, Λ0,x) T ∂ω (Σ0, Λ0,x) = dx dx 1{u6=v} ∂Σij − 2 0 ∂Σuv − 2 0 ∂Σuv  Z   Z  and Σ0 denotes the true parameter value. So, the central limit theorem is:

1 T Σ11 Σ11,tdt 1 1 T 0 L 4 4 − 1 T X n (Σ Σ0)= n Σ Σ dt MN(0,VQ), −  12 − T R0 12,t  −→ Σb 1 T Σ dt  22 − T R0 22,t  b  b  R where b 1 ∂Ψ¯ −1 ∂Ψ¯ −1 ′ Π = Σ0 Avar(2)(Σ ) + Avar(3)(Σ ) + Avar(4)(Σ ) Σ0 . Q 4 ∂Σ 0 0 0 ∂Σ       Note that

1 ∂Ψ 2 Λ0 − 2Λ0,11 = 1 , ∂Λ 2 − 2Λ0,22 ! hence the CLT for Λ follows immediately from (A.5). This concludes the proof of Theorem 2.

A.3 Proof of Corollaryb 1

i.i.d Denote ∆ = ∆(1¯ + ξ ) and ξ is O (1). Note that i i i ∼ p

Ω= Ω+¯ ∆Σ¯ Ξ, ⊗

50 ¯ where Ξ = diag(ξ1,...,ξi,...,ξn), and Ω is the covariance matrix in the equidistant case with ∆ replaced by ∆.¯ It turns out that

Ω−1 =(Ω(¯ I + ∆¯ Ω¯ −1Σ Ξ))−1 = (I + ∆¯ Ω¯ −1Σ Ξ)−1Ω¯ −1 ⊗ ⊗ ∞ =Ω¯ −1 + ( 1)k∆¯ k(Ω¯ −1Σ Ξ)kΩ¯ −1. − ⊗ Xk=1 For any θ ,θ Σ , Σ , Σ , we have 1 2 ∈ { 11 12 22} ∂Ω ∂Ω¯ ∂Σ = + ∆¯ Ξ ∂θ1 ∂θ1 ∂θ1 ⊗ hence, ∂ log(det Ω) ∂Ω E = E tr Ω−1 ∂θ1 ∂θ1  ∂Ω¯     ∂Ω¯ ∂Σ =E tr Ω¯ −1 + ∆¯ E tr Ω¯ −1Σ ΞΩ¯ −1 + Ω¯ −1 Ξ ∂θ1 ⊗ ∂θ1 ∂θ1 ⊗      ∂Ω¯ ∂Σ  + ∆¯ 2E tr (Ω¯ −1Σ Ξ)2Ω¯ −1 Ω¯ −1Σ ΞΩ¯ −1 Ξ + o(∆¯ 2). ⊗ ∂θ1 − ⊗ ∂θ1 ⊗    Because E(Ξ) = 0, ∂Ω¯ ∂Σ E tr Ω¯ −1Σ ΞΩ¯ −1 + Ω¯ −1 Ξ = 0. ⊗ ∂θ1 ∂θ1 ⊗    Also, ∂Ω¯ ∂Σ E tr (Ω¯ −1Σ Ξ)2Ω¯ −1 Ω¯ −1(Σ Ξ)Ω¯ −1( Ξ) ⊗ ∂θ1 − ⊗ ∂θ1 ⊗   ∂Ω¯ ∂Σ  =tr Ω¯ −1(Σ I)D(Σ I)Ω¯ −1 Ω¯ −1(Σ I)D( I) var(ξ), ⊗ ⊗ ∂θ1 − ⊗ ∂θ1 ⊗   where diag(Ω¯ −1) diag(Ω¯ −1) D = 11 12 diag(Ω¯ −1) diag(Ω¯ −1)  12 22  ¯ −1 ¯ −1 and Ωij is the (i, j) block of the Ω . Therefore, 2 2 ¯ ¯ ∂ L 1 ∂ log(det Ω) 1 ¯ −1 ∂Ω ¯ −1 ∂Ω ¯ ¯ ¯ 2 E = E = tr Ω Ω + φθ1,θ2 (Σ, Ω, ∆)var(ξ)+ o(∆ ) − ∂θ1θ2 −2 ∂θ1∂θ2 2 ∂θ2 ∂θ1       where ¯ ¯ ¯ 1 ∂ ¯ −1 ¯ −1 ∂Ω ¯ −1 ∂Σ ¯ 2 φθ1,θ2 (Σ, Ω, ∆) = tr Ω (Σ I)D(Σ I)Ω Ω (Σ I)D( I) ∆ . −2 ∂θ2 ⊗ ⊗ ∂θ1 − ⊗ ∂θ1 ⊗   In fact, we can show that

¯ ¯ ¯ 3/2 φθ1,θ2 (Σ, Ω, ∆) = o(∆ ).

Hence, the new fisher information converges to the previous one given in the proof of Theorem 1, as ∆¯ 0, which concludes the proof. →

51 A.4 Proof of Theorem 3

Since n1 >> n2, we have:

∆n1 0 0 0 1:m · · · n1 .. .  0 ∆m+1:2m 0 . .  n1,n2 n2,n1 ′ . ∆ = (∆ ) =  0 0 ∆n1 .. 0  ,  2m+1:3m   ......   . . . . 0   n1   0 0 0∆   · · · n1−m+1:n1 n1×n2   where ∆n1 = (∆n1 , ∆n1 ,..., ∆n1 )′ is a m dimensional vector. km+1:(k+1)m km+1 km+2 (k+1)m − Using similar orthogonal matrices U n1 , and U n2 , such that

U n1 (U n1 )′ V V Ω = 11 12 =: V, U n2 (U n2 )′ V ′ V      12 22  where

nk 2 i j uij = sin · π , i, j = 1,...,nk, k = 1, 2, nk + 1 nk + 1 r   ¯ j Vii = diag(µij) = diag Σii∆i + 2Λii 1 cos π , i = 1, 2, and j = 1,...,ni, − ni + 1 12 n1 n1,n2 n2 ′    V12 = (vi,j)=Σ12U ∆ (U ) .

So for i = 1, 2,...,n1, j = 1, 2,...,n2, we have

n1 n2 ml 12 ¯ n1 n2 ¯ n2 n1 vi,j =Σ12∆1 ui,kuj,[(k−1)/m]+1 =Σ12∆1 uj,l ui,k Xk=1 Xl=1 k=mX(l−1)+1 imπ 1 sin 2(1+n1) = Σ ∆¯ (n + 1)(n + 1) iπ 12 1 s 1 2 sin 2(1+n1) n2(j+jn1−im−in1)π (i−j)π n2(j+jn1+im+in1)π (i+j)π sin 2(1+n1)(1+n2) cos 2 sin 2(1+n1)(1+n2) cos 2 . sin (j+jn1−im−in1)π − sin (j+jn1+im+in1)π  2(1+n1)(1+n2) 2(1+n1)(1+n2)  Moreover,

(U n1 )′ U n1 Ω−1 = V −1 (U n2 )′ U n2     where by the Woodbury formula:

(V V V −1V ′ )−1 (V V V −1V ′ )−1V V −1 V −1 = 11 − 12 22 12 − 11 − 12 22 12 12 22 V −1V ′ (V V V −1V ′ )−1 V −1 + V −1V ′ (V V V −1V ′ )−1V V −1  − 22 12 11 − 12 22 12 22 22 12 11 − 12 22 12 12 22  V −1 + V −1V (V V ′ V −1V )−1V ′ V −1 V −1V (V V ′ V −1V )−1 = 11 11 12 22 − 12 11 12 12 11 − 11 12 22 − 12 11 12 . (V V ′ V −1V )−1V ′ V −1 (V V ′ V −1V )−1  − 22 − 12 11 12 12 11 22 − 12 11 12  Then, for any θ ,θ Σ , Σ , Σ , 1 2 ∈ { 11 12 22} ∂ log(det Ω) ∂Ω ∂V = tr Ω−1 = tr V −1 ∂θ1 ∂θ1 ∂θ1     52 so ∂2L 1 ∂2 log(det Ω) 1 ∂ ∂V E = = tr V −1 − ∂θ1∂θ2 −2 ∂θ1∂θ2 −2 ∂θ2 ∂θ1     Hence, our goal now is to calculate the following expressions, as n ,n . 1 2 −→ ∞ ∂V ∂V ∂V tr V −1 , tr V −1 , and tr V −1 . ∂Σ11 ∂Σ12 ∂Σ22       Since n1 >> n2, we have

−1 −1 ∂V ¯ ′ −1 tr V =∆2tr V22 V12V11 V12 ∂Σ22 −      −1 =∆¯ tr diag V V ′ V −1V (1 + o(1)) 2 22 − 12 11 12 n2 n1   −1  =∆¯ µ (v12)2µ−1 (1 + o(1)). 2 2j − i,j 1i Xj=1  Xi=1  Notice that 2 n1 2 2 n1 imπ Σ ∆¯ sin 2(1+n1) (v12)2µ−1 = 12 1 i,j 1i (n + 1)(n + 1) iπ 1 2 sin 2(1+n1) ! Xi=1 Xi=1 n2(j+jn1−im−in1)π (i−j)π n2(j+jn1+im+in1)π (i+j)π 2 sin 2(1+n1)(1+n2) cos 2 sin 2(1+n1)(1+n2) cos 2 µ−1 (j+jn1−im−in1)π − (j+jn1+im+in1)π 1i sin 2(1+n1)(1+n2) sin 2(1+n1)(1+n2) ! We can prove that the dominant terms in the second brackets of the last summation are contributed by those is such that either (j + jn im in )π (j + jn + im + in )π sin 1 − − 1 or sin 1 1 2(1 + n1)(1 + n2) 2(1 + n1)(1 + n2) is close to 0. As n >> n , for each 1 j n1/2+δ, there exists m different is [1,n ] such that 1 2 ≤ ≤ 2 ∈ 1 (j + jn im in )π (j + jn + im + in )π sin 1 − − 1 0, or sin 1 1 0. 2(1 + n1)(1 + n2) ≈ 2(1 + n1)(1 + n2) ≈ On the other hand,

imπ 2 sin 2(1+n1) m2, when i [1,n1/2+δ] iπ ≈ ∈ 2 sin 2(1+n1) ! and decrease rapidly once i>n . Hence, the dominant term when j [1,n1/2+δ] is the one with 2 ∈ 2 i j, in which case ≈ n2(j+jn1−im−in1)π (i−j)π n2(j+jn1+im+in1)π (i+j)π sin 2(1+n1)(1+n2) cos 2 sin 2(1+n1)(1+n2) cos 2 n2, and 1 (j+jn1−im−in1)π ≈ (j+jn1+im+in1)π ≈− sin 2(1+n1)(1+n2) sin 2(1+n1)(1+n2) Therefore,

n1 12 2 −1 −1 −1 −1 2 2 (vi,j) µ1i = n1 n2 µ1j Σ12T (1 + o(1)) Xi=1 53 so as n , n and m , 1 →∞ 2 →∞ →∞ n2 n1 ¯ −1 1 −1 ∂V ∆2 12 2 −1 tr V = µ2j (vi,j) µ1i √n2 ∂Σ22 √n2 −   Xj=1  Xi=1  ∞ Σ T 2 11 dx −→ Σ T (Σ T + Λ π2x2) Σ2 T 2 Z0 11 22 22 − 12 Similarly, we can derive

−1 −1 ∂V ¯ −1 −1 ′ −1 ′ −1 tr V =∆1 tr V11 + tr V11 V12 V22 V12V11 V12 V12V11 ∂Σ11 − !         n1 n2 n1 n1 −1 ¯ −1 12 2 −1 12 2 −2 =∆1 µ1i + µ2j (vi,j) µ1i (vi,j) µ1i (1 + o(1)) − !! Xi=1 Xj=1  Xi=1  Xi=1 ∞ Σ2 T = √n 12 dx 2 Σ (Σ (Σ T + Λ π2x2) Σ2 T ) Z0 11 11 22 22 − 12 ∞ T + √n dx (1 + o(1)) 1 Σ T + Λ π2x2 Z0 11 11 ! and

−1 ∂V 2 −1 ′ −1 −1 ′ tr V = tr V11 V12(V22 V12V11 V12) V12 ∂Σ12 Σ12 − −     ∞ 2Σ T = 12 dx √n (1 + o(1)). Σ (Σ T + Λ π2x2) Σ2 T 2 Z0 11 22 22 − 12 ! −1 ∂V Notice that because n1 >> n2, tr V ∂Σ11 is dominated by the O(√n1) term, hence the conver- gence rate of the liquid asset is not affected by the illiquid asset. The Fisher information matrix can then be constructed by calculating the following derivatives multiplied by appropriate rates:

∞ 2 Σ 1 ∂ −1 ∂V T I11 = lim tr V = dx, n1→∞,n2→∞ −2√n ∂Σ ∂Σ 2(Σ T + Λ π2x2)2 1 11 11 Z0 11 11   ∞ 2 2 2 Σ 1 ∂ −1 ∂V T (Σ11(Σ22T + Λ22π x )+Σ12T ) I22 = lim tr V = dx, n1→∞,n2→∞ −2√n ∂Σ ∂Σ (Σ (Σ T + Λ π2x2) Σ2 T )2 2 12 12 Z0 11 22 22 12   ∞ 2 − Σ 1 ∂ −1 ∂V Σ11Σ12T I23 = lim tr V = dx, n1→∞,n2→∞ −2√n ∂Σ ∂Σ (Σ (Σ T + Λ π2x2) Σ2 T )2 2 22 12 Z0 11 22 22 12   ∞ 2 4 − Σ 1 ∂ −1 ∂V Σ11T I33 = lim tr V = 2 2 2 2 2 dx, n1→∞,n2→∞ −2√n2 ∂Σ22 ∂Σ22 2(Σ T (Σ T + Λ π x ) Σ T ) Z0 11 22 22 − 12 Σ Σ   I12 = I13 = 0.

Hence,

Σ −1 I11 0 0 Π = 0 IΣ IΣ , A  22 23  0 IΣ · 33   and this concludes the proof.

54 Journal of Econometrics 162 (2011) 149–169

Contents lists available at ScienceDirect

Journal of Econometrics

journal homepage: www.elsevier.com/locate/jeconom

Multivariate realised kernels: Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading✩

Ole E. Barndorff-Nielsen a,b, Peter Reinhard Hansen c, Asger Lunde d,b,∗, Neil Shephard e,f a The T.N. Thiele Centre for Mathematics in Natural Science, Department of Mathematical Sciences, University of Aarhus, Ny Munkegade, DK-8000 Aarhus C, Denmark b CREATES, University of Aarhus, Denmark c Department of Economics, Stanford University, Landau Economics Building, 579 Serra Mall, Stanford, CA 94305-6072, USA d School of Economics and Management, Aarhus University, Bartholins Allé 10, DK-8000 Aarhus C, Denmark e Oxford-Man Institute, University of Oxford, Eagle House, Walton Well Road, Oxford OX2 6ED, UK f Department of Economics, University of Oxford, UK article info a b s t r a c t

Article history: We propose a multivariate realised kernel to estimate the ex-post covariation of log-prices. We show this Received 14 July 2010 new consistent estimator is guaranteed to be positive semi-definite and is robust to measurement error of Received in revised form certain types and can also handle non-synchronous trading. It is the first estimator which has these three 14 July 2010 properties which are all essential for empirical work in this area. We derive the large sample asymptotics Accepted 28 July 2010 of this estimator and assess its accuracy using a Monte Carlo study. We implement the estimator on some Available online 27 January 2011 US equity data, comparing our results to previous work which has used returns measured over 5 or 10 min intervals. We show that the new estimator is substantially more precise. Keywords: HAC estimator © 2011 Elsevier B.V. All rights reserved. Long run variance estimator Market frictions Quadratic variation Realised variance

1. Introduction throughout the paper. These observations could be trades or quote updates. The observation times for the i-th asset will be written as The last seven years has seen dramatic improvements in the (i) (i) (i) (i) t1 , t2 ,.... This means the available database of prices is X (tj ), way econometricians think about time-varying financial volatility, for j = 1, 2,..., N(i)(1), and i = 1, 2,..., d. Here N(i)(t) counts first brought about by harnessing high frequency data and then by the number of distinct data points available for the i-th asset up to mitigating the influence of market microstructure effects. Extend- time t. ing this work to the multivariate case is challenging as this needs to X is assumed to be driven by Y , the efficient price, abstracting additionally remove the effects of non-synchronous trading while from market microstructure effects. The efficient price is modelled simultaneously requiring that the covariance matrix estimator be as a Brownian semimartingale (Y ∈ BSM) defined on some filtered positive semi-definite. In this paper we provide the first estimator probability space (Ω, F ,(Ft ), P), which achieves all these objectives. This will be called the multi- ∫ t ∫ t variate realised kernel, which we will define in Eq. (1). Y (t) = a(u)du + σ (u)dW (u), We study a d-dimensional log price process X = (X (1), X (2),..., 0 0 X (d))′. These prices are observed irregularly and non-synchronous where a is a vector of elements which are predictable locally over the interval [0, T ]. For simplicity of exposition we take T = 1 bounded drifts, σ is a càdlàg volatility matrix process and W is a vector of independent Brownian motions. For reviews of the econometrics of this type of process see, for example, Ghysels et al. ✩ The second and fourth author are also affiliated with CREATES, a research center (1996). If Y ∈ BSM then its ex-post covariation, which we will funded by the Danish National Research Foundation. The Ox language of Doornik focus on for reasons explained in a moment, is (2006) was used to perform the calculations reported here. We thank Torben ∫ 1 Andersen, Tim Bollerslev, Ron Gallant, Xin Huang, Oliver Linton and Kevin Sheppard ′ for comments on a previous version of this paper. [Y ](1) = Σ(u)du, where Σ = σ σ , ∗ Corresponding author at: School of Economics and Management, Aarhus 0 University, Bartholins Allé 10, DK-8000 Aarhus C, Denmark. where E-mail addresses: [email protected] (O.E. Barndorff-Nielsen), n − ′ [email protected] (P.R. Hansen), [email protected] (A. Lunde), [Y ](1) = plim {Y (tj) − Y (tj−1)}{Y (tj) − Y (tj−1)} , n→∞ [email protected] (N. Shephard). j=1

0304-4076/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.07.009 150 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

(e.g. Protter (2004, p. 66–77) and Jacod and Shiryaev (2003, p. 51)) for a general form of noise that is consistent with the empirical for any sequence of deterministic synchronized partitions 0 = t0 features of tick-by-tick data. For this reason we adopt a larger < t1 < ··· < tn = 1 with supj{tj+1 − tj} → 0 for n → ∞. This is bandwidth that has the implication that our multivariate realised the quadratic variation of Y . kernel estimator converges at rate n1/5. Although this rate is The contribution of this paper is to construct a consistent, slower than n1/4 it is, from a practical viewpoint, important to positive semi-definite (psd) estimator of [Y ](1) from our database acknowledge that there are only 390 one-minute returns in a of asset prices. The challenges of doing this are three-fold: typical trading day, while many shares trade several thousand (i) there are market microstructure effects U = X − Y , (ii) the times, and 3901/4 < 20001/5. So the rates of convergence will data is irregularly spaced and non-synchronous, (iii) the market not (alone) tell us which estimators will be most accurate in microstructure effects are not statistically independent of the Y practice — even for the univariate estimation problem. In addition process. to being robust to noise with a general form of dependence, the Quadratic variation is crucial to the economics of financial n1/5 convergence rate enables us to construct an estimator that risk. This is reviewed by, for example, Andersen et al. (2010) is guaranteed to psd, which is not the case for the estimator and Barndorff-Nielsen and Shephard (2007), who provide very by Barndorff-Nielsen et al. (2008). Moreover, our analysis of extensive references. The economic importance of this line of irregularly spaced and non-synchronous observations causes the research has recently been reinforced by the insight of Bollerslev asymptotic distribution of our estimator to be quite different from et al. (2009) who have showed that expected stock returns seem that in Barndorff-Nielsen et al. (2008). We discuss the differences well explained by the variance risk premium (the difference between these estimators in greater details in Section 6.1. between the implied and realised variance) and this risk premium The structure of the paper is as follows. In Section 2 we is only detectable using the power of high frequency data. See also synchronise the timing of the multivariate data using what we the papers by Drechsler and Yaron (2011), Fleming et al. (2003) and call Refresh Time. This allows us to refine high frequency returns de Pooter et al. (2008). and in turn the multivariate realised kernel. Further we make Our analysis builds upon earlier work on the effect of noise on precise the assumptions we make use of in our theorems to univariate estimators of [Y ](1) by, amongst others, Zhou (1996), study the behaviour of our statistics. In Section 3 we give a Andersen et al. (2000), Bandi and Russell (2008), Zhang et al. detailed discussion of the asymptotic distribution of realised (2005), Hansen and Lunde (2006), Hansen et al. (2008), Kalnina kernels in the univariate case. The analysis is then extended and Linton (2008), Zhang (2006), Barndorff-Nielsen et al. (2008), to the multivariate case. Section 4 contains a summary of a Renault and Werker (2011), Hansen and Horel (2009), Jacod et al. simulation experiment designed to investigate the finite sample (2009) and Andersen et al. (2011). The case of no noise is dealt properties of our estimator. Section 5 contains some results with in the same spirit as the papers by Andersen et al. (2001), from implementing our estimators on some US stock price data Barndorff-Nielsen and Shephard (2002, 2004), Mykland and Zhang taken from the TAQ database. We analyse up to 30 dimensional (2006, 2009) Goncalves and Meddahi (2009) and Jacod and Protter covariance matrices, and demonstrate efficiency gains that are (1998). around 20-fold compared to using daily data. This is followed by A distinctive feature of multivariate financial data is the a section on extensions and further remarks, while the main part phenomenon of non-synchronous trading or nontrading. These of the paper is finished by a conclusion. This is followed by an two terms are distinct. The first refers to the fact that any two assets Appendix which contains the proofs of various theorems given in rarely trade at the same instant. The latter to situations where the paper, and an Appendix with results related to Refresh Time one assets is trading frequently over a period while some other sampling. More details of our empirical results and simulation assets do not trade. The treatment of non-synchronous trading experiments are given in a web appendix which can be found at effects dates back to Fisher (1966). For several years researchers http://mit.econ.au.dk/vip_htm/alunde/BNHLS/BNHLS.htm. focused mainly on the effects that stale quotes have on daily closing prices. Campbell et al. (1997, Chapter 3) provides a survey of 2. Defining the multivariate realised kernel this literature. When increasing the sampling frequency beyond the inter-hour level several authors have demonstrated a severe 2.1. Synchronising data: refresh time bias towards zero in covariation statistics. This phenomenon is often referred to as the Epps effect. Epps (1979) found this bias Non-synchronous trading delivers fresh (trade or quote) prices for stock returns, and it has also been demonstrated to hold at irregularly spaced times which differ across stocks. Dealing with for foreign exchange returns, see Guillaume et al. (1997). This non-synchronous trading has been an active area of research in is confirmed in our empirical work where realised covariances financial econometrics in recent years, e.g. Hayashi and Yoshida computed using high frequency data, over specified fixed time (2005), Voev and Lunde (2007) and Large (unpublished paper). periods such as 15 s, dramatically underestimate the degree of Stale prices are a key feature of estimating covariances in financial dependence between assets. Some recent econometric work on econometrics as recognised at least since Epps (1979), for they this topic includes Malliavin and Mancino (2002), Reno (2003), induce cross-autocorrelation amongst asset price returns. Martens (unpublished paper), Hayashi and Yoshida (2005), Bandi Write the number of observations in the i-th asset made up to and Russell (unpublished paper), Voev and Lunde (2007), Griffin time t as the counting process N(i)(t), and the times at which trades and Oomen (2011) and Large (unpublished paper). We will draw (i) (i) are made as t1 , t2 ,.... We now define refresh time which will be ideas from this work. key to the construction of multivariate realised kernels. This time Our estimator, the multivariate realised kernel, differs from the scale was used in a cointegration study of price discovery by Harris univariate realised kernel estimator by Barndorff-Nielsen et al. et al. (1995), and Martens (unpublished paper) has used the same (2008) in important ways. The latter converges a rate n1/4 but idea in the context of realised covariances. critically relies on the assumption that the noise is a white noise process, and Barndorff-Nielsen et al. (2008) stress that their Definition 1. Refresh Time for t ∈ [0, 1]. We define the first re- = (1) (d) estimator cannot be applied to tick-by-tick data. In order not to fresh time as τ1 max(t1 ,..., t1 ), and then subsequent refresh be in obvious violation of the iid assumption, Barndorff-Nielsen times as et al. (2008) apply their estimator to prices that are (on average) (1) (d) τj+1 = max(t (1) ,..., t (d) ). sampled every minute or so. Here, in the present paper, we allow N +1 N +1 τj τj O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 151

kernels (RK). It takes on the following form

−n  h  K(X) = k Γh, H h=−n (1) n = − ′ ≥ where Γh xjxj−h, for h 0, j=h+1 = ′ and Γh Γ−h for h < 0. Here Γh is the h-th realised autocovariance and k : R y R is a non-stochastic weight function. We focus on Fig. 1. This figure illustrates Refresh Time in a situation with three assets. The dots the class of functions, K, that is characterised by: (i) represent the times {t }. The vertical lines represent the sampling times generated j ′ from the three assets with refresh time sampling. Note, in this example, N = 7, Assumption K. (i) k(0) = 1, k (0) = 0; (ii) k is twice differentiable ∞ n(1) = 8, n(2) = 9 and n(3) = 10. 0,0  2 1,1 with continuous derivatives; (iii) define k• = k(x) dx, k• = ∞ ∞ 0  k′(x)2dx, and k2,2 =  k′′(x)2dx then k0,0, k1,1, k2,2 < ∞; The resulting Refresh Time sample size is N, while we write n(i) = 0 • 0 • • •  ∞ ≥ ∈ N(i)(1). (iv) −∞ k(x) exp(ixλ)dx 0 for all λ R. The assumption k(0) = 1 means Γ0 gets unit weight, while The τ is the time it has taken for all the assets to trade, i.e. all ′ 1 k (0) = 0 means the kernel gives close to unit weight to Γh for their posted price have been updated. τ2 is the first time when all small values of |h|. Condition (iv) guarantees K(X) to be positive the prices are again refreshed. This process is displayed in Fig. 1 for semi-definite, (e.g. Bochner’s theorem and Andrews (1991)). = d 3. The multivariate realised kernel has the same form as a { } Our analysis will now be based on this time clock τj . Our standard heteroskedasticity and autocorrelated (HAC) covariance approach will be to: matrix estimator familiar in econometrics (e.g. Gallant (1987), • Assume the entire vector of up to date prices are seen at these Newey and West (1987) and Andrews (1991)). But there are a refreshed times X(τj), which is not correct — for we only see a number of important differences. For example, the sums that single new price and d − 1 stale prices.1 define the realised autocovariances are not divided by the sample • Show these stale pricing errors have no impact on the asymp- size and k′(0) = 0 is critical in our framework. Unlike the situation totic distribution of the realised kernels. in the standard HAC literature, an estimator based on the Bartlett This approach to dealing with non-synchronous data converts kernel will not be consistent for the ex-post variation of prices, the problem into one where the Refreshed Times’ sample measured by quadratic variation, in the present setting. Later we size N is determined by the degree of non-synchronicity and will recommend using the Parzen kernel (its form is given in n(1), n(2),..., n(d). The degree to which we keep data is measured Table 1) instead. by the size of the retained data over the original size of the In some of our results we use the following additional assump- = ∑d (i) tion on the Brownian semimartingale. database. For Refresh Time this is p dN/ i=1 n . For the data in Fig. 1, p = 21/27 ≃ 0.78. Assumption SH. Assume µ and σ are bounded and we will write σ = sup |σ (t)|. 2.2. Jittering end conditions + t∈[0,1] This can be relaxed to locally bounded if (µ, σ ) is an Ito process It turns out that our asymptotic theory dictates that we need to — e.g. Mykland and Zhang (forthcoming). average m prices at the very beginning and end of the day to obtain a consistent estimator.2 The theory behind this will be explained 2.4. Some assumptions about refresh time and noise in Section 6.4, where experimentation suggests the best choice for m is around two for the kind of data we see in this paper. Having defined the positive semi-definite realised kernel, we Now we define what we mean by jittering. Let n, m ∈ N, with will now write out our assumptions about the refresh times {τN,j} n − 1 + 2m = N, then set the vector observations X0, X1,..., Xn and the market microstructure effects U that govern the properties as Xj = X(τN,j+m), j = 1, 2,..., n − 1, and of the vector returns {xj} and so K(X). 1 −m 1 −m X0 = X(τN,j) and Xn = X(τN,N−m+j). 2.4.1. Assumptions about the refresh time m m j=1 j=1 We use the subscript-N to make the dependence on N explicit. − = So X0 and Xn are constructed by jittering initial and final time Note that N is random and we write the durations as τN,i τN,i−1 points. By allowing m to be moderately large but very small in = DN,i 1N,i N for all i. comparison with n, it means these observations record the efficient We make the following assumptions about the durations price without much error, as the error is averaged away. These between observation times. prices allow us to define the high frequency vector returns: xj = p X − X − , j = 1, 2,..., n, that the realised kernels are built out of. r | → ≤ j j 1 Assumption D. (i) That E(DN,⌊tN⌋ FτN,⌊tN⌋−1 ) ~r (t), 0 < r 2, as N → ∞. Here we assume ~r (t) are strictly positive càdlàg 1/2 2.3. Realised kernel processes adapted to {Ft }; (ii) maxi∈{j+1,...,j+R} DN,i = op(R ) for any j; (iii) τN,0 ≤ 0 and τN,N+1 ≥ 1. Having synchronised the high frequency vector returns {xj} we can define our class of positive semi-definite multivariate realised Remark 1. If we have Poisson sampling the durations are expo- nential and the maxi∈{1,...,N} 1N,i = Op(log(N)/N), so maxi∈{1,...,N} D = O log N . Note both Barndorff-Nielsen et al. (2008) 1 Their degree of staleness will be limited by their Refresh Time construction to N,i p( ( )) a single lag in Refresh Time. The extension to a finite number of lags is given in and Mykland and Zhang (2006) assume that maxi∈{1,...,N} DN,i = Section 6.5. Op(1). Phillips and Yu (unpublished paper) provided a novel analy- 2 This kind of averaging appears in, for example, Jacod et al. (2009). sis of realised volatility under random times of trades. We use their 152 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Assumption D here, applied to the realised kernel. Deriving results It is convenient to define the average long run variance of U by for realised volatility under random times of trades is an active ∫ 1 3 research area. Ω = ΣU (u)du, 0 Example 1 (Refresh Time). If each individual series has trade times which is a d × d matrix. When d = 1 we write ω2 in place of which arrive as independent Poisson processes with the same 2 2 Ω, σ (t) in place of ΣU (t), etc. ω appears frequently later. It (j) i.i.d. U intensity λN, then their scaled durations are D ∼ exp(λ), j = reflects the variance of the average noise a frequent trader would L (1) 1, 2,..., d, so the refresh time durations are DN,i = max{D ,..., be exposed to. D(d)}, and so (e.g. Embrechts et al. (1997, p. 189)) the refresh − = 1 3. Asymptotic results times have the form of a renewal process τN,i τN,i−1 N DN,i, L ∑d 1 (j) −1 ∑d −1 DN,i = D . In particular ~1(t) = λ j , ~2(t) = j=1 j j=1 3.1. Consistency −2{∑d −2+ ∑d −1 2} λ j=1 j ( j=1 j ) . Of interest is how these terms change as d increases. The former is the harmonic series and divergent at We note that the multivariate realised kernel can be written as 2 the slow rate log(d). The conditional variance converges to π as 6λ2 K(X) = K(Y ) + K(Y , U) + K(U, Y ) + K(U), ~ (t) − d → ∞, so lim 2 → 1. For d = 1, t = 1 and − D→∞ ~ (t) ~1( ) λ n 1   1 − h − ′ −2 −1 = ~ (t) = 2λ , so ~ (t)/~ (t) = 2λ . where K(Y , U) k yjuj−h, 2 2 1 H h=−n+1 j

2.4.2. Assumptions about the noise with yj and uj defined analogous to the definition of xj. This implies The assumptions about the noise are stated in observations time immediately that — that is we only model the noise at exactly the times where = there are trades or quote updates. This follows, for example, Zhou Theorem 1. Let K hold and√ suppose that K(Y ) Op(1). Then (1998), Bandi and Russell (unpublished paper), Zhang et al. (2005), K(X) − K(Y ) = K(U) + Op( K(U)). Barndorff-Nielsen et al. (2008) and Hansen and Lunde (2006). Theorem 1 is a very powerful result for dealing with endoge- We define the noise associated with X(τ ) at the observation N,j nous noise. Note whatever the relationship between Y and U, if time τN,j as UN,j = X(τN,j) − Y (τN,j). p p K(U) → 0 then K(X) − K(Y ) = op(1), so if also K(Y ) →[Y ] then p Assumption U. Assume the component model K(X) →[Y ]. Hansen and Lunde (2006) have shown that endoge- nous noise is empirically important, particularly for mid-quote UN,i = vN,i + ζN,i, ∞ data. The above theorem means endogeneity does not matter for − consistency. What matters is that the realised kernel applied to the where v = ψ (τ − − )ϵ − , N,i h N,i 1 h N,i h noise process vanishes in probability. h=0 Because realised kernels are built out of these n high frequency = −1/2[ ˜ − ˜ ] with ϵN,i 1N,i W (τN,i) W (τN,i−1) . returns, it is natural to state asymptotic results in terms of n (rather ˜ than N). Here W is a standard Brownian motion and {ζN,i} is a sequence | = of independent random variables, with E(ζN,i FτN,i−1 ) 0 and Lemma 1. Let K, SH, D, and U hold. Then as H, n, m → ∞ with | = η var(ζN,i FτN,i−1 ) Σζ (τN,i−1). Further, ϵN,i and ζN,i are assumed to m/n → 0,H = c0n , c0 > 0, and η ∈ (0, 1) be independent, while (ψh, Σζ ) are bounded and adapted to {Ft }, ∑∞ 2 with |ψ (t)| < ∞ a.s. uniformly in t. We also assume that H p ′′ j=0 j K(X) → |k (0)|Ω, if η < 1/2, (2) Dyζ . n ∫ 1 ˜ − ′′ Remark 2. The auxiliary Brownian motion W facilitates a general K(X) = Σ(u)du + c 2|k (0)|Ω + O (1), if η = 1/2, (3) ˜ 0 p form of endogenous noise through correlation between W and the 0 Brownian motion, W , that drives the underlying process, Y . In fact, 1 p ∫ the case W˜ = W is permitted under our assumptions. K(X) → Σ(u)du, if η > 1/2. (4) 0

Remark 3. The standard assumption in this literature is that ψh(t) is zero for all t and h, but this assumption is known to be shallow This shows the crucial role of H. H needs to increase with n quite quickly to remove the influence on the estimator of the empirically. A ψ0(t) type term appears in Hansen and Lunde (2006, Example 1) and Kalnina and Linton (2008) in their analysis of noise. For very slow rates of increase in H the realised kernel will endogeneity and a two scale estimator. actually estimate a scaled version of the integrated long-run noise. ∞ To estimate the integrated variance our preferred approach is to The ‘‘local’’ long run variance of v is given by Σ (t) = ∑ 3/5 3/5 ν h=−∞ set H = c0n , because n is the optimal rate in the trade-off = ∑∞ ′ ≥ = γh(t), where γh(t) j=1 ψj+h(t)ψj (t) for h 0 and γh(t) between (squared) bias and variance. In the next section we give ′ γ−h(t) for h < 0, so that the local long run variance of U is given a rule for selecting the best possible c0. This approach delivers by a positive semi-definite consistent estimator which is robust to endogeneity and semi-staleness of prices. Σ (t) = Σ (t) + Σ (t). U ν ζ The results for H ∝ n1/2 achieve the best available rate of con- vergence (see below), but is inconsistent. The Op(1) remainder in 3 The earliest research on this that we know of is Jacod (1994), Mykland and (3) can, in some cases, be shown to be op(1) in which case the − Zhang (2006) and Barndorff-Nielsen et al. (2008) provided an analysis based on bias, c 2|k′′(0)|Ω, can be removed by using (2) with η = 1/5, for = 0 the assumption that DN,i Op(1), which is perhaps too strong an assumption example. However, the resulting estimator is not necessarily posi- here (see Remark 1). Barndorff-Nielsen and Shephard (2005) allowed very general spacing, but assumed times and prices were independent. More recent important tive semi-definite and we do not recommend this in applications. contributions include Hayashi et al. (unpublished paper), Li et al. (2009), Mykland To study these results in more detail we will develop a central and Zhang (forthcoming) and Jacod (unpublished paper) provide insightful analysis. limit theory for the realised kernels, which allows us to select a O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 153 sensible H. Before introducing the multivariate results, it is helpful The result looks weak compared to the corresponding result for to consider the univariate case. the flat-top kernel K F (X) introduced by Barndorff-Nielsen et al. (2008) with k′(0) = 0. They had the nicer result that4 3.2. Univariate asymptotic distribution  ∫ 1  n1/4 K F (X) − σ 2(u)du 3.2.1. Core points 0  1  Ls 8 ∫ 4 The univariate version of the main results in our paper is the → MN 0 4c k0,0IQ + k1,1 2 2 u du + k2,2 4 , 0 • • ω σ ( ) 3 • ω , following. c0 0 c0 1/2 3/5 when H = c0n , under the (far more restrictive) assumption Theorem 2. Let K, SH, D, and U hold. If n → ∞,H = c0n and − √ − that U is white noise. Hence, the implication is that the kernel m 1 = o( H/n) = o(n 1/5) we have that estimators proposed in this paper will be (asymptotically) inferior  ∫ 1  to K F (X) in the special case where U is white noise. The advantage n1/5 K X − 2 u du 3/5 ( ) σ ( ) of our estimator, which has H = c0n , is that it is based on far 0 more realistic assumptions about the noise. This has the practical  ∫ 1  Ls − ′′ ~2(u) implication that K(X) can be applied to prices that are sampled at → MN c 2|k (0)|ω2, 4c k0,0 σ 4(u) du . 0 0 • the highest possible frequency. This point is forcefully illustrated 0 ~1(u) in Section 6.1.2 where we compare the two estimators, K(X) and F Ls K (X), and show the importance of being robust to endogeneity The notation → MN means stable convergence to a mixed Gaus- and serial dependence. A simulation design shows that K(X) is far sian distribution. This notion is important for the construction of more accurate than K F (X) when the noise is serially dependent. confidence intervals and the use of the delta method. The reason Moreover, as an extra benefit of constructing our estimator from  1 4 ~2(u) is that σ (u) du is random, and stable convergence guar- K is that it ensures positive semi-definiteness. Naturally, one can 0 ~1(u) antees joint convergence that is needed here. Stable convergence always truncate an estimator to be psd, for instance by replacing is discussed, for example, in Mykland and Zhang (2006), who also negative eigenvalues with zeros. Still, we find it convenient that provide extensive references. the estimator is guaranteed to be psd, because it makes a check 3/5 The minimum mean square error of the H = c0n estimator for positive definiteness and correction for lack thereof, entirely ∗ 4/5 ∗ 4/5 3/5 redundant. is achieved by setting c0 = c ξ so H = c ξ n where Having an asymptotic bias term in the asymptotic distribution  ′′ 2 1/5 2 ∗ k (0) ω is familiar from kernel density estimation with the optimal c = , ξ 2 = √ , 0,0 bandwidth. The bias is modest so long as H increases at a faster k• IQ √ rate than n. If k′′(0) = 0 we could take H ∝ n1/2 which would ∫ 1 ~ (u) = 4 2 result in a faster rate of convergence. However, no weight function IQ σ (u) du. ′′ = 0 ~1(u) with k (0) 0 can guarantee a positive semi-definite estimate, see Andrews (1991, p. 832, comment 5). Notice that the serial dependence in the noise will impact the The following theorem rules out an important class of choice of c with ceteris paribus increasing dependence leading to 0 estimators which seems to be attractive to empirical researchers. larger values of H. Then ′ | ′′ | Lemma 2. Given U and a kernel function with k (0) ̸= 0 but other- 0,0 2 k (0) 2 c k IQ = κ , ω = κ, H p 0 • 2 wise satisfies K. Then, as n H m → ∞ we have that K U → c , , n ( ) 0 1 2|k′(0)|  {Σ (u) + γ (u)}du. where 0 ζ 0 2/5 ′′ 0,0 2 1/5 ′ κ = κ0{IQω} , κ0 = (|k (0)|(k• ) ) . Remark 4. If k (0) ̸= 0 then there does not exist a consistent K(X). This rules out, for example, the well-known Bartlett type estimator Then in this context.  1  ∫ Ls n1/5 K(X) − σ 2(u)du → MN(κ, 4κ 2). (5) 3.2.3. Choosing the bandwidth H and weight function 0 The relative efficiency of different realised kernels in this class ′′ 0,0 2 1/5 This shows both the bias and variance of the realised kernel will are determined solely by the constant |k (0)(k• ) | and so can increase with the value of the long-run variance of the noise. be universally determined for all Brownian semimartingales and Interestingly time-variation in the noise does not, in itself, change noise processes. This constant is computed for a variety of kernel the precision of the realised kernel — all that matters is the average weight functions in Table 1. This shows that the Quadratic Spectral level of the long-run variance of the noise. For the Parzen kernel we (QS), Parzen and Fejér weight functions are attractive in this | ′′ 0,0 2|1/5 have κ0 = 0.97. context. The optimal weight function minimises, k (0)(k• ) , which is also the situation for HAC estimators, see Andrews (1991). 3.2.2. Some additional comments Thus, using Andrews’ analysis of HAC estimators, it follows from our results that the QS kernel is the optimal weight function within The conditions on m is caused by end effects, as these induce the class of weight functions that are guaranteed to produce a non- a bias in K(U) that is of the order 2m−1ω2. Empirically ω2 is negative realised kernel estimate. A drawback of the QS and Fejér tiny so 2m−1 2 will be small even with m = 1, but theoreti- ω weight functions is that they, in principle, require n (all) realised cally this is an important observation. Assumption D(i) implies autocovariances to be computed, whereas the number of realised | →p − 2 Var(Dn,⌊tn⌋ Fτ⌊tn⌋−1 ) ~2(t) ~1 (t), which is non-negative. Thus autocovariances needed for the Parzen kernel is only H — hence we have the inequality, ~2(t)/~1(t) ≥ ~1(t), which means that we advocate the use of Parzen weight functions. We will discuss  1 4 ~2(u)  1 4 2 σ (u) du ≥ σ (u)~1(u)du. So the asymptotic variance estimating ξ in Section 3.4. 0 ~1(u) 0 above is higher than a process with time-varying but non- stochastic durations. The random nature of the durations inflates 4 See also Zhang (2006) who independently obtained a n1/4 consistent estimator the asymptotic variance. using a multiscale approach. 154 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Table 1 Properties of some realised kernels.

′′ 0,0 ∗ ′′ 0,0 2 1/5 Kernel function, k(x) |k (0)| k• c |k (0)(k• ) | 1  − 2 + 3 0 ≤ x ≤ 1 6x 6x 2 Parzen k(x) = 2(1 − x)3 1 12 0.269 3.51 0.97 ≤ x ≤ 1 0 2 x > 1 Quadratic spectral k(x) = 3  sin x − cos x x ≥ 0 1/5 3π/5 0.46 0.93 x2 x = sin x 2 ≥ Fejér k(x) ( x ) x 0 2/3 π/3 0.84 0.94 = 2  π −x ≥ 2 Tukey-Hanning∞ k(x) sin 2 e x 0 π /2 0.52 2.16 1.06 BNHLS (2008) k(x) = (1 + x)e−x x ≥ 0 1 5/4 0.96 1.09

′′ 0,0 2 1/5 |k (0)(k• ) | measures the relative asymptotic efficiency of K ∈ K.

3.3. Multivariate asymptotic distribution which has features in common with the noiseless case discussed in Barndorff-Nielsen and Shephard (2004, Eq. 18). By the delta To start we extend the definition of the integrated quarticity to method we can deduce the asymptotic distribution of the kernel the multivariate context based regression and correlation (extending the work of, for ∫ 1 example, Andersen et al. (2003), Barndorff-Nielsen and Shephard ~2(u) IQ = {Σ(u) ⊗ Σ(u)} du, (2004) and Dovonon et al. (forthcoming)). For example, with 1 1 0 ~1(u) (i,j) =   β 0 Σijdu/ 0 Σjjdu, 2 2 which is a d × d random matrix.  (i) (j)  K(X , X ) Ls n1/5 − β(i,j) → MN(A, B), 3/5 −1 −1/5 (j) Theorem 3. Suppose H = c0n , m = o(n ), K, SH, D, and U K(X ) then where  ∫ 1  −2 ′′ 1/5 Ls −2 ′′ 0,0 c |k (0)| n K(X) − Σ(u)du → MN{c |k (0)|Ω, 4c0k IQ}. 0 0 • A = (Ωij − Ωjjβij), 0  1 0 Σjjdu This is the multivariate extension of Theorem 2, yielding a limit and theorem for the consistent multivariate estimator in the presence 0,0 2c0k•  (i,j) of noise. The bias is determined by the long-run variance Ω, B = 1, −β  2 whereas the variance depends solely on the integrated quarticity.  1 0 Σjjdu

d [∫ 1  2  ]   Corollary 1. An implication of Theorem 3 is that for a, b ∈ we Σ Σ + Σ 2Σ Σ ~2 1 R × ii jj ij ii ji du . have 2 − (i,j) 0 2ΣjjΣij 2Σjj ~1 β  ∫ 1  ′ n1/5a K(X) − Σ(u)du b 0 3.4. Practical issues: choice of H →Ls { −2| ′′ | ′ 0,0 ′ } MN c0 k (0) a Ωb, 4c0k• vabIQvab , A main feature of multivariate kernels is that there is a single ′+ ′ bandwidth parameter H which controls the number of leads and where v = vec( ab ba ). For two different elements, a′K(X)b and ab 2 lags used for all the series. It must grow with n at rate n3/5, the key ′ 0,0 ′ c K(X)d say, their asymptotic covariance is given by 4c0k• vabIQvcd. question here is how to estimate a good constant of proportionality So once a consistent estimator for IQ is obtained, Corollary 1 — which controls the efficiency of the procedure. If we applied the univariate optimal mean square error makes it straightforward to compute a confidence interval for any bandwidth selection to each asset price individually we would get element of the integrated variance matrix. 4/5 d bandwidths H(i) = c∗ n3/5, where c∗ = {k′′ 0 2 k0,0}1/5 and √ ξi ( ) / • 2 = Example 2. In the bivariate case we can write the results as ξi Ωii/ IQii, where Σii(u) is the spot variance for the i-th asset. √ 1 In practice we usually approximate IQ by  Σ (u)du and use  ∫ 1  ii 0 ii K X (i) − du 2 =  1 ( ) Σii ξi Ωii/ 0 Σii(u)du, which can be estimated relatively easily  0   1  ∫ 1  by using a low frequency estimate of Σii(u)du and one of many 1/5  (i) (j)  →Ls 0 n K(X , X ) − Σijdu MN(A, B), (6) sensible estimators of Ωii which use high frequency data. Then we  0   ∫ 1  could construct some ad hoc rules for choosing the global H, such  (j)  as H = min(H(1),..., H(d)), H = max(H(1),..., H(d)), or K(X ) − Σjjdu min max − d 0 ¯ = 1 ∑ (i) H d i=1 H , or many others. In our empirical work we have where used H¯ , while our web Appendix provides an analysis of the impact of this choice. Ωii − ′′ An interesting alternative is to optimise the problem for a A = c 2|k (0)| Ω 0 ij portfolio, e.g. letting ι be a d-dimensional vector of ones then Ω − ′ − ′ jj d 2ι K(X)ι = K(d 1ι X), which is like a ‘‘market portfolio’’ if X and contains many assets. This is easy to carry out, for having converted everything into Refresh Time one computes the market (ι′X/ι′ι) 2Σ 2 2Σ Σ 2Σ 2  ∫ 1 ii ii ij ij ~ return and then carry out a univariate analysis on it, choosing 0,0 • + 2 2 B = 2c0k•  ΣiiΣjj Σij 2ΣiiΣji du, an optimal H for the market. This single H is then applied to the 0 •• 2 ~1 multivariate problem. 2Σjj O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 155

From the results in Example 2 it is straightforward to derive We vary λ through the following configurations (3, 6), (5, 10), the optimal choice for H, when the objective is to estimate a (10, 20), (15, 30), (30, 60), (60, 120) motivated by the kind of data covariance, a correlation, the inverse covariance matrix (which is we see in databases of equity prices. important for portfolio choice) or β(i,j). For β(1,2) the trade-off is In order to calculate K(X) we need to select H. To do this −4 ′′ 2 (1,2) 2 (i)2 (i) (i) between c |k (0)| (Ω12 − Ω22β ) , and ˆ = [ ] [ ] 0 we evaluate ωδ Xδ (1)/(2n) and X1/900 (1), the realised ∫ 1 variance estimator based on 15 min returns. These give us the ~2 2c k0,0 (Σ Σ + Σ 2 − 4β(1,2)Σ Σ + 2β(1,2)2Σ ) du. ˆ = 3/5 ˆ (i)2 [ (i) ] 2/5 0 • 11 22 12 11 22 22 following feasible values Hi cn (ωδ / X1/900 (1)) . The 0 ~1 results for Hmean are presented in Table 2. Panel A of the table reports the univariate results of estimating 4. Simulation study integrated variance. We give the bias and root mean square error (MSE) for the realised kernel and compare it to the standard So far the analysis has been asymptotic as n → ∞. Here realised variance. In the no noise case of ξ 2 = 0 the RV statistic we carry out a simulation analysis to assess the accuracy of the is quite a bit more precise, especially when n is large. The positive asymptotic predictions in finite samples. We simulate over the bias of the realised kernel can be seen when ξ 2 is quite large, but it interval t ∈ [0, 1]. is small compared to the estimator’s variance. In that situation the The following multivariate factor stochastic volatility model is realised kernel is far more precise than the realised variance. None used of these results is surprising or novel. dY (i) = µ(i)dt + dV (i) + dF (i), dV (i) = ρ(i)σ (i)dB(i), In Panel B we break new ground as it focuses on estimating the integrated covariance. We compare the realised kernel estimator  2 dF (i) = 1 − ρ(i) σ (i)dW with a realised covariance. The high frequency realised covariance is a very precise estimator of the wrong quantity as its bias is very where the elements of B are independent standard Brownian close to its very large mean square error. In this case its bias does (i) motions and W yB. Here F is the common factor, whose strength  not really change very much as n increases. is determined by 1 − (ρ(i))2. The realised kernel delivers a very precise estimator of the This model means that each Y (i) is a diffusive SV model with integrated covariance. It is downward biased due to the non- constant drift µ(i) and random spot volatility σ (i). In turn the synchronous data, but the bias is very modest when n is large (i) = (i) and its sampling variance dominates the root MSE. Taken together spot volatility obeys the independent processes σ exp(β0 + (i) (i) (i) = (i) (i) + (i) this implies the realised kernel estimators of the correlation and β1 ϱ ) with dϱ α ϱ dt dB . Thus there is perfect statistical leverage (correlation between their innovations) be- regression (beta) are strongly negatively biased — which is due to it tween V (i) and σ (i), while the leverage between Y (i) and ϱ(i) is being a non-linear function of the noisy estimates of the integrated  variance. The bias is the dominant component of the root MSE in ρ(i). The correlation between Y (1)(t) and Y (2)(t) is 1 − (ρ(1))2  the correlation case. 1 − (ρ(2))2. 5 The price process is simulated via an Euler scheme, and the fact 5. Empirical illustration that the OU-process have an exact discretisation (e.g. Glasserman (2004, pp. 110)). Our simulations are based on the following con- We analyse high-frequency assets prices for thirty assets.6 In (i) (i) (i) (i) (i) = − − figuration (µ , β0 , β1 , α , ρ ) (0.03, 5/16, 1/8, 1/40, the analysis the main focus will be on the empirical properties − (i) = (i) 2 (i) of 30 × 30 realised kernel estimates. To conserve space we will 0.3), so that β0 (β1 ) /(2α ). Throughout we have im- 1 only present detailed results for a 10 × 10 submatrix of the full posed that E( σ (i)2(u)du) = 1. The stationary distribution of 0 30 × 30 matrix. The ten assets we will focus on are Alcoa Inc. (i) ϱ is utilised in our simulations to restart the process each day at (AA), American International Group Inc. (AIG), American Express (i) ∼ − (i) −1 ϱ (0) N(0,( 2α ) ). For our design we have that the vari- Co. (AXP), Boeing Co. (BA), Bank of America Corp. (BAC), Citygroup 2 − (i) 2 (i) − ≃ ance of σ is exp( 2(β1 ) /α ) 1 2.5. This is comparable to Inc. (C), Caterpillar Inc. (CAT), Chevron Corp. (CVX), El DuPont de the empirical results found in e.g. Hansen and Lunde (2005) which Nemours & Co. (DD), and Standard & Poor’s Depository Receipt (i) motivates our choice for α . (SPY). The SPY is an exchange-traded fund that holds all of the S&P We add noise simulated as 500 Index stocks and has enormous liquidity. The sample period   N runs from January 3, 2002 to July 31, 2008, delivering 1503 distinct i i.i.d. − ( )| ∼ 2 2 = 2 −1 (i)4 days. The data is the collection of trades and quotes recorded on the Uj σ , Y N(0, ω ) with ω ξ N σ (j/N), j=1 NYSE, taken from the TAQ database through the Wharton Research Data Services (WRDS) system. We present empirical results for where the noise-to-signal ratio, ξ 2 takes the values 0, 0.001 and both transaction and mid-quote prices. 0.01. This means that the variance of the noise increases with the Throughout our analysis we will estimate quantities each volatility of the efficient price (e.g. Bandi and Russell (2006)). day, in the tradition of the realised volatility literature following To model the non-synchronously spaced data we use two Andersen et al. (2001) and Barndorff-Nielsen and Shephard (2002). independent Poisson process sampling schemes to generate the This means the target becomes functions of [Y ]s = [Y ](s) − { (i)} [ ] − ∈ times of the actual observations tj to which we apply our Y (s 1), s N. The functions we will deal with are covariances, realised kernel. We control the two Poisson processes by λ = correlations and betas. (λ1, λ2), such that for example λ = (5, 10) means that on average X (1) and X (2) is observed every 5 and 10 s, respectively. This means 5.1. Procedure for cleaning the high-frequency data that the simulated number of observations will differ between repetitions, but on average the processes will have 23,400/λ1 and Careful data cleaning is one of the most important aspects 23,400/λ2 observations, respectively. of volatility estimation from high-frequency data. Numerous

5 We normalise one second to be 1/23,400, so that the interval [0, 1] contains 6 The ticker symbols of these assets are AA, AIG, AXP, BA, BAC, C, CAT, CVX, DD, 6.5 h. In generating the observed price, we discretise [0, 1] into a number N = DIS, GE, GM, HD, IBM, INTC, JNJ, JPM, KO, MCD, MMM, MRK, MSFT, PFE, PG, SPY, T, 23,400 of intervals. UTX, VZ, WMT, and XOM. 156 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Table 2 Simulation results. Panel A: Integrated variance Series A Series B RV1m RV15m K(X) RV1m RV15m K(X) ξ 2 = 0.0 R.mse R.mse Bias R.mse R.mse R.mse Bias R.mse λ = (3, 6) 0.113 0.505 0.006 0.147 0.122 0.436 0.003 0.134 λ = (10, 20) 0.111 0.547 0.011 0.262 0.114 0.450 0.011 0.224 λ = (60, 120) 0.229 0.504 0.003 0.557 0.227 0.517 0.001 0.490 ξ 2 = 0.001 λ = (3, 6) 1.509 0.654 0.040 0.253 1.417 0.488 0.033 0.215 λ = (10, 20) 1.432 0.660 0.041 0.359 1.318 0.492 0.035 0.295 λ = (60, 120) 1.013 0.559 0.014 0.557 0.636 0.554 0.013 0.551 ξ 2 = 0.01 λ = (3, 6) 14.39 1.531 0.096 0.410 13.67 1.168 0.084 0.351 λ = (10, 20) 14.01 1.452 0.106 0.568 13.15 1.305 0.081 0.424 λ = (60, 120) 8.893 1.222 0.077 0.611 5.386 1.322 0.080 0.776

Panel B: Integrated covariance/correlation Cov1m Cov15m K(X) Covar K(X) Corr K(X) beta ξ 2 = 0.0 #rets Bias R.mse Bias R.mse Bias R.mse Bias R.mse Bias R.mse λ = (3, 6) 3121 −0.051 0.076 −0.004 0.183 −0.007 0.062 −0.012 0.016 −0.016 0.061 λ = (5, 10) 1921 −0.085 0.108 −0.006 0.183 −0.009 0.076 −0.015 0.020 −0.019 0.064 λ = (10, 20) 982 −0.160 0.186 −0.011 0.186 −0.009 0.097 −0.018 0.026 −0.023 0.084 λ = (30, 60) 332 −0.342 0.395 −0.038 0.188 −0.021 0.142 −0.028 0.042 −0.035 0.125 λ = (60, 120) 166 −0.445 0.510 −0.071 0.203 −0.034 0.189 −0.036 0.054 −0.035 0.178 ξ 2 = 0.001 λ = (3, 6) 3121 −0.046 0.091 −0.005 0.191 −0.000 0.090 −0.027 0.032 −0.034 0.085 λ = (5, 10) 1921 −0.082 0.123 −0.006 0.186 −0.002 0.099 −0.029 0.036 −0.033 0.083 λ = (10, 20) 982 −0.156 0.189 −0.010 0.195 −0.004 0.118 −0.032 0.040 −0.042 0.111 λ = (30, 60) 332 −0.344 0.400 −0.039 0.187 −0.019 0.150 −0.039 0.052 −0.049 0.153 λ = (60, 120) 166 −0.445 0.513 −0.074 0.206 −0.034 0.195 −0.044 0.060 −0.049 0.204 ξ 2 = 0.01 λ = (3, 6) 3121 −0.027 0.398 −0.009 0.263 0.000 0.123 −0.063 0.071 −0.072 0.132 λ = (5, 10) 1921 −0.073 0.431 −0.005 0.257 −0.002 0.133 −0.067 0.076 −0.082 0.149 λ = (10, 20) 982 −0.139 0.407 −0.001 0.263 −0.005 0.153 −0.074 0.084 −0.099 0.198 λ = (30, 60) 332 −0.354 0.486 −0.044 0.236 −0.017 0.180 −0.089 0.104 −0.119 0.242 λ = (60, 120) 166 −0.451 0.561 −0.083 0.265 −0.032 0.222 −0.092 0.111 −0.120 0.310 Simulation results for the realised kernel using a factor SV model with non-synchronous observations and measurement noise. Panel A looks at estimating integrated variance using realised variance and the Parzen type realised kernel K(X). Panel B looks at estimating integrated covariance and correlation using realised covariance and realised kernel. Bias and root mean square error are reported. The results are based on 1000 repetitions. problems and solutions are discussed in Falkenberry (2001), P2, T1, T2, T4, Q2, Q3 and Q4 collectively reduce the sample size by Hansen and Lunde (2006), Brownless and Gallo (2006) and less than 1%. Barndorff-Nielsen et al. (2009). In this paper we follow the step- by-step cleaning procedure used in Barndorff-Nielsen et al. (2009) 5.2. Sampling schemes who discuss in detail the various choices available and their impact on univariate realised kernels. For convenience we briefly review We applied three different sampling schemes depending on the these steps. particular estimator. The simplest one is the estimator by Hayashi All data. (P1) Delete entries with a timestamp outside the 9:30 and Yoshida (2005) that uses all the available observations for a a.m.–4 p.m. window when the exchange is open. (P2) Delete particular asset combination. Following Andersen et al. (2003) the entries with a bid, ask or transaction price equal to zero. (P3) Retain realised covariation estimator is based on calender time sampling. entries originating from a single exchange (NYSE, except INTC and Specifically, we consider 15 s, 5 min, and 30 min intraday returns, MFST from NASDAQ and for SPY for which all retained observations aligned using the previous tick approach. This results in 1560, 78 are from Pacific). Delete other entries. and 13 daily observations, respectively. For the realised kernel the Refresh Time sampling scheme Quote data only. (Q1) When multiple quotes have the same discussed in Section 2.1 is used. In our analysis we present timestamp, we replace all these with a single entry with the estimates for the upper left 10 × 10 block of the full 30 × 30 median bid and median ask price. (Q2) Delete rows for which the integrated covariance matrix. The estimates are constructed using spread is negative. (Q3) Delete rows for which the spread is more three different sampling schemes. (a) Refresh Time sampling that 10 times the median spread on that day. (Q4) Delete rows applied to full set of DJ stocks, (b) Refresh Time sampling applied to for which the mid-quote deviated by more than 10 mean absolute only the 10 stocks that we focus on and (c) Refresh Time sampling deviations from a centred median (excluding the observation applied to each unique pair of assets. So in our analysis we will under consideration) of 50 observations. present three sets of realised kernel estimates of the elements of Trade data only. (T1) Delete entries with corrected trades. (Trades the integrated covariance matrix. One set that comes from a 30×1 with a Correction Indicator, CORR ̸= 0.) (T2) Delete entries with vector of returns, the same set estimated using only the required abnormal Sale Condition. (Trades where COND has a letter code, 10 × 1 vector of returns, and finally a set constructed from the except for ‘‘E’’ and ‘‘F’’.) (T3) If multiple transactions have the same 45 distinct 2 × 2 covariance matrix estimates. Note that the two time stamp: use the median price. (T4) Delete entries with prices first estimators are positive semidefinite by construction, while the that are above the ask plus the bid-ask spread. Similar for entries latter is not guaranteed to be so. We compute these covariance with prices below the bid minus the bid-ask spread. We note steps matrix estimates for each day in our sample. O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 157

Table 3 Summary statistics for the refresh sampling scheme, 2 × 2 case. 2 × 2 case AA AIG AXP BA BAC C CAT CVX DD SPY

AA 0.589 0.599 0.595 0.586 0.565 0.593 0.582 0.602 0.574 AIG 0.635 0.607 0.596 0.616 0.608 0.584 0.605 0.598 0.601 AXP 0.639 0.651 0.613 0.617 0.595 0.596 0.594 0.619 0.584 BA 0.634 0.642 0.652 0.596 0.577 0.595 0.591 0.609 0.582 BAC 0.636 0.656 0.662 0.649 0.631 0.579 0.612 0.597 0.613 C 0.635 0.657 0.660 0.647 0.680 0.554 0.610 0.575 0.619 CAT 0.630 0.631 0.641 0.636 0.633 0.630 0.575 0.597 0.569 CVX 0.641 0.657 0.659 0.651 0.671 0.675 0.634 0.588 0.618 DD 0.639 0.646 0.656 0.649 0.653 0.651 0.639 0.652 0.575 SPY 0.609 0.642 0.630 0.622 0.667 0.685 0.602 0.670 0.625 Average over daily number of high frequency observations available before the refresh time transformation Trades 3442 4228 3461 3529 4,544 5,480 3330 4,845 3307 5,412 Quotes 8460 9270 8626 8553 10,091 10,809 8026 10,254 8521 15,973 Summary statistics for the refresh sampling scheme. In the upper panel we present averages over the daily data of the data maintained by the refresh sampling scheme, = ∑d (i) × measured by p dN/ i=1 n . The upper panel display this in the 2 2 case. The upper diagonal is based on transaction prices, whereas the lower diagonal is based on mid-quotes. In the lower panel we average over the daily number of high frequency observations.

The fraction of the data we retained by constructing Refresh leading diagonal are results from trade data, the numbers below × HY Time is recorded in Table 3 for each of the 45 distinct 2 2 matrices. are from mid-quotes. It is interesting to note that the Covs It records the average of the daily p statistics defined in Section 2.1 OtoC estimates are typically much lower than the corresponding Covs for each pair. It emerges that we never lose more that half the estimate. Numbers in bold font indicate estimates that are observations for most frequently traded assets. For the least active OtoC significantly different from Covs at the one percent level. This assets we typically lose between 30% and 40% of the observations. assessment is carried out in the following way. For the 10 × 1 case the data loss is more pronounced. Still, K2×2 For a given estimator, e.g. Covs , we consider the difference on average more that 25% of the observations remain in the K e = Cov 2×2 − CovOtoC , and compute the sample bias as e¯ and sample. For transaction data the average number of Refresh Time s s s 2 = + ∑q − h observations is 1470, whereas the corresponding number is 4491 robust (HAC) variance as sη γ0 2 h=1(1 q+1 )γh, where for the quote data. So in most cases we have an observation on = 1 ∑n−h = − ¯ = { 2/9} γh T −h s=1 ηsηs−h. Here ηs es e and q int 4(T /100) . average more often than every 5 s for quote data and 15 s for trade √ The number is boldfaced if | T e¯/sη| > 2.326. The results in Table 4 data. We observed that the data loss levels off as the dimension HY increases. For the 30 × 1 case we have on average more that 17% indicate that Covs is severely downward bias. Every covariance of the observations remaining in the sample. For transaction data estimator for every pair of assets for both trades and quotes is the average number of Refresh Time observations is 966 and 2978 statistically significantly biased. for the quote data. This gives an observation on average more often K30×30 K10×10 K2×2 than every 8 s for quote data and 24 s for trade data. 5.4. Results for Covs , Covs and Covs

K HY OtoC We now move on to more successful estimators. The upper 5.3. Analysis of the covariance estimators: Covs , Covs , Covs and 1m panel of Table 5 presents the time series average estimates for Covs K30×30 K10×10 Covs , the middle panel for Covs , and the lower panel give K2×2 Throughout this subsection the target which we wish to results for Covs . The diagonal elements are the estimates based (i) (j) estimate is [Y , Y ]s, i, j = 1, 2,..., d, s ∈ N. In what follows the on transactions. Off-diagonal numbers are boldfaced if they are pair i, j will only be referred to implicitly. All kernels are computed OtoC significantly biased (compared to Covs ) at the 1% level. These with Parzen weights. results are quite encouraging for all three estimators. The average We compute the realised kernel for the full 30-dimensional levels of the three estimators are roughly the same. vector, the 10-dimensional sub-vector and (all possible) pairs of A much tougher comparison is to replace the noisy es = (i) (j) the ten assets. The resulting estimates of [Y , Y ] are denoted Kd×d Kd′×d′ s CovK − CovOtoC with e = Cov − Cov , where the two K30×30 K10×10 K2×2 s s s s s by Covs , Covs and Covs , respectively. These estimators estimates come from applying the realised kernel to price vectors differ in a number of ways, such as the bandwidth selection and of dimension d and d′. Our tests will then ask if there is a significant the sampling times (due to the construction of Refresh Time). difference in the average. The results reported in our web Appendix To provide useful benchmarks for these estimators we also HY suggest very little difference in the level of the three realised compute: Covs , the Hayashi and Yoshida (2005) covariance kernel estimators. When we compute the same test based on e = 1 s estimator. Cov , the realised covariance based on intraday returns Kd×d 5m s Cov − Cov we find that the realised kernels and the realised that span a interval of length , e.g. 5 or 30 min (the previous- s s 1 covariances based on 5 min returns are also quite similar. tick method is used). CovOtoC , the outer products of the open to s The result in that analysis is reinforced by the information in close returns, which when averaged over many days provide an the summary Table 6, which shows results averaged over all asset estimator of the average covariance between asset returns. pairs for both trades and quotes. The results are not very different The empirical analysis of our estimators of the covariance for most estimators as we move from trades to quotes, the counter is started by recalling the main statistical impact of market example is CovHY which is sensitive to this. microstructure and the Epps effect. Table 4 contains the time s K30×30 K10×10 K2×2 series average covariance computed using the Hayashi and The table shows Covs , Covs and Covs have roughly HY OtoC K2×2 Yoshida (2005) estimator Covs and the open to close estimator the same average value, which is slightly below Covs . Covs OtoC OtoC Covs . Quite a few of these types of tables will be presented has a seven times smaller variance than Covs , which shows it is and they all have the same structure. The numbers above the a lot more precise. Of course integrated variance is its self random 158 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Table 4 Average high frequency realised covariance and open to close covariance. AA AIG AXP BA BAC C CAT CVX DD SPY

Average of Hayashi–Yoshida covariances (all times)

AA 4.331 0.415 0.484 0.383 0.372 0.446 0.424 0.360 0.465 0.349 AIG 0.225 2.904 0.612 0.387 0.520 0.597 0.415 0.332 0.415 0.361 AXP 0.251 0.315 3.383 0.447 0.580 0.661 0.476 0.382 0.477 0.430 BA 0.197 0.206 0.230 3.274 0.344 0.400 0.393 0.319 0.389 0.336 BAC 0.198 0.275 0.299 0.177 2.466 0.596 0.374 0.315 0.375 0.345 C 0.237 0.305 0.319 0.199 0.303 4.014 0.431 0.354 0.430 0.382 CAT 0.227 0.224 0.254 0.210 0.197 0.220 2.274 0.332 0.413 0.341 CVX 0.194 0.196 0.219 0.176 0.180 0.190 0.187 2.192 0.340 0.305 DD 0.240 0.225 0.251 0.206 0.196 0.224 0.222 0.186 2.643 0.342 SPY 0.149 0.152 0.179 0.143 0.142 0.147 0.151 0.141 0.142 1.068

Open-to-close covariance

AA 1.100 1.228 1.091 1.019 1.267 1.377 1.027 1.219 1.036 AIG 1.092 1.776 0.978 1.808 2.076 1.062 0.675 1.076 1.120 AXP 1.220 1.763 1.045 1.661 2.055 1.191 0.853 1.101 1.180 BA 1.075 0.976 1.049 0.882 1.071 0.990 0.630 0.850 0.820 BAC 1.015 1.804 1.653 0.887 2.036 0.967 0.619 0.927 0.988 C 1.265 2.088 2.053 1.077 2.041 1.230 0.819 1.134 1.261 CAT 1.365 1.045 1.178 0.981 0.965 1.233 0.768 1.005 0.933 CVX 1.026 0.671 0.862 0.627 0.624 0.824 0.771 0.658 0.705 DD 1.211 1.070 1.092 0.850 0.923 1.134 0.996 0.656 0.829 SPY 1.029 1.120 1.177 0.816 0.985 1.268 0.927 0.708 0.829

HY OtoC The upper panel presents average estimates for Covs and the lower panel displays these for Covs . In both panels the upper diagonal is based on transaction prices, whereas the lower diagonal is based on mid-quotes. the diagonal elements are computed with transaction prices. In the upper panel numbers outside the diagonal are boldfaced if the bias is significant at the 1% level.

Table 5 Average for alternative integrated covariance estimators. AA AIG AXP BA BAC C CAT CVX DD SPY

Average of Parzen covariances (30 × 30)

AA 3.338 0.945 1.033 0.808 0.865 1.118 1.055 0.818 1.047 0.862 AIG 0.933 2.725 1.300 0.722 1.254 1.556 0.856 0.533 0.820 0.878 AXP 1.003 1.273 2.580 0.780 1.326 1.656 0.974 0.578 0.900 0.930 BA 0.792 0.730 0.773 2.194 0.654 0.831 0.765 0.488 0.702 0.670 BAC 0.850 1.226 1.305 0.658 2.057 1.681 0.805 0.485 0.768 0.811 C 1.102 1.528 1.627 0.840 1.649 2.942 1.011 0.638 0.986 1.057 CAT 1.028 0.841 0.939 0.760 0.792 1.001 2.225 0.603 0.882 0.776 CVX 0.822 0.534 0.570 0.500 0.489 0.640 0.607 1.655 0.576 0.589 DD 1.049 0.813 0.890 0.706 0.759 0.984 0.874 0.589 1.810 0.736 SPY 0.862 0.876 0.918 0.678 0.808 1.058 0.775 0.602 0.744 0.745

Average of Parzen covariances (10 × 10)

AA 3.400 0.953 1.024 0.808 0.863 1.121 1.040 0.825 1.046 0.868 AIG 0.931 2.766 1.298 0.734 1.248 1.553 0.860 0.544 0.829 0.885 AXP 0.995 1.271 2.593 0.778 1.315 1.642 0.957 0.581 0.900 0.926 BA 0.798 0.739 0.783 2.222 0.656 0.845 0.772 0.501 0.712 0.679 BAC 0.844 1.204 1.290 0.660 2.080 1.669 0.799 0.493 0.765 0.811 C 1.101 1.518 1.616 0.854 1.625 2.985 1.013 0.653 0.992 1.062 CAT 1.019 0.843 0.931 0.767 0.785 1.003 2.234 0.613 0.879 0.779 CVX 0.824 0.544 0.583 0.512 0.501 0.656 0.619 1.682 0.592 0.603 DD 1.045 0.819 0.892 0.719 0.755 0.991 0.876 0.603 1.850 0.745 SPY 0.865 0.876 0.917 0.688 0.806 1.061 0.779 0.616 0.752 0.755

Average of Parzen covariances (2 × 2)

AA 3.531 0.931 0.986 0.812 0.837 1.098 1.004 0.824 1.034 0.867 AIG 0.912 2.803 1.273 0.775 1.180 1.490 0.863 0.583 0.847 0.891 AXP 0.956 1.238 2.707 0.832 1.268 1.599 0.945 0.633 0.919 0.941 BA 0.793 0.773 0.825 2.371 0.679 0.892 0.800 0.560 0.753 0.731 BAC 0.820 1.156 1.237 0.683 2.096 1.565 0.786 0.541 0.763 0.811 C 1.078 1.469 1.568 0.893 1.550 3.108 1.011 0.719 0.995 1.066 CAT 0.975 0.851 0.914 0.794 0.775 1.005 2.299 0.642 0.883 0.798 CVX 0.810 0.595 0.642 0.577 0.553 0.723 0.648 1.738 0.637 0.662 DD 1.018 0.842 0.904 0.759 0.760 0.999 0.873 0.644 1.961 0.774 SPY 0.853 0.875 0.910 0.726 0.795 1.054 0.791 0.664 0.768 0.783

K 30×30 K 10×10 K 2×2 The upper panel presents average estimates for Covs , the middle panel for Covs , and the lower panel gives results for Covs . In both panels the upper diagonal is based on transaction prices, whereas the lower diagonal is based on mid-quotes. The diagonal element are computed with transaction prices. Outside the diagonals numbers are boldfaced if the bias is significant at the 1% level. O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 159

Table 6 Summary statistics across all asset pairs. Transaction prices

Estimator Average HAC Stdev Bias cor(., K) acf1 acf2 acf3 acf4 acf5 acf10 Summary stats for covariances

CovK30×30 0.8844 [0.089] 1.607 −0.229 1.000 0.67 0.58 0.52 0.45 0.44 0.35 CovK10×10 0.8862 [0.089] 1.596 −0.227 0.992 0.69 0.61 0.54 0.47 0.45 0.36 CovK2×2 0.8900 [0.088] 1.518 −0.223 0.960 0.75 0.66 0.59 0.53 0.51 0.42 CovHY 0.2113 [0.022] 0.362 −0.902 0.767 0.80 0.73 0.68 0.64 0.62 0.53 Cov1/4m 0.4534 [0.050] 0.805 −0.660 0.660 0.84 0.74 0.68 0.64 0.61 0.52 Cov5m 0.8505 [0.085] 1.511 −0.262 0.942 0.71 0.62 0.54 0.48 0.46 0.37 Cov30m 0.9049 [0.091] 1.838 −0.208 0.866 0.49 0.46 0.37 0.34 0.35 0.25 Cov3h 0.9566 [0.105] 2.659 −0.156 0.640 0.22 0.25 0.20 0.17 0.16 0.15 CovOtoC 1.1116 [0.150] 4.255 0.508 0.12 0.15 0.17 0.14 0.08 0.15

Summary stats for correlations

CorrK30×30 0.3862 [0.008] 0.203 1.000 0.28 0.26 0.23 0.23 0.22 0.19 CorrK10×10 0.3825 [0.008] 0.188 0.975 0.32 0.29 0.26 0.26 0.25 0.22 CorrK2×2 0.3698 [0.007] 0.155 0.824 0.44 0.40 0.37 0.35 0.34 0.30 Corr1/4m 0.1836 [0.007] 0.113 0.277 0.78 0.75 0.72 0.71 0.70 0.66 Corr5m 0.3619 [0.007] 0.169 0.756 0.34 0.31 0.28 0.26 0.24 0.22 Corr30m 0.4030 [0.010] 0.298 0.684 0.14 0.13 0.11 0.12 0.11 0.10 Corr3h 0.3869 [0.016] 0.594 0.347 0.03 0.03 0.03 0.04 0.02 0.02 Average unconditional open-to-close correlation = 0.5185

Mid-quotes

Estimator Average HAC Stdev Bias cor(., K) acf1 acf2 acf3 acf4 acf5 acf10 Summary stats for covariances

CovK30×30 0.8917 [0.089] 1.656 −0.221 1.000 0.62 0.55 0.48 0.43 0.42 0.32 CovK10×10 0.8940 [0.090] 1.636 −0.219 0.992 0.66 0.58 0.51 0.45 0.44 0.34 CovK2×2 0.9000 [0.089] 1.546 −0.213 0.941 0.74 0.66 0.59 0.53 0.51 0.41 CovHY 0.4144 [0.038] 0.627 −0.699 0.788 0.82 0.74 0.69 0.65 0.62 0.53 Cov1/4m 0.4470 [0.048] 0.776 −0.666 0.669 0.83 0.74 0.68 0.64 0.61 0.52 Cov5m 0.8530 [0.084] 1.481 −0.260 0.922 0.72 0.61 0.55 0.50 0.47 0.39 Cov30m 0.9056 [0.091] 1.833 −0.207 0.897 0.50 0.46 0.37 0.34 0.35 0.25 Cov3h 0.9574 [0.105] 2.661 −0.156 0.672 0.22 0.25 0.21 0.17 0.16 0.16 CovOtoC 1.1143 [0.150] 4.234 0.534 0.12 0.15 0.18 0.14 0.08 0.15

Summary stats for correlations

CorrK30×30 0.3904 [0.009] 0.221 1.000 0.25 0.23 0.21 0.20 0.20 0.18 CorrK10×10 0.3870 [0.008] 0.200 0.968 0.30 0.27 0.24 0.24 0.23 0.20 CorrK2×2 0.3763 [0.008] 0.165 0.818 0.41 0.37 0.34 0.32 0.31 0.27 Corr1/4m 0.1815 [0.006] 0.103 0.282 0.75 0.71 0.69 0.67 0.66 0.62 Corr5m 0.3650 [0.007] 0.168 0.724 0.35 0.30 0.28 0.26 0.25 0.22 Corr30m 0.4027 [0.010] 0.299 0.734 0.14 0.13 0.11 0.12 0.11 0.10 Corr3h 0.3873 [0.016] 0.593 0.382 0.03 0.04 0.03 0.04 0.02 0.02 Average unconditional open-to-close correlation = 0.5169 Summary statistics across all asset pairs. The first column identify the estimator, and the second gives the average estimate across all asset combinations, followed by the average Newey–West type standard error. The fourth gives the average standard deviation of the estimator. The fifth is the average bias. Next is average sample correlation with our realised kernel. The remaining columns give average autocorrelations. The upper panel is based on transaction prices, whereas the lower panel is based on mid- quotes.

K2×2 so seven underestimates the efficiency gain of using Covs . If average of this correlation estimator. The largest autocorrelation K30×30 K2×2 volatility is close to being persistent then Covs is at least amongst the more reliable estimators is that of Corrs , which 4.2552 ≃ suggest that this is most effective estimate of the correlation. 2 − 20 times more informative than the cross product 1.61 (1 acf 1) In our web appendix we give time series plots and autocor- of daily returns. The same observation holds for mid-quotes. relogram for the various estimates of realised covariance for the Cov15s and CovHY are very precise estimates of the wrong s s K2×2 K K K AA-SPY assets combination using trade data. They show Cov quantity. Cov5m is quite close to Cov 30×30 , Cov 10×10 and Cov 2×2 , s s s s s performing much better than the 30 min realised covariance but 5m K30×30 with Covs and Covs having a correlation of 0.942. We note there not being a great deal of difference between the statistics that realised kernel results seem to show some bias compared when the realised covariance is based on 5 min returns. The web OtoC to Covs , the difference is however statistically insignificantly appendix also presents scatter plots of estimates based on trans- OtoC different than zero, as Covs turns out to be very noisy. action prices (vertical axis) against the same estimate based on The corresponding results for correlations are interesting. mid-quotes (horizontal axis) for the same days. These show a re- Naturally, the computation of the correlation involves a non- K2×2 5m markable agreement between estimates based on Covs , Covs linear transformation of roughly unbiased and noisy estimates. We K × and Cov30m, while once again CovHY struggles. Overall Cov 2 2 and should therefore (by a Jensen inequality argument) expect all these s s s 5m K2×2 1/4m Cov behave in a similar manner, with Covs slightly stronger. estimates to be biased. The most persistent estimator is Corrs , s K10×10 K2×2 but the high autocorrelation merely reflects the large distortion Covs estimates roughly the same level as Covs but is dis- that noise has on this estimator, as is also evident from the sample cernibly noisier. 160 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Fig. 2. Scatter plots for daily realised kernel betas for the AA and SPY asset combination.

5.5. Analysis of the correlation estimates We also calculate the encompassing regressions. The estimates for the realised kernel betas are (i,j) In this subsection we will focus on estimating = [Y (i) K = + K + 5 min + − ρs , βs 0.084 0.858 βs−1 0.074 βs−1 us 0.726 us−1, (j)  (i) (j) (i,j)K (0.031) (0.053) (0.043) (0.044) Y ]s/ [Y ]s[Y ]s by the realised kernel correlation ρˆs =  adj−R2 = 0 215 (i,j) (i,i) (j,j) Xm . , Ks / Ks Ks and the corresponding realised correlation ρˆ . s with the corresponding 5 min based realised betas A table in our web appendix reports the average estimates for 5 min 5 min K K2×2 K10×10 5m K2×2 = + + + − ˆ , ˆ and ˆ . It shows the expected result that ˆ is βs 0.056 0.879 βs−1 0.069 βs−1 us 0.822 us−1, ρs ρs ρs ρs (0.026) (0.047) (0.035) (0.040) K10×10 more precise than ρˆ . Both have average values which are 2 s adj−R = 0.150. quite a bit below the unconditional correlation of the daily open- to-close returns. This is not surprising. All the three ingredients of This shows that either estimator dominates the other in terms of K × (i,j)K encompassing, although the realised kernel has a slightly stronger the ρˆ 2 2 are measured with noise and so when we form ρˆ it s s t-statistic. will be downward biased. 5.7. A scalar BEKK 5.6. Analysis of the beta estimates An important use of realised quantities is to forecast future (i,j) (i) (j) (j) Here we will focus on estimating βs = [Y , Y ]s/[Y ]s, by volatilities and correlations of daily returns. The use of reduced (i,j)K (i,j) (j,j) form has been pioneered by Andersen et al. (2001, 2003). One the realised kernel beta βs = Ks /Ks . Fig. 2 presents scatter plots of beta estimates based on transaction prices (vertical axis) useful way of thinking about the forecasting problem is to against the same estimate based on mid-quotes (horizontal axis). fit a GARCH type problem with lagged realised quantities as K2×2 5m explanatory variables, e.g. Engle and Gallo (2006). Here we follow The two estimators are βs to βs . The results are not very this route, fitting multivariate GARCH models with E(rs|Fs−1) = 0, different in these two cases. Cov(rs|Fs−1) = Hs, where rs is the d×1 vector of daily close to close Fig. 3 compares the fitted values from ARMA models for the returns, Fs−1 is the information available at time s − 1 to predict rs. kernel and 5 min estimates of realised betas for the AA-SPY assets T A standard Gaussian quasi-likelihood − 1 ∑ (log |H |+ r′H−1r ) combination. These are based on the model estimates for the daily 2 s=1 s s s s is used to make inference. The model we fit is a variant on the scalar kernel based realised betas BEKK (e.g. Engle and Kroner (1995)) K = + K + − − 2 = ′ ′ βs 1.20 0.923 βs−1 us 0.726 us−1, adj R 0.213, Hs = C C + βHs−1 + αrs−1r − + γ Ks−1, α, β, γ ≥ 0. (0.06) (0.027) (0.048) s 1 Here we follow the literature and use Hs to denote the conditional and for 5 min based realised betas variance matrix (not to be confused with our bandwidth parame- 5 min = + 5 min + − ters). βs 1.16 0.950 βs−1 us 0.821 us−1, (0.06) (0.024) (0.039) Instead of estimating the d(d + 1)/2 unique elements of C adj−R2 = 0.145. we use a variant of variance targeting as suggested in Engle and Mezrich (1996). The general idea is to estimate the intercept matrix Both models have a significant memory, with autoregressive by an auxiliary estimator that is given by roots well above 0.9 and with large moving average roots. The fit of T the realised kernel beta is a little bit better than that for the realised ˆ ′ ˆ ¯ ¯ 1 − ′ C C = S ⊙ (1 − α − β − γ κ), S = rsr , (7) beta. T s s=1 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 161

Fig. 3. ARMA(1, 1) model for transaction based realised kernel betas for the AA and SPY combination.

Table 7 Scalar BEKK models for close-to-close returns. Panel A: 30 × 30 case Panel B: 10 × 10 case

′ 5m ′ 5m Hs−1 rs−1(rs−1) Kt−1 RVs−1 log LHs−1 rs−1(rs−1) Kt−1 RVs−1 log L 0.943 0.005 0.040 – −27,029.9 0.768 0.015 0.151 – −7920.7 (0.006) (0.000) (0.004) (0.017) (0.003) (0.011) 0.742 0.013 – 0.115 −27,077.7 0.687 0.022 – 0.160 −7935.9 (0.014) (0.001) (0.006) (0.025) (0.003) (0.013) 0.984 0.008 –– −28,477.5 0.965 0.023 –– −8307.5 (0.001) (0.000) (0.003) (0.001) 0.777 – 0.076 0.061 −26,948.3 0.705 – 0.126 0.067 −7923.3 (0.012) (0.004) (0.005) (0.023) (0.014) (0.014) 0.784 0.009 0.067 0.059 −26 904.3 0.716 0.017 0.106 0.065 −7903.0 (0.013) (0.001) (0.004) (0.005) (0.023) (0.003) (0.014) (0.013) Panel C: 2 × 2 cases AIG-CAT BA-SPY ′ 5m ′ 5m Hs−1 rs−1(rs−1) Kt−1 RVs−1 log LHs−1 rs−1(rs−1) Kt−1 RVs−1 log L 0.837 0.038 0.126 – −2584.9 0.844 0.031 0.094 – −1516.9 (0.028) (0.008) (0.030) (0.043) (0.008) (0.032) 0.863 0.044 – 0.098 −2591.2 0.843 0.032 – 0.091 −1517.8 (0.023) (0.007) (0.025) (0.041) (0.008) (0.030) 0.951 0.045 –– −2629.7 0.958 0.036 –– −1544.4 (0.006) (0.005) (0.006) (0.005) 0.764 – 0.236 0 −2592.4 0.717 – 0.125 0.083 −1521.1 (0.036) (0.063) – (0.074) (0.071) (0.065) 0.837 0.038 0.126 0 −2584.9 0.837 0.031 0.068 0.031 −1516.6 (0.028) (0.008) (0.050) – (0.045) (0.009) (0.047) (0.045) Estimation results for scalar BEKK models for close-to-close d = 30, 10, 2 dimensional return vectors. where ⊙ denotes the Hadamard product. There is a slight deviation We estimate scalar BEKK models for the 30 × 30, 10 × 10, from the situation considered by Engle and Mezrich (1996) because and the 45 2 × 2 cases. In Table 7 we present estimates for K2×2 the two larger dimensions and three selected 2 × 2 cases. The Ks−1 is only estimated for the part of the day where the NYSE is open. To accommodate this we follow Shephard and Sheppard results in Table 7 suggest that lagged daily returns are no longer (2010) that introduce the scaling matrix, κ, in (7) which we significant for this multivariate GARCH model once we have the estimate by realised kernel covariance. This is even though the realised kernel   T T covariance misses out the overnight effect — the information in µˆ RK − − ′ − − κˆ = , µˆ = T 1 r r and µˆ = T 1 K . the close-to-open returns. An interesting feature of the series is ij µˆ s s RK s ij s=1 s=1 that in most cases including Ks−1 reduces the size of the estimated H term. It is also interesting to note that including K in Having S¯ and κˆ at hand the remaining parameters are simply s−1 s−1 general gives a higher log-likelihood than including RV5m . This estimated by maximising the concentrated quasi-log-likehood, s−1 with holds for both the 30-dimensional and the 10-dimensional cases, ′ and for 40 of the 45 2-dimensional cases. In our web appendix we H = S¯ ⊙ 1 − − − ˆ + H + r r + K s ( α β γ κ) β s−1 α s−1 s−1 γ s−1, report summary statistics of two likelihood ratio tests applied to all α, β, γ ≥ 0. the 45 2-dimensional cases. The average LR statistic for removing 5m An interesting question is whether γ is statistically different from RVs−1 from our most general specification is 0.66, where as the zero, because this means that high frequency data enhances the corresponding average for removing Ks−1 is 11.9. These tests can be forecast of future covariation. In our analysis we will also augment interpreted as encompassing tests, and provide an overwhelming 5m 5m the model with RVs−1. evidence that the information in RVs−1 is contained in Ks−1. 162 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

6. Additional remarks 6.1.2. Efficiency An important question is how inefficient is K(X) in practice F 6.1. Relating K(X) to the flat-top realised kernel K (X) compared to the flat-top realised kernel, K F (X)? The answer is quite a bit when U is white noise. Table 8 gives E[n1/4{K(X) − n h In the univariate case the realised kernel K X = ∑ k 2 1/4 F 2 ( ) h=−n ( H ) [Y ]}] /ω and E[n {K (X) − [Y ]}] /ω, the mean square nor- ∑n Γ , with Γ = x x −| |, is at first sight very similar to the F h h j=|h|+1 j j h malised by the rate of convergence of KP (X) (which is the flat-top unbiased flat-top realised kernel of Barndorff-Nielsen et al. (2008) realised kernel using the Parzen weight function. An implication F n is that the scaled MSE for the K X and K will increase without  h − 1  ( ) B F = + − F + F → ∞ K (X) Γ0 k (Γh Γ−h), bound as n because these estimators converge at a rate that H + 1 1/4 h=1 is slower than n ). The results are given in the case of Brown- n ian motion observed with different types of noise. Results for two F = − F F Γh xjxj−h. flat-tops are given, the Bartlett (KB (X)) and Parzen (KP (X)) weight j=1 functions. Similar types of results hold for other weight functions. Consider first the case with Gaussian U white noise with Here the Γ and Γ F are not divided by the sample size. This means h h variance of ω2. The results show that the variance of K(X) is that the end conditions, the observations at the start and end of much bigger than its squared bias. For small n there is not much the sample, can have influential effects. Jittering eliminates the difference between the three estimators, but by the time n = 4096 end effects in K(X), whereas the presence of x−1, x−2,... and F F (which is realistic for our applications) the flat-top K (X) has xn+1, xn+2,... in the the definition of Γ removes the end effects h roughly half the MSE of K(X) in the univariate case. Hence in ideal from K F (X). However, an implication of this is that the resulting (but unrealistic) circumstances K F (X) has advantages over K(X), estimator is not guaranteed to be positive semi-definite whatever but we are attracted to the positivity and robustness of K(X). the choice of the weight function. The alternative K F (X) has the advantage that it (under the The robustness advantage of K(X) can be seen using four sim- restrictive independent noise assumption) converges at a n1/4 ulation designs where Uj is modelled as a dependent process. We = − rate and is close to the parametric efficiency bound. It has the consider the moving average specification, Uj ϵj θϵj−1, with = ± = + disadvantage that it can go negative, while we see in the next θ 0.5 and the autoregressive specification, Uj ϕUj−1 subsection that it is sensitive to deviations from independent noise, ϵj, with ϕ = ±0.5, where ϵj is Gaussian white noise. The band- such as serial dependence in the noise and endogenous noise, width for all estimators were to be ‘‘optimal’’ under U being white F = 4/3 2/3 which K(X) is robust to. The requirement that K(X) be positive noise, which is the default in the literature, so HB 2.28ω n , F = 1/2 = 4/5 3/5 2 = ∑∞ results in the bias-variance trade-off and reduces the best rate HP 4.77ωn , and HP 3.51ω n where ω h=−∞ 1/4 1/5 of convergence from n to n . This resembles the effects seen cov(Uj, Uj−h). The results show the robustness of K(X) and the F F in the literature on density estimation with kernel functions. The strong asymptotic bias of KP and KB under the non-white noise  2 property, u k(u)du = 0, reduces the order of the asymptotic assumption. The specifications, θ = 0.5 and ϕ = −0.5 induce a bias, but kernel functions that satisfy  u2k(u)du = 0 can result negative first-order autocorrelation while θ = −0.5 and ϕ = 0.5 in negative density estimates, see Silverman (1986, Sections 3.3 induce positive autocorrelation. Negative first-order autocorrela- and 3.6). tion can be the product of bid-ask bounce effects, this is particularly the case if sampling only occurs when the price changes. Positive 6.1.1. Positivity first-order autocorrelation would, for example, be relevant for the There are three reasons that K F (X) can go negative.7 The most noise in bid prices because variation in the bid-ask spread would obvious is the use of a kernel function that does not satisfy, induce such dependence.  ∞ −∞ k(x) exp(ixλ)dx ≥ 0 for all λ ∈ R, such as the Tukey–Hanning kernel or the cubic kernel, k(x) = 1 − 3x2 + 2x3. The flat-top 6.2. Preaveraging without bias correction F kernels give unit weight to γ1 and γ−1, which can mean K (X) may be negative. This can be verified by rewriting the estimator ′ 6.2.1. Estimating multivariate QV as a quadratic form estimator, x Mx, where M is a symmetric band In independent and concurrent work Vetter (2008, p. 29 and matrix M = band(1, 1, k( 1 ), k( 2 ), . . . , ). The determinant of the H H Section 3.2.4) has studied a univariate suboptimal preaveraging −{ 1 − }2 1 = upper-left matrix is given by k( H ) 1 , so that k( H ) 1 estimator of [Y ] whose bias is sufficiently small that the estimator is needed to avoid negative eigenvalues. Repeating this argument does not need to be explicitly bias corrected to be consistent (the h = leads to k( H ) 1 for all h, which violates the condition that bias corrected version can be negative). Its rate of convergence h → → ∞ −1/4 k( H ) 0, as h . Finally, the third reason that the flat- does not achieve the optimal n rate. Hence his suboptimal top kernel could produce a negative estimate was due to the preaveraging estimator has some similarities to our non-negative = ∑n construction of realised autocovariances, γh j=1 xjxj−h. This realised kernel. Implicit in his work is that his non-corrected requires the use of ‘‘out-of-period’’ intraday returns, such as x1−H . preaveraging estimator is non-negative. However, this is not This formulation was chosen because it makes E{K(U)} = 0 when remarked upon explicitly nor developed into the multivariate case U is white noise. However, since x−H only appears once in this where non-synchronously spaced data is crucial. estimator, with the term x1x1−H , it is evident that a sufficiently Here we outline what a simple multivariate uncorrected pre- large value of x1−H (positive or negative, depending on the sign of averaging estimator based on refresh time would look like. We − x1) will cause the estimator to be negative. We have overcome the ˆ ∑n H ′ −1/2 ∑H h define it as V = = xjx , where xj = (ψ2H) = g( )xj+h, last obstacle by jittering the end-points, which makes the use of j 1 j h 1 H =  1 2 ∈ ‘‘out-of-period’’ redundant. They can be dropped at the expense of ψ2 0 g (u)du. Here g(u), u [0, 1] is a non-negative, con- a O(m−1) bias. tinuously differentiable weight function, with the properties that 3/5 g(0) = g(1) = 0 and ψ2 > 0. Now if we set H = θn , then the univariate result in Vetter (2008) would suggest that Vˆ converges 7 The flat-top kernel is only rarely negative with modern data. However, if [Y ] is at rate n−1/5, like the univariate version of our multivariate realised very small and the ω2 very large, which we saw on slow days on the NYSE when the tick size was $1/8, then it can happen quite often when the flat-top realised kernel kernel. There is no simple guidance, even in the univariate case, as is used. We are grateful to Kevin Sheppard for pointing out these negative days. to how to choose θ. O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 163

Table 8 Relative efficiency of the realised kernel K(X). ω2 = 0.001 n Normalised bias2 Normalised variance Normalised mse F F F F F F KB (X) KP (X) K(X) KB (X) KP (X) K(X) KB (X) KP (X) K(X) U ∈ WN 250 0.0 0.0 0.8 16.2 16.3 18.0 16.2 16.3 18.8 1000 0.0 0.0 2.5 11.7 12.1 16.9 11.7 12.1 19.4 4000 0.0 0.0 3.1 10.4 10.4 19.0 10.4 10.4 22.1 16,000 0.0 0.0 4.6 10.5 9.5 20.8 10.5 9.5 25.4

Uj = ϵj + 0.5ϵj−1 250 1.5 1.2 0.6 15.3 15.7 17.6 16.9 16.9 18.2 1000 22.1 7.3 2.2 11.0 11.9 16.9 33.0 19.2 19.1 4000 175.7 18.5 3.2 9.3 10.2 19.0 185.0 28.8 22.2 16,000 898.5 41.0 4.4 9.0 9.4 20.9 907.6 50.4 25.4

Uj = ϵj − 0.5ϵj−1 250 122.7 96.9 3.9 27.5 24.2 18.3 150.2 121.1 22.2 1000 1,769.1 588.0 6.1 44.8 20.4 16.9 1,813.9 608.3 23.0 4000 14,195.1 1490.4 5.0 73.1 13.9 19.3 14,268.2 1504.4 24.3 16,000 72,797.6 3326.8 5.5 88.6 10.9 20.8 72,886.2 3337.7 26.3

Uj = −0.5Uj−1 + ϵj 250 39.1 30.9 1.3 18.9 18.1 17.9 58.0 49.0 19.2 1000 1,261.0 74.9 3.3 35.9 13.2 16.8 1,296.9 88.1 20.0 4000 7,751.7 141.1 3.5 40.8 10.8 18.8 7,792.5 151.9 22.4 16,000 40,973.1 253.8 4.8 52.0 9.7 20.9 41,025.2 263.5 25.7

Uj = 0.5Uj−1 + ϵj 250 0.5 0.4 0.3 14.8 15.3 17.7 15.3 15.7 18.0 1000 9.6 6.3 1.5 9.8 10.8 16.6 19.4 17.1 18.2 4000 96.0 39.6 2.7 8.5 9.7 19.1 104.4 49.2 21.8 16,000 505.8 141.5 4.2 8.5 9.2 21.1 514.3 150.7 25.3

Estimation results for scalar BEKK models for close-to-close Relative efficiency of the realised kernel K(X) and the flat-top realised kernel, K F (X). Results for five different types of noise are presented. In the MA(1) and AR(1) designs, the variance of was scaled such that Var(U) = ω2. The squared bias, variance, and MSE have been scaled by 1/2 F → ∞ n /ω. In the special case with Gaussian white noise the asymptotic lower bound for the normalised MSE is 8.00 (the normalised MSE for KP (X) converges to 8.54 as n in this special case). The results are based on 50,000 repetitions.   In the univariate bias corrected form, Jacod et al. (2009) show  2 Ls  κ κ  that Vˆ is asymptotically equivalent to using a K F (X) with k(x) = → MN , 4  1 2  1 2 1 σ (u)du σ (u)du −1  − ∝ 1/2  0 0  ψ2 x g(u)g(u x)du and H n . It is clear the same result will hold for the relationship between Vˆ and K(X) in the multivariate to improve its finite sample performance. When the data is 3/5 case when H = θn . A natural choice of g is g(x) = (1 − x) ∧ x, regularly spaced and the volatility is constant then κσ −2 =  1 2 = 2/5| ′′ |1/5 0,0 2/5 2 which delivers 0 g (u)du 1/12 and a k function which is the (ω/σ ) k (0) (k• ) , which depends less on σ than the Parzen weight function. Hence one might investigate using θ = c0 non-transformed version. as in our paper, to drive the choice of H for Vˆ when applied to refresh time based high frequency returns. 6.4. Subtlety of end effects Following the initial draft of this paper Christensen et al. (2009) have defined a bias corrected preaveraging estimator of the multivariate [Y ] with H = θn1/2, for which they derive We have introduced jittering to eliminate end-effects. The limit theory. Their estimator has the disadvantage that it is not larger is m the smaller is the end-effects, however increasing m has guaranteed to be positive semi-definite. the drawback that is reduces the sample size, n, that can be used to compute the realised autocovariances. Given N observations, 6.2.2. Estimating integrated quarticity the sample size available after jittering is n = N − 2(m − 1), so In order to construct feasible confidence intervals for our extensive jittering will increase the variance of the estimator. In realised quantities (see Barndorff-Nielsen and Shephard (2002)) this subsection we study this trade-off. we have to estimate the stochastic d2 ×d2 matrix, IQ. Our approach We focus on the univariate case where U is white noise. The is based on the no-noise Barndorff-Nielsen and Shephard (2004) mean square error caused by end-effects is simply the squared bias ′ ′ −2 4 bipower type estimator applied to suboptimal preaveraged data plus the variance of U0U + UnU , which is given by 4m ω + 3/5 0 n taking H = θn . This is not an optimal estimator, it will converge −2 4 = 4 −2 1/5 4m ω 8ω m , see the proof of the Proposition A.2. The at rate n , but it will be positive semidefinite. The proposed 2 −2/5 = n−H−1 asymptotic variance (abstracting from end-effects) is 5κ n (positive semi-definite) estimator of vec(IQ) is Qˆ = n ∑ ′′ 2 2/5 0,0 4/5 −2/5 j=1 5|k (0)ω | {k• IQ} n . So the trade-off between contribu- { ′ − 1 ′ + } = ¯ ¯′ cjcj 2 (cjcj+H cj+H cj) , where cj vec(xjxj). That the elements tions from end-effects and asymptotic variance is given by of Qˆ are consistent using this choice of bandwidth is implicit in the −2 4 ′′ 2 2/5 0,0 4/5 −2/5 g 2 (m) = m 8ω + 5|k (0)ω | {k IQ} (N − m) . thesis of Vetter (2008, p. 29 and Section 3.2.4). N,ω ,IQ • This function is plotted in Fig. 4 for the case where N = 1000 and 6.3. Finite sample improvements IQ = 1 and ω2 = 0.0025 and 0.001. The optimal value of m ranges from 1 to 2. The effect of increasing n on optimal m can be seen The realised kernel is non-negative so we can use log-transform from Fig. 4, where the optimal value of m has increased a little from  ∫ 1  Fig. 4 as n has increased to 5000. However, the optimal amount of 1/5 − 2 n log(K(X)) log σ (u)du jittering is still rather modest. 0 164 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

Fig. 4. Sensitivity to the the choice of m. The figure shows the RMSE as a function of m for the sample sizes N = 1000 and N = 5000, and ω2 = 0.001 and ω2 = 0.0025.

6.5. Finite lag refresh time due to the sampling variance of our estimators of variance. The empirical results based on our new estimator are striking, In this paper we roughly synchronise our return data using the providing much sharper estimates of dependence amongst assets concept of Refresh Time. Refresh Time guarantees that our returns than has previously been available. We have analysed problems of are not stale by more than one lag in Refresh Time. Our proofs need up to 30 dimensions and have found that efficiency gains of using a somewhat less tight condition, that returns are not stale by more the high frequency data are around 20 fold. than a finite number of lags. This suggests it may be possible to find Multivariate realised kernels have potentially many areas a different way of synchronising data which throws information of application, improving our ability to estimate covariances. away less readily than Refresh Time. We leave this problem to In particular, this allows us to utilise high frequency data to further research. significantly improve our predictive models as well as providing a better understand of asset pricing and management of risk in 6.6. Jumps financial markets. In the appendices that follow, we give some proofs and the In this paper we have assumed that Y is a pure BSM. The errors induced by stale prices. Under the assumptions given in this analysis could be extended to the situation where Y is a pure paper, our line of argument will be as follows. BSM plus a finite activity jump process. The analysis in Barndorff- • Show the realised kernel is consistent and work out its limit Nielsen et al. (2008, Section 5.6) suggests that the realised theory for synchronised data. This is shown in Appendix A, [ ] kernel is consistent for the quadratic variation, Y , at the same where Propositions A.1–A.5, Theorems 3 and A.4 are used rate of convergence as before, but with a different asymptotic to establish the multivariate result in Theorem 3 and the distribution. univariate result in Theorem 2 then follows as a corollary to Theorem 3. 7. Conclusions • Show the staleness left by the definition of refresh time has no impact on the asymptotic distribution of the equally spaced In this paper we have proposed the multivariate realised kernel, realised kernel. This is shown in Appendix B. which is a non-normalised HAC type estimator applied to high frequency financial returns, as an estimator of the ex-post variation Appendix A. Proofs for synchronised data of asset prices in the presence of noise and non-synchronous trading. The choice of kernel weight function is important here Proof of Theorem 1. We note that for all i, j, — for example the Bartlett weight function yields an inconsistent estimator in this context. Y (i)   K(Y (i)) K(Y (i), U(j)) K = , Our analysis is based on three innovations: (i) we used a weight U(j) K(Y (i), U(j)) K(U(j)) function which delivers biased kernels, allowing us to use positive semi-definite estimators, (ii) we coordinate the collection of data is positive semi-definite. This means that by taking the determi- (i) (j) 2 through the idea of refresh time, (iii) we show the estimator is nant of this matrix and rearranging we see that K(Y , U ) ≤ (i) (j) robust to the remaining staleness in the data. We are able to show K(Y )K(U ), so that consistency and asymptotic mixed Gaussianity of our estimator.     (i) (j) Our simulation study indicates our estimator is close to K(X) = K(Y ) + O maxiK(Y ) maxjK(U ) + K(U), being unbiased for covariances under realistic situations. Not surprisingly the estimators of correlations are downward biased and the result follows.  O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 165

 1  +  Next collect limit results about K(Y ) and K(U). Due to Note that 0 Σζ (u) γ0(u) du is the average local variance Theorem 1 we can safely ignore the cross terms K(U, Y ) as long =  1 + of U as oppose to the average long-run variance Ω 0 Σζ (u) as K(U) vanishes at the appropriate rate. ∑∞  h=−∞ γh(u) du. A.1. Results concerning K(U) Proof of Proposition A.2. The first result follows by the definition − of V and U. Next, since U = m−1 ∑m 1 U(t ) it follows that Z is The aim of this subsection to is prove the following proposition. h 0 j=0 j h stochastic for any m < ∞, and

Theorem A.4. Under K and U then m−1 m−1 ′ −1 − − ′ p 2 mU0U = m U(tj)U(ti) → ΣU (0), H p ′′ 2 0 K(U) → −k (0)Ω, as n, H, m → ∞ with H /(mn) → 0. j=0 i=0 n ′ →p = Before we prove Theorem A.4, we establish some intermediate and similar mUnUn ΣU (1). So the result for h 0 follows from = ′ + ′ results. The following definitions lead to a useful representation of Z0 2(U0U0 UnUn). Next, for h > 0, = K(U). For h 0, 1,..., we define m−1 ∞ p n−1 ′ − ′ − mU0U = U(tj)U(tm−1+h) → γ−j−h(0) = − ′ + ′ h Vh UjUj−h Uj−hUj , j=0 j=0 j=h+1 ∞ − ′ and = γj+h(0) , = ′ + ′ + ′ + ′ j=0 Zh (U0Uh UhU0) (UnUn−h Un−hUn). ′ p ∑m−1 and similarly we find mU U → γ + (1). Proposition A.1. The realised autocovariances of U can be written n n−h j=0 j h  as Proof of Theorem A.4. Since k′(0) = 0 and k′′(x) is continuous we −2 ′′ −1 1 have k0 − k1 = −H k (ϵ)/2, for some 0 ≤ ϵ ≤ H . Define ′′ 2 Γ0(U) = V0 − V1 + Z0 − Z1 (A.1) a = −k (ϵ) and a = H (−k + 2k − k ), and write 2 0 h |h|+1 |h| |h|−1 ′ V -terms of K(U), see (A.3), as Γh(U) + Γh(U) = −Vh−1 + 2Vh − Vh+1 + Zh − Zh+1, (A.2) n−1 so with k = k h we have − h ( H ) (k0 − k1)V0 − (kh+1 − 2kh + kh−1)Vh = n−1 h 1 − n−1 K(U) = (k − k )V − (k + − 2k + k − )V 0 1 0 h 1 h h 1 h = −2 − − ′ h=1 H ah UjUj−h h=−n+1 j n−1 1 − −2 − − ′ −2 − − ′ + Z0 − (kh − kh−1)Zh. (A.3) = H ah UjU − + H ah UjU − . 2 √ j h √ j h h=1 |h|≤ H j |h|> H j Proof. The first expression, (A.1), follows from By the continuity of k′′(x) it follows that n  H2  = − − − ′  + ′′  → Γ0(U) (Uj Uj−1)(Uj Uj−1) sup√  ahn k (0) 0, j=1 |h|≤ H  n  n−1 as H, n → ∞ with H/n = o(1), = ′ + ′ + − ′ + ′ U0U0 UnUn (UjUj UjUj ) n ∑ √ 1 ∑ ′ ′′ n so that the first term ah UjU = −k (0) Ω + j=1 H2 |h|≤ H n j j−h H2 o( n ). The second term vanishes because n−1 H2 − ′ ′ − (U U + U U )   j j−1 j−1 j   j=2 n  − 1 − ′   ah UjU −  ′ ′ ′ ′ 2 √ j h − (U U + U − U + U U + U U ), H  n  n n−1 n 1 n 0 1 1 0 |h|> H j  and (A.2) is proven similarly.    n − 2  1 − ′  ≤ |H ah| · sup  UjU −  , We note that end-effects can only have an impact on K(U) H2 √ √  n j h | | |h|> H  j  through Zh, h = 0, 1,..., because U0 and Un do not appear in the h > H expressions for V , h = 0, 1,.... h √ | 1 ∑ ′ | = and sup|h|> H n j UjUj−h op(1). −1 Proposition A.2. Given U. Then For the Z-terms we have by Proposition A.2 that Z0 = Op(m ),  ∫ 1 and { + } = 2 Σζ (u) γ0(u) du for h 0, n−1 n−1 1 p  0 − 1 1 − ′ Vh → (kh − kh−1)Zh = {k (h/H) + o(1)}mZh n ∫ 1 m H  ′ h=1 h=1  {γh(u) + γh(u) }du for h > 0, 0 −1 = Op(m ).  −1 and Zh = Op(m ) for all h = 0, 1,..., and as m → ∞, ′  Proof of Lemma 2. When k (0) ̸= 0 we see that the first term of 2{Σ (0) + Σ (1)} for h = 0,  U U H p ′  1 p  ∞ − → − { + } (A.3) is such that n (k0 k1)V0 k (0)2 0 Σζ (u) γ0(u) du. mZh → − ′ ′ {γj+h(0) + γj+h(0) + γj+h(1) + γj+h(1) } for h > 0. From the proof of Theorem A.4 it follows that the other terms in  j=0 (A.3) are of lower order.  166 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169

m m A.2. Results concerning K(Y ) − 2 − DN,i 2 2 −1/2 2 y = σ (τN,i−1)ε {1 + op(N )} N,i N N,i i=1 i=1 The aim of this subsection to is prove the following theorem that  3/2  concern K(Y ) in the univariate case. Then we extend the result to m = op , the multivariate case in the next subsection. N

since max D = o (m1/2), σ 2(t) is bounded, and ∑m ε2 Theorem A.5. Suppose K, SH, D√, and U hold then as n, m, H → ∞ i=1,...,m N,i p i=1 N,i −1  3/2 with H/n = o(1) and m = o( H/n), we have = n m = 3 Op(m). So we need H N O(1) which is implied by m /  1 n  ∫  (HN) ≤ m3/(Hn) = O(1). K(Y ) − σ 2(u)du  H 0 Proposition A.4. Suppose SH and D hold then, so long as H = o(N),  1  Ls ∫ ~ (u) 0,0 4 2  → MN 0, 4k• σ (u) du . (A.4) N u ˜ ˆ 0 ~1( ) {K(Y ) − K(Y )} = op(1). H Before we prove this theorem for K(Y ) we introduce and Proof. From, for example, Phillips and Yu (unpublished paper) it is analyse two related quantities, ∑N (1) − ˆ (1) = −1/2 known that i=1 ηN,i ηN,i op(N ). The only thing left to do N N N (2) (2) √ − (1) (2) − (1) (2) ∑ − ˆ = ˜ = + ˆ = ˆ + ˆ is to prove that i=1 ηN,i ηN,i op( H/N). First note that K(Y ) (ηN,i ηN,i) and K(Y ) (ηN,i ηN,i) i=1 i=1   1 N N − (2) − ˆ (2) = − − where y = Y (τ ) − Y (τ − ) and yˆ = σ (τ − )(W − ηN,i ηN,i yN,i khyN,i−h N,i N,i N,i 1 N,i N,i 1 τN,i 2 i=1 i=1 h>0 WτN,i−1 ) and N   N−1 − ˆ − ˆ (1) (1) (2) − − yN,i khyN,i−h = y2 ˆ = yˆ2 = 2y k y ηN,i N,i, ηN,i N,i, ηN,i N,i h i−h, i=1 h>0 h=1 N   N−1 − − (2) − = (yN,i − yˆN,i) khyN,i−h ηˆ = 2yˆ k yˆ − . N,i N,i h N,i h i=1 h>0 h=1 N   ˜ − − K is similar to K, except that it is not subjected to the jittering, and + yˆN,i kh(yN,i−h − yˆN,i−h) . (A.6) Kˆ is similar to K˜ , but is computed with auxiliary intraday returns. i=1 h>0 Note that we have (uniformly over i) the strong approximation The first term of (A.6) is a sum of martingale difference sequences. (under SH) Its conditional variance is τ τ ∫ N,i ∫ N,i 2 = + N λ2 2   yN,i µ(u)du σ (u)dW (u) − τN,i−1 DN,i − τ τ V = khyN,i−h N,i−1 N,i−1 2 N2 i=1 h>0 −1/2 = yˆN,i{1 + op(N )}, (A.5)  2   2 1 N λ 1 ≤ 2 − τN,i−1 − Jacod (unpublished paper, (6.25))√ and Phillips and Yu (unpub- max DN,i khyN,i−h i=1,...,N = − N = 2 N lished paper, Eq. 66)). Let εN,i 1N,i(WτN,i WτN,i−1 ) so that i 1 h>0 ∼ ˆ = −1/2 1/2 1  H  εN,i iid N(0, 1) and note that yN,i N σ (τN,i−1)DN,i εN,i. We = op(N)Op(1) Op = op(H/N), use yˆN,i as our estimate of yN,i throughout, later showing it makes N N no impact on the result. √ τ ∑  N,i−j where we have used that khyN,i−h = Op( H/N). The second Note that yN,i − yˆN,i = {σ (u) − σ (τN,i−j−1)}dW (u), so h>1 τN,i−j−1 term is with d[σ]t = λt dt we find N ∫ τN,i − − 2 yˆN,i kh(yN,i−h − yˆN,i−h) {σ (u) − σ (τN,i−1)} dW (u) i=1 h>0 τ − N,i 1   ∫ τN,i N ∫ τN,i−h 2 − − = λ (u − τ − )du{1 + o (1)} = yˆN i kh {σ (u) − σ (τN i−h−1)}dW (u) . τN,i−1 N,i 1 p , , τN,i−1 i=1 h>0 τN,i−h−1 λ2 2 τN,i−1 DN,i It has a zero conditional means and its conditional variance is = {1 + o (1)}. 2 p 2 N N ∫ τN,i−h 1 − 2 − 2 2 σ (τN,i−1)DN,i k {σ (u) − σ (τN,i−h−1)} du N h i=1 h>0 τN,i−h−1 Proposition A.3. Suppose√ K, SH, and D hold. Then as n → ∞ with 3 N H = o(n) and m = O( Hn), then 1 − 2 − 2 2 −2 2 = σ (τN,i−1)DN,i k λ N D − /2{1 + op(1)},  N h τN,i−h−1 N,i h n ˜ i=1 h>0 {K(Y ) − K(Y )} = op(1). H where [ ] = ˜ d σ t λt dt Proof. The difference between K(Y ) and K(Y ) is tied to the m first N and m last observations. So the difference vanishes if m does not 1 − 2 1 ≤ σ (τN,i−1) max DN,i max λN,i grow at too fast a rate. We have N3 i=1,...,N 2 i i=1 O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 167

⌊ ⌋ + N/q q(j 1) as this is a temporal average of a martingale difference sequence. × − 2 − 2{ + } max DN,i kh 1 op(1) i=qj+1,...,qj+q This means that j=0 h>qj N N N   − ˆ (2)| 1 − 2 1/2 N Var(ηN,i FτN,i−1 ) ≤ σ (τN,i−1)op(N )Op(1)op(q)Ho log H N3 q i=1 i=1 N E(D2 |F )  H  0,0 − 4 N,i τN,i−1 1/2 = k• σ (τN,i−1)1N,i + op(1), = op , take q = N / log(N). E(D |F ) N i=1 N,i τN,i−1 ∫ 1 1/2 ~2(u) Here we have used that maxh∈{c+1,...,c+m} DN,i = op(m ), that = 0,0 4 + k• σu du op(1), qj+q N 1 ∑ h 2 −1 1 ∑ h 2 0 ~1(u) H h>qj k( H ) is at most of order o(j ), since H h=1 k( H ) ≃ ∑⌊N/q⌋ 1 ∑qj h 2 ∑m 1 = by Riemann integration. The results then follows by the martingale j=0 H h>q(j−1) k( H ) is convergent, and that j=1 j array CLT.  O(log m).  Proof of Theorem A.5. Follows by combining the results of Propo- = Proposition A.5. Suppose SH and D hold and H o(N), then as sitions A.3–A.5.  N → ∞  A.3. Multivariate results N  ∫ 1  Kˆ (Y ) − σ 2(u)du H 0 Proof of Lemma 1. The results (2) and (4) follow by combining  ∫ 1  Theorem 1 with Proposition A.4 and Theorem A.5. From the√ proof Ls 0,0 4 ~2(u) → MN 0, 4k• σ (u) du . of Theorem 1 we have K(X) = K(Y ) + K(U) + Op( K(U)), 0 ~1(u) p p − ′′ and (3) follows since K(Y ) →[Y ] and K(U) → k (0) Ω when H = c2 N (1) (2) 0 ˆ = ∑ 2 + ˆ + ˆ 1/2 Proof. We have K(Y ) i=1 σ (τN,i−1)1N,i ηN,i ηN,i. Phillips c n .  0  N ∑N 2 − and Yu (unpublished paper) imply that H ( i=1 σ (τN,i−1)1N,i Proof of Theorem 3. We analyse the joint characteristic function 1 of the realised kernel matrix  2 = √1 0 σ (u)du) op( ). This means that H  d  − ′  1  N E exp [itr{AK X }] = E exp i tr{K X a a } N  ∫  N ( ) λj ( ) j j . ˆ 2 − (1) (2) K(Y ) − σ (u)du = (ηˆ +η ˆ ) + op(1), j=1 H H N,i N,i 0 i=1 ∑d ′ 8 where A = λjaja is symmetric matrix of constants. Hence it (1) (2) j=1 j which is the sum of the martingale differences: {ˆη +η ˆ , Fτ }. ′ N,i N,i N,i is sufficient for us to study the joint law of ajK(X)aj, for any fixed aj, So we just need to compute its contributions to the conditional ′ ′ j = 1,..., d. This is a convenient form as a K(X)a = K(a X), the variance. j j j (1) ′ The first term, ηˆ , is the sampling error of the well-known univariate kernel applied to the process ajX. This is very convenient N,i ′ realized variance. In the present context, it was studied in Phillips as ajX is simple a univariate process in our class.  N N (1) The univariate results imply that the only thing left to study is ∑ ˆ = ′ and Yu (unpublished paper), and it follows that H i=1 ηN,i the joint distribution of K(ajY ). Now under the conditions of the −1 η Op(H ). This means that unless H = O(1) this term will be theorem with n, H → ∞ and H ∝ n for η ∈ (0, 1) and m/n → 0, asymptotically irrelevant for the realised kernel. Next we will establish that  2  n  ∫ t  N (2) 2 2 −1 − K(Y ) − Σ(u)du (ηˆ ) = 4Nyˆ H khyˆN,i−h , H N,i N,i H 0 h>1  ∫ 1  p Ls 0,0 ~2(u) −1 ∑ ˆ 2 → 0,0 2 ˆ2 = 2 → MN 0, 4k• Ψ (u) du (A.7) where (H h>1 khyN,i−h) k• σ (τN,i). Since NyN,i DN,iσ 0 ~1(u) 2 (τN,i)ε we have N,i where Ψ (u) = Σ(u)⊗Σ(u). This will then complete the theorem. ′ ′ N N The univariate proof implies we that can replace K(a Y ) by Kˆ (a Y ) − ˆ (2)| j j Var(ηN,i FτN,i−1 ) ∑n 2 ∑n ∑n−1 ′ H = x + 2 xn j i khxn j i−h, where xn j i = a σ i=1 i=1 n,j,i i=1 , , h=1 , , , , j 1/2 N (τ − )1 ε and ε = W − W . But this raises no new k0,0 n,i 1 n,i n,i n,i τn,i τn,i−1 = • − 4 2 | + principles and so we can see that using the same method as before σ (τN,i−1)E(DN,i FτN,i−1 ) op(1). N = ′ i 1  [K a Y   ′  ∫ t  ] n ( j ) aj   Now we follow Phillips and Yu (unpublished paper) and write ′ − ′ Σ(u)du aj ak H K(akY ) ak 0 2 E D |   2 2  2 ( N,i FτN,i−1 ) ∫ 1  ′   ′  | = Ls a Σ(u)aj a Σ(u)ak ~2(u) E(DN,i FτN,i−1 ) DN,i 0,0 j j | → MN 0, 4k• 2 2 du . E(DN,i FτN,i−1 )  ′   ′  0 ajΣ(u)ak ajΣ(u)aj ~1(u) 2 | E(DN,i FτN,i−1 ) − {D − E(D |F )} . Unwrapping the results delivers (A.7) as required. N,i N,i τN,i−1 |  E(DN,i FτN,i−1 ) Proof of Theorem 2. Follows as a corollary to Theorem 3.  Now 0,0 N E D2 | 8 k• − 4 ( N,i FτN,i−1 ) It is well known that the distribution is characteristed by the matrix charac- σ (τN,i−1){DN,i − E(DN,i|Fτ − )} [ { ′ }] N N,i 1 E(D |F ) teristic function E exp itr A K(X) . Without loss of generality we can assume A is i=1 N,i τN,i−1 ′ = ∑ = ∑ + symmetric as K(X) is symmetric and tr(A K(X)) i,j aijK(X)ji i aiiK(X)ii = ∑ + = ∑ + = 1 { + ′ ′ } op(1), i

A.4. Optimal choice of bandwidth Barndorff-Nielsen, O.E., Shephard, N., 2004. Econometric analysis of realised covariation: high frequency covariance, regression and correlation in financial The problem is simply to minimise the squared bias plus the economics. Econometrica 72, 885–925. Barndorff-Nielsen, O.E., Shephard, N., 2005. Power variation and time change. contribution from the asymptotic variance with respect to c0. Set Theory of Probability and its Applications 50, 1–15.  1 4 −4 ′′ 2 Bollerslev, T., Tauchen, G., Zhou, H., 2009. Expected stock returns and variance risk IQ = σ (u)du. The first order conditions of minc {−c k (0) 0 0 0 premia. Review of Financial Studies 22, 4463–4492. 4 0,0 ω + c04k• IQ} yield the optimal value for c0 Brownless, C.T., Gallo, G.M., 2006. Financial econometric analysis at ultra-high frequency: data handling concerns. Computational Statistics & Data Analysis 51, ′′ 1/5  2 4  2232–2245. ∗ k (0) ω ∗ 4/5 ∗ ′′ 2 0,0 1/5 c = = c with c = {k 0 k } Campbell, J.Y., Lo, A.W., MacKinlay, A.C., 1997. The Econometrics of Financial 0 0,0 ξ , ( ) / • . k• IQ Markets. Princeton University Press. ∗ ∗ 4/5 3/5 Christensen, K., Kinnebrock, S., Podolskij, M., 2009. Pre-averaging estimators of With H = c ξ n the asymptotic bias is given by the ex-post covariance matrix. Research paper 2009-45, CREATES, Aarhus − University.  ′′ 2 4  2/5 k (0) ω ′′ − de Pooter, M., Martens, M., van Dijk, D., 2008. Predicting the daily covariance − k 0 2n 1/5 0,0 ( )ω matrix for s&p 100 stocks using intraday data — but which frequency to use? k• IQ Econometric Reviews 27, 199–229. ′′ − = |k (0)ω2|1/5{k0,0IQ}2/5n 1/5, Doornik, J.A., 2006. Ox: An Object-Orientated Matrix Programming Langguage, fifth • ed. Timberlake Consultants Ltd., London. and the asymptotic variance is Dovonon, P., Goncalves, S., Meddahi, N., 2011. Bootstrapping realized multivariate volatility measures. Journal of Econometrics (forthcoming).  ′′ 2 4 1/5 Drechsler, I., Yaron, A., 2011. What’s vol got to do with it. Journal of Econometrics k (0) ω − ′′ − 4k0,0IQn 2/5 = 4|k 0 2|2/5{k0,0IQ}4/5n 2/5 24, 1–45. 0,0 • ( )ω • .  k• IQ Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events for Insurance and Finance. Springer, Berlin. Engle, R.F., Gallo, J.P., 2006. A multiple indicator model for volatility using intra daily Appendix B. Errors induced by stale prices data. Journal of Econometrics 131, 3–27. Engle, R.F., Kroner, K.F., 1995. Multivariate simultaneous generalized ARCH. Econometric Theory 11, 122–150. The stale prices induce a particular form of noise with an Engle, R.F., Mezrich, J., 1996. GARCH for groups. Risk 36–40. Epps, T.W., 1979. Comovements in stock prices in the very short run. Journal of the endogenous component. The price indexed by time τj is, in fact, American Statistical Association 74, 291–296. (i) ≤ = the price recorded at time tj τj, for i 1,..., d. With Refresh Falkenberry, T.N., 2001. High frequency data filtering, Technical Report, Tick Data. ≥ (i) Fisher, L., 1966. Some new stock-market indexes. Journal of Business 39, 191–225. Time we have τj tj > τj−1 so that Fleming, J., Kirby, C., Ostdiek, B., 2003. The economic values of volatility timing using ‘realized’ volatility. Journal of Financial Economics 67, 473–509. (i) = (i) (i) + ˜ (i) (i) X (τj) Y (tj ) U (tj ) Gallant, A.R., 1987. Nonlinear Statistical Models. John Wiley, New York. Ghysels, E., Harvey, A.C., Renault, E., 1996. Stochastic volatility. In: Rao, C.R., = (i) + ˜ (i) (i) − { (i) − (i) } Maddala, G.S. (Eds.), Statistical Methods in Finance. North-Holland, Amsterdam, Y (τj) U (tj ) Y (τj) Y (tj ) . pp. 119–191.    Glasserman, P., 2004. Monte Carlo Methods in Financial Engineering. Springer- U(i)(τ ) j Verlag, New York, Inc. The endogenous component that is induced by refresh time is Goncalves, S., Meddahi, N., 2009. Bootstrapping realized volatility. Econometrica 77, 283–306. (i) − (i) Y (τj) Y (tj ). But this is exactly the sort of dependence that Griffin, J.E., Oomen, R.C.A., 2011. Covariance measurement in the presence of non- ˜ synchronous trading and market microstructure noise. Journal of Econometrics Assumption U can accommodate through correlation between W 160, 58–68. and W , and the (random) coefficients ψh(t), h = 0, 1,.... Guillaume, D.M., Dacorogna, M.M., Dave, R.R., Müller, U.A., Olsen, R.B., Pictet, O.V., 1997. From the bird’s eye view to the microscope: a survey of new stylized facts of the intra-daily foreign exchange markets. Finance and Stochastics 2, 95–130. References Hansen, P.R., Horel, G., 2009. Quadratic Variation by Markov Chains, working paper. http://www.stanford.edu/people/peter.hansen. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2010. Parametric and nonparametric Hansen, P.R., Large, J., Lunde, A., 2008. Moving average-based estimators of measurement of volatility. In: Aït-Sahalia, Y., Hansen, L.P. (Eds.), Handbook of integrated variance. Econometric Reviews 27, 79–111. Financial Econometrics. North Holland, Amsterdam, pp. 67–138. Hansen, P.R., Lunde, A., 2005. A realized variance for the whole day based on Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2000. Great realizations. Risk intermittent high-frequency data. Journal of Financial Econometrics 3, 525–554. 13, 105–108. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001. The distribution of (with discussion). Journal of Business and Economic Statistics 24, 127–218. exchange rate volatility. Journal of the American Statistical Association 96, Harris, F., McInish, T., Shoesmith, G., Wood, R., 1995. Cointegration, error correction 42–55. Correction published in 2003 volume 98, page 501. and price discovery on informationally-linked security markets. Journal of Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting Financial and Quantitative Analysis 30, 563–581. realized volatility. Econometrica 71, 579–625. Hayashi, T., Jacod, J., Yoshida, N., 2008. Irregular Sampling and Central Limit Andersen, T.G., Bollerslev, T., Meddahi, N., 2011. Market microstructure noise and Theorems for Power Variations: The Continuous Case. Keio University realized volatility forecasting. Journal of Econometrics 160, 220–234. (unpublished paper). Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covari- Hayashi, T., Yoshida, N., 2005. On covariance estimation of non-synchronously ance matrix estimation. Econometrica 59, 817–858. observed diffusion processes. Bernoulli 11, 359–379. Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized variance, and optimal Jacod, J., 1994. Limit of random measures associated with the increments of a sampling. Review of Economic Studies 75, 339–369. Brownian semimartingale. Preprint number 120, Laboratoire de Probabilitiés, Bandi, F.M., Russell, J.R., 2005. Realized covariation, realized beta and microstruc- Université Pierre et Marie Curie, Paris. ture noise. Graduate School of Business, University of Chicago (unpublished Jacod, J., 2008. Statistics and high frequency data (unpublished paper). paper). Bandi, F.M., Russell, J.R., 2006. Seperating microstructure noise from volatility. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise Journal of Financial Economics 79, 655–692. in the continuous case: the pre-averaging approach. Stochastic Processes and Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing Their Applications 119, 2249–2276. realised kernels to measure the ex-post variation of equity prices in the Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for presence of noise. Econometrica 76, 1481–1536. stochastic differential equations. Annals of Probability 26, 267–307. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2009. Realised kernels Jacod, J., Shiryaev, A.N., 2003. Limit Theorems for Stochastic Processes, second ed. in practice: trades and quotes. Econometrics Journal 12, 1–33. Springer, Berlin. Barndorff-Nielsen, O.E., Shephard, N., 2007. Variation, jumps and high frequency Kalnina, I., Linton, O., 2008. Estimating quadratic variation consistently in the data in financial econometrics. In: Blundell, R., Persson, T., Newey, W.K. (Eds.), presence of correlated measurement error. Journal of Econometrics 147, 47–59. Advances in Economics and Econometrics. Theory and Applications, Ninth Large, J., 2007. Accounting for the Epps Effect: Realized Covariation, Cointegration World Congress. In: Econometric Society Monographs, Cambridge University and Common Factors. Oxford-Man Institute, University of Oxford (unpublished Press, pp. 328–372. paper). Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised Li, Y., Mykland, P., Renault, E., Zhang, L., Zheng, X., 2009. Realized volatility when volatility and its use in estimating stochastic volatility models. Journal of the endogeniety of time matters. Department of Statistics, University of Chicago. Royal Statistical Society, Series B 64, 253–280. Working Paper. O.E. Barndorff-Nielsen et al. / Journal of Econometrics 162 (2011) 149–169 169

Malliavin, P., Mancino, M.E., 2002. Fourier series method for measurement of Reno, R., 2003. A closer look at the Epps effect. International Journal of Theoretical multivariate volatilities. Finance and Stochastics 6, 49–61. and Applied Finance 6, 87–102. Martens, M., 2003. Estimating unbiased and precise realized covariances. Depart- Shephard, N., Sheppard, K.K., 2010. Realising the future: forecasting with high ment of Finance, Erasmus School of Economics, Rotterdam (unpublished paper). frequency based volatility (HEAVY) models. Journal of Applied Econometrics 25, Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Ito processes. Annals of 197–231. Statistics 34, 1931–1963. Silverman, B.W., 1986. Density Estimation for Statistical and Data Analysis. Mykland, P.A., Zhang, L., 2009. Inference for continuous semimartingales observed Chapman & Hall, London. at high frequency: a general approach. Econometrica 77, 1403–1455. Vetter, M., 2008. Estimation methods in noisy diffusion models. Ph.D. thesis, Institute of Mathematics, Ruhr University Bochum. Mykland, P.A., Zhang, L., 2009. The econometrics of high frequency data. In: A.L.M. Voev, V., Lunde, A., 2007. Integrated covariance estimation using high-frequency Kessler and M. Sørensen, (Eds), ‘Statistical Methods for Stochastic Differential data in the presence of noise. Journal of Financial Econometrics 5, 68–104. Equations’, Chapman & Hall/CRC Press (forthcoming). Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: Newey, W.K., West, K.D., 1987. A simple positive semi-definite, heteroskedasticity a multi-scale approach. Bernoulli 12, 1019–1043. and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining Phillips, P.C.B., Yu, J., 2008. Information loss in volatility measurement with flat integrated volatility with noisy high-frequency data. Journal of the American price trading. Cowles Foundation for Research in Economics, Yale University Statistical Association 100, 1394–1411. (unpublished paper). Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal Protter, P., 2004. Stochastic Integration and Differential Equations. Springer-Verlag, of Business and Economic Statistics 14, 45–52. New York. Zhou, B., 1998. Parametric and nonparametric volatility measurement. In: Du- Renault, E., Werker, B., 2011. Causality effects in return volatility measures with nis, C.L., Zhou, B. (Eds.), Nonlinear Modelling of High Frequency Financial Time random times. Journal of Econometrics 160, 272–279. Series. John Wiley Sons Ltd., New York, pp. 109–123 (Chapter 6).