Quick viewing(Text Mode)

Principal Component Analysis of High-Frequency Data

Principal Component Analysis of High-Frequency Data

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2019, VOL. 114, NO. 525, 287–303, Theory and Methods https://doi.org/./..

Principal Component Analysis of High-Frequency Data

Yacine Aït-Sahaliaa and Dacheng Xiub aDepartment of Economics, Princeton University and NBER, Princeton, NJ; bBooth School of Business, University of Chicago, Chicago, IL

ABSTRACT ARTICLE HISTORY We develop the necessary methodology to conduct principal component analysis at high frequency. We Received April  construct estimators of realized eigenvalues, eigenvectors, and principal components, and provide the Revised August  asymptotic distribution of these estimators. Empirically, we study the high-frequency covariance structure KEYWORDS of the constituents of the S&P 100 Index using as little as one week of high-frequency data at a time, and Eigenvalue; Eigenvector; examines whether it is compatible with the evidence accumulated over decades of lower frequency returns. High frequency; Itô We find a surprising consistency between the low- and high-frequency structures. During the recent finan- semimartingale; Principal cial crisis, the first principal component becomes increasingly dominant, explaining up to 60% of thevari- components; Spectral ation on its own, while the second principal component drives the common variation of financial sector stocks. Supplementary materials for this article are available online.

1. Introduction multivariate normal distribution. Even under normality, the Principal component analysis (PCA) provides information asymptotic distribution becomes rather involved when repeated about any latent common structure that might exist in a dataset. eigenvalues are present. Waternaux (1976)showedthatasimi- As a result, it is one of the most popular techniques in multi- lar central limit result holds for simple eigenvalues as long as the variate , especially when analyzing large datasets (see, distribution of the data has finite fourth moment, while Tyler e.g., Hastie, Tibshirani, and Friedman 2009). The central idea of (1981) and Eaton and Tyler (1991)obtainedtheasymptoticdis- PCA is to identify a small number of common or principal com- tribution of eigenvectors and eigenvalues under more general ponentswhicheffectivelysummarizealargepartofthevariation assumptions, see also discussions in the books by Jolliffe2002 ( ) ofthedata,andservetoreducethedimensionalityoftheprob- and Jackson (2003). lem and achieve parsimony. The classical PCA approach to statistical inference of eigen- Classical PCA originated in Pearson (1901)andHotelling valuessuffersfromthreemaindrawbacks.Firstisthecurse (1933) and is widely used in macroeconomics and finance, often of dimensionality. For instance, it is well known that even the as a parsimonious summary of the data, or as a precursor or largest eigenvalue is no longer consistently estimated when the in addition to factor analysis. Among applications, Litterman cross-sectional dimension grows at the same rate as the sam- and Scheinkman (1991) employed PCA to document a three- ple size along the time domain. Second, the asymptotic theory factor structure of the term structure of bond yields, labeling the is essentially dependent on frequency domain analysis under first three components as the level, the slope, and the curvature stationarity (and often additional) assumptions, see, for exam- factors. Egloff, Leippold, and Wu (2010) suggested a two-factor ple, Brillinger (2001). Third, the principal components are lin- model for volatility based on the PCA of variance swaps. Baker ear combinations of the data, which fail to capture potentially and Wurgler (2006) created a sentiment measure, to reflect the nonlinear patterns therein. optimistic or pessimistic view of investors, using the first princi- These three drawbacks create difficulties when PCA is pal component of a number of sentiment proxies. Baker, Bloom, employed on asset returns data. A typical portfolio may con- and Davis (2016) created a policy uncertainty index using prin- sist of dozens of stocks. For instance, a portfolio with 30 stocks cipal components of newspaper coverage of policy-related eco- has 465 parameters in their covariance matrix, if no additional nomic uncertainty, reports by the congressional budget office, structure is imposed, while one with 100 stocks contain 5050 and survey of professional forecasters. parameters. As a result, years of time series data are required The estimation of the eigenvalues of the sample covariance for estimation, raising issues of survivorship bias, potential non- matrix is the key step toward PCA. Classical low-frequency stationarity, and parameter constancy. Moreover, asset returns asymptotic results, starting with Anderson (1958) and Ander- are known to exhibit time-varying volatility and heavy tails, son (1963), show that eigenvalues of the sample covariance leading to deviations from the assumptions required by the matrix are consistent and asymptotically normal estimators of classical PCA asymptotic theory. In addition, prices of deriva- the population eigenvalues, at least when the data follow a tives, for example, options, are often nonlinear functions of the

CONTACT Dacheng Xiu [email protected] Booth School of Business, University of Chicago, Chicago, IL -. Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. ©  American Statistical Association 288 Y. AÏT-SAHALIA AND D. XIU underlying asset’s price and volatility. Ruling out nonlinear as special cases of this estimator, by selecting a specific spectral combinations disconnects their principal components from the function. Aggregating local estimates results in an asymptotic underlying factors that drive derivative returns. bias, due to the nonlinearity of the spectral functions. The bias These issues motivate the development in this article of the is fortunately of higher order, which enables us to construct a tools necessary to conduct PCA for continuous-time stochas- bias-corrected estimator, and show that the latter is consistent tic processes using high-frequency data and high-frequency or and achieves stable convergence in law to a mixed normal dis- “in-fill” asymptotics, whereby the number of observations grows tribution. within a fixed time window. This approach not only adds the As far as asymptotic theory is concerned, our estimators are important method of PCA to the high-frequency toolkit, but constructed by aggregating functions of instantaneous covari- it also addresses the three drawbacks mentioned above. First, ance estimates. Other examples of this general type arise in the the large amount of intraday time series data vastly improve the high-frequency context. Jacod and Rosenbaum (2013)analyzed time series versus cross-sectional dimensionality trade-off. And such estimators with the objective of estimating the integrated these are not redundant observations for the problem at hand. quarticity. Li, Todorov, and Tauchen (2016)estimatedvolatil- Unlike for expected returns, it is well established theoretically ity functional dependencies, while Kalnina and Xiu (2017)esti- that increasing the sampling frequency improves variance and mated the leverage effect measured by the integrated correla- covariance estimation, at least until market microstructure con- tion with an additional volatility instrument. Li and Xiu (2016) cerns start biting. Yet, market microstructure noise is not a seri- employed the generalized method of moments to investigate ous concern at the 1-minute sampling frequency we will employ structural economic models which include the instantaneous empirically, for liquid stocks which typically trade on infra- volatility as an explanatory variable. An alternative inference second time scales. As a result of the large increase in the time theory in Mykland and Zhang (2009) is designed for a class of series dimension, it is plausible to expect high-frequency asymp- estimators based on an aggregation of local estimates using a totic analysis with the cross-sectional dimension fixed to serve finite number of blocks. as accurate approximations. In fact, we will show both in sim- Also, related to the problem we study is the rank infer- ulations and empirically that PCA works quite well at high fre- ence developed by Jacod, Lejay, and Talay (2008)andJacod quency for a portfolio of 100 stocks using as little as one week of and Podolskij (2013), where the cross-sectional dimension is 1-minutereturns.Theuseofafixeddimensionasymptoticanal- fixed and the processes follow continuous Itô semimartingales. ysis makes our inference feasible. Second, the high-frequency Allowing for an increasing dimension yet more parametric asymptotic framework enables nonparametric analysis of gen- structure, Zheng and Li (2011) established a Marcenko–Pastur eral stochastic processes, namely, Itô semimartingales, thereby type theorem for analyzing the spectral distribution of the inte- allowing strong dependence, nonstationarity, and heteroscedas- grated covariance matrix for a special class of diffusion pro- ticity due to stochastic volatility and jumps, and freeing the anal- cesses. In a similar setting, Heinrich and Podolskij (2014)stud- ysis from the strong parametric assumptions that are required ied the spectral distribution of empirical covariation matrices of in a low-frequency setting. In effect, the existence of population Brownian integrals. moments becomes irrelevant and PCA becomes applicable at A limitation of our setup is the fixed dimensionality required high frequency for general Itô semimartingales. Third, our prin- for the present asymptotic inference. In the low-frequency cipal components are built upon instantaneous (or locally) lin- setting, when the cross-sectional dimension of the data grows ear combinations of the stochastic processes, thereby capturing atthesamerateasthesamplesize,eigenvaluesofthesample general nonlinear relationships by virtue of Itô’s lemma. covariance matrix are no longer consistent with respect to their Within a continuous-time framework, we first develop the population counterparts (see, e.g., Bai 1999;Johnstone2001). concept of “realized” or high-frequency PCA for data sampled These analyses are related to a large and evolving literature on from a stochastic process within a fixed time window. Realized random matrix theory, with numerous results on the distribu- PCA depends on realizations of eigenvalues, which are stochas- tion of the extreme eigenvalues of random matrices, the limiting tic processes, evolving over time. Our implementation of real- spectral distribution of large random matrices, and the so called ized PCA is designed to extend classical PCA from its low- universality phenomenon, see Bai and Silverstein (2006)and frequency setting into the high frequency one as seamlessly as Tao (2012) for detailed reviews and discussions. Another related possible. Therefore, we start by estimating realized eigenvalues strand of the literature focuses on the spiked eigenvalue models, fromtherealizedcovariancematrix.Onetechnicalchallengewe where the largest few eigenvalues differ from the rest in popu- mustovercomeisthefactthatwhenaneigenvalueissimple,it lation, yet are still bounded. In this scenario, Johnstone and Lu is a smooth function of the instantaneous covariance matrix. (2009)andPaul(2007) showed that the estimated eigenvector On the other hand, when it comes to a repeated eigenvalue, corresponding to the largest eigenvalue is inconsistent unless this function is not differentiable, which hinders statistical infer- thesamplesizegrowsataratefasterthantheoneatwhichthe ence, as the asymptotic theory requires at least second-order dif- cross-sectional dimension increases. Recently, Wang and Fan ferentiability. To tackle the issue of nonsmoothness of repeated (2017) extended this setting to allow for diverging eigenvalue eigenvalues, we propose an estimator constructed by averaging spikes, and characterize the limiting distribution of the extreme all repeated eigenvalues. Using the theory of spectral functions, eigenvalues and certain entries of the eigenvectors under an we show how to obtain differentiability of the proposed esti- ultra-high-dimensional regime where the dimension can grow mators, and the required derivatives. We therefore construct an even faster than the sample size. An alternative avenue to resolv- estimator of realized spectral functions, and develop the high- ing the inconsistency of the PCA “sparsifies” the PCA directly, frequency asymptotic theory for this estimator. Estimation of that is, imposing the sparsity on eigenvectors (see, e.g., Jolliffe, eigenvalues, principal components, and eigenvectors then arise Trendafilov, and Uddin 2003; Zou, Hastie, and Tibshirani 2006; JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 289 d’Aspremont et al. 2007; Amini and Wainwright 2009,and More often than not, time aggregation results in patterns that Johnstone and Lu 2009). The aforementioned papers all rely on are quite different in data sampled at vastly different frequencies. the iid assumption, some of which require additional structures Of course, the inherent design limitation of PCA is that it does (spiky eigenvalues or sparse eigenvectors), so that their analyses not lend itself naturally to an identification of what the principal are not applicable to the general nonparametric setting we components are in terms of economic factors. Nevertheless, we analyze. Despite our conjecture that similar results shall hold in find evidence that the first principal component shares time our high-frequency setting when dimension grows, extensions series characteristics with the overall market return, and, using to this case are beyond the scope of this paper, which we leave biplots, we also find that the second principal component drives for future work. the common variation of financial sector stocks during the WhilethePCAdevelopmentsinthisarticlearefullynon- weeks most associated with the recent financial crisis, but not parametric without any assumptions on the existence of a in other weeks. Section 6 concludes. Proofs are in the appendix. factor structure, related techniques to those developed here can be employed to estimate parametric or semiparametric 2. The Setup continuous-time models with a factor structure allowing for an increasing dimensionality: see Fan, Furger, and Xiu (2016) 2.1. Standard Notations and Definitions and Aït-Sahalia and Xiu (2017) for using factor analysis for covariance estimation and Pelger (2015a)andPelger(2015b) In what follows, Rd denotes the d-dimensional Euclidean space, R+ k for estimating factors and their loadings. Factor analysis is a and its subset d contains nonnegative real vectors. Lete be the relatedyetdistinctproblemfromPCA,justasitisinalow- R+ 1 = d k unit vector in d ,withthekth entry equal to 1, k=1 e , frequency setting. That being said, PCA is often employed to and I be the identity matrix. Md denotes the Euclidean space of analyze (approximate) factor structures (Ross 1976;Chamber- × M+ M all d d real-valued symmetric matrices. d is a subset of d, lain and Rothschild 1983), including estimating the number of M++ which includes all positive-semidefinite matrices. We use d approximate factors (see Connor and Korajczyk 1993; Bai and to denote the set of all positive-definite matrices. The Euclidean Ng 2002; Amengual and Watson 2007;Kapetanios2010;Onatski space Md is equipped with an inner product A, B=Tr(AB). 2010,andTrapani2017), estimating factors and their loadings We use · to denote the Euclidean norm for vectors or matrices, (see Bai 2003), and forecasting (see Stock and Watson 1999; and the superscript “+” to denote the Moore–Penrose inverse of Stock and Watson 2002). The above factor models are static, as a real matrix (see, e.g., Magnus and Neudecker 1999). opposed to the dynamic factor models discussed in Forni et al. All vectors are column vectors. The transpose of any matrix  (2000, 2004), Forni and Lippi (2001), and their recent exten- A is denoted by A .Theith row of A is written as Ai,· and sions with infinite-dimensional factor space discussed in Forni the jth column of A is A·, j. Aij denotes the (i, j)th entry et al. (2015, 2017), for which frequency domain PCA is applied. of A. The operator diag : Md → Rd is defined as diag(A) =  Recently, Wu and Zaffaroni (2018) provided uniform conver- (A11, A22,...,Add ) . In addition, we define its inverse opera- gence results of lag-window spectral density estimates for non- tor diag, which maps a vector x to a diagonal matrix. linear, non-Gaussian, and nonstrong mixing processes, which R+ R Let f be a function from d to .Thegradientof f is writ- paves the way for analyzing generalized dynamic factor models ten as ∂ f , and its Hessian matrix is denoted as ∂2 f . The deriva- under weak conditions. M+ → R ∂ ∈ M tive of a matrix function F : d is denoted by F d, The article is organized as follows. Section 2 sets up the nota- with each element written as ∂ijF,for1≤ i, j ≤ d .Notethatthe tion and the model. Section 3 provides the estimators and the ∈ M+ derivative is defined in the usual sense, so that for any A d , main asymptotic theory. Section 4 reports the results of Monte ∂ijA = Jij,whereJij is a single-entry matrix with the (i, j)th Carlosimulations.Wethenimplementthemethodtoanalyzeby entry equal to 1. The Hessian matrix of F is written as ∂2F, PCA the covariance structure of the S&P 100 stocks in Section ∂2 ≤ , , , ≤ with each entry referred to as jk,lmF,for1 j k l m d.We 5, asking whether at high-frequency cross-sectional patterns k use ∂ to denote kth-order derivatives and δi, j to denote the in stock returns are compatible with the well-established low- Kronecker’s delta function giving 1 if i = j or 0 otherwise. A frequency evidence of a low-dimensional common factor struc- function f is Lipschitz if there exists a constant K such that ture (e.g., Fama and French 1993). We find that, excluding jump | f (x + h) − f (x)|≤K h . In the proof, K is a generic con- variation,three Brownian factors explain between 50 and 60% of stant which may vary from line to line. A function with kth con- continuous variation of the stock returns, that their explanatory tinuous derivatives is denoted a Ck function. powervariesovertimeandthatduringtherecentfinancialcrisis, Data are sampled discretely every n units of time. All lim- the first principal component becomes increasingly dominant, u.c.p. its are taken as  → 0. “ ⇒ ” denotes uniformly on compacts explaining up to over 60% of the variation on its own, capturing n −→p market-wide, systemic, risk. Despite the differences in methods, in probability, and “ ” denotes convergence in probability. L−s time periods, and length of observation, these empirical find- We use “−→ ” to denote stable convergence in law. We write ings at high frequency are surprisingly consistent with the well- an bn if for some c ≥ 1, bn/c ≤ an ≤ cbn for all n.Finally,[·, ·] established low-frequency Fama–French results of three (or up denotes the quadratic covariation between Itô semimartingales, to five) common factors. and [·, ·]c the continuous part thereof. The empirical finding that the low-dimensional PC structure that is known to hold at low frequency using decades of daily or 2.2. Eigenvalues and Eigenvectors lower frequency returns data also holds at high frequency using a week’s worth of data is not obvious, and could not have been We now collect some preliminary results about eigenvalues and ∈ R+ ¯ obtained without the methodology developed in this article. eigenvectors. For any vector x d , x denotes the vector with 290 Y. AÏT-SAHALIA AND D. XIU thesameentriesasx, ordered in a nonincreasing order. We use 2.3. Dynamics of the Variable R¯ + R+ d to denote the subset of d containing vectors x satisfying ≥ ≥···≥ ≥ The process we analyze is a general d-dimensional Itô semi- x1 x2 xd 0. (, F, ∈ R¯ + martingale, defined on a filtered probability space By convention, for any x d ,wecanwrite (Ft )t≥0, P) with the following Grigelionis representation: = ···= > =···= > ··· > t t x x x + x x − x − + 1 g1 g1 1 g2 gr 1 gr 1 1 = + + σ + (δ ) ∗ (μ − ν) = ···= ≥ , Xt X0 bs ds sdWs 1δ≤1 t xgr 0 (1) 0 0 +(δ1{δ>1}) ∗ μt , (2) where g = d and r isthenumberofdistinctelement. r μ {g , g ,...,g } depends on x. We then define a correspond- where W is a d-dimensional Brownian motion, is a Pois- 1 2 r R+ × R ing partition of indices as I ={g − + 1, g − + 2,...,g } for son random measure on d with the compensator j j 1 j 1 j ν( , ) = ⊗¯ν( ) ν¯ σ j = 1, 2,...,r. dt dx dt dx ,and is a -finite measure. This ∈ M+ λ( ) = (λ ( ), λ ( ),...,λ ( )) model allows for price and volatility jumps and imposes no For any A d , A 1 A 2 A d A is the vector of its eigenvalues in a nonincreasing order. This nota- restriction on the dependence among various components of the λ M+ R¯ + processes, hence accommodates the leverage effect (Black 1976). tion also permits us to consider as a mapping from d to d . An important result establishing the continuity of λ is (see, e.g., As shown by Delbaen and Schachermayer (1994), a semimartin- Tao 2012): gale is appropriate for modeling asset returns, as it essentially rules out arbitrage opportunities. Therefore, such a model λ M+ → R¯ + accommodates many applications in finance and is commonly Lemma 1. : d d is Lipschitz. used for deriving infill asymptotic results for high-frequency + data. More details on high-frequency models and asymptotics Associated with any eigenvalue λg of A ∈ M , we denote d  its eigenvector as γ , which satisfies Aγ = λ γ and γ γ = 1. can be found in the book Aït-Sahalia and Jacod (2014). g g g g g g σ = (σσ ) ∈ M+ The eigenvector, apart from its sign, is uniquely defined when The volatility process s is càdlàg, cs s d ,for ≤ ≤ λ is simple. In such a case, without loss of generality, we require any 0 s t.Wedenoted nonnegative eigenvalues of cs by g λ ≥ λ ≥···≥λ λ the first nonzero element of the eigenvector to be positive. In 1,s 2,s d,s,summarizedinavector s.Asisdis- λ = λ( ) λ(·) the presence of repeated eigenvalues, the eigenvector is deter- cussed above, we sometimes write s cs , regarding as λ(·) mined up to an orthogonal transformation. In any case, we can afunctionofcs .ByLemma 1, is a continuous function so = γ γ = that λ(c ) is càdlàg. choose eigenvectors such that for any g h, g h 0 (see, e.g., s Anderson 1958). λ γ When g is a simple root, we regard g as another vector- 2.4. Principal Component Analysis at High Frequency valued function of A.Itturnsoutinthiscase,bothλg(·) and γg(·) are infinitely smooth at A (see, e.g., Magnus and Neudecker Similar to the classical PCA, the PCA procedure in this set- 1999). ting consists in searching repeatedly for instantaneous linear combinations of X, namely principal components, which max- λ ∈ M+ λ imize certain measure of variation, while being orthogonal Lemma 2. Suppose g is a simple root of A d ,then g : M+ → R¯ + γ M+ → R ∞ to the principal components already constructed, at any time d and g : d d are C at A.Moreover,we have between 0 and t.Incontrasttotheclassicalsetting,whereone maximizes the variance of the combination (see, e.g., Anderson ∂ λ ( ) = γ ( )γ ( ) ∂ γ ( ) = (λ I − )+ γ ( ), 1958), the criterion here is the continuous part of the quadratic jk g A gj A gk A and jk g A g A ·, j gk A variation. γ γ where gk is the kth entry of g. If in addition all the eigenvalues Lemma 3. Suppose that X is a d-dimensional vector-valued (γ ,γ ,...,γ ) of A are simple, with 1 2 d being the corresponding process described in (2). Then there exists a sequence of eigenvectors, then {λg,s,γg,s}1≤g≤d,0≤s≤t ,suchthat   1 γ , = λ , γ , ,γ, γ , = , γ γ , = , ∂2 γ =− γ γ γ γ γ − γ γ γ γ γ cs g s g s g s g s g s 1 and h,scs g s 0 jk,lm gh (λ − λ )2 gl gm ph pj gk pl pm ph pj gk = g p p g where λ1,s ≥ λ2,s ≥···≥λd,s ≥ 0. Moreover, for any càdlàg γ γ γ = 1 and vector-valued adapted process s,suchthat s s 1and + γql γpmγqhγpjγgk  (λ − λ )(λ − λ ) γ csγh,s = 0, 1 ≤ h ≤ g − 1, p =g q =p g p p q s 1 u u u c + γ γ γ γ γ λ ≥ γ  , γ  , ≤ ≤ . (λ − λ )(λ − λ ) ql pm qj ph gk g,sds s−dXs s−dXs for any 0 u t p =g q =p g p p q 0 0 0 1 + γ γ γ γ γ . When λg,s is a simple root of cs between 0 and t,thenγg,s (λ − λ )(λ − λ ) qk ql gm ph pj γ (·) p =g q =g g p g q is an adapted càdlàg process, due to the continuity of g by Lemma 2,sothatwecanconstructitscorrespondingprincipal t γ  In general, while the eigenvalue is always a continuous func- component 0 g,s−dXs. tion, the eigenvector associated with a repeated root is not neces- Given Lemma 3, we can define our parameters of interest: the t λ sarily continuous. In what follows, we only consider estimation realized eigenvalue, that is, 0 sds, the realized principal com- λ t γ  ofaneigenvectorwhenitisassociatedwithasimpleeigenvalue. ponentsassociatedwithsomesimpleroot g,thatis, 0 g,s−dXs. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 291

It may also be worth investigating the average loading on the Next, we note that both simple and repeated eigenvalues can principal component, that is, the realized eigenvector, which is be viewed as special cases of spectral functions. t γ , ds. Since the definitions of these quantities all rely on inte- 0 g s Example 1 (A Simple Eigenvalue). Suppose the kth eigenvalue of grals, we call them integrated eigenvalues, integrated principal + A ∈ M is simple, that is, λk−1(A)>λk(A)>λk+1(A).Define components, and integrated eigenvectors. d + for any x ∈ R , The above analysis naturally leads us to consider inference d t λ( ) λ(·) ¯ for cs ds, for which the differentiability of is critical: f (x) = the kth largest entry in x = xk. 0 ˆ We start with the convergence of an estimator cs to cs,from Apparently, f is a symmetric function, and it is C∞ at any which we need in a delta method sense to infer the convergence ¯ + ∈ R − > > + ∂ ( ) = k of the integrated eigenvalues estimator. A simple eigenvalue is point y d ,withyk 1 yk yk 1. Indeed, f y e and ∞ ∂l ( ) = ≥ λ ( ) = ( ◦ λ)( ) ∞ C -differentiable, but for repeated eigenvalues, this is not nec- f y 0forl 2. By Lemma 4, k A f A is C essarily the case. For this reason, in order to fully address the at A. estimation problem, we need to introduce spectral functions. Example 2 (Nonsimple eigenvalues). Suppose the eigenvalues of ∈ M+ A d satisfy 2.5. Spectral Functions λ ( )>λ ( ) ≥···≥λ ( )>λ ( ), gl−1 A gl−1+1 A gl A gl+1 A + A real-valued function F defined on a subset of M is called a d for some 1 ≤ gl−1 < gl ≤ d.Byconvention,whengl = d,the spectral function (see, e.g., Friedland 1981)ifforanyorthogonal > + last “ ” above is not used. Consider the following function f , M M ( ) = (  ) + matrix O in d and X in d , F X F O XO . We describe evaluated at x ∈ R ,whichisgivenby R+ R+ R+ d sets in d and functions from d to as symmetric if they are gl invariant under coordinate permutations. That is, for any sym- 1 + ( ) = ¯ . metric function f with an open symmetric domain in R ,we f x x j d gl − gl−1 j=g − +1 have f (x) = f (Px) for any permutation matrix P ∈ Md.Asso- l 1 ciated with any spectral function F,wedefineafunction f on ∞ + It is easy to verify that f is symmetric and C at any point R ( ) = ( ( )) ¯ + d ,sothat f x F diag x . Since permutation matrices are y ∈ R that satisfies y > y + ≥···≥y > y .More- ◦ λ = ◦ d gl−1 gl−1 1 gl gl+1 orthogonal, f is symmetric, and f F,where denotes ∂ ( ) = 1 gl k ∂l ( ) = ≥ over, f y − = + e ,and f y 0, for any l function composition. gl gl−1 k gl−1 1 2. As a result of Lemma 4, the corresponding spectral function We will need to differentiate the spectral function F.We gl introduce a matrix function associated with the corresponding 1 ∈ R¯ + F(A) = ( f ◦ λ)(A) = λ (A), symmetric function f of F,which,foranyx d in the form − j gl gl−1 = + of (1), is given by j gl−1 1 ⎧ ∞ ⎪ = ; is C at A.Inthespecialcasewhere ⎨⎪ 0ifp q ∂2 f (x) − ∂2 f (x) if p = q, and A f ( ) = pp qq λ (A)>λ + (A) =···=λ (A)>λ (A), p,q x ⎪ , ∈ ; gl−1 gl−1 1 gl gl+1 ⎩⎪ p q Il ∂ ( ) − ∂ ( ) / − ( ) = λ + ( ) = p f x q f x xp xq otherwise that is, there is a repeated eigenvalue, and F A gl−1 1 A ···=λg (A).Bycontrast,λ j (A) is not differentiable, for any for some l ={1, 2,...,r}. l gl−1 + 1 ≤ j ≤ gl. The following lemma collects some known and useful results ∈ R+ regarding the continuous differentiability and convexity of spec- Example 3 (Trace and determinant). For any x d , define tral functions, which we will use below: d d Lemma 4. The symmetric function f is twice continuously dif- f1(x) = x j and f2(x) = x j. + λ( ) ∈ R¯ = = ferentiable at a point A d if and only if the spectral func- j 1 j 1 = ◦ λ tion F f is twice continuously differentiable at the point ∞ ∈ ∈ M+ Both f1 and f2 are symmetric and C . Therefore, for any A A d . The gradient and the Hessian matrix are given below: M+ ( ) = ( ◦ λ)( ) ( ) = ( ◦ λ)( ) ∞ d ,Tr A f1 A and det A f2 A are C at d A. ∂ jk( f ◦ λ)(A) = Opj∂p f (λ(A))Opk, p=1 The previous examples make it clear that the objects of inter- est, eigenvalues, are special cases of spectral functions, whether d 2 2 they are simple or repeated. (We are also able to estimate the ∂ , ( f ◦ λ)(A) = ∂ f (λ(A))Opl OpmOqjOqk jk lm pq trace and determinant “for free” .) Lemma 4 links the differen- p,q=1 tiability of a spectral function F, which is needed for statistical d f inference, to that of its associated symmetric function f .Thekey + A (λ(A))O O O O , pq pl pj qk qm advantage is that differentiability of f is easier to establish. p,q=1 Before turning to inference, we prove a useful result that char- where O is any orthogonal matrix that satisfies A = acterizes the topology of the set of matrices with special eigen- Odiag(λ(A))O. More generally, f is a Ck function at λ(A) if value structures. The domain of spectral functions we consider and only if F is Ck at A,foranyk = 0, 1,...,∞. In addition, f will be confined to this set in which these spectral functions are is a convex function if and only if F is convex. smooth. 292 Y. AÏT-SAHALIA AND D. XIU

Lemma 5. For any 1 ≤ g1 < g2 < ···< gr ≤ d,theset we can make assumptions directly on f instead of F,whichare  ++ much easier to verify. M(g , g ,...,g ) = A ∈ M | λ (A)>λ + (A), 1 2 r d gl  gl 1 for any l = 1, 2,...,r − 1 (3) Assumption 2. Suppose F is a vector-valued spectral function, and f is the corresponding vector-valued symmetric function ++ is dense and open in M .Inparticular,thesetof such that F = f ◦ λ. f is a continuous function, and satisfies d   ζ positive-definite matrices with distinct eigenvalues, that is, f (x) ≤ K(1 + x ),forsomeζ>0. ++ M(1, 2,...,d), is dense and open in M . d In all the examples of Section 2.5, this assumption holds. By ( ) We conclude this section with some additional notations. Lemma 1 and Assumption 2, F cs is càdl àg, and the integral R¯ +/{ } t ( ) First, we introduce a symmetric open subset of 0 ,the 0 F cs ds is well defined. image of M(g1, g2,...,gr ) under λ(·): The above assumptions are sufficient to ensure the desired consistency of estimators we will propose. Additional assump- D(g , g ,...,g )  1 2 r  tions are required to establish their asymptotic distribution. + ¯ ¯ = x ∈ R /{0}|xg > xg +1, for any l = 1, 2,...,r − 1 . d l l Assumption 3. There exists some open and convex set (4) ¯ C, such that its closure C ⊂ M(g1, g2,...,gr ),where ≤ < < ···< ≤ ≤ ≤ ∈ We also introduce a subset of M(g1, g2,...,gr ),inwhichour 1 g1 g2 gr d,andthatforany0 s t, cs C ∩ M∗( , ,..., ) 3 D( , ,..., ) spot covariance matrix ct will take values. g1 g2 gr .Moreover, f is C on g1 g2 gr .  ∗ ++ ∗ Assumption 3,inparticular,cs ∈ M (g1, g2,...,gr ),guar- M (g1, g2,...,gr ) = A ∈ M | d antees that different groups of eigenvalues do not cross over λ (A) =···=λ (A)>λ + (A) =··· , 1 g1 g1 1  within [0 t]. This condition is mainly used to deliver the joint = λ ( )>...λ ( )> λ ( ) =···=λ ( ) . central limit theorem for spectral functions that depend on all g A g − A g − +1 A g A 2 r 1 r 1 r eigenvalues, although it may not be necessary for some spe- Finally, we introduce the following set, which becomes relevant cial cases, such as det(·) and Tr(·),whicharesmoothevery- C if we are only interested in a simple eigenvalue λg: where. The convexity condition on is in principle not diffi-   M( , ,..., ) ++ cult to satisfy, given that g1 g2 gr can be embedded M( ) = ∈ M | λ − ( )>λ( )>λ+ ( ) , g A d g 1 A g A g 1 A into some Euclidean space of real vectors. This condition is and D(g) = λ(M(g)). imposed to ensure that the domain of the spectral function can be restricted to certain neighborhood of {cs}0≤s≤t ,inwhichthe By convention, we ignore the first (resp. second) inequality in function is smooth and the mean-value theorem can be applied. the definition of M(g) when g = 1(resp.g = d). This assumption is not needed if ct is continuous. It is also worth mentioning that all eigenvalues of ct being 3. Estimators and Asymptotic Theory distinct is a special case, which is perhaps the most relevant sce- nario in practice, as is clear from Lemma 5.Eveninthisscenario, , ,..., 3.1. Assumptions Assumption 3 is required (with g1 g2 gr being chosen as , ,..., λ(·) 1 1 2 d), because the eigenvalue function is not every- We start with the standard assumption on the process X : where differentiable.

Assumption 1. The drift term bt is progressively measurable and Assumption 3 resembles the spacial localization assumption  locally bounded. The spot covariance matrix ct = (σσ )t is an inLi,Todorov,andTauchen(2017)andLiandXiu(2016), which Itô semimartingale. Moreover, for some γ ∈ [0, 1),thereisa is different from the growth conditions proposed by (τ ) ∞ sequence of stopping times n increasing to , and a deter- Jacod and Rosenbaum (2013). The growth conditions are not ¯ ¯ γ ministic function δn such that δn(x) ν(¯ dx)<∞ and that satisfied in our setting, when the difference between two groups Rd δ(ω,t, x) ∧ 1 ≤ δ¯ (x),forall(ω, t, x) with t ≤ τ (ω). of repeated eigenvalues approaches zero. n n If we are only interested in the spectral function that depends The constant γ servesasanupperboundforthegeneral- on one simple eigenvalue, for example, the largest and simple t λ ( ) ized Blumenthal–Getoor index (i.e., the activity) of price jumps. integrated eigenvalue, 0 1 cs ds,thenitisonlynecessaryto λ ( )>λ ( ) ≤ ≤ The spot covariance matrix is another Itô semimartingale, hence ensure that 1 cs 2 cs ,for0 s t, regardless of whether allowing for general forms of volatility-of-volatility and volatility the remaining eigenvalues are simple or not. We thereby intro- jumps. Although this condition admits many volatility models duce the following assumption for this scenario, which is much in finance, it does exclude an important class of long-memory weaker than Assumption 3. volatility models that are driven by fractional Brownian motion, Assumption 4. There exists some open and convex set C,such ¯ seeComteandRenault(1996, 1998) . The generalization in this that C ⊂ M(g),forsomeg = 1, 2,...,d,andthatforany0≤ direction seems to deserve its own research. 3 s ≤ t, cs ∈ C.Moreover, f is C on D(g). In view of the previous examples, our main theory is tailored t ( ) to the statistical inference on 0 F cs ds.ThankstoLemma 4, 3.2. Realized Spectral Functions  Using the notation on page  of Jacod and Protter (), Assumption  states We now turn to the construction of the estimators. To estimate that the process X satisfies Assumption (H-) and that ct satisfies Assumption (H- ). the integrated spectral function, we start with estimation of the JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 293 spot covariance matrix. Suppose we have equidistant observa- We then derive the asymptotic distribution of the bias- tions on X over the interval [0, t], separated by a time interval corrected estimator: n. We form nonoverlapping blocks of length knn. At each Theorem 2. Suppose Assumptions 1–3 hold. In addition, kn   −ς  γ −ς ikn n,weestimatecikn n by   ς ∈ ( ∨ 1 , 1 )  ∈ 1 , 1 ) n and un n for some 2 3 2 and [ 2−γ 2 . kn    As n → 0, we have  1 n n    cik  =  + X  + X 1   ,   n n ikn j ikn j n X≤u t k  ikn+ j n L− n n j=1 1  s √ V (n, X; F) − F(cs)ds −→ Wt , (10) (5) n 0  where u = α and nX = X  − X( − ) .Choicesofα n n l l n l 1 n W and  are standard in the literature (see, e.g., Aït-Sahalia and where is a continuous process defined on an extension of the F Jacod 2014) and are discussed below when implemented in original probability space, which conditionally on ,iscontinu- simulations. ous centered Gaussian martingale with its covariance given by   We then estimate eigenvalues of cikn n by solving for the (W W |F ) | − λI|= E p,t q,t roots of ciknn 0. Using the notation in Lemma 1,we  d have λ(c  ) = λ  . Almost surely, the eigenvalues stacked t ikn n ikn n = ∂ ( )∂ ( ) + . λ jkFp cs lmFq cs c jl,sckm,s c jm,sckl,s ds(11) in iknn are distinct (see, e.g., Okamoto 1973)sothatwehave λ >λ > ···>λ 0 j,k,l,m=1 1,iknn 2,iknn d,iknn .Ourproposedestimatorof the integrated spectral function is then given by2 Remark 1. As previously noted, in the case when the spectral /(  ) λ [t kn n ] function F only depends on a simple eigenvalue g,thesame ( , ; ) =  λ . result holds under the weaker Assumption 4 instead of 3. V n X F kn n f iknn (6) i=0 A feasible implementation of this distribution requires an We start by establishing consistency of this estimator: estimator of the asymptotic variance, which we construct as Theorem 1. Suppose Assumptions 1 and 2 hold. Also, either ζ ≤ follows: ζ>  ∈ ζ −1) , 1 ) 1or 1with [ 2ζ −γ 2 holds. Then the estimator (6)is  Proposition 2. The asymptotic variance of V (n, X; F) can be consistent. As kn →∞and knn → 0, estimated consistently by . . . t u c p /(  ) V (n, X; F) ⇒ F(cs)ds. (7) [t kn n ] d 0  ∂ ( )∂ ( ) kn n jkFp ciknn lmFq ciknn = , , , = Next, obtaining a central limit theorem for the estimator i 0 j k l m 1 is more involved, as there is a second-order asymptotic bias ×   +  c jl,iknn ckm,iknn c jm,iknn ckl,iknn associated with the estimator (6), a complication that is also t d encountered in other situations such as the quarticity estimator p −→ ∂ jkFp(cs)∂lmFq(cs) c jl,sckm,s + c jm,sckl,s ds. in Jacod and Rosenbaum (2013),theGMMestimatorinLiand 0 j,k,l,m=1 Xiu (2016), or the leverage effect estimator by Kalnina and Xiu (12) (2017). The bias here is characterized as follows:

Proposition 1. Suppose Assumptions 1–3 hold. In addition, kn −ς  ς ∈ ( γ , 1 )  ∈ 1−ς , 1 ) 3.3. Realized Eigenvalues n and un n for some 2 2 and [ 2−γ 2 .As  → n 0, we have We next specialize the previous theorem to obtain a central limit   t theorem for the realized eigenvalue estimators. With the struc- kn V (n, X; F) − F(cs)ds tureofeigenvaluesinmind,weuseaparticularvector-valued 0 λ spectral function F , tailor-made to deliver the asymptotic the- d t p ory we need for realized eigenvalues: −→ 1 ∂2 ( ) + . jk,lmF cs c jl,sckm,s c jm,sckl,s ds (8) 2 0 λ j,k,l,m=1 F (·) ⎛ ⎞ g1 g2 gr The characterization of the bias in (8)suggestsabias- ⎝ 1 1 1 ⎠ = λ j (·), λ j (·),..., λ j (·) . corrected estimator as follows: g1 g2 − g1 gr − gr−1 j=1 j=g1+1 j=gr−1+1 [t/(k  )]  n n (13) ( , ; ) =  ( ) − 1 V n X F kn n F ciknn (9) = 2kn i 0 Apparently, if a group contains only one λ j, then the correspond- λ d  ing entry of F is equal to this single eigenvalue; if within cer- 2 ∂ (  )  ,   ,  + ,   ,  . jk,lmF cikn n c jl ikn n ckm ikn n c jm ikn n ckl ikn n tain group, all eigenvalues are identical, then the corresponding λ j,k,l,m=1 entry of F yieldsthecommoneigenvalueofthegroup.

−ς    ς ∈ We prefer this estimator to the alternative one using overlapping windows Corollary 1. Suppose kn n and un for some γ −ς n because the overlapping implementation runs much slower. In a finite sample, ( ∨ 1 , 1 ) and  ∈ [ 1 , 1 ). both estimators have a decent performance. 2 3 2 2−γ 2 294 Y. AÏT-SAHALIA AND D. XIU

(i) Under Assumption 1, the estimator of integrated eigen- where λ and λ are the vectors of eigenvalues of the sample value vector given by and population covariance matrices and λ is simple. Note the /(  ) similarity between (19)and(16) in the special case where the [t kn n ] = ( , ; λ) =  λ( ) eigenvalues are simple (in which case gi i) and are constant V n X kn n ciknn (14) i=0 instead of being stochastic. The classical setting requires the strong assumption that the covariance matrix follows a Wishart is consistent. distribution. By contrast, our results are fully nonparametric, (ii) If Assumption 3 further holds, the estimator correspond- relying instead on high-frequency asymptotics. ing to the pth entry of Fλ given by (9)canbewritten explicitly as Remark 3. If there are eigenvalues of cs that are equal to 0 for ≤ ≤ /(  ) any 0 s t,thenourestimatorissuper-efficient.Thisisan  [t kn n ] gp ( , ; λ) = kn n interesting case as the rank of the covariance matrix is deter- V n X Fp gp − gp−1 mined by the number of nonzero eigenvalues, see, for example, i=0 = − +  h gp 1 1 the rank inference in Jacod, Lejay, and Talay (2008)andJacod × λ − 1 (λ I− )+ λ . and Podolskij (2013). h,iknn Tr h,iknn ciknn ciknn h,iknn kn ( , ; λ) = The joint central limit theorem for V n X F 3.4. Realized Principal Components (( , ; λ), . . . , ( , ; λ)) V n X F1 V n X Fr is given by   As we have remarked before, when eigenvalues are simple, the t √1 ( , ; λ) − λ( ) −→L−s Wλ, corresponding eigenvectors are uniquely determined. There- V n X F F cs ds t n 0 fore, we can estimate the instantaneous eigenvectors together (15) with eigenvalues, which eventually leads us to construct the cor- λ where W is a continuous process defined on an exten- responding principal component: sion of the original probability space, which condition- ally on F, is continuous centered Gaussian martingale Proposition 3. Suppose Assumptions 1 and 4 hold. In addition, γ with a diagonal covariance matrix given by g,s is a vector-valued function that corresponds to the eigenvec- λ −ς λ λ  tor of cs with respect to a simple root g,s.Supposekn n and (W (W ) |F ) γ −ς E t t  ς ∈ ( , 1 )  ∈ 1 , 1 ) ⎛ ⎞ un n for some 2 2 and [ 2−γ 2 .Thenwehave 2 t λ2 , ds as n → 0 ⎜ g1 0 g1 s ⎟ 2 t λ2 ⎜ − g ,sds ⎟ /(  ) − = ⎜ g2 g1 0 2 ⎟ , [t kn n ] 1 t ⎜ . ⎟  u.c.p  ⎝ . ⎠ γ (X( + )  − X  ) ⇒ γ , −dX . . g,(i−1)knn i 1 kn n ikn n g s s 0 2 t 2 i=1 λ , ds gr−gr−1 0 gr s (16) 3.5. Realized Eigenvectors λ( ) = (λ ,λ ,...,λ ) where F cs g1,s g2,s gr,s . (iii) Under Assumptions 1 and 4, our estimator with respect Sofarwehaveconstructedtheprincipalcomponentsandesti- mated integrated eigenvalues. It remains to figure out the load- to the gth simple eigenvalue λg(·) is given by ing of each entry of X on each principal component, that is, /(  ) [t kn n ] the eigenvector. As the eigenvector is stochastic, we estimate ( , ; λ ) =  V n X g kn n the integrated eigenvector. Apart from its sign, an eigenvector  i=0 is uniquely identified if its associated eigenvalue is simple. We × λ − 1 (λ − )+ λ , determine the sign hence identify the eigenvector, by requiring g,iknn Tr g,iknn ciknn ciknn g,iknn kn a priori certain entry of the eigenvector to be positive, for exam- (17) ple, the first nonzero entry.

which satisfies Corollary 2. Suppose Assumptions 1 and 4 hold. In addition, γg,s   is a vector-valued function that corresponds to the eigenvector t L− λ 1  λ s g √ V ( , X; λ ) − F (c )ds −→ W , λ , ∈ ,  n g s t of cs with respect to a simple root g s,foreachs [0 t]. Then n 0 we have as  → 0 (18) n λ W g  /(  )  where is a continuous process defined on an exten- [t kn n ] 1 sion of the original probability space, which condition- √ k  γ ,   n n g ikn n F n = ally on , is a continuous centered Gaussian martingale i 0   t 2   with its variance λ , ds. λ λ t L− 0 g s 1 g,iknn p,iknn s γ + γ ,  − γ , −→ W ,   g ikn n g sds t 2k (λ ,  − λ ,  )2 Remark 2. Corollary 1 is the analog in our context to the classical n p =g g ikn n p ikn n 0 results on asymptotic theory for PCA by Anderson (1963), who Wγ showed that where is a continuous process defined on an extension of the √ original probability space, which conditionally on F,isacontin- (λ − λ) →d , λ2,λ2,...,λ2 , n N 0 2diag 1 2 d (19) uous centered Gaussian martingale with its covariance matrix JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 295 given by which is orthogonal to F.Thismodelisaspecialcaseofthe t general semimartingale model (2). (Wγ (Wγ )|F ) = λ (λ I − )+ (λ I − )+ . By construction, the continuous part of the quadratic varia- E t t g,s g,s cs cs g,s cs ds 0 tion has the following local structure:  [dX, dX]c = β [dF, dF]cβ + [dZ, dZ]c. 3.6. “PCA” on the Integrated Covariance Matrix t t t t t One may wonder why we chose to estimate the integrated eigen- When the idiosyncratic components have smaller magnitude, values of the spot covariance matrix rather than the eigenval- the dominating eigenvalues of X areclosetothenonzero β , cβ ues of the integrated covariance matrix. Because eigenvalues eigenvalues of t [dF dF]t t ,byWeyl’sinequality.Asaresult, are complicated functionals of the covariance matrix, the two we should expect three large eigenvalues from the scree plot. quantities are not much related, and it turns out that the latter Our goal here is to conduct nonparametric PCA rather than procedure is less informative than the former. For instance, the to estimate a parametric factor model, for which we need to rank of the instantaneous covariance matrix is determined by resort to a large panel of X , whose dimension increases with the the number of nonzero instantaneous eigenvalues. The eigen- samplesize,see,forexample,BaiandNg(2002),andso(21) structure of the integrated covariance matrix is rather opaque is only employed as the data-generating process on which PCA due to the aggregation of instantaneous covariances. Neverthe- is employed without any knowledge of the underlying factor less,itispossibletoconstructa“PCA”procedureontheinte- structure. grated covariance matrix, whose asymptotic results are imme- Specifically, we simulate diately available by applying the “delta” method, using Lemma 2 r and Theorem 13.2.4 in Jacod and Protter (2012). dXi,t = βij,t dFj,t + dZi,t , dFj,t The corresponding principal component is then defined as j=1 γ ( − ) γ t g Xt X0 ,where g is the eigenvector of csds.However,its F Z 0 = μ jdt + σ j,t dWj,t + dJ , , dZi,t = γt dBi,t + dJ , , corresponding eigenvalue and the principal component do not j t i t satisfy the fundamental relationship between the eigenvalue and where i = 1, 2,...,d and j = 1, 2,...,r. In the simulations, thevarianceoftheprincipalcomponent. one of the Fs plays the role of the market factor, so that its asso-   β t ciated s are positive. The correlation matrix of dW is denoted   c λ = γ ( − ), γ ( − ) , as ρF . We allow for time-varying σ , , γ ,andβ , ,whichevolve g csds g Xt X0 g Xt X0 (20) j t t ij t 0 according to the following system of equations: which is a desired property of any sensible PCA procedure. This 2 2  σ 2 dσ , = κ j (θ j − σ , )dt + η jσ j,t dWj,t + dJ , , fact alone makes this second procedure much less useful than j t j t j t γ 2 = κ(θ − γ 2 )dt + ηγ dB¯ , the one we proposed above, based on integrated eigenvalues of t t t t β = the spot covariance. Another feature that distinguishes this sec- d ij,t ond“PCA”procedurewithboththeclassicaloneandourreal- κ˜ (θ˜ − β ) + ξ˜ β  j ij ij,t dt j ij,t dBij,t if the jth factor is the “market,” , ized PCA is that the asymptotic covariance matrix of eigenvalues κ˜ (θ˜ − β ) + ξ˜  t j ij ij,t dt jdBij,t otherwise. of 0 csds is no longer diagonal.  F An analogy of the relationship between the two PCA pro- where the correlation between dWj,· and dWj,· is ρ j, {J }1≤ j≤r t σ 4 j cedures is that between integrated quarticity t 0 s ds and the { Z} ≤ ≤ t and Ji 1 i d are driven by two Poisson processes with arrival ( σ 2 )2 F Z squaredintegratedvolatility 0 s ds . If the covariance matrix rates λ and λ , respectively. Their jump sizes follow double F F Z is constant or at least its eigenvectors are, the two procedures exponential distributions with means denoted by μ+, μ−, μ+, are equivalent but not in general. One can in principle develop μZ { σ 2 } F and −,respectively. Jj 1≤ j≤r cojumps with J ,andtheirjump procedures to test for a constant covariance matrix or its eigen- sizes follow exponential distributions with the mean equal to structure, in the spirit of Reiß, Todorov, and Tauchen (2015)and μσ 2 .AllthemodelparametersaregiveninTable 1.Theyare Kao, Trapani, and Urga (2018), to justify the use of the second and simpler procedure. We leave this endeavor to future work. Table . Parameters in Monte Carlo simulations. ˜ ˜ κ j θ j η j ρ j μ j κ˜ j θi, j ξ j 4. Simulation Evidence j = 1  . . −. .  U[0.25, 1.75] . j = 2  . . −. .  N (0, 0.52 ) . We now investigate the small sample performance of the esti- j = 3  . . −. .  N (0, 0.52 ) . mators in conditions that approximate the empirical setting of λF μF λZ μZ μσ 2 ρF ρF ρF PCA for large stock portfolios. In order to generate data with a √+/− √+/− √ 12 13 23 /t  /t   common factor structure, we simulate a factor model in which 1 4 2 6 . . . a vector of log-stock prices X follows the continuous-time κθη dynamics  . . NOTE: In this table, we report the parameter values used in the simulations. The = β + , ˜ dXt t dFt dZt (21) constant matrix θi, j is generated randomly from the described distribution, and is X β fixed throughout all replications. The dimension of t is , whereas the dimen- where F is a collection of unknown factors, is the matrix of sion of Ft is .  is the sampling frequency, andt is the length of the time window. factor loadings, and Z is a vector of idiosyncratic components, The number of Monte Carlo replications is . 296 Y. AÏT-SAHALIA AND D. XIU

Table . Simulation results: First eigenvalue estimation. Table . Simulation results: Second eigenvalue estimation.

 week,  sec  week,  min  week,  sec  week,  min # Stocks True Bias Stdev True Bias Stdev # Stocks True Bias Stdev True Bias Stdev

.− . . . . .  . . . . . .  . . . . − . .  . − . . . . .  . − . . . − . .  . − . . . . .  . . . . − . .  . − . . . − . .  . − . . . − . .  . − . . . − . .  . . . . − . .  . − . . . − . .  . − . . . − . .  . − . . . − . .  week,  sec  month,  min  week,  sec  month,  min

 . . . . . . # Stocks True Bias Stdev True Bias Stdev  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . . . . − . .  . − . . . − . . NOTE: In this table, we report the summary statistics of  Monte Carlo simula-  . − . . . − . . tions for estimating the first integrated eigenvalue. Column “True” corresponds to the average of the true integrated eigenvalue; Column “Bias” corresponds to NOTE: In this table, we report the summary statistics of  Monte Carlo simu- the mean of the estimation error; Column “Stdev”is the standard deviation of the lations for estimating the second integrated eigenvalue. Column “True” corre- estimation error. sponds to the average of the true integrated eigenvalue; Column “Bias” corre- sponds to the mean of the estimation error; Column “Stdev”is the standard devi- ation of the estimation error. chosen to be realistic given the empirical characteristics of the cross-section of stock returns and their volatilities. we provide estimates for the first eigenvectors. Similar to the Anticipating our empirical application to the S&P 100 Index estimation for eigenvalues, the estimates are very accurate, even = constituents, we simulate intraday returns of d 100 stocks at with 100 stocks. various frequencies and with horizons T spanning 1 week to 1 To further verify the accuracy of the asymptotic distribu- = , month. When r 3 the model implies three distinct eigenval- tioninsmallsamplesasthesamplesizeincreases,weprovide uesreflectingthelocalfactorstructureofthesimulateddata.The in Figure 1 histograms of the standard estimates of integrated remaining population eigenvalues are identical, due to idiosyn- eigenvalues using 30 stocks with 5-sec returns. Finally, we exam- cratic variations. Throughout, we fix kn to be the closest divisors ine the finite sample accuracy of the integrated simple eigenval- / θ−1/2 ( ) θ = . of [t n]to n log d ,with 0 5andd is the dimen- ues, in the more challenging scenario with 100 stocks sampled sion of X. Our choice of log(d) is motivated from the literature every minute, which matches the setup of the empirical analy- on high-dimensional covariance matrix estimation, although sis we will conduct below. To verify that the detection of three our asymptotic design does not take into account an increas- eigenvalues is not an artifact, we vary the number of common ing dimensionality. Other choices of θ, for example, 0.05−0.5, or functions of d, e.g., d log d, deliver the same results. To Table . Simulation results: Third eigenvalue estimation truncate off jumps from spot covariance estimates, we adopt =  week,  sec  week,  min theusualprocedureintheliterature,thatis,choosing ui,n t . . t ( , / )0 50 47 ≤ ≤ , # Stocks True Bias Stdev True Bias Stdev 3 0 cii sds t n ,for1 i d,where 0 cii sds canbeesti- mated by, for instance, bipower variations.  . . . . . . We then apply the realized PCA procedure. We first examine  . . . . . . the curse of dimensionality – how increasing number of stocks  . . . . . .  . . . . . . affects the estimation – and how the sampling frequency affects  . . . . . . the estimation. In light of Corollary 1, we estimate three simple  . . . . − . . integrated eigenvalues as well as the average of the remaining  . . . . − . . t λ = , , identical eigenvalues, denoted as 0 isds, i 1 2 3, and 4. We  week,  sec  month,  min report the mean and standard errors of the estimates as well as the root-mean-square errors of the standardized estimates # Stocks True Bias Stdev True Bias Stdev with d = 5, 10, 15, 20, 30, 50, and 100 stocks, respectively, using  . . . . . .  . . . . − . . returns sampled every n = 5 sec, 1 min, and 5 min over one  . . . . . . week and one month horizons. The results, as shown from  . . . . . . Tables 2–5, suggest that, as expected, the estimation is more  . . . . . . difficult as the dimensionality increases, but the large amount  . . . . − . . − − of high-frequency data and in-fill asymptotic techniques deliver  . . . . . . very satisfactory finite sample approximations. The eigenval- NOTE: In this table, we report the summary statistics of  Monte Carlo simula- ues are accurately recovered. The repeated eigenvalues are tions for estimating the third integrated eigenvalue. Column “True”corresponds to the average of the true integrated eigenvalue; Column “Bias” corresponds to estimated with smaller biases and standard errors, due to the the mean of the estimation error; Column “Stdev”is the standard deviation of the extra averaging taken at the estimation stage. In Tables 6–7, estimation error. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 297

Table . Simulation results: Repeated eigenvalue estimation, fourth and beyond. Table . Simulation results: Eigenvector estimation.

 week,  sec  week,  min  week,  sec  week,  min # Stocks True Bias Stdev True Bias Stdev # Stocks True Bias Stdev True Bias Stdev

 . . . . . . . −4 × 10−6 . . . .  . . . . . . . 3 × 10−7 . . − . .  . . . . . . . − . . . − . .  . . . . . . . 1 × 10−6 . . . .  . . . . . . . − . . . . .  . . . . . . . −3 × 10−6 . . − . .  . . . . . . . . . . . .  week,  sec  week,  min . . . . . . . . . . . . # Stocks True Bias Stdev True Bias Stdev . − . . . . . . . . . . .  . . . . . . . . . . − . .  . . . . . . . − . . . . .  . . . . . . . − . . . − . .  . . . . . . × −7 −  . . . . . . . 6 10 . . . .  . − . . . . .  . . . . . . −  . . . . . . . . . . . . . − . . . . . − − NOTE: In this table, we report the summary statistics of  Monte Carlo simula- . . . . . . tions for estimating the repeated integrated eigenvalues. Column “True” corre- . . . . . . − − sponds to the average of the true integrated eigenvalue; Column “Bias” corre- . . . . . . − − sponds to the mean of the estimation error; Column “Stdev”is the standard devi- . . . . . . − ation of the estimation error. . . . . . . . − . . . − . . . . . . − . . factors in the data-generating process from r = 2to4,byremov- . − . . . . . ing the third factor or adding another factor that shares the same . . . . . . . − . . . − . . parameters as the third factor. The histograms are provided in . . . . . . Figure 2. The results collectively show that the finite sample per- . − . . . . . formance of the method is quite good even for a 100-stock port- NOTE: In this table, we report the summary statistics of  Monte Carlo simula- folio with 1-min returns and one-week horizon. tions for estimating the integrated eigenvectors associated with the first eigen- values. The column “True”corresponds to the average of the true integrated vec- tor; “Bias”corresponds to the mean of the estimation error; “Stdev”is the standard 5. High-Frequency Principal Components in the S&P deviation of the estimation error. The setup for these simulations is identical to that of Tables –. To save space, we do not report the eigenvectors in dimensions 100 Stocks larger than . Given the encouraging Monte Carlo results, we now con- 3 duct PCA on intraday returns of S&P 100 Index (OEX) con- Stock Exchange (NYSE). Due to entry and exit from the index, stituents. We collect stock prices over the 2003–2012 period there are in total 154 tickers over this 10-year period. We con- fromtheTradeandQuote(TAQ)databaseoftheNewYork duct PCA on a weekly basis. One of the key advantages of the large amount of high-frequency data is that we are effectively Table . Simulation results: Eigenvector estimation. able to create a 10-year long time series of eigenvalues and prin- cipalcomponentsattheweeklyfrequencybycollatingtheresults  week,  sec  week,  min obtained over each week in the sample. # Stocks True Bias Stdev True Bias Stdev After removing those weeks during which the index con- . − . . . . . stituents are switching, we are left with 482 weeks. To clean the . . . . . . data, for each ticker on each trading day, we only keep the intra-  . . . . − . . day prices from the single exchange which has the largest num- . . . . . . . . . . . . ber of transaction records. We provide in Figure 3 the quan- . . . . . . tiles of the number of transactions between 9:35 a.m. EST and . . . . . . 3:55 p.m. EST, across 100 stocks each day. These stocks have . . . . . . . . . . . . excellent liquidity over the sampling period. We thereby employ  . . . . . . 1-min subsamples for the most liquid 90 stocks in the index . . . . . . (for simplicity, we still refer to this 90-stock subsample as the . . . . . . . . . . . . S&P 100) in order to address any potential concerns regarding 4 . . . . . . microstructure noise and asynchronous trading. These stocks . . . . . .

NOTE: In this table, we report the summary statistics of  Monte Carlo simula-  While the constituents of the OEX Index change over time, we keep track of the tions for estimating the integrated eigenvector associated with the first eigen- changes to ensure that our choice of stocks is always in line with the index con- value. Column “True” corresponds to the average of the true integrated vector; stituents. Column “Bias” corresponds to the mean of the estimation error; Column “Stdev”  Estimators that are robust to both noise and asynchronicity are available for inte- is the standard deviation of the estimation error. The setup for these simulations grated covariance estimation in Aït-Sahalia, Fan, and Xiu (), Christensen, Kin- is identical to that of Tables –. nebrock, and Podolskij (), Barndorff-Nielsen et al. (), and Shephard and 298 Y. AÏT-SAHALIA AND D. XIU

Figure . Finite sample distribution of the standardized statistics. Note: In this figure, we report the histograms of the  simulation results for estimating the first four integrated eigenvalues using -sec returns for  stocks over one week. The purpose of this figure is to validate the asymptotic theory, hence the use of a short -sec sampling interval. The smallest  eigenvalues are identical so that the fourth eigenvalue is repeated. The solid lines plot the standard normal density; the dashed histograms report the distribution of the estimates before bias correction; the solid histograms report the distribution of the estimates after bias correction is applied and is to be compared to the asymptotic standard normal. Because the fourth eigenvalue is small, the dashed histogram on the fourth subplot is out of the x-axis range, to the right.

Figure . Finite sample distribution of the standardized statistics. Note: In this figure, we report the histograms of the integrated simple eigenvalues using weekly -min returns of  stocks, a setting that matches thatofourempiricalanal- ysis below. Columns , , and  report the results with r = 2, , and  common factors in the data generating process, respectively. The number of Monte Carlo replications is . JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 299

Figure . The liquidity of S&P  index components. Note: In this figure, we provide quartiles of the average time between successive Figure . The scree graph. transactions, or sampling frequencies, for the  stocks S&P  index constituents Note: In this figure, we report the time series average over the entire sample of the between  and . We exclude the first and the last  min from the . regular estimated eigenvalues against their order, or “scree graph.”These integrated eigen- trading hours in the calculation, during which trading intensities are unusually high. values are estimated assuming that all of them are simple. The y-axis reports the corresponding average sampling frequencies in seconds, computed from the high-frequency transactions in the sample. From these trans- actions, we construct a synchronous -min sample using the latest tick recorded within that minute. The  least liquid stocks in the index are included in this fig- the appropriate number of components in empirical applica- ure but excluded from the empirical analysis that follows. tions. The graph reveals that there are a handful of eigenvalue estimates, three, which are separated from the rest. The fact that three factors on average explain the cross-sectional variation of stock returns is surprisingly consistent with the well-established low-frequency Fama–French common factor analysis, despite the large differences in methods, time periods, sample frequen- cies, and length of observation. We also plot the estimates of the percentage variation explained by the first three integrated eigenvalues in Figure 6, along with the average variation explained by the remaining eigenvalues.Therearesubstantialtimevariationofthefirstthree eigenvalues, all of which are at peak around the recent finan- cial crisis, indicating an increased level of comovement, which in this context is the definition of systemic risk. The idiosyn- cratic factors become relatively less important and even more dominated by the common factors during the crisis. Figure . Time series of the cumulative returns of S&P  index components. The first eigenvalue accounts for on average 30%–40% of the Note: In this figure, we plot the time series of cumulative daily open-to-close returns total variations of the 90 constituents, capturing the extent of of  S&P  index components from  to . The thick black solid line plots the cumulative daily open-to-close S&P  index returns. All overnight returns are excluded. Companies that exited the index during the sample period, including bankruptcies, are represented by a time series truncated at the time of delisting. trade multiple times per sampling interval and we compute returnsusingthelatestpricesrecordedwithineachsampling interval. Overnight returns are removed so that there is no con- cern of price changes due to dividend distributions or stock splits. The 158 time series of cumulative returns are plotted in Figure 4.

5.1. Scree Plot We report the time series average of the eigenvalue estimates against their order in Figure 5,aplotknowninclassicalPCA analysis as the “scree plot” (see, e.g., Jolliffe 2002). This plot clas- Figure . Time series of realized eigenvalues. sically constitutes the main graphical aid employed to determine Note: In this figure, we plot the time series of the three largest realized eigenval- ues, representing the percentage variations explained by the corresponding princi- Xiu (). Extending the current method to a noisy and asynchronous setting is pal components, using weekly -min returns of  most liquid S&P  index con- theoretically interesting but not empirically necessary for the present paper, it is stituents from  to . For comparison, we also plot the average of the remain- left for future work. ing  eigenvalues. 300 Y. AÏT-SAHALIA AND D. XIU

Figure . Time series of cumulative realized principal components. Figure . Biplot of the integrated eigenvector: Pre-crisis week. Note: In this figure, we compare the cumulative returns of the first three principal Note: This biplot shows the integrated eigenvector estimated during the week components, using weekly -min returns of the S&P  index constituents from March –, . Each dot and the vector joining it and the origin represents a firm.  to . The x and y coordinates of a firm denote its average loading on the first and second principal components, respectively. The dots and lines associated with firms in the financial sector are colored in red and dashed, while nonfinancial firms are colored themarketvariationinthesample.Thesecondandthirdcom- in blue and drawn as solid lines. The figures shows little difference between the two ponents together account for an additional 15%–20% of the total groups during that week. variation. Therefore, there exists significant amount of remain- ing idiosyncratic variation, beyond what can be explained by a the variation of the variable. Moreover, vectors associated with three-common-factor model. The average variation explained similar variables tend to point toward the same direction, hence by the remaining 87 components are around 0.4%–0.6%, which abiplotcanbeusedforclassificationoftheassets.Whileitis corresponds to the idiosyncratic contributions relative to the possible to compute biplots for the spot eigenvectors, the biplots first three principal components. for the integrated eigenvectors are more stable. Next, we compare the cumulative returns of the first three We report two representative snapshots of the biplots of the cumulative principal components. The time series plots are first two integrated eigenvectors in Figures 8 (March 7–11, 2005, shown in Figure 7 . The empirical correlations among the three preceding the crisis) and 9 (March 10–14, 2008, during the crisis, components are −0.020, 0.005, and −0.074, respectively, which although not at its most extreme), in order to interpret the sec- agrees with the design that the components are orthogonal. The ond principal component. We identify differently in the figures first principal component accounts for most of the common the vectors that correspond to financial stocks (identified using variation, and it shares the time series features of the overall mar- the Global Industrial Classification Standard (GICS) codes). The ket return. This is further reinforced by the fact that the loadings Federal Reserve made several important announcement during of all the stocks on the first principal component, although time- varying, remain positive throughout the sample period, which is not the case for the additional principal components. It is worth pointing out however that these principal components are not constrained to be portfolio returns, as their weights at each point in time are nowhere constrained to add up to one. This means that the first principal component cannot be taken directly to be the market portfolio, or more generally identified with addi- tional Fama–French or additional mimicking portfolios.

5.2. Biplots Although PCA is by design not suited to identifying the under- lying economic factors corresponding to the principal compo- nents, the time series evidence in Figure 7 suggests that the first principal component captures the features of the overall market Figure . Biplot of the integrated eigenvector: Crisis week. return, as already noted. Can we extract more information about Note: This biplot shows the integrated eigenvector estimated during the week the economic nature of the principal components? For this pur- March –, . Each dot and the vector joining it and the origin represents a pose, we can use our results to produce biplots (see Gabriel 1971 firm. The x and y coordinates of a firm denote its average loading on the first and second principal components, respectively. The dots and lines associated with firms ) and see which assets line up together in the basis of the prin- in the financial sector are colored in red and dashed, with the following singled out: cipal components. A biplot gives the relative position of the d American International Group (AIG), Bank of America (BOA), Citigroup (C), Goldman variables on a plot where two principal components are the axes. Sachs (GS), J.P.Morgan Chase (JPM), and Lehman Brothers (LEH). Nonfinancial firms are colored in blue and drawn as solid lines. The figure shows a sharp distinction The length of a vector from the origin pointing toward the posi- between the two groups’ dependence on the second principal component during tionofavariableonthebiplottendstoreflectthemagnitudeof that week. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 301

6. Conclusions This article develops the tools necessary to implement PCA at high frequency, constructing and estimating realized eigenval- ues, eigenvectors, and principal components. This development complements the classical PCA theory in a number of ways. Compared to its low-frequency counterpart, PCA becomes fea- sible over short windows of observation (of the order of one week), relatively large dimensions (90 in our application) and further are free from the need to impose strong parametric assumptions on the distribution of the data, applying instead to a broad class of semimartingales. The estimators perform well in simulations and reveal that the joint dynamics of the S&P 100 stocks at high frequency are well explained by a three-factor model, a result that is broadly consistent with the Fama–French factor model at low frequency, surprisingly so given the large Figure . Scree plot of the integrated eigenvalues: Pre-crisis week. Note: This figure contains a scree plot of the integrated eigenvalues estimated dur- differences in time scale, sampling, and returns horizon. ing the week March –, . The blue solid lines provide % confidence interval This article represents a necessary first step to bring PCA for the first three eigenvalues computed based on the results of Corollary . tools to a high-frequency setting. Although not empirically rel- evant in the context of the analysis above of a portfolio of highly liquid stocks at the 1-min frequency, the next steps in the devel- the week of March 10–14, 2008 to address heightened liquidity opment of the theory will likely include the incorporation of pressures of the financial system, including the creation of the noise-robust and asynchronicity-robust covariance estimation Term Securities Lending Facility, and the approval of a financ- methods. We hope to pursue these extensions in future work. ing arrangement between J.P. Morgan Chase and Bear Stearns. We find that the financial sector is clearly separated from the rest of the stocks during March 10–14, 2008, in sharp con- Acknowledgments trast with the biplot for the week of March 7–11, 2005, during which no such separation occurs. This suggests that PCA based The authors thank the Associate Editor and two referees for helpful comments and suggestions. The authors also thank Hristo Sendov for on low-frequency data is likely to overlook these patterns. Inter- valuable discussions on eigenvalues and spectral functions. In addition, the estingly, we can also see from the March 2008 biplot that the authors are benefited from comments by Torben Andersen, Tim Bollerslev, Lehman Brothers (LEH) stands out, having the largest loadings Oleg Bondarenko, Marine Carrasco, Gary Chamberlain, Kirill Evdokimov, on both components. The corresponding scree plots for these Jianqing Fan, Christian Hansen, Jerry Hausman, Jean Jacod, Ilze Kalnina, two weeks in Figures 10 and 11 both suggest the presence of Jia Li, Yingying Li, Oliver Linton, Nour Meddahi, Per Mykland, Ulrich Müller, Andrew Patton, Eric Renault, Jeffrey Russell, Neil Shephard, George at least three factors, but the eigenvalues in March 10–14, 2008 Tauchen, Viktor Todorov, Ruey Tsay, and Xinghua Zheng, as well as seminar are much larger in magnitude, consistently with a higher sys- and conference participants at Brown University, CEMFI, Duke University, tematic component to asset returns. For each week, the figures Harvard University, MIT, Monash University, Northwestern University, report 95% confidence intervals for the first three eigenvalues Peking University, Princeton University, Singapore Management Univer- computedbasedonthedistributiongiveninCorollary 1. sity, Stanford University, University of Amsterdam, University of Chicago, University of Illinois at Chicago, University of Tokyo, the 2015 North American Winter Meeting of the Econometric Society, the CEME Young Econometricians Workshop at Cornell, the NBER-NSF Time Series Confer- ence in Vienna, the 10th International Symposium on Econometric Theory and Applications, the 7th Annual SoFiE Conference, the 2015 Financial Econometrics Conference in Toulouse and the 6th French Econometrics Conference.

References

Aït-Sahalia, Y., Fan, J., and Xiu, D. (2010), “High-Frequency Covariance Estimates with Noisy and Asynchronous Data,” Journal of the American Statistical Association, 105, 1504–1517. [297] Aït-Sahalia, Y., and Jacod, J. (2014), High Frequency Financial Econometrics, Princeton, NJ: Princeton University Press. [290,293] Aït-Sahalia, Y., and Xiu, D. (2017), “Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data,” Journal of Econometrics, 201, 384–399. [289] Amengual, D., and Watson, M. W. (2007), “Consistent Estimation of the Number of Dynamic Factors in a Large N and T Panel,” Journal of Busi- Figure . Scree plot of the integrated eigenvalues: Crisis week. ness and Economic Statistics, 25, 91–96. [289] Note: This figure contains a scree plot of the integrated eigenvalues estimated dur- Amini, A. A., and Wainwright, M. J. (2009), “High-Dimensional Analysis ing the week March –, . The blue solid lines provide % confidence interval of Semidefinite Relaxations for Sparse Principal Components,” Annals for the first three eigenvalues computed based on the results of Corollary . of Statistics, 37, 2877–2921. [289] 302 Y. AÏT-SAHALIA AND D. XIU

Anderson,T.W.(1958), An Introduction to Multivariate Statistical Analysis, Forni, M., and Lippi, M. (2001), “The Generalized Dynamic Factor Model: New York: Wiley. [287,290] Representation Theory,” Econometric Theory, 17, 1113–1141. [289] ——— (1963), “Asymptotic Theory for Principal Component Analysis,” Friedland, S. (1981),“ConvexSpectralFunctions,”Linear and Multilinear Annals of Mathematical Statistics, 34, 122–148. [287,294] Algebra, 9, 299–316. [291] Bai, J. (2003), “Inferential Theory for Factor Models of Large Dimensions,” Gabriel, K. (1971), “The Biplot Graphic Display of Matrices with Applica- Econometrica, 71, 135–171. [289] tion to Principal Component Analysis,” Biometrika, 58, 453–467. [300] Bai,J.,andNg,S.(2002), “Determining the Number of Factors in Approxi- Hastie,T.,Tibshirani,R.,andFriedman,J.(2009), The Elements of Statisti- mate Factor Models,” Econometrica, 70, 191–221. [289,295] cal Learning: Data Mining, Inference, and Prediction (2nd ed.), Springer Bai, Z. (1999), “Methodologies in Spectral Analysis of Large Dimensional Series in Statistics, New York: Springer-Verlag. [287] Random Matrices: A Review,” Statistica Sinica, 9, 611–677. [288] Heinrich,C.,andPodolskij,M.(2014), “On Spectral Distribution of High Bai, Z., and Silverstein, J. W. (2006), Spectral Analysis of Large Dimensional Dimensional Covariation Matrices,” CREATES Research Paper 2014- Random Matrices.NewYork:Springer.[288] 54, University of Aarhus. [288] Baker, M., and Wurgler, J. (2006), “Investor Sentiment and the Cross- Hotelling, H. (1933),“AnalysisofaComplexofStatisticalVariables Section of Stock Returns,” The Journal of Finance, 61, 1645–1680. [ 287] into Principal Components,” Journal of Educational Psychology, 24, Baker,S.R.,Bloom,N.,andDavis,S.J.(2016), “Measuring Economic Policy 417–441, 498–520. [287] Uncertainty,” The Quarterly Journal of Economics, 131, 1593–1636. [287] Jackson, J. E. (2003), A User’s Guide to Principal Components,Wiley.[287] Barndorff-Nielsen, O. E., Hansen, P.R., Lunde, A., and Shephard, N.(2011), Jacod,J.,Lejay,A.,andTalay,D.(2008), “Estimation of the Brownian “Multivariate Realised Kernels: Consistent Positive Semi-Definite Esti- Dimension of a Continuous Itô Process,” Bernoulli, 14, 469–498. [288,294] mators of the Covariation of Equity Prices with Noise and Non- Jacod, J., and Podolskij, M. (2013), “A Test for the Rank of the Volatility Synchronous Trading,” Journal of Econometrics, 162, 149–169. [297] Process: The Random Perturbation Approach,” Annals of Statistics, 41, Black, F. (1976), “Studies of Stock Price Volatility Changes,”in Proceedings of 2391–2427. [288,] 294 the 1976 Meetings of the American Statistical Association, pp. 171–181. Jacod, J., and Protter, P. (2012), Discretization of Processes,NewYork: [290] Springer-Verlag. [295] Brillinger, D. R. (2001), TimeSeries:DataAnalysisandTheory,Classicsin Jacod, J., and Rosenbaum, M. (2013), “Quarticity and Other Functionals Applied (Book 36), Philadelphia, PA: Society for Indus- of Volatility: Efficient Estimation,” Annals of Statistics, 41, 1462–1484. trial and Applied Mathematics. [287] [288,292,293] Chamberlain, G., and Rothschild, M. (1983), “Arbitrage, Factor Structure, Johnstone, I. M. (2001), “On the Distribution of the Largest Eigen- and Mean-Variance Analysis on Large Asset Markets,” Econometrica, value in Principal Components Analysis,” Annals of Statistics, 29, 51, 1281–1304. [289] 295–327. [288] Christensen, K., Kinnebrock, S., and Podolskij, M. (2010), “Pre-Averaging Johnstone,I.M.,andLu,A.Y.(2009), “On Consistency and Sparsity for Estimators of the Ex-Post Covariance Matrix in Noisy Diffusion Principal Components Analysis in High Dimensions,” Journal of the Models with Non-Synchronous Data,” Journal of Econometrics, 159, American Statistical Association, 104, 682–693. [288] 116–133. [297] Jolliffe, I. T. (2002), Principal Component Analysis,Springer-Verlag.[287,299] Comte, F., and Renault, E. (1996), “Long Memory Continuous Time Mod- Jolliffe, I. T., Trendafilov, N. T., and Uddin, M.(2003), “A Modified Principal els,” Journal of Econometrics, 73, 101–149. [292] Component Technique Based on the LASSO,” Journal of Computational Comte, F., and Renault, E. (1998), “Long Memory in Continuous-Time and Graphical Statistics, 12, 531–547. [288] Stochastic Volatility Models,” Mathematical Finance, 8, 291–323. [292] Kalnina, I., and Xiu, D. (2017), “Nonparametric Estimation of the Lever- Connor, G., and Korajczyk, R. (1993), “A Test for the Number of Factors in age Effect: A Trade-Off Between Robustness and Efficiency,” Journal of an Approximate Factor Model,” The Journal of Finance, 48, 1263–1291. American Statistical Association, 112, 384–396. [288,293] [289] Kao, C., Trapani, L., and Urga, G. (2018), “Testing for Instability in Covari- d’Aspremont, A., Ghaoui, L. E., Jordan, M. I., and Lanckriet, G. R. G. (2007), ance Structures,” Bernoulli, 24, 740–771. [295] “A Direct Formulation for Sparse PCA Using Semidefinite Program- Kapetanios,G.(2010), “A Testing Procedure for Determining the Number ming,” SIAM Review, 49, 434–448. [289] of Factors in Approximate Factor Models,” Journal of Business and Eco- Delbaen, F., and Schachermayer, W. (1994), “A General Version of the nomic Statistics, 28, 397–409. [289] Fundamental Theorem of Asset Pricing,” Mathematische Annalen, 300, Li, J., Todorov, V., and Tauchen, G. (2017), “Adaptive Estimation of 463–520. [290] Continuous-Time Regression Models using High-Frequency Data,” Eaton, M. L., and Tyler, D. E. (1991), “On Wielandt’s Inequality and Journal of Econometrics, 200, 36–47. [292] Its Application to the Asymptotic Distribution of the Eigenval- ——– ( 2016), “Inference Theory on Volatility Functional Dependencies,” ues of A Random Symmetric Matrix,” The Annals of Statistics, 19, Journal of Econometrics, 193, 17–34. [288] 260–271. [287] Li,J.,andXiu,D.(2016), “Generalized Method of Integrated Egloff,D.,Leippold,M.,andWu,L.(2010), “The Term Structure of Vari- Moments for High-Frequency Data,” Econometrica, 84, 1613–1633. ance Swap Rates and Optimal Variance Swap Investments,” Journal of [288,,] 292 293 Financial and Quantitative Analysis, 45, 1279–1310. [287] Litterman,R.,andScheinkman,J.(1991), “Common Factors Affecting Fama, E. F., and French, K. R. (1993), “Common Risk Factors in the Bond Returns,” Journal of Fixed Income, 1, 54–61. [287] Returns on Stocks and Bonds,” Journal of Financial Economics, 33, Magnus, J. R., and Neudecker, H. (1999), Matrix Differential Calculus With 3–56. [289] Applications in Statistics and Economics, Hoboken, NJ: Wiley. [289,290] Fan,J.,Furger,A.,andXiu,D.(2016), “Incorporating Global Industrial Mykland, P. A., and Zhang, L. (2009), “Inference for Continuous Classification Standard into Portfolio Allocation: A Simple Factor- Semimartingales Observed at High Frequency,” Econometrica, 77, Based Large Covariance Matrix Estimator with High Frequency Data,” 1403–1445. [288] Journal of Business and Economic Statistics, 34, 489–503. [289] Okamoto, M. (1973), “Distinctness of the Eigenvalues of a Quadratic form Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), “The Generalized in a Multivariate Sample,” Annals of Statistics, 1, 763–765. [293] Dynamic-Factor Model: Identification and Estimation,” The Review of Onatski, A. (2010), “Determining the Number of Factors from Empirical Economics and Statistics, 82, 540–554. [289] Distribution of Eigenvalues,” Review of Economics and Statistics, 92, ——— ( 2004), “The Generalized Dynamic Factor Model: Consistency and 1004–1016. [289] Rates,” Journal of Econometrics, 119, 231–255. [289] Paul, D. (2007), “Asymptotics of Sample Eigenstructure for a Large Dimen- Forni,M.,Hallin,M.,Lippi,M.,andZaffaroni,P.(2015), “Dynamic Factor sional Spiked Covariance Model,” Statistical Sinica, 17, 1617–1642. [288] Models with Infinite-dimensional Factor Spaces: One-Sided Represen- Pearson, K. (1901), “On Lines and Planes of Closest Fit to Systems of Points tations,” Journal of Econometrics, 185, 359–371. [289] in Space,” Philosophical Magazine, 2, 559–572. [287] ——— ( 2017), “Dynamic Factor Models with Infinite-Dimensional Factor Pelger,M.(2015a), “Large-Dimensional Factor Modeling Based on High- Space: Asymptotic Analysis,” Journal of Econometrics, 199, 74–92. [289] Frequency Observations,” Discussion paper, Stanford University. [289] JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 303

——— ( 2015b), “Understanding Systematic Risk: A High-Frequency Trapani, L. (2017), “A Randomised Sequential Procedure to Determine the Approach,” Discussion paper, Stanford University. [289] Number of Factors,” Journal of American Statistical Association,forth- Reiß,M.,Todorov,V.,andTauchen,G.E.(2015), “Nonparametric Test coming. [289] for a Constant Beta between Itô Semi-Martingales Based on High- Tyler, D. E. (1981), “Asymptotic Inference for Eigenvectors,” Annals of Frequency Data,” Stochastic Processes and their Applications, 125, Statistics, 9, 725–736. [287] 2955–2988. [295] Wang, W., and Fan, J. (2017), “Asymptotics of Empirical Eigenstructure for Ross,S.A.(1976), “The Arbitrage Theory of Capital Asset Pricing,” Journal Ultra-High Dimensional Spiked Covariance Model,” Annals of Statis- of Economic Theory, 13, 341–360. [289] tics, 45, 1342–1374. [288] Shephard,N.,andXiu,D.(2017), “Econometric Analysis of Multivari- Waternaux, C. M. (1976), “Asymptotic Distribution of the Sample Roots for ate Realized QML: Estimation of the Covariation of Equity Prices aNonnormalPopulation,”Biometrika, 63, 639–645. [287] Under Asynchronous Trading,” Journal of Econometrics, 201, 19–42. Wu, W. B., and Zaffaroni, P. (2018), “Asymptotic Theory for Spectral Den- [297] sity Estimates of General Multivariate Time Series,” Econometic Theory, Stock,J.H.,andWatson,M.W.(1999), “Forecasting Inflation,” Journal of 34, 1–22. [289] Monetary Economics, 44, 293–335. [289] Zheng, X., and Li, Y. (2011), “On the Estimation of Integrated Covariance ——— ( 2002), “Forecasting using Principal Components from a Large Matrices of High Dimensional Diffusion Processes,” Annals of Statistics, Number of Predictors,” Journal of American Statistical Association, 97, 39, 3121–3151. [288] 1167–1179. [289] Zou, H., Hastie, T., and Tibshirani, R. (2006), “Sparse Principal Compo- Tao, T. (2012), Topics in Random Matrix Theory,Providence,RI:American nent Analysis,” Journal of Computational and Graphical Statistics, 15, Mathematical Society. [288,290] 265–286. [288]