IEEE TRANSACTIONS ON , VOL. 67, NO. 5, MARCH 1, 2019 1353 Estimation of the Evolutionary Spectra With Application to Stationarity Test Yu Xiang , Member, IEEE,JieDing , Member, IEEE, and Vahid Tarokh, Fellow, IEEE

Abstract—In this paper, we propose a new inference procedure emphasis on the physical meaning of frequency, while general- for understanding non-stationary processes, under the framework izing the spectral representation of the stationary processes to of evolutionary spectra developed by Priestley. Among various that of the non-stationary processes [9]. frameworks of modeling non-stationary processes, the distinguish- ing feature of the evolutionary spectra is its focus on the physical Perhaps the most closely related framework is the locally sta- meaning of frequency. The classical estimate of the evolutionary tionary processes framework. Since Dahlhaus developed this is based on a double-window technique consist- framework in a series of papers [4], [10], [11], it has been ing of a short-time and a smoothing. How- extensively studied (see for example [12], [13] and references ever, smoothing is known to suffer from the so-called bias leak- therein). We attempt to summarize the main differences between age problem. By incorporating Thomson’s multitaper method that the two frameworks as follows. The evolutionary spectra frame- was originally designed for stationary processes, we propose an improved estimate of the evolutionary spectral density, and ana- work is motivated by the physical interpretation of frequency, lyze its bias/variance/resolution tradeoff. As an application of the but does not guarantee the uniqueness of the spectral density. new estimate, we further propose a non-parametric rank-based On the other hand, the locally stationary processes framework stationarity test, and provide various experimental studies. guarantees the uniqueness of the spectral density by providing Index Terms—Non-stationary processes, evolutionary spectra, an asymptotic analysis of the non-stationary processes. How- spectral analysis, multitaper method, stationarity test. ever, the rescaling technique, which is central to that frame- work, may sacrifice physical interpretations in some real appli- I. INTRODUCTION cations. One related framework is developed based on SLEX ONSTATIONARY processes are common across a vari- (smooth localized complex exponential basis) functions. Inter- N ety of areas and serve as a natural generalization of the estingly, it is shown to be asymptotically mean square equivalent classical wide-sense stationary processes. Because of their wide to Dahlhaus’s framework [14]. These SLEX based methods are range of applications, they have been an active research area quite useful for very long time series, as the dyadic segmentation in many different areas including signal processing, statistics, could help as a first approximation step. More comparisons be- neuroscience, and economics. tween Priestley’s and Dahlhaus’s frameworks can also be found However, the intrinsic complexity of the non-stationarity pre- in [15] and detailed discussions on the other frameworks can be cludes a unique way of modeling the non-stationary processes. found in [9], [16] and the references therein. Various frameworks have been developed over the past few The estimation procedure of the evolutionary spectra in [2] decades: instantaneous power spectra [1], evolutionary spec- is based on the so-called double-window technique, consist- tra [2], Wigner-Ville spectral analysis [3], locally stationary ing of a short-time Fourier transform and smoothing. However, processes [4], and local cosine basis [5] among others. In this the smoothing step is known to suffer from the so-called bias work, we adopt the evolutionary spectra framework developed leakage problem [17]. To overcome this problem for station- by Priestley and his colleagues [2], [6]–[8], which is one of the ary processes, various tapering methods have been developed first attempts to model non-stationary processes from the spec- and Thomson’s multitaper method [18] is arguably the most tral point of view. The appealing aspect of this framework is its widely used one. In this work, we apply the multitaper method to the estimation of evolutionary spectral density and analyze Manuscript received February 24, 2018; revised September 26, 2018 and the bias/variance/resolution tradeoff of the estimate. We show October 29, 2018; accepted November 26, 2018. Date of publication January 1, that the non-stationarity calls for additional considerations of the 2019; date of current version January 16, 2019. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. D. tradeoff, which provides insights into window design, choice of Robert Iskander. This work was supported by Defense Advanced Research frequency resolution and number of tapers. As an application of Projects Agency (DARPA) under Grants W911NF-14-1-0508, W911NF-16-1- the estimate, we propose a non-parametric rank-based station- 0561, and N66001-15-C-4028. (Corresponding author: Yu Xiang.) arity test and compare it with the stationarity test investigated Y. Xiang is with the Electrical and Computer Engineering Department, Uni- versity of Utah, Salt Lake City, UT 84112 USA (e-mail:,[email protected]). by Priestley and Subba Rao in [7]. Our test is more robust to J. Ding is with the School of Statistics, University of Minnesota, Minneapolis, the underlying distribution of the data, and it can serve as a MN 55455 USA (e-mail:,[email protected]). complementary test to the existing stationarity tests from our V.Tarokh is with the Electrical and Computer Engineering Department, Duke University, Durham, NC 27708 USA (e-mail:,[email protected]). numerical experiments. This paper has supplementary downloadable material available at http:// There are a few other related works in the literature. In [19], ieeexplore.ieee.org., including software implementations (in MATLAB) and the authors expressed Priestley’s two step approach in the form real data used in the paper. This material is 20 KB in size. Digital Object Identifier 10.1109/TSP.2018.2890369 of the multitaper formulation, where the number of tapers

1053-587X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. 1354 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 5, MARCH 1, 2019

becomes the number of neighbors of the targeted time t for Remark 1: Note that hw (t) ≡ 1 corresponds to the case the smoothing step. However, in this work we have removed when {X(t)} is a , which leads to Hw (v) ≡ the smoothing step and apply the multitaper method over the δ(v), where δ(·) is the Dirac delta function. same segment of process, i.e., with the same targeted time t Throughout this paper, we assume that μ(w) is absolutely and frequency w. With additional smoothness assumptions on continuous with respect to Lebesgue measure. Thus the evolu- the underlying spectra, the authors in [20] have investigated the tionary spectral density at time t is statistical properties of the spectra estimate. dμ(w) f (w)=|A (w)|2 . The paper is organized as follows. In Section II, the evolu- t t dw tionary spectra framework is briefly reviewed. In Section III, As mentioned above, for any fixed w, H (v) is the Fourier the main results are summarized and the key steps of the proofs w transform of h (t), i.e., are presented. In Section IV, the estimate based on the short- w ∞ time Fourier transform is evaluated through a new time-domain H (v)= h (t)e−ivt. approach, which facilitates the analysis of the estimate in later w w −∞ sections and serves as a much simplified proof compared with t= the original one. In Section V, the estimate based on the multi- Without loss of generality, At (w) can be normalized so that for taper method is analyze in the evolutionary spectra framework. all w, In Section VI, a non-parametric stationarity test is proposed and A0 (w)=1, (2) various experimental studies are presented. which implies that dμ(w) represents the evolutionary spectrum 2 at t =0and |At (w)| represents the change relative to t =0. A. Notation Let  Z R π Let and denote the set of integers and real numbers, | || |  BF (w)= v Hw (v) dv, respectively. For integers a and b such that a

the first window {g(u),u∈ R} is defined as with gk (u) being a set of sequences (shifted so that they are ∞ centered around 0) each of length N0 such that g(u)=0for |u| >L.LetG(w) denote u=−(N −1)/2 the Fourier transform of g(u), i.e., ∞ The expectation of its evolutionary spectral density estimate G(w)= g(u)e−iuw . (4) using the multitaper method is given below. Theorem 1: u=−∞ ˆK We assume that g(u) is square integrable and without loss of E[ft (w)]  generality it is normalized, π   ∞  − (K )  π = ρK (w λ)ft (λ)dλ + O Bg /BX 2 2 − 2π |g(u)| = |G(w)| dw =1. (5)  π −∞ −π π   u= − (K ) = ρK (λ)ft (w λ)dλ + O Bg /BX , − B. Uniformly Modulated Processes π where − It is hard to characterize BX exactly for semi-stationary pro- K1  1 | |2 (K )  cesses [22]. However, there is one important class of processes ρK (λ) Gk (λ) ,Bg max Bgk , K k whose characteristic widths can be bounded from below. This k=0 class, termed as the uniformly modulated processes [2], is of the (K ) and Bg is sufficiently smaller than BX . following form: Assume that ft (w)∞ is bounded for all t, the bias of the X(t)=c(t)Y (t), (6) estimate can be bounded as follows. Theorem 2: Assume that f (w)∞ < ∞, where Y (t) is a stationary process with zero mean and spectral    t  density fY (w), and the Fourier transform of c(t) has an absolute  ˆK   ˆK − K  Bias(ft (w)) = ft (w) ft (w) maximum at the origin. Thus it follows straightforwardly that   π (K ) log N Bg X(t)= c(t)eiwtdZ(w), = O + W 2 + . −π K BX 2 { } where E|dZ(w)| = dFY (w). The process introduced in (6) is When X(t) is further assumed to be a normal process, the F { iwt} ˆK an oscillatory process since Y = c(t)e is a family of variance of ft (w) can be characterized as follows. oscillatory functions. The evolutionary spectrum with respect Theorem 3: Assume that ft (w)∞ < ∞ and {X(t)} be a to F is normal process,   2 (K ) ft (w)=c (t)fY (w). ˆK 1 Bg Var(ft (w)) = O + . The name, uniformly modulated processes, follows from the K BX − fact that for two different frequencies w1 and w2 in [ π,π],the From Theorem 2 and 3, the mean squared error (MSE) of spectrum is modulated in the same way, i.e., ˆK ft (w) is given by the following. f (w ) f (w ) t1 1 = t1 2 . Corollary 1: Given the same assumptions as in Theorem 3, K ft2 (w1 ) ft2 (w2 ) MSE(fˆ (w)) ≥ t From the definition of BX ,wehaveBX BFY .    log N 2 1 B(K ) = O + W 4 + + g . III. STATEMENTS OF THE MAIN RESULTS K K BX For semi-stationary processes {X(t), 0 ≤ t ≤ T − 1},itis The proofs of the results are postponed to Section V and natural to apply the multitaper method [18], which identifies appendices and we briefly overview the main ingredients here. | |2 { } K sequences of length N denoted by {gk (u), 1 ≤ k ≤ K, 1 ≤ Firstly, we analyze U(w) for general g(u) , which serves as u ≤ N} (assume N to be odd for simplicity of notation). Let a preliminary estimate (Propostion 1 and Propostion 2) before 2π/N < W < π denote the frequency resolution of the mul- applying the multitaper method. We take a different approach titaper method. The details are postponed to Section V. The than Priestley did in [2], in particular, we apply the pseudo estimate of the evolutionary spectral density ft (w) is given as δ-function argument (see Definition 1) directly in the time do- below main (Lemma 1) instead of in the frequency domain. This al- K −1 ternative approach allows us to carry out analysis without intro- 1  fˆK (w)= |U (k)(w)|2 , (7) ducing the generalized transfer function (for details see equa- t K t k=0 tion (6.6) and Theorem 7.2 in [2]). The benefits of this new where we have approach are twofolds, it makes the variance analysis for the T−1 multitaper method straightforward (Theorem 3) and provides (k) − −iwu Ut (w)= gk (u t)X(u)e a much simplified alternative proof of Propostion 2, which is u=0 a slightly different version of Theorem 8.1 in [2], which then 1356 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 5, MARCH 1, 2019 leads to Theorem 1. Secondly, by leveraging on a recent approx- with imation result (Theorem 4) on multitaper method by Abreu and Romero [23], we analyze the bias/variance/resolution tradeoffs |R(t)|    in the evolutionary spectra framework as in Corollary 1 based  ∞ π  (a) 1  −iwu iv(t+u) ivt  on Theorem 2 and Theorem 3. =  g(u)e Hw (v)(e − e )dv 2π − u=−∞ π  IV. APPROXIMATELY UNBIASED ESTIMATE OF THE ∞ π ≤ 1 | | | || ivu − | EVOLUTIONARY SPECTRA g(u) Hw (v) e 1 dv 2π − u=−∞ π In this section, we start with analyzing a preliminary estimate ∞  (b)  π of the ft (w). Recall that g(u) is assumed to be a time-limited 1 ≤ |u||g(u)| |v||Hw (v)|dv function. First introduce Jt (w) as follows, for fixed t ∈ Z and 2π − u=−∞ π w ∈ [−π,π], (c)   ≤ Jt (w) O Bg /BX , ∞  where in (a), A (u) is substituted by = g(u − t)X(u)e−iwu w  u=−∞ π 1 ivu ∞   Aw (u)= Hw (v)e dv,  π 2π − iλu −iwu π = g(u − t) Au (λ)e dZ(λ) e − u=−∞ π (b) follows since |eix − 1|≤|x|, and (c) follows the definition  ∞ π  of Bg and (a) − −i(w −λ)u = g(u t)Au (λ)e dZ(λ)  − π π u=−∞ |v||H (v)|dv = BF ∗ (w) ≤ 1/B .  ∞ w X π  −π −i(w −λ)t −i(w −λ)v = e g(v)Av+t (λ)e dZ(λ), − π v=−∞  Remark 4: Lemma 1 provides a delta approximation in the where the summation and integral can exchange in (a) is because time domain directly, unlike the Priestley’s frequency domain g(u) is a time-limited function. approach. The new machinery proposed here allows us to carry Remark 3: Note that Jt (w) is equivalent to its counterpart out the analysis without introducing the generalized transfer Yt (w) in [2] when g(u) is a symmetric function, i.e., g(u)= function [2]. The generalized transfer function also plays a key − g( u) for all u. role in performing the mean and variance analysis (for details of In the following, we introduce the pseudo δ-function argu- the variance analysis, see Section III in [6]), thus our approach ment but apply it to the time domain directly. The analysis of the leads to a straightforward analysis of both mean (Proposition 2) spectra estimate of ft (w) in [2] depends on an approximation and variance (Theorem 3). called pseudo δ-function and the discrete counterpart can be From Lemma 1, Jt (w) can be further expressed in the fol- defined as below. The continuous version can be defined simi- lowing expression. larly and is used in [2]. Proposition 1: Definition 1: Consider two functions a(·):Z → R and · Z → R b( ): . Then a(u) is a pseudo δ-function of order  with J (w) ∈ Z t respect to b(u) if, for any t , there exists  not depending on  π ∞ t such that −i(w −λ)t −i(w −λ)v   = e At (λ) g(v)e dZ(λ) ∞ ∞ −   π v=−∞  a(u)b(u + t) − b(t) a(u) <.      π u=−∞ u=−∞ −i(w −λ)t + O Bg /BX e dZ(λ) − Now we show the following.  π ∗ −iwu π Lemma 1: For family F , a(u)  g(u)e is a pseudo δ- −i(w −λ)t = At (λ)G(w − λ)e dZ(λ) function of b(u)  Au (w) with order O Bg /BX . − π  Proof: To clarify the role of u as the argument, we write   π −i(w −λ)t Aw (u)=Au (w) in this proof. For any t ∈ Z, + O Bg /BX e dZ(λ), (8) −π ∞  −iwu +∞ − g(u)Aw (u + t)e iwv where G(w)= v=−∞ g(v)e . u=−∞ As one shall see, the relationship between the window choice ∞ 2  g(u) and the estimate |Jt (w)| of ft (w) is revealed directly −iwu = g(u)Aw (t)e + R(t) through this time domain approach. This leads to the following u=−∞ proposition. XIANG et al.: ESTIMATION OF THE EVOLUTIONARY SPECTRA WITH APPLICATION TO STATIONARITY TEST 1357

Proposition 2: gible. This holds since we are dealing with g(u) that is time- 2 limited, i.e., g(u)=0for |u| >Nfor some N. Thus we have E[|Jt (w)| ]  for N/2

review Thomson’s multitaper method [18] and the discrete pro- It has been observed numerically that ρK (λ) is close to late spheroidal sequences (DPSS) or Slepian sequences [17], (1/2W )1[−W,W ](·) by Thomson [18], which is justified recently [27]–[29]. by Abreu and Romero [23] as given below. Theorem 4 ([23]). Let N ≥ 2 denote the length of the se-

A. Thomson’s Multitaper Method quence, 2π/N < W < π and set K= NW/π . Then   Consider N sample records {X(0),...,X(N − 1)}.1 As-  1  log N ρK (·) − 1[−W,W ](·) = O . sume that the sampling frequency is 1, then for a sequence of 2W 1 K length N, the fundamental frequency is 2π/N and the Nyquist In the following section, we apply this result to analyze frequency is π.For2π/N < W < π, one wishes to find se- the performance of the multitaper method for semi-stationary quences with spectral densities concentrated over [−W, W ].We processes. will refer to W as the resolution of the estimate. This problem was first investigated in a series of papers by Slepian, Laudau, B. Estimate of the Evolutionary Spectra Based on the Pollak [27]–[29]. The solution turns out to be a set of sequences Multitaper Method v (N,W; u), 0 ≤ u ≤ N − 1, 0 ≤ k ≤ N − 1, which satisfy k For stationary processes, the bias and variance of the multita- the following eigenvalue equation per spectral estimate (17) has been investigated [30], [31], [32]. N−1 sin W (u − u ) In this section, we investigate its performance for semi- v (N,W; u ) sin π(u − u ) k stationary processes. Let g(u) be a time-limited function, i.e., u =0 |g(u)| =0, for |u| > (N − 1)/2, = λ (N,W)v (N,W; u). k k where N is assumed to be odd. Apply the multitaper method on · These N eigenvectors vk (N,W; ) are called the discrete pro- {X(t), 0 ≤ t ≤ T − 1} with late spheroidal and they are ordered by their eigenvalues   gk (u)  vk u +(N − 1)/2 for 0 ≤ k ≤ K − 1, 1 >λ0 (N,W) >λ1 (N,W) > ···>λN −1 (N,W) > 0.Itis well-known that the first K = 2NW/2π = NW/π eigen- then for t>(N − 1)/2 we have values are close to 1. T−1 (k) − −iwu Remark 5: The choice of using N as the length of the sam- Ut (w)= gk (u t)X(u)e ple records instead of T is on purpose. T is the length of the u=0 whole sample records, while N will be used as the length of a t+(N−1)/2 −iwu time-limited function g(u) discussed later in this section. If the = gk (u − t)X(u)e . process is indeed stationary, one would choose T = N. u=t−(N −1)/2 The discrete prolate spheroidal wave functions are denoted From Proposition 2 and (16), by V (N,W; λ) for 1 ≤ k ≤ K, where k | (k) |2 − E[ Ut (w) ] N1  V (N,W; λ)=(−1)k  v (N,W; u)e−iλ(u−(N −1)/2), π   k k k | − |2 u=0 = Gk (w λ) ft (λ)dλ + O Bgk /BX , √ − − π where k =1 when k is even and k = 1 when k is where odd. For simplicity of notation, we suppress N and W and ∞ (N−1)/2 write v (u)=v (N,W; u), V (λ)=V (N,W; λ), and λ = −iλu −iλu k k k k k Gk (λ)= gk (u)e = gk (u)e . λk (N,K). These K functions satisfy two types of orthogonality u=−∞ u=−(N −1)/2 over [−W, W ] and [−π,π], respectively (k) 2  The estimate of ft (w) is the average of |U (w)| , W λ , for k = l; t k K−1 Vk (λ)Vl (λ)dλ = 1 −W 0, otherwise. fˆK (w)= |U (k)(w)|2 t K  k=0 π 1, for k = l; and the mean of the estimate given in Theorem 1. Vk (λ)Vl (λ)dλ = (15) −π 0, otherwise. There is a bias/variance/resolution tradeoff for the esti- ˆK   2 mate ft (w). Assuming that ft (w) ∞ is bounded for all t, Observe that |Vk (λ)| can be rewritten as follows,   Theorem 2 can be proved by invoking Theorem 4 as given be- N−1 2 − low. |V (λ)|2 =  v (u)e iλu  . (16) k  k  Proof: First, the bias can be bounded, u=0 | ˆK | Consider the average of the K tapered estimates, Bias(ft (w))   K−1  ˆK −  1 2 = E[ft (w)] ft (w) ρK (λ)  |Vk (λ)| . (17)   K  π  k=0   ≤  ρK (λ)ft (w − λ)dλ − ft (w) −π 1   These N sample records are a consecutive subsequence from the (K ) {X(0),...,X(T − 1)}. + O Bg /BX . XIANG et al.: ESTIMATION OF THE EVOLUTIONARY SPECTRA WITH APPLICATION TO STATIONARITY TEST 1359

From Theorem 4,   π     ρK (λ)ft (w − λ)dλ − ft (w) −π    π     1  ≤  ρK (λ) − 1[−W,W ](λ) ft (w − λ)dλ −π 2W    π   1  +  1[−W,W ](λ)ft (w − λ)dλ − ft (w) −π 2W (a) ≤ C(log(N)/K + W 2 ), where (a) follows from Theorem 4 and the assumption that ft (w)∞ < ∞ and C is some positive constant.  When X(t) is a normal process, the variance of fˆK (w) can Fig. 1. The blue circle line corresponds to the MSE in (18) with respect to the t optimal K for each N and the red dot line corresponds to the MSE in (19) with be characterized as in Theorem 3 and the proof can be found in respect to the optimal K for each N . ˆK Appendix B. Now the MSE of ft (w) can be bounded as shown in Corollary 1,    log N 2 1 B(K ) O + W 4 + + g , K K BX where we have used the fact that the cross term W 2 (log N/K) is dominated by either W 2 or (log N/K) depending on whether W 2 ≥ (log N/K) or W 2 ≤ (log N/K), and this argument ap- plies to all the other cross terms as well. For the uniformly modulated processes X(t)=c(t)Y (t), where Y (t) is a stationary process as defined in Section II-B,

BX can be lower bounded by BFY . Thus, based on Corollary 1, the MSE of the estimate in the case can be further bounded as   log N 2 1 B(K ) + W 4 + + g . (18) K K BFY Fig. 2. The blue circle line is the relationship between N and the correspond- ing optimal K evaluated according to (18) and the red dot line is the relationship Recall that K = NW/π and W can be as small as 2π/N. between N and the corresponding optimal K evaluated according to (19). Thus for stationary processes, the MSE of the spectral density estimate decreases as N grows. However, this is no longer the relationship between the N and the corresponding optimal K is (K ) case for the semi-stationary processes as Bg /BX may become shown in Fig. 2. Note that (19) is derived in [23] and the optimal the dominant term for N large enough. We demonstrate this K scales roughly as N −4/5 . For each N,theK that minimize point through the following example appeared in [2], [6], [7]. the MSE do not vary much in both cases. Example. Consider the following semi-stationary process For non-stationary processes, it is a natural idea to approxi- X(t)=c(t)Y (t), for 1 ≤ t ≤ T mate it by stationary processes locally. Heuristic methods, such as segmentation, have been developed [33] to deal with non- where a = 200, c(t)=e(t−T/2)2 /2a2 , and stationarity. In the evolutionary spectra framework, BX defined − Yt =0.8Yt−1 0.4Yt−2 + Zt ,  in (3) can be roughly interpreted the longest “approximately ∼N 2 stationary” segment [2] of the semi-stationary process. It is thus with Zt (0, 100 ). It is shown in [6] that BFY = a π/2. We now use this fact to evaluate (18) and compare it with the tempting to estimate BX . However, it seems that BX is more of MSE without considering non-stationarity, i.e., a theoretical technique rather than providing fundamental mean-   ings. Its definition is tailored to get the first order approximation log N 2 1 + W 4 + . (19) of the estimate, which can be partially seen from Section IV. K K Furthermore, characterizing BX is highly non-trivial as shown In Fig. 1, we compare the MSE in these two cases and by Melard´ in [22]. As a comparison, in the locally stationary the parameter K is optimized over {2, 3,...,N − 1} since processes framework [4], the authors characterized the optimal 2π/N < W < π. In Fig. 1, for each N, the MSE are com- choice of N as Nopt, the length of the stationary segment [34] puted with respect to the optimal K. We can see that the MSE through minimizing the MSE of a local covariance estimate. in (18) starts to increase for larger N, i.e., larger N will no While the characterization is interesting from the theoretical longer be beneficial for estimating the evolutionary spectra of point of view, its application is limited due to its dependence on semi-stationary processes. This is because larger N does not the true unknown parameters. provide more information of the spectra since it is changing As a natural application of the evolutionary spectral density over time. This is in contrast with the stationary case, where estimate, we propose a non-parametric stationarity test in the larger N would improve the performance of the estimate. The next Section. 1360 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 5, MARCH 1, 2019

VI. STATIONARITY TEST where I = T/N and J is the number of frequencies chosen The evolutionary spectral density estimate suggests a sta- 2π(K +1)/(N +1)apart. Let tistical test for the stationarity of a process, as first discussed I J in Priestley’s paper [2] and later investigated by Priestley and W·· =(1/IJ) Wij , Subba Rao (PSR test) in [7]. The original version of the PSR i=1 j=1 test uses the smoothing technique by introducing a second win- J dow, which suffers from bias leakage problems as discussed in Wi· =(1/J) Wij , Section IV. In a recent package developed by Constantine and j=1 Percival [35], smoothing is replaced by the multitaper method. I This modified PSR test has been served as a baseline to when W· =(1/I) W . compared with other stationarity tests, e.g., in [36]. Based on j ij the results from Section V-B, we attempt to provide some in- i=1 Between times variance with degrees of freedom I − 1 concerns sights into the choice of the parameters in the test. Furthermore, how uniform are {W } over the time indices 1 ≤ i ≤ I, a non-parametric version of the stationarity test is proposed, ij which is based on the Friedman test [37], [38] and is robust to I − 2 the underlying distribution. It serves as a complementary test ST = J (Wi· W··) . to the existing stationarity tests, in the sense that it is more i=1 conservative than PSR, see Section VI-C for details. Similarly, between frequencies variance with degrees of free- dom J − 1 is J 2 A. PSR Stationarity Test With the Multitaper Method SF = I (W·j − W··) . j=1 Let ft (w) denote the evolutionary spectral density of a semi- stationary process {X(t):0≤ t ≤ T − 1}. Consider the esti- Interaction and residual variance with degrees of freedom ˆK (I − 1)(J − 1) is mate ft (w) based on the multitaper method as in Section V and recall that I J 2 SI +R = (Wij − Wi· − W·j + W··) . K−1 1 i=1 j=1 fˆK (w)= |U (k)(w)|2 , t K t The null hypothesis is that the process is stationary and the k=0 alternative hypothesis is that the process is non-stationary. The where test steps are described in the following. 1) First, test the interaction and residual sum of squares using − t+(N1)/2 S /σ2 = χ2 . (k) − −iwu I +R (I −1)(J −1) Ut (w)= gk (u t)X(u)e . 2) If the interaction and residual is not significant, one u=t−(N −1)/2 conclude that the process is a uniformly modulated process.3 Then proceed to test S /σ2 = χ2 .Ifthe It is a common practice to take the logarithm of the estimate, T (I −1) between-times is not significant, conclude that the pro- which stabilizes its variance [39]. Let cess is stationary. Otherwise, conclude that the process is ˆK non-stationary. Yij = log ft (wj ), i 3) If the interaction and residual is significant, conclude that Moreover, to apply the two-way analysis of variance (ANOVA) the process is non-stationary. ˆK test [40], it has to be assumed that the distribution of log ft (w) is approximately normal [7]. More specifically, it can be shown B. A Non-Parametric Stationarity Test − that Wij = Yij ψ(K) + log(K) is approximately distributed There are two main assumptions of the two-way ANOVA according to the normal distribution with mean 0 and vari- test: (1) the samples are uncorrelated and (2) the residuals are 2 ≥ · · ance σ = ψ (K) for K 5, where ψ( ) and ψ ( ) denote normally distributed. There has been extensive research on the the digamma function and the trigamma function, respectively robustness of the assumptions for ANOVA test. It is known that (see [41, Section II] and [42] for details). the test statistics depend heavily on the first assumption and The approximate independence in time is obtained by choos- is less sensitive to the second assumption. The latter is shown ing non-overlapping short windows of length N and the approx- empirically first in [43] and later in [44]. imate independence in frequency is by choosing frequencies that More specifically, the degree of violation of the normal dis- are 2π(K +1)/(N +1)apart.2 Now the problem reduces to a tribution is usually characterized by the skewness β1 and flat- two-way ANOVA test for Wij for i ∈ [1 : I] and j ∈ [1 : J], 3 3 ness β2 of the distribution, where β1 = E[(X − μ) ]/σ and 4 4 2 β2 = E[(X − μ) ]/δ , where μ and δ denote the mean and variance of X, respectively. The test statistics are less sensitive

2Buffers are needed at the beginning and at the end when sample in frequency 3The test of the sum of interaction and residual being not significant implies to overcome the edge effect. The size of the buffer could be chosen from that the time and frequency can be decoupled and thus the process can be [B/2,B],whereB  2π(K +1)/(N +1). expressed in the form of uniformly modulated process. See [7] for details. XIANG et al.: ESTIMATION OF THE EVOLUTIONARY SPECTRA WITH APPLICATION TO STATIONARITY TEST 1361 to the skewness and flatness of X, essentially due to the central limit theorem as the the test statistics are based on summation of many terms. In the PSR test, the test results are more re- liable when the degrees of freedom of time and frequency are large. On the other hand, nonparametric test, e.g., the rank-based Friedman test [37], [38], has an edge when the number of test samples is relatively small. We now propose the non-parametric test, which will be re- ferred to as rank-based stationarity test or RS test in short. Take {Wij } introduced in the previous section. In the time-frequency table filled by {Wij }, rank the elements in each column in an increasing order (i.e., 1 corresponds to the smallest element) to form a table of ranks: {Rij}. Whenever there is a tie among k elements in the same column, assign the mean rank of the k Fig. 3. The blue square line corresponds to the MSE in (18) plus a penalty elements. Similar to the two-way ANOVAtest, let R·· denote the term K and the black circle line corresponds to the MSE in (18). mean rank of all ranks, denote Ri· the mean rank of row i.The − 2 TABLE I sum of square of ranks SSR is SSR = J i (Ri· R··) .The EMPIRICAL SIZE COMPARISON (%) test statistics tR = SSR /const, where const = I(I +1)/12.It is known that tR is (approximately) distributed according to 2 χI −1 [37], [38]. Remark 6: Conventionally, the rows are ranked and then the ranks in each column are summed up to form the test statistic. To be consistent with the two-way ANOVA test, the role of row and column are switched in this work.

C. Simulations 1) Synthetic Data: The performance of a test is evaluated In this section, the performance of the proposed non- based on its empirical size and power values. Generate M = parametric stationarity test is evaluated and compared with the 1000 sample paths/realizations each with length T = 512 and PSR test for a variety of synthetic data and real data. In our let the nominal size of the test be 0.05. The null hypothesis H0 simulation, we use the multitaper function pmtm in MATLAB is that the process is stationary and the alternative hypothesis (R2016a) with default values as in [35]: number of tapers is H1 is that the process is not stationarity. 5, number of non-overlapping blocks is max{2, log(T )}, and For the size comparison, we generate sample paths from var- buffer size 0.7B where B =2π(K +1)/(N +1). The number ious stationary processes and count the number of rejections of of tapers is much smaller than the one we used for estimating the null hypothesis. Consider the following set of stationary the evolutionary spectra in Section V-B. autoregressive and moving-average (ARMA) models used Before the simulations, we first try to justify the choice of in [45] The noise term Z(t) is distributed according to N (0, 1). number of tapers. One key factor is that testing stationarity is a a) i.i.d. standard normal task of different nature compared with estimating the underlying b) AR(1): X(t)=0.9X(t − 1) + Z(t). evolutionary spectra. In particular, testing stationarity requires c) AR(1): X(t)=−0.9X(t − 1) + Z(t). a good amount of “independent” samples in both time and fre- d) MA(1): X(t)=Z(t)+0.8Z(t − 1). quency domain. Recall that this rough independence between e) MA(1): X(t)=Z(t) − 0.8Z(t − 1). the samples is achieved by sampling in non-overlapping blocks f) ARMA(1, 2): X(t)=−0.4X(t)+Z(t) − 0.8Z(t − 1). as well as sampling frequencies that are 2π(K +1)/(N +1) g) AR(2): X(t)=α1 X(t − 1) + α2 X(t − 2) + Z(t) with apart. Thus there is a hidden penalty for choosing large K (but α1 =1.385929 and α2 = −0.9604 (from [46]). less than 2NW), because large K will reduce the number of The empirical sizes for PSR is smaller in [45] than that in samples for performing the hypothesis test. Therefore, instead Table I, but still at least twice as large as the empirical sizes of RS of solely focusing on MSE, it is reasonable to add to it a penalty for all the models in Table I. The differences in PSR from [45] c(K) that is an increasing function of K. For simplicity of could be due to differences in parameters such as number of demonstration, we choose c(K)=K and evaluate the exam- tapers, number of non-overlapping blocks, buffer size, etc. ple in Section V based on the summation of MSE in (18) and For the power comparison, we generate sample paths from K in Fig. 3. Here N is chosen to be T/max{2, log(T )} with semi-stationary processes and count the number of acceptances T = 512. Thus number of chosen tapers should be 4 or 5. A of the null hypothesis. We focus on the uniformly modulated thorough study of this penalized approach is out of the scope processes as in [2], [7]. As in Section V-B, we focus on the of this paper and will be pursued in our future works. Note that following model, window design (choosing weights and length of the window) is (t−T/2)2 /2a2 hard even for stationary processes and “in practice it is advis- X(t)=e Y (t), (20) able to experiment with a range of windows” (see Section 10.4 where a = 200 and Yt =0.8Yt−1 − 0.4Yt−2 + Zt with Zt ∼ in [21] for details). N (0, 1002 ). For all the models from Table I, generate 1362 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 5, MARCH 1, 2019 uniformly modulated processes by multiplying each of them Rate Service. In the experiments, we take the widely used trans- (t−T/2)2 /2a2 with e . To make the numbering consistent with form: the log first-order difference, i.e., log(Xt ) − log(Xt−1 ) Table I, these models are also numbered from (a) to (g) and for 2 ≤ t ≤ N with {Xt :1≤ N} denote the RER. model (20) will be numbered as (h) in the table below. TABLE V TABLE II TEST RESULTS OF PSR EMPIRICAL POWER COMPARISON (%)

2 Based on RS test and ST /σ of the PSR test, one could order the stationarity of the four countries (in terms of RER) from the most stationary to least stationary as: Japan, UK, Canada, Since the empirical size of RS is smaller than that of PSR but China. When choosing 1% as the nominal size of the test, RS the empirical power is also smaller, RS is a more conservative test suggests that RER of both Japan and UK are stationary with test compared with PSR. Canada close to stationary; while PSR test suggests that RER 2) Real Data: We consider a real data example called ecgrr of Japan is stationary and RER of UK is close to stationary. used in [35] and comes from ‘RR interval time series model- ing: A challenge from PhysioNet and Computers in Cardiology TABLE VI 2002’ site of PhysioNet. The data are the RR intervals (beat-to- TESTS RESULT OF RS beat intervals measured between successive peaks of the QRS complex) for patients in normal sinus rhythm (record 16265 of the MIT-BIH database). The length of the sample T = 512 and the nominal size of the test be 0.05 as in the previous section. 2 2 Recall that the test statistics for PSR are ST /σ and SI +R /σ and that for RS is tR . Both PSR and RS suggest that the the pro- cess is non-stationary and their test statistics are summarized in the following tables.

TABLE III Both tests (under both 5% and 1%) suggest that RER of China TEST RESULT OF PSR is non-stationary.

VII. CONCLUDING REMARKS In this work, we investigate the spectrum estimation for an important class of non-stationary processes developed by TABLE IV TEST RESULT OF RS Priestley. We propose and analyze an improved estimate within the evolutionary spectra framework. The analysis is based on a novel alternative delta approximation in the time domain, as well as leveraging on a recent concentration result on the multitaper method. The estimate is then applied to develop a non-parametric stationarity test, which is complementary to the 3) Real Data: Purchasing Power Parity: In economics, a existing stationarity tests. There are several interesting future common practice to test the purchasing power parity (PPP) hy- directions. For parameter design, the penalized approach to pothesis via testing the stationarity of real exchange rates (RER). understand the parameter design for stationarity tests will be Some earlier studies using unit root tests yielded results that pursued. Regarding the computational constraints for spectra were not favorable to PPP (see for example [47]–[49]). In this estimation, which is crucial for very long time series, it will real data study, we test the stationarity of RER of four countries be interesting to incorporate the SLEX based methods into the (Canada, China, Japan, UK) with respect to US over the period framework to handle the computational issues. of January 1970 to December 2017. The monthly data of RER were calculated by E · P ∗/P , where E, P ∗ and P respectively ACKNOWLEDGMENT denote the nominal exchange rates, the foreign price level (eval- uated using consumer price index) and the domestic price level, The authors would like to thank the anonymous reviewers for using data sources from International Financial Statistics of the their valuable comments and suggestions to improve the quality International Monetary Fund, Financial Statistics of the Fed- of the paper. The first author would like to thank Lu´ıs Daniel eral Reserve Board, Haver Analytics, and the Pacific Exchange Abreu and Jose´ Luis Romero for clarifying proofs in [23]. XIANG et al.: ESTIMATION OF THE EVOLUTIONARY SPECTRA WITH APPLICATION TO STATIONARITY TEST 1363

  APPENDIX A K−1 1 (k) 2 PROOF OF LEMMA 1 =Var |U (w)| K t Proof: The statement can be proved by observing k=0 −     1 K1 π π = Cov |U (k)(w)|2 , |U (l) (w)|2 . | |2 | |2 K2 t t G(w) ft (w+v)dw = ft (v) G(w) dw+R(v), k,l=0 −π −π   | (k) |2 | (l) |2 where  First, Cov Ut (w) , Ut (w) can be rewritten as (21), π | | | | |2 | which is shown at the bottom of this page. Similar to Propo- R(v) = w G(w) ft (v + η(w)w)dw −π sition 2, we can further express S1 and S2 as (22) and (23),    ∞ π respectively. Now, since ft (w) ∞ < , ≤ | | | || |2 sup ft (w) w G(w) dw,    −π ≤w ≤π −π  K−1   (k) 2 (l) 2   Cov |U (w)| , |U (w)|  ≤ ≤  t t where 0 η(w) 1 for all w. k,l=0      K−1   K−1  APPENDIX B     ≤  S1  +  S2  PROOF OF THEOREM 3 k,l=0 k,l=0 ˆK Proof: The variance of ft (w) can be expressed as, − (a) K1 ≤ 2K + O(Bgk /BX ), ˆK k=0 Var(ft (w))   | (k) |2 | (l) |2 Cov Ut (w) , Ut (w)

  −iwu1 iwu2 −iwu3 iwu4 =Cov gk (u1 −t)gk (u2 −t)X(u1 )X(u2 )e e , gl (u3 − t)gl (u4 − t)X(u3 )X(u4 )e e

u 1 ,u2 u 3 ,u4

(a) = S1 + S2 , (21) where (a) follows from the Isserlis’ theorem [50] and ui ∈ [t − (N − 1)/2:t +(N − 1)/2] for i ∈ [1 : 4] with,  −iwu1 iwu2 −iwu3 iwu4 S1 = gk (u1 − t)gk (u2 − t)gl (u3 − t)gl (u4 − t)E[X(u1 )X(u3 )]E[X(u2 )X(u4 )]e e e e ,

u 1 ,u2 ,u3 ,u4  −iwu1 iwu2 −iwu3 iwu4 S2 = gk (u1 − t)gk (u2 − t)gl (u3 − t)gl (u4 − t)E[X(u1 )X(u4 )]E[X(u2 )X(u3 )]e e e e .

u 1 ,u2 ,u3 ,u4  −iw(v1 +t) iw(v2 +t) −iw(v3 +t) iw(v4 +t) S1 = gk (v1 )gk (v2 )gl (v3 )gl (v4 )e e e e

v1 ,v2 ,v3 ,v4   π π ∗ − ∗ − · iλ(v1 +t) iλ(v3 +t) iξ(v2 +t) iξ(v4 +t) Av1 +t (λ)Av +t (λ)e e dμ(λ) Av2 +t (ξ)Av +t (ξ)e e dμ(ξ) − 3 − 4 π π π  −iw(v1 +t) −iw(v3 +t) ∗ iλ(v1 +t) −iλ(v3 +t) = gk (v1 )gl (v3 )e e Av1 +t (λ)Av3 +t (λ)e e dμ(λ) −π v1 ,v3  π  iw(v2 +t) iw(v4 +t) ∗ iξ(v2 +t) −iξ(v4 +t) gk (v2 )gl (v4 )e e Av2 +t (ξ)Av4 +t (ξ)e e dμ(ξ) −π v2 ,v4  π  (a) 2 −iw(v1 +t) −iw(v3 +t) iλ(v1 +t) −iλ(v3 +t) = |At (λ)| gk (v1 )gl (v3 )e e e e dμ(λ) −π v1 ,v3  π  − | |2 iw(v2 +t) iw(v4 +t) iξ(v2 +t) iξ(v4 +t) At (ξ) gk (v2 )gl (v4 )e e e e dμ(ξ)+O(Bgk /BX )+O(Bgl /BX ) −π v2 ,v4   π π − − − − = Gk (w λ)Gl (w + λ)Gk ( w ξ)Gl ( w + ξ)ft (λ)dλft (ξ)dξ + O(Bgk /BX )+O(Bgl /BX ), (22) −π −π 1364 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 5, MARCH 1, 2019

where (a) follows from Lemma 1. Similarly we have,  −iw(v1 +t) iw(v2 +t) −iw(v3 +t) iw(v4 +t) S2 = gk (v1 )gk (v2 )gl (v3 )gl (v4 )e e e e

v1 ,v2 ,v3 ,v4   π π ∗ − ∗ − · iλ(v1 +t) iλ(v4 +t) iξ(v2 +t) iξ(v3 +t) Av1 +t (λ)Av +t (λ)e e dμ(λ) Av2 +t (ξ)Av +t (ξ)e e dμ(ξ) − 4 − 3 π  π π π − − − − = Gk (w λ)Gl ( w + λ)Gk ( w ξ)Gl (w + ξ)ft (λ)dλft (ξ)dξ + O(Bgk /BX )+O(Bgl /BX ). (23) −π −π   K−1 π π K−1 K−1 K−1 − − − − S1 = Gk (w λ)Gk ( w ξ) Gl (w + λ)Gl ( w + ξ)ft (λ)dλft (ξ)dξ + O(Bgk /BX ) − − k,l=0 π π k=0 l=0 k=0     π π K−1 2   ≤  Gk (w − λ)Gk (−w − ξ) ft (λ)dλft (ξ)dξ − − π π k=0      π π K−1 2 1/2 K−1  −   Gl (w + λ)Gl ( w + ξ) ft (λ)dλft (ξ)dξ + O(Bgk /BX ). (24) − − π π l=0 k=0

where (a) follows from (24) and a similar argument for S2 , [17] D. B. Percival and A. T. Walden, Spectral Analysis for Physical Applica- as well as the orthogonality condition (15). This finishes the tions. Cambridge, U.K.: Cambridge Univ. Press, 1993.  [18] D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. IEEE, proof. vol. 70, no. 9, pp. 1055–1096, Sep. 1982. [19] A. T. Walden and E. A. Cohen, “Statistical properties for coherence esti- REFERENCES mators from evolutionary spectra,” IEEE Trans. Signal Process., vol. 60, [1] C. H. Page, “Instantaneous power spectra,” J. Appl. Phys., vol. 23, no. 1, no. 9, pp. 4586–4597, Sep. 2012. pp. 103–106, 1952. [20] K. S. Riedel, “Optimal data-based kernel estimation of evolutionary [2] M. B. Priestley, “Evolutionary spectra and non-stationary processes,” J. spectra,” IEEE Trans. Signal Process., vol. 41, no. 7, pp. 2439–2447, Roy. Statistical Soc. B, vol. 28, no. 1, pp. 228–240, 1966. Jul. 1993. [3] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstation- [21] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods. Berlin, ary processes,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, Germany: Springer, 2013. no. 6, pp. 1461–1470, Dec. 1985. [22] G. Melard´ and A. H.-d. Schutter, “Contributions to evolutionary spectral [4] R. Dahlhaus, “On the Kullback-Leibler information divergence for locally theory,” J. Time Ser. Anal., vol. 10, no. 1, pp. 41–63, 1989. stationary processes,” Stochastic Processes Appl., vol. 62, pp. 139–168, [23] L. D. Abreu and J. L. Romero, “MSE estimates for multitaper spectral 1996. estimation and off-grid compressive sensing,” IEEE Trans. Inf. Theory, [5] S. Mallat, G. Papanicolaou, and Z. Zhang, “Adaptive covariance estima- vol. 63, no. 12, pp. 7770–7776, Dec. 2017. tion of locally stationary processes,” Ann. Statist., vol. 26, no. 1, pp. 1–47, [24] S. Haykin, “Cognitive radio: Brain-empowered wireless communica- 1998. tions,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, Feb. [6] M. B. Priestley, “Design relations for non-stationary processes,” J. Roy. 2005. Statistical Soc. Ser. B (Methodological), vol. 28, pp. 228–240, 1966. [25] A. Delorme and S. Makeig, “EEGLAB: An open source toolbox for anal- [7] M. B. Priestley and T. S. Rao, “A test for non-stationarity of time-series,” ysis of single-trial EEG dynamics including independent component anal- J. Roy. Statistical Soc. Ser. B (Methodological), vol. 31, pp. 140–149, ysis,” J. Neurosci. Methods, vol. 134, no. 1, pp. 9–21, 2004. 1969. [26] G. Bond et al., “A pervasive millennial-scale cycle in North Atlantic [8] M. B. Priestley and H. Tong, “On the analysis of bivariate non-stationary holocene and glacial climates,” Science, vol. 278, no. 5341, pp. 1257– processes,” J. Roy. Statistical Soc. Ser. B (Methodological), vol. 35, 1266, 1997. pp. 153–166, 1973. [27] D. Slepian and H. O. Pollak, “Prolate spheroidal wave functions, Fourier [9] M. B. Priestley, Spectral Analysis and Time Series. New York, NY, USA: analysis and uncertainty–I,” Bell Labs Tech. J., vol. 40, no. 1, pp. 43–63, Academic, 1981. 1961. [10] R. Dahlhaus, “Fitting time series models to nonstationary processes,” Ann. [28] H. J. Landau and H. O. Pollak, “Prolate spheroidal wave functions, Fourier Statist., vol. 25, no. 1, pp. 1–37, 1997. analysis and uncertainty–II,” Bell Labs Tech. J., vol. 40, no. 1, pp. 65–84, [11] R. Dahlhaus, “A likelihood approximation for locally stationary pro- 1961. cesses,” Ann. Statist., vol. 28, no. 6, pp. 1762–1794, 2000. [29] D. Slepian, “Prolate spheroidal wave functions, , and [12] G. P. Nason, R. Von Sachs, and G. Kroisandt, “Wavelet processes and uncertainty–V: The discrete case,” Bell Labs Tech. J., vol. 57, no. 5, adaptive estimation of the evolutionary wavelet spectrum,” J. Roy. Statis- pp. 1371–1430, 1978. tical Soc.: Ser. B (Statistical Methodology), vol. 62, no. 2, pp. 271–292, [30] K. Lii and M. Rosenblatt, “Prolate spheroidal spectral estimates,” Statist. 2000. Probability Lett., vol. 78, no. 11, pp. 1339–1348, 2008. [13] H. C. Ombao, J. A. Raz, R. von Sachs, and B. A. Malow, “Automatic sta- [31] A. T. Walden, E. McCoy, and D. B. Percival, “The variance of multita- tistical analysis of bivariate nonstationary time series,” J. Amer. Statistical per spectrum estimates for real Gaussian processes,” IEEE Trans. Signal Assoc., vol. 96, no. 454, pp. 543–560, 2001. Process., vol. 42, no. 2, pp. 479–482, Feb. 1994. [14] H. Ombao, J. Raz, R. Von Sachs, and W. Guo, “The slex model of a non- [32] J. A. Hogan and J. D. Lakey, Duration and Bandwidth Limiting: Pro- stationary random process,” Ann. Inst. Statistical Math., vol. 54, no. 1, late Functions, Sampling, and Applications. Berlin, Germany: Springer, pp. 171–200, 2002. 2011. [15] R. Dahlhaus, “Asymptotic statistical inference for nonstationary processes [33] M. Ding, S. Bressler, W. Yang, and H. Liang, “Short-window spec- with evolutionary spectra,” Athens Conf. Appl. Probability Time Ser., tral analysis of cortical event-related potentials by adaptive multi- vol. 115, pp. 145–159, 1996. variate autoregressive modeling: Data preprocessing, model validation, [16] P. Flandrin, Time-Frequency/Time-Scale Analysis. New York, NY, USA: and variability assessment,” Biol. Cybern., vol. 83, no. 1, pp. 35–45, Academic, 1998, vol. 10. 2000. XIANG et al.: ESTIMATION OF THE EVOLUTIONARY SPECTRA WITH APPLICATION TO STATIONARITY TEST 1365

[34] R. Dahlhaus and L. Giraitis, “On the optimal segment length for parameter Yu Xiang (S’10–M’15) received the B.S. degree with highest distinction in estimates for locally stationary time series,” J. Time Ser. Anal., vol. 19, telecommunication engineering from Xidian University, Xi’an, China, in 2008, no. 6, pp. 145–159, 1998. and the M.S. and Ph.D. degrees both in electrical and computer engineering from [35] W. Constantine and D. Percival, “Fractal: Fractal time series modeling University of California, San Diego, CA, USA, in 2010 and 2015, respectively. and analysis,” R Package Version, 2011. From 2015 to 2018, he was a Postdoctoral Fellow with Harvard University, [36] G. Nason, “A test for second-order stationarity and approximate confi- School of Engineering and Applied Sciences, Cambridge, MA, USA. He is dence intervals for localized autocovariances for locally stationary time currently an Assistant Professor with the Electrical and Computer Engineering series,” J. Roy. Statistical Soc.: Ser. B (Statistical Methodology), vol. 75, Department, University of Utah, Salt Lake City. His research interests include no. 5, pp. 879–904, 2013. statistical signal processing and information theory. [37] M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. Amer. Statistical Assoc., vol. 32, no. 200, pp. 675–701, 1937. [38] M. Friedman–, “A comparison of alternative tests of significance for the problem of m rankings,” Ann. Math. Statist., vol. 11, no. 1, pp. 86–92, 1940. [39] U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary Time Series. New York, NY, USA: AMS Chelsea Publishing, 1957. [40] F. Yates, “The analysis of multiple classifications with unequal numbers in Jie Ding (M’12) received the Ph.D. degree in engineering sciences from Har- the different classes,” J. Amer. Statistic. Assoc., vol. 29, no. 185, pp. 51–66, vard University, Cambridge, MA, USA, in 2017, where he was also a Postdoc- 1934. toral Researcher from January 2017 to December 2018. He was a Postdoctoral [41] A. T. Walden, D. B. Percival, and E. J. McCoy, “Spectrum estimation Researcher with the Rhodes Information Initiative, Duke University, Durham, by wavelet thresholding of multitaper estimators,” IEEE Trans. Signal North Carolina, USA, from January to August 2018. He is currently an Assistant Process., vol. 46, no. 12, pp. 3153–3165, Dec. 1998. Professor with the School of Statistics, University of Minnesota, Minneapolis, [42] M. S. Bartlett and D. Kendall, “The statistical analysis of variance- MN, USA. His research topics include signal processing, statistical inference, heterogeneity and the logarithmic transformation,” Suppl. J. Roy. Sta- time series prediction, and machine learning with various applications. tistical Soc., vol. 8, no. 1, pp. 128–138, 1946. [43] E. S. Pearson, “The analysis of variance in cases of non-normal variation,” Biometrika, vol. 23, pp. 114–133, 1931. [44] G. V. Glass, P. D. Peckham, and J. R. Sanders, “Consequences of failure to meet assumptions underlying the fixed effects analyses of variance and covariance,” Rev. Educational Res., vol. 42, no. 3, pp. 237–288, 1972. [45] G. Nason, “Simulation study comparing two tests of second-order sta- tionarity and confidence intervals for localized autocovariance,” 2016, arXiv:1603.06415. [46] C. Chatfield, The Analysis of Time Series: An Introduction. Boca Raton, Vahid Tarokh (F’09) received the Ph.D. degree in electrical engineering from FL, USA: CRC Press, 2016. the University of Waterloo, Waterloo, ON, Canada, in 1995. He was with AT&T [47] W. Enders, “Arima and cointegration tests of ppp under fixed and flexible Labs-Research and AT&T Wireless Services until August 2000 as Member, exchange rate regimes,” Rev. Econ. Statist., pp. 504–508, 1988. Principal Member of Technical Staff and, finally, as the Head of the Department [48] M. P. Taylor, “An empirical examination of long-run purchasing power of Wireless Communications and Signal Processing. In September 2000, he parity using cointegration techniques,” Appl. Econ., vol. 20, no. 10, joined the Massachusetts Institute of Technology as an Associate Professor of pp. 1369–1381, 1988. Electrical Engineering and Computer Science. In June 2002, he joined Harvard [49] W. J. Crowder, “Testing stationarity of real exchange rates using Johansen University as a Gordon McKay Professor of Electrical Engineering and Ham- tests,” Dept. Econ., Univ. Texas Arlington, Arlington, TX, USA, Working mond Vinton Hayes Senior Research Fellow. He was named Perkins Professor Paper, 2001. of Applied Mathematics in 2005. In January 2018, he joined Duke University, [50] L. Isserlis, “On certain probable errors and correlation coefficients of mul- as the Rhodes Family Professor of Electrical and Computer Engineering, Com- tiple frequency distributions with skew regression,” Biometrika, vol. 11, puter Science, and Mathematics. From January 2018 to May 2018, he was also a no. 3, pp. 185–190, 1916. Gordon Moore Distinguished Scholar in the California Institute of Technology.