arXiv:1803.03995v1 [stat.ME] 11 Mar 2018 osdrkre mohn h ut-ae pcrlestima spectral multi-taper the smoothing kernel consider with estimator estimato second combined the a of propose properties We robustness the appro [14]. with second 6.15], the Ch. while large, [7, can is estimate approach range first spectral The loga the when the . bias to tapered transform the then of and logarithm density spectral the estimate fteseta density, spectral the of ecnie icee ttoay asintm series time Gaussian stationary, discrete, a consider We INTRODUCTION 1 pcrldensity, spectral rnfr fteseta est:Cv[ Cov density: spectral the of transform ∗ h uhr hn h eeesfrueu omns Researc comments. useful for referees the thank authors The nScin2 ecnie udai siae ftespectra the of estimates quadratic consider we 2, Section In mlmnaino aibebnwdhkre mohri given. is smoother kernel bandwidth variable a of implementation oe.Frt utpetpretmt spromd olwdb ker by expec followed the performed, reduces ( procedure is by This ror estimate estimate. taper taper multiple log-multiple the a First, posed. yrdetmtro h o-pcrldniyo ttoaytime stationary a of density log-spectral the of estimator hybrid A π 4 2 dpieSotigo h Log-Spectrum the of Smoothing Adaptive ) 4 S / 5 ( f vrsml mohn h o aee eidga.Adt adaptive data A periodogram. tapered log the smoothing simply over ,wihi one wyfo eo h uooainei the is autocovariance The zero. from away bounded is which ), orn nttt fMteaia Sciences Mathematical of Institute Courant θ ( f ihMlil Tapering Multiple with ) ..Ree n .Sidorenko A. and Riedel K.S. ≡ e ok Y10012-1185 NY York, New ln[ e okUniversity York New S ac 3 2018 13, March ( f x ] sdsrd w omnapoce r:1 to 1) are: approaches common two desired, is )], j Abstract x , k 1 = ] R − 1 2 2 1 S uddb h ..Dprmn fEnergy. of Department U.S. the by funded h ( f ) e { e nScin4 h oaih of logarithm the 4, Section In te. 2 x πi u t aineinflation. variance its out j c nae h aineo the of variance the inflates ach j , ( ih;ad2 osot the smooth to 2) and rithm; j ftelgseta density log-spectral the of r − ∗ esniiet broad-band to sensitive be 1 = k est.I eto ,we 3, Section In density. l ) f e ensur er- square mean ted df N . . . , hntelogarithm the When . e smoothing nel eisi pro- is series } ihasmooth a with Fourier the multi-taper spectral estimate is kernel smoothed to estimate the log-. In Section 5, we consider a data adaptive variable bandwidth implementation of this method. In Section 6, we present our simulation results. Sections 7 and 8 discuss and summarize our results. In the appendix, we describe a new method for selecting the initial halfwidth.

2 STATISTICS OF MULTI-TAPER SPECTRAL ESTIMA- TORS

Every quadratic, modulation-invariant spectral estimator has the form

N 2πi(m−n)f Sˆmt(f)= qmnxmxne , (1) m,nX=1 where Q = [qmn] is a self-adjoint matrix [2, 5]. Decomposing Q into its eigenvector repre- K ν(k) ν(k)† sentation, Q = k=1 µk , (1) can be recast as

P K N 2 ˆ (k) −2πinf Smt(f)= µk νn xne , (2)

k=1 n=1 X X (k) where the ν are the orthonormal eigenvectors of Q and the µk are the eigenvalues. We call (2) the multiple taper representation of the spectral estimate [7, 10, 13]. (This name is often shortened to multi-taper and sometimes referred to as a multiple spectral window estimate.) In practice, quadratic spectral estimators are constructed by specifying the eigenvectors/tapers and the weights. For concreteness, we will usually use the sinusoidal (k) 2 πkm tapers νm = N+1 sin N+1 [12]. For these tapers, the spectral estimate (2) can be recast as q   K 2 Sˆmt(f)=∆ µk|ζ(f + k∆) − ζ(f − k∆)| , (3) kX=1 1 N −2πimf where ∆ = 2N+2 and ζ(f) is the discrete of {x}: ζ(f)= n=1 xme . ˆ K 2 The corresponding smoothed periodogram estimate, Ssp(f)= k=−K |ζ(fP+k∆)| /(2KN+ N), has an appreciably larger bias. The sinusoidal multi-tapPer estimate reduces the bias since the sidelobes of ζ(f + k∆) are partially cancelled by those of ζ(f − k∆). To analyze the multi-taper estimate, we use the local white noise approximation [3], which corresponds to assuming that the combined estimator of θ(f) has its domain of de- ˆ 2 pendence concentrated near frequency f. When µk = 1/K, Smt(f)/S(f) has a χ2K /(2K) 2 distribution to leading order in K/N [14]. Note E ln χ2K /(2K) = ψ(K) − ln(K), 2 ′ ′ Var ln χ2K/(2K) = ψ (K), where ψ is the digamma function and ψ is the trigamma function. The multi-taper estimate of the logarithm of the spectral density is

θˆmt(f) ≡ ln[Sˆmt(f)] − [ψ(K) − ln(K)] . (4)

2 An alternative estimate of ln[S(f)] is to average the logarithms of the individual multi- taper estimates:

1 K |ζ(f + k∆) − ζ(f − k∆)|2 ln[Sˆ (f)] ≡ ln( ) , (5) st K 2(N + 1) kX=1 2 where the subscript “st” denotes single taper. Since the χ2 distribution has its most probable value at zero, the distribution of its logarithm has a very long lower tail. This lower

tail induces bias and increases the variance in the estimate: Bias[ln(Sˆst)] ≃ −0.577, and ′ 2 Var[ln(Sˆst)] = ψ (1)/K = π /(6K). By averaging the K estimates prior to taking the logarithm, we reduce both the bias and the variance. The variance reduction factor if one ¯ averages and then takes logarithms, ln[Sˆ(f)], is Kψ′(K)/ψ′(1). For large K, Kψ′(K) ≃ 1 1+ 2K , so the variance reduction factor (of reversing the order of the operations in (5) ) is asymptotically 6/π2. The local bias of the multi-taper estimate is

′′ K 1 S (f) 2 ′ 2 (k) ′ 2 ′ E[Sˆmt(f) − S(f)] ≃ µk |f | |V (f )| df , (6) 2 − 1 kX=1 Z 2 where the k-th spectral window, V (k), is the Fourier transform of the k-th taper, ν(k): (k) N (k) −2πinf K V (f)= n=1 νn e . Equation (6) neglects the nonlocal bias and assumes k=1 µk = 1. For theP sinusoidal tapers with uniform weighting (µk = 1/K), (6) reduces to P

S′′(f) K k2 K2 Bias[Sˆ (f)] ∼ µ = S′′(f) , (7) mt 8 k N 2 24N 2 kX=1 where the intermediate equality is derived in [12]. Noting that S′′(f)/S(f) = [θ′′(f)+ |θ′(f)|2], the local bias of the estimate (4) for the uniformly weighted sinusoidal tapers is

K2 E [θˆ (f) − θ(f)] ∼ [θ′′(f)+ |θ′(f)|2] . (8) mt 24N 2 3 SMOOTHED MULTI-TAPER ESTIMATE

q We now consider kernel estimators of ∂f S(f) which smooth the multi-taper estimate:

1 ′ q 1 2 f − f ˆ ′ ′ ∂f Sκ(f) ≡ q+1 κ Smt(f )df , (9) h − 1 h Z 2   d q ˆ where the over ∂f Sκ denotes the estimate of the qth derivative. The subscript on Sκ denotes the two-stage estimator constructed by first multi-tapering and then kernel smoothing. Here κ(f)b is a kernel with Lipshitz smoothness of degree 2 with support in [−1, 1], and κ(±1) = 0. The bandwidth parameter is h. We say a kernel is of order (q,p) if f mκ(f)df = R 3 m! δm,q , m = 0,...,p − 1. We denote the pth moment of a kernel of order (q,p) by p Bp ≡ f κ(f)df/p!. For function estimation (q = 0), we use p = 2 and p = 4. To estimate the secondR derivative, we use a kernel of order (2,4). Smoothing the multi-taper estimate replaces the original quadratic estimator in (1) by ˜ ˜ K (k) (k) another quadratic estimator, Q with Qmn =κ ˆm−n k=1 µkνm νn , whereκ ˆm is the Fourier ′ −(q+1) f imf ′ ′ transform of the kernel smoother:κ ˆm ≡ h Pκ( h )e df . By Theorem 5.2 of Riedel & Sidorenko [12], this smoothed multi-taper estimatorR cannot outperform the pure multi- taper method with minimum bias tapers.

Theorem 3.1 Let S(f) be twice continuously differentiable with 0 < Smin ≤ S(f) ≤ Smax < ∞. Consider the kernel smoothed multi-tapered spectral estimate (9) with K tapers. Let the kernel, κ(f), be of order (q,p) and have Lipshitz smoothness of degree 2. Let the envelope of the spectral windows, V (k)(f), decay as (Nf)−1 or faster for f > K/N and as- (k) (k) Km sume that νn+m≃ νn [1 + O( N )]. Consider the limit that N →∞, h → 0 and K →∞, such that K/(Nh) → 0. The kernel smoothed multi-tapered estimate (9) has asymptotic variance: 2 2 K N q kκk S(f) (k) 2 (k′) 2 K 4/5 h 2 Var ′ ∂f Sκ(f) ≃ 2q+1 µkµk |νn | |νn | + OR ( ) + ( ) , h ′ ! Nh K h i k,kX=1 nX=1   d (10) 2 1 2 where kκk ≡ −1 κ(f) df. R We use the notation OR(·) to denote a size of O(·) relative to the main term. The K condition, N /h → 0, implies that the smoothing from multi-tapering is much less than the (k) (k) Km smoothing from kernel averaging. The condition that νn+m≃ νn [1 + O( N )] is fulfilled when the k-th taper has a scale length of variation of N/k. The sinusoidal tapers satisfy this condition as do the Slepian tapers when their bandwidth parameter, W , is chosen as K/N. Proof: We separate the variance into a broad-banded contribution ≈ 1/(N|f − f ′|)2 for |f − f ′| >> K/Nh and a local contribution ≈ |f − f ′|2. The broad-band contribution is hN 2 ′′ 2 K 2 OR(( KN ) ). The local contribution differs from a locally white process by OR(S (f) ( 2Nh ) ). We now consider the local contribution in the locally white noise approximation [3]. Using the Gaussian fourth moment identity and resumming yields

K N N−n ′ q (k) (k ) ′ 2 ˜ ˜ 2 ′ 2 (k) (k ) Var ∂f Sκ(f) ≃ S(f) tr[QQ] = S(f) µkµk κˆmνn+mνn νn+mνn . ′ h i k,kX=1 nX=1 m=1X−n d (11) 2 Our kernel, κ(·) is Lipshitz of degree 2, and thereforeκ ˆm ∼ O(kκˆk/(mh) ) for mh ≫ 1. (k) Expanding νn+m in mK/N and truncating in m yields K N N ′ q 2 ′ (k) 2 (k ) 2 2 Var ∂f Sκ(f) ∼ S(f) µkµk |νn | |νn | κˆm ′ ! ! h i k,kX=1 nX=1 mX=1 d 4 2 K N kκk ′ 2 ′ (k) 2 (k ) 2 = S(f) 2q+1 µkµk |νn | |νn | . (12) h ′ ! k,kX=1 nX=1 4 (k) The first line is valid to O(1/(mh) )+ O(Km/N), so we Taylor expand νn+m for |mh| < O((Nh/K)1/5) and drop all terms with |mh| > O((Nh/K)1/4). The resulting expression is 4/5 accurate to OR((K/Nh) ). The final line follows from Parseval’s identity. ✷ For K = 1, Eq. (12) reduces to the well known result [15] for the variance of smoothed tapered periodogram:

1 f − f ′ S(f)2kκk2 N Var κ( )|ζ (f ′)|2df ′ ∼ ν4 , (13) hq+1 h ν h2q+1 n  Z  nX=1 N 4 where ζν(f) is the tapered Fourier transform. In (13), n=1 νn is O(1/N). For the sinusoidal tapers, (12) can be explicitly evaluated: P kκk2S(f)2 1 h K Var ∂q S (f) ∼ 1+ + O ( )2 + O ( )4/5 , (14) f κ Nh(q+1) 2K R K R Nh h i       d where we have used

K N K N ′ 1 (k) 2 (k′) 2 4 πkn 2 πk n 2 2K + 1 2 |νn | |νn | = 2 2 sin( ) sin( ) = . K ′ K (N + 1) ′ N + 1 N + 1 2K(N + 1) k,kX=1 nX=1 k,kX=1 nX=1 (15)

4 SMOOTHED LOG MULTI-TAPER ESTIMATE

We now show that combining kernel smoothing with multi-tapering does improve the esti- mation of the logarithm of the spectral density, θ(f) = ln[S(f)]. Let

1 ′ q 1 2 f − f ˆ ′ ′ ∂f θ (f) ≡ q+1 κ θmt(f )df . (16) κ h − 1 h Z 2   d For h ≪ 1, and Nh ≫ 1, we expand (16) in the bandwidth

K2 Bias[∂q θ (f)] ≃ B ∂pθ(f)hp−q + ∂q [θ′′(f)+ |θ′(f)|2] . (17) f κ p f f 24N 2 The first term is thed bias from kernel smoothing and the second term is from the sinu- soidal multi-taper estimate (8). Traditionally, the “delta approximation”, Var[f(X)] = f ′(E[X])2Var[X], is used to evaluate the variance of the smoothed log-periodogram. For the delta approximation to be valid, the characteristic scale of variation of f(·) must be large relative to Var[X], where f is continuously differentiable. This requirement is not fulfilled for thep log-periodogram, and the resulting analysis makes an order one er- ror in single taper estimation. For the multi-taper estimation, the expansion parameter

5 for the delta approximation is 1/K. To leading order in the 1/K expansion, the vari- 2 ance inflation factor from the long tail of the ln[χ2K ] distribution is not visible. Recall ′ ′ ′ 2 ′ that Var[θˆmt(f )] ≈ [Kψ (K)] × Var[Sˆmt(f )]/S(f) for |f − f | << 1. We believe that adding a Kψ′(K) correction improves the accuracy of the delta approximation for f ′′ 6= f ′. Therefore, we evaluate the variance of the smoothed log multi-taper estimate by using the approximate identity: [Kψ′(K)] Cov[θˆ (f ′), θˆ (f ′′)] ≈ × Cov[Sˆ (f ′), Sˆ (f ′′)] , (18) mt mt S(f)2 mt mt for |f ′ − f| << 1 and |f ′′ − f| << 1. Using (18), the variance of θˆ(f) is

′ 1 1 ′ ′′ Kψ (K) 2 2 f − f f − f Var q Cov ˆ ′ ˆ ′′ ′ ′′ [∂f θ (f)] ∼ 2 2(q+1) κ κ [Smt(f ), Smt(f )]df df . κ S(f) h − 1 − 1 h h Z 2 Z 2     d (19) q q Thus, the variance of ∂f θ(f) reduces to the same calculation as the variance of ∂f S(f):

d (K + 1 )ψ′(K)kκ2k 1 K d Var ∂qθ (f) ∼ 2 + O + O ( )4/5 , (20) f κ Nh2q+1 R K R Nh h i     for the uniformlyd weighted sinusoidal tapers. (See the calculation in Theorem 3.1.) Com- bining (17) with (19) yields the expected asymptotic square error (EASE) in ∂q θ : f κ Theorem 4.1 Let S(f) have p continuous derivatives. Consider the two-staged estimate (16) using the uniformly weighted sinusoidal tapers in the first-stage. Under the hypotheses of Theorem 3.1 and the formal approximation (18), the expected asymptotic square error of ∂q θ is f κ 2 d 2 2 q q p p−q q ′′ ′ 2 K E ∂f θ (f) − ∂f θ(f) ≈ Bp∂f θ(f)h + ∂f [θ (f)+ |θ (f)| ] κ " 24N 2 #   1 ′ 2 (K d+ )ψ (K)kκk 1 K + 2 + O h2(p−q)+1 + O + O ( )4/5 . (21) Nh2q+1 R R K R Nh       The benefit of multi-tapering (in terms of the variance reduction) is significant for using a 2 to 20 tapers. However, the marginal benefit of each additional taper tends rapidly to zero. Minimizing (21) with respect to h and K yields the following result:

Corollary 4.2 Under the hypotheses of Theorem 4.1, the expected asymptotic square error (EASE) of ∂qθ is minimized by f κ

1 d 1 ′ 2 2p+1 2q + 1 (K + 2 )ψ (K)kκk ho(f)= 2 p 2 , (22) "2(p − q) BpN|∂f θ(f)| # and p q ′′ ′ 2 3 2 −(p+q+1) Bp[∂f θ(f)]{∂f [θ (f)+ |θ (f)| ]} Kopt ≃ 6kκk Nho . (23)

6 −1/(2p+1) (3p+q+2)/(6p+3) Thus hopt ∼ N and Kopt ∼ N . For kernels of order (0, 2), this −1/5 8/15 reduces to hopt ∼ N and Kopt ∼ N . Thus the ordering 1 ≪ K ≪ Nh is justified. The EASE (21) depends only weakly on K for 1 ≪ K ≪ Nh while the dependence on the choice of bandwidth is strong. When the bandwidth, ho, satisfies (22), the leading order EASE reduces to

2(p−q) 2 2(2q+1) (K + 1 )ψ′(K)kκk2 (2p+1) q q p (2p+1) 2 E ∂f θ(fj) − ∂f θ(fj) ≃ Mq,p|Bp∂f θ(fj)| , (24) N !   d 2(p −q) (2q+1) 2q+1 (2p+1) 2(p−q) (2p+1) q where Mq,p ≡ ( 2(p−q) ) + ( 2q+1 ) . Thus the EASE in estimating ∂f θ is propor- −2(p−q) tional to N (2p+1) . We note that if K = 1 (a single taper), the variance term in (21) is π2 N 4 inflated by a factor of 6 n=1 νn. Thus using a moderate level of multi-tapering prior to π2 N 4 4/5 π2 4/5 smoothing the logarithm reducesP the EASE by a factor of [ 6 n=1 νn] = [ 4 ] , where N 4 we substitute n=1 νn = 1.5 for the sinusoidal tapers. P From (24),P using the best fixed halfwidth kernel smoother degrades performance by a factor of 1 1/5 1 EASE(h ) 2 2 global = |θ′′(f)|2df |θ′′(f)|2/5df (25) EASE(hvariable) − 1 − 1 "Z 2 # ,Z 2 over using an optimal variable halfwidth smoother [4]. In many cases, the spectral range is large, and thus it is often essential to allow the bandwidth to vary locally as a function of frequency. Equation (22) gives an explicit solution for the bandwidth which minimizes the local bias versus variance trade-off. It shows that when θ(f) is rapidly varying (|θ′′(f)| is large), then the kernel bandwidth should be decreased. However, (22) has two major difficulties. First, (21)-(24) are based on a Taylor series expansion and the expansion parameter is 1/(2p+1) 1/(2p+1) ho ∼ 1/N . Even when 1/N is small, (1/N) may be not so small. Second, ′′ θ (f) and ho(f) are unknown and need to be estimated.

5 DATA-ADAPTIVE ESTIMATE

In practice, θ′′(f) is unknown and we use a data-adaptive multiple stage kernel estima- tor where a pilot estimate of the optimal bandwidth is made prior to estimating θ(f). To simplify the implementation, we choose K independent of frequency, and usually set K ≈ cN 8/15, where c is a constant. For nonparametric function estimation, data adaptive multiple stage schemes are given in [1, 4, 11]. A straightforward application of these schemes to multi-taper spectral estimation has the following steps: 0) Evaluate the multi-taper estimate of (3) on a grid of size 2N +2. If the computational effort is not important, set K = N 8/15; otherwise choose K according to your computational budget.

7 1a) Kernel smooth θˆmt(f) with a kernel of order (0,4) for a number of different band- widths, hℓ, and evaluate the average square residual (ASR) as a function of hℓ:

N 2 ASR(hℓ)= |θˆst(fn) − θˆκ(fn|hℓ)| , (26) nX=1

where θˆκ(fn|hℓ) is the kernel estimate of θ(f) using bandwidth hℓ applied to θˆmt(f), while 2 θˆst(fn) is the single taper estimate: θˆst(f) = ln[|ζ(f + ∆) − ζ(f − ∆)| /2(N + 1)] + .577. 1b) Estimate the optimal (0,4) global halfwidth using a goodness of fit method. Relate this to the optimal (2,4) using the halfwidth quotient relation. (See below.) ′′ 2) Estimate θ (f) by smoothing the multi-taper estimate with global halfwidth h2,4. 3) Estimate θ(f) by substituting θˆ′′(f) into the optimal halfwidth expression corre- sponding to the minimum of (14). For Step 1b), M¨uller and Stadtm¨uller propose to determine the starting halfwidth by minimizing the Rice criterion. In [11], we describe a different method for selecting the initial bandwidth in step 1b). Our method is based on fitting the average square residual of (26) to a parametric expression based on (21). This parametric fit usually outperforms the Rice criterion because it uses an asymptotically valid expression.

In (26), the ASR is computed relative to the single taper estimate, θˆst(fn), instead of the multi-taper estimate, θˆmt(fn). We do this because the multi-taper estimate is strongly ′ ′ autocorrelated for frequencies, f and f with |f − f |≤ K/2N. To correct for using θˆst(f) in step 1 and θˆmt(f) in steps 2 and 3 , we inflate the variance in the (0,4) kernel estimate. The halfwidth quotient relation relates the optimal halfwidth for derivative estimates, hˆ2,4 to the optimal halfwidth for a (0, 4) kernel using (22):

1 1 10B2 kκ k2 9 2 (1) 4 9 ˆ ˆ 0,4 2,4 π N n |νn | h2,4 = H(κ2,4, κ0,4)h0,4, where H(κ2,4, κ0,4) ≡ 2 2 . B2,4kκ0,4k ! P6 ! (27) The last term in parentheses is the variance inflation factor from using a single taper. To minimize the effects of tapering-induced autocorrelation, we recommend using a Tukey

taper for θˆst. ′′ When θˆ (f) is vanishingly small, the optimal halfwidth becomes large. Thus, hˆ0,2 needs to be regularized. Following [11], we determine the size of the regularization from hˆ0,4 in the previous stage. We say a “plug-in” scheme has a relative convergence rate of N −α if

ˆ ˆ 2 2 −2α ˆ 2 E |θ(f|h0,2) − θ(f)| ≃ 1+ O(Cr N ) E |θ(f|h0,2) − θ(f)| , h i   h i where h0,2 is the optimal halfwidth and hˆ0,2 is the estimated halfwidth. In [1], a detailed analysis of the convergence properties of their similar scheme is given. Their scheme has an optimal convergence rate of N −4/5 and a relative convergence rate of N −1/4. Our simpler

8 method has the same convergence rate of N −4/5 and a slightly slower relative convergence rate: N −2/9.

6 COMPARISON OF KERNEL SMOOTHER ESTIMATES

We now compare three different kernel smoother estimates of the log-spectrum: 1) Kernel smoothing the log-multitaper estimate, θˆmt as in (16); 2) Kernel smoothing the log-single taper estimate, ln[Sˆst]; 3) The logarithm of the kernel smoothed multi-taper spectral esti- mate, ln[Sˆκ] as in (21). In all cases, we u