Spectral Estimation Using Multitaper Whittle Methods with a Lasso Penalty Shuhan Tang, Peter F
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , MONTH YEAR 1 Spectral Estimation Using Multitaper Whittle Methods with a Lasso Penalty Shuhan Tang, Peter F. Craigmile, and Yunzhang Zhu Last updated July 25, 2019 Abstract—Spectral estimation provides key insights into the can compromise estimation (see [1], ch.9, and references frequency domain characteristics of a time series. Naive non- therein). parametric estimates of the spectral density, such as the peri- A popular alternative approach is to consider a semipara- odogram, are inconsistent, and the more advanced lag window or multitaper estimators are often still too noisy. We propose metric model for the SDF, in which the log SDF is expressed an L1 penalized quasi-likelihood Whittle framework based on in terms of a truncated basis expansion, where the number multitaper spectral estimates which performs semiparametric of basis functions are allowed to increase with the sample spectral estimation for regularly sampled univariate stationary size. The statistical problem then becomes how to enforce time series. Our new approach circumvents the problematic sparsity by selecting the basis functions and estimating the Gaussianity assumption required by least square approaches and achieves sparsity for a wide variety of basis functions. We model parameters so that we adequately estimate the SDF, present an alternating direction method of multipliers (ADMM) but also have computational efficiency as the sample size algorithm to efficiently solve the optimization problem, and is increased. Gao [3] [4], Moulin [5] and Walden et al. [6] develop universal threshold and generalized information criterion enforce sparsity using a penalized least square (LS) approach (GIC) strategies for efficient tuning parameter selection that for estimating the log SDF with wavelet soft thresholding. outperform cross-validation methods. Theoretically, a fast conver- gence rate for the proposed spectral estimator is established. We In terms of computational complexity, wavelet thresholding demonstrate the utility of our methodology on simulated series methods are typically O(N), for a time series of N regularly and to the spectral analysis of electroencephalogram (EEG) data. sampled values. A number of approaches enforce smoothness of the SDF via Index Terms—Alternating direction method of multipliers an L2 penalty: Cogburn and Davis [7], Wahba and Wold [8] (ADMM) algorithm; basis expansion; multitaper spectral esti- and Wahba [9] use penalized LS, and Pawitan and O’Sullivan mates; wavelets. [10] uses a penalized Whittle method. To enforce sparsity, some of these L2 methods of smoothing splines also use I. INTRODUCTION model selection, often in combination with cross-validation, STIMATING the spectral density function (SDF) or to select the basis functions that are used to model the SDF. E spectrum of a series collected over time is an important Alternatively, one can implement methods such as [11] to tool in time series analysis and signal processing. It is used enforce sparsity on the basis expansion directly. (On a related in many fields such as astronomy, cognitive science, earth topic, smoothness of spectral estimation can also be tuned sciences, electrical engineering, and finance. Examining the with high-resolution approaches introduced in [12], [13] and SDF allows us to explore periodicities in the data (e.g., [1, ch. using extended frameworks based on so-called beta and tau 10]), provides an alternative way to analyze and estimate the divergence families, such as [14]–[18]; see [19] for a general covariance structure of stationary time series (e.g., [1, ch. 4]), review of such divergences.) and can also be used to understand the effect of preprocessing Our method is also motivated by the need to enforce sparsity a time series (e.g., [2]). while adequately estimating the SDF. In addition, we seek There are many nonparametric estimators of the SDF of computational efficiency as we increase the sample size. We a univariate stationary time series. These include the peri- develop a quasi-likelihood method for estimating SDFs using odogram, direct spectral estimators, lag window and overlap- a Whittle likelihood [20] based on MT spectral estimates. A ping segment averaging spectral estimators, and multitaper quasi-likelihood function [21] [22, ch. 9] has similar statistical (MT) spectral estimators. (See [1] for a complete review.) properties to that of the log likelihood, and can be used for While many of these estimators are developed to provide statistical inference, but does not have to match exactly to an adequate tradeoff between bias and variance, often these the log of the joint probability density function of the data. nonparametric estimates are still too noisy when a stable MT estimates [23] [1] provide a good compromise between estimate of the SDF is required. An alternative strategy is to bias and variance and can yield more efficient estimates of use a parametric approach, however model misspecification the SDF [6]. We demonstrate that the addition of a Whittle caused by considering a limited class of models for the SDF, likelihood method [20] improves estimation over traditional LS approaches. S. Tang, P. F. Craigmile, Y. Zhu are with the Department of Statis- We use a lasso penalty [24] to enforce sparsity, deriving two tics, The Ohio State University, Columbus, OH, 43210 USA (e-mail: [email protected]). strategies to optimally select the tuning parameter that is key to Manuscript received December 12, 2018; revised June 27, 2019. obtaining estimates of the SDF with low integrated root mean IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. , NO. , MONTH YEAR 2 squared error (IRMSE): “universal threshold” and generalized There are a variety of options for families of basis functions information criterion (GIC)-based methods. Neither method φ(f) that can be used for spectral estimation. For example, compromises on computational or statistical efficiency by polynomial and Fourier bases can be used to capture global requiring the use of cross-validation to select the tuning param- patterns [26], smoothing splines allow for local and smooth eter. Theoretically, we derive the rate of convergence for our patterns in the SDF [7], [8], and wavelet bases model local proposed spectral estimator under some technical conditions behaviour, while capturing second order effects such as peaks, on the model sparsity and the MT spectral estimator. troughs, and cusps [3]–[6]. When the SDF is spatially inho- We introduce a computationally efficient method to estimate mogeneous, spatially adaptive bases, such as wavelets, have the parameters in our model using the alternating direction theoretical optimality properties (see, e.g., [27]–[29]). We can method of multipliers (ADMM) algorithm. To reach an - also combine families of basis functions to form dictionaries. optimal solution with a time series of length N, our method In Section IV we propose our estimator of the coefficient is O(N−1) when using wavelet bases and O(N 3 + −1N 2) vector β, which is based on multitaper spectral estimates of for general bases. Although computationally more challenging, time series data that we define in the next section. (In Sec- our method can be applied to SDF estimation using any collec- tion VII we provide assumptions on the basis representation tion of basis functions and, as mentioned above, outperforms (3) so that we can asymptotically recover the true log SDF.) LS-based methods such as wavelet thresholding in terms of estimation quality. III. MULTITAPER SPECTRAL ESTIMATION The rest of the paper is organized as follows. Section II T Suppose we observe N observations, X = (X1;:::;XN ) , presents models for SDFs in terms of basis methods. In Sec- from the stationary process fXtg. Then a multitaper (MT) tion III, we introduce the multitaper spectral estimator that we or multiple taper spectral estimate [23], is an average of a use in our penalized Whittle estimation method in Section IV. number of tapered spectral estimates. Specifically, let fhk;t : The ADMM algorithm is described in Section V and Section k = 1; : : : ; K; t = 1;:::;Ng denote K orthonormal data VI outlines two approaches for tuning parameter selection. We P 2 P 0 tapers; i.e., hk;t = 1 and hk;thk0;t = 0 for k 6= k . derive the rate of convergence for the proposed L penalized t t 1 Then the standard MT spectral estimator of the SDF Sb(mt)(f), MT-Whittle likelihood estimator in Section VII. Our methods is the average of the K eigenspectra, are evaluated using Monte Carlo simulations in Section VIII K and we perform a spectral analysis of electroencephalogram 1 X (mt) Sb(mt)(f) = Sb (f); (4) (EEG) data in Section IX. We close with some remarks in K k Section X. Proofs and further details of the ADMM algorithm k=1 are provided in the Appendix. where the kth (k = 1;:::;K) eigenspectrum is defined by (mt) 2 Sbk (f) = jJk(f)j ; with II. BASIS MODELS FOR SDFS N X Let fXt : t 2 Zg be a univariate real-valued stationary Jk(f) = hk;tXt exp(−i2πft): process collected at sampling interval ∆ > 0. Without loss of t=1 generality assume ∆ = 1. Let γ(h) = cov(Xt;Xt+h), h 2 Z, Different tapers induce different statistical properties for the denote the (stationary) autocovariance function (ACVF) of MT estimator. Discrete prolate spheroidal sequences (DPSS) fXtg and assume that the ACVF is absolutely summable: and sine tapers are most commonly used [1]. DPSS tapers are P1 h=−∞ jγ(h)j < 1: Then the spectral density function designed to reduce the sidelobes in the spectral estimate. They (SDF) S(f) for a frequency jfj < 1=2 exists and is defined solve the time-frequency concentration problem in which we as the Fourier transform pair of the ACVF: find the time limited sequence which has most of its energy 1 concentrated in a specified frequency band [1, ch. 8]. We use X S(f) = γ(h)e−i2πfh; (1) the easily calculated sine tapers [30], h=−∞ 1=2 Z 1=2 2 (k + 1)πt i2πfh h = sin ; with γ(h) = e S(f)df: (2) k;t N + 1 N + 1 −1=2 k = 1; : : : ; K; t = 1; : : : ; N; The SDF is a non-negative, even, and real-valued function and decomposes the variance of the time series fXtg: from which are designed to reduce the smoothing bias, at a com- R 1=2 promise to sidelobe reduction.