1

Dynamic Bayesian Multitaper Spectral Analysis Proloy Das and Behtash Babadi

Abstract—Spectral analysis using overlapping sliding windows popular approach to estimating such time-varying spectra is is among the most widely used techniques in analyzing non- to subdivide the data into overlapping windows or segments stationary time series. Although sliding window analysis is and estimate the spectrum locally for each window using convenient to implement, the resulting estimates are sensitive to the window length and overlap size. In addition, it undermines various Fourier or wavelet-based methods [17], assuming the dynamics of the time series as the estimate associated to each the underlying process is quasi-stationary, i.e., the spectrum window uses only the data within. Finally, the overlap between changes slowly with time. Thereby, the so-called spectrogram consecutive windows hinders a precise statistical assessment. In analysis is obtained by using sliding windows with overlap in this paper, we address these shortcomings by explicitly modeling order to capture non-stationarity. the spectral dynamics through integrating the multitaper method with state-space models in a Bayesian estimation framework. Although sliding window processing is widely used due The underlying states pertaining to the eigen-spectral quantities to its fast implementation, it has several major drawbacks. arising in multitaper analysis are estimated using instances First, the window length and extent of overlap are subjective of the Expectation-Maximization algorithm, and are used to choices and can drastically change the overall attribute of the construct spectrograms and their respective confidence intervals. spectrogram if chosen poorly. Second, given that the estimate We propose two spectral estimators that are robust to noise and are able to capture spectral dynamics at high spectrotemporal associated to a given window is obtained by only the data resolution. We provide theoretical analysis of the bias-variance within, it ignores the common dynamic trends shared across trade-off, which establishes performance gains over the standard multiple windows, and thereby fails to fully capture the degree overlapping multitaper method. We apply our algorithms to of smoothness inherent in the signal. Instead, the smoothness synthetic data as well as real data from human EEG and electric of the estimates is enforced by the amount of overlap between network frequency recordings, the results of which validate our theoretical analysis. adjacent windows. Third, although techniques such as the MT analysis are able to mitigate the variabilities arising from finite Index Terms—Spectrogram analysis, Non-stationary spectral data duration or the so-called ‘sampling’ noise by averaging analysis, Multitaper analysis, State-space models, Bayesian fil- tering over multiple tapers, their spectral resolution degrades when applied to data within small windows due to the increase in the Rayleigh resolution [18]. In addition, they do not have a mech- I.INTRODUCTION anism in place to suppress the additive measurement noise that PECTRAL analysis techniques are among the most im- commonly contaminates empirical observations. Fourth, the S portant tools for extracting information from time series overlap between adjacent windows hinders a precise statistical data recorded from naturally occurring processes. Examples assessment of the estimates, such as constructing confidence include speech [2], images [3], electroencephalography (EEG) intervals due to the high dependence of estimates across [4], oceanography [5], climatic time series [6] and seismic windows. To address this issue, statistical corrections for data [7]. Due to the exploratory nature of most of these appli- multiple comparisons need to be employed [19], which in cations, non-parametric techniques based on Fourier methods turn limit the resulting test powers when multiple windows and wavelets are among the most widely used. In particular, are involved. the multitaper (MT) method excels among the available non- In recent years, several alternative approaches to non- parametric techniques due to both its simplicity and control stationary spectral analysis have been proposed, such as the arXiv:1706.01563v2 [cs.IT] 15 Dec 2017 over the bias-variance trade-off via bandwidth adjustment [8]– empirical mode decomposition (EMD) [20], synchrosqueezed [10]. wavelet transform [21], [22], time-frequency reassignment Most existing spectral analysis techniques assume that the [23], time-frequency ARMA models [24], and spectrotemporal time series is stationary. In many applications of interest, pursuit [25]. These techniques aim at decomposing the data however, the energy of the various oscillatory components in into a small number of smooth oscillatory components in the data exhibits dynamic behavior. Extensions of stationary order to produce spectral representations that are smooth in time series analysis to these non-stationary processes have led time but sparse or structured in frequency. Although they to ‘time-varying’ spectral descriptions such as the Wigner- produce spectral estimates that are highly localized in the Ville distribution [11], [12], the evolutionary spectra and its time-frequency plane, they require certain assumptions on generalizations [13], [14], and the time-frequency operator the data to hold. For example, EMD analysis assumes the symbol formulation [15] (See [16] for a detailed review). A signal to be deterministic and does not take into account The authors are with the Department of Electrical and Computer Engineer- the effect of observation noise [20]. Other methods assume ing, University of Maryland, College Park, MD 20742. that the underlying spectrotemporal components pertain to This work has been presented in part at the IEEE in certain structures such as amplitude-modulated narrowband Medicine and Biology Symposium, 2017 [1]. This material is based upon work supported by the National Science mixtures [21], [22], sparsity [25] or chirp-like dynamics [23]. Foundation under Grant No. 1552946. In addition, they lack a statistical characterization of the esti- 2 mates. Finally, although these sophisticated methods provide where dz(f) is the generalized of the spectrotemporal resolution improvements, they do not yield process. This process has a covariance function of the form: implementations as simple as those of the sliding window- 1 1 Z 2 Z 2 based spectral estimators. Γ (t , t ):=E[y y∗ ]= ei2π(t1f1−t2f2)γ (f , f )df df L 1 2 t1 t2 L 1 2 1 2, (2) − 1 − 1 In this paper, we address the above-mentioned shortcomings 2 2 ∗ of sliding window multitaper estimators by resorting to state- where γL(f1, f2):=E[dz(f1)dz (f2)] is referred to as the gen- space modeling. State-space models provide a flexible and eralized or the Loeve` spectrum [32]. Due to natural framework for analyzing systems that evolve with time the difficulty in extracting physically-plausible spectrotempo- [26]–[29], and have been previously used for parametric [24], ral information from the two-dimensional function γL(f1, f2), [30] and non-parametric [25] spectral estimation. The novelty other forms of spectrotemporal characterization that are two- of our approach is in the integration of techniques from MT dimensional functions over time and frequency have gained analysis and state-space modeling in a Bayesian estimation popularity [17]. To this end, by defining the coordinate ro- framework. To this end, we construct state-space models tations t := (t1 + t2)/2, τ := t1 − t2, f := (f1 + f2)/2, in which the underlying states pertain to the eigen-spectral and g := f1 − f2 and by substituting in the definition of the quantities, such as the empirical eigen-coefficients and eigen- covariance function in (2), we obtain: spectra arising in MT analysis. We employ state dynamics 1 1 Z 2 Z 2 that capture the evolution of these quantities, coupled with i2π(tg+τf) Γ(τ, t):=ΓL(t1, t2)= e γ(g, f)dfdg, 1 1 observation models that reflect the effect of measurement − 2 − 2 and sampling noise. We then utilize Expectation-Maximization where f and g are referred to as the ordinary and non- (EM) to find the maximum a posteriori (MAP) estimate of stationary frequencies, respectively [32], and γ(g, f)dfdg := the states given the observed data to construct our spectral γ (f , f )df df is the Loeve` spectrum in the rotated coor- estimators as well as statistical confidence intervals. L 1 2 1 2 dinates. To obtain one such two-dimensional spectral density We provide theoretical analysis of the bias-variance trade- representation over time and frequency, we define: off, which reveals two major features of our proposed frame- 1 1 Z 2 Z 2 work: 1) our methodology inherits the control mechanism of i2πtg −i2πτf ∗ D(t, f):= e γ(g, f)dg = e E[yt+ τ y τ ]dτ, (3) 2 t− 2 the bias-variance trade-off from the MT framework by means − 1 − 1 of changing the design bandwidth parameters [10], and 2) our 2 2 algorithms enjoy the optimal data combining and denoising which coincides with the expected value of the Wigner-Ville features of Bayesian filtering and smoothing. In addition, due distribution [11]. The ‘time-varying’ spectral representation to the simplicity and wide usage of Bayesian filtering and D(t, f) captures the spectral information of the data as a smoothing algorithms, our algorithms are nearly as simple function of time, and thus provides a useful framework for ana- to implement as the sliding window-based spectrograms. To lyzing non-stationary time series. However, estimating D(t, f) further demonstrate the performance of our algorithms, we from finite samples of the process is challenging, considering apply them to synthetic as well as real data including human that the expectation needs to be replaced by time averages EEG recordings during sleep and electric network frequency which may smooth out the time-varying features of the signal data from audio recordings. Application of our proposed [12]. estimators to these data provides spectrotemporal features that In order to address this challenge, certain additional as- are significantly denoised, are smooth in time, and enjoy sumptions need to be imposed on the underlying process, high spectral resolution, thereby corroborating our theoretical which as a matter of fact restrict the extent of temporal or results. spectral variations the signal can exhibit. In this regard, two The rest of the paper is organized as follows: In Section such popular assumptions are posed in terms of the quasi- II, we present the preliminaries and problem formulation. In stationary and underspread properties. In order to define Section III, we develop our proposed estimators. Application these properties quantitatively, let us first define the Expected of our estimators to synthetic and real data are given in Section Ambiguity Function (EAF) [14], [16] as: 1 1 IV, followed by our theoretical analysis in Section V. Finally, Z 2 Z 2 i2πτf −i2πtg ∗ ∆(τ, g):= e γ(g, f)df = e E[y τ y τ ]dt (4) our concluding remarks are presented in Section VI. t+ 2 t− 1 1 2 − 2 − 2 A signal whose EAF has a limited spread along g (i.e., II.PRELIMINARIESAND PROBLEM FORMULATION negligible spectral correlation) is called quasi-stationary. If A. Non-stationary Processes and Time-Varying Spectrum the signal is concentrated around the origin with respect to Consider a finite realization of T samples from a discrete- both τ and g (i.e., small spectral and temporal correlation), it is called underspread. There exists a large body of work time non- yt, t = 1, 2, ··· T , obtained via sampling a continuous-time signal above Nyquist rate. We on estimating time-varying spectra under the quasi-stationarity assumption, such as short-time , pseudo-Wigner assume that the non-stationary process yt is harmonizable so that it admits a Cramer´ representation [31] of the form: estimators [12], and estimates of the evolutionary spectra [13]. More recent methods such as the Generalized Evolution- 1 Z 2 i2πft ary Spectra (GES) estimators [14] and time-frequency auto- yt = e dz(f), (1) 1 regressive moving-average (TFARMA) [24] estimators rely on − 2 3

W −1 the underspread property of the underlying signals. We refer X (k) U (k)(f) := u e−i2πfl. the reader to [33, Chapter 10] for a detailed discussion of these l properties. l=0 It is noteworthy that both assumptions are fairly general, Choosing K 6 b2WBc − 1 dpss having eigenvalues close to encompass a broad range of naturally-occurring processes, and 1 as data tapers, the MT spectral estimate can be calculated have resulted in successful applications in real life problems. as follows: K We will next discuss one of the widely used methods for 1 X Sb(mt)(f) := |x(k)(f)|2, (5) estimating time-varying spectra under the quasi-stationary as- K b sumption, which extends MT spectral analysis beyond second- k=1 order stationary processes and has gained popularity in ex- (k) PW −1 −i2πfl (k) where xb (f) := l=0 e ul yl for k = 1, 2, ··· ,K ploratory studies of naturally occurring processes [32]. are called the ‘eigen-coefficients’. The ‘eigen-spectra’, (k) (k) 2 Sb (f) := |xb (f)| can be viewed as the expansion co- B. The Sliding Window MT Spectral Analysis efficients of the decomposition. One of the popular non-parametric techniques for esti- To estimate time-varying spectra under the MT framework, mating the ‘time-varying’ spectral representation D(t, f) is sliding windows with overlap are used to enforce temporal achieved by subdividing the data into overlapping windows smoothness and increase robustness of the estimates [32], or segments and estimating the spectrum for each window [36], resulting in MT spectrogram estimates. Although this independently using the MT method [32]. This method natu- ‘overlapping’ MT procedure overcomes frequency leakage rally and intuitively extends the popular non-parametric MT issues and produces consistent estimates, subjective choices method to the non-stationary scenario under the assumption of the window length and the degree of overlap can change of quasi-stationarity, which enables one to treat the time the overall appearance of the spectrogram drastically when series within segments (locally) as approximately second-order poor choices of these parameters are used. In addition, these stationary [34]. The resulting spectrotemporal representations estimates lack precise statistical inference procedures, such as are smoothed version of the Wigner-Ville distribution [16] and hypothesis testing, due to the statistical dependence induced by are referred to as spectrogram. In what follows, we briefly the overlaps. The objective of this work is to overcome these describe the MT spectrogram method, since it preludes the limitations by directly modeling and estimating the evolution rest of our treatment. of the process without committing to overlapping sliding The MT method is an extension of single-taper spectral windows, while achieving fast and efficient implementations. analysis, where the data is element-wise multiplied by a taper Before presenting our proposed solutions, we give a brief prior to forming the spectral representation to mitigate spectral overview of other existing approaches in the literature in order leakage [8], [9]. In the MT method, spectral representation is to put our contributions in context. computed as the average of several such single-taper PSDs, C. Motivation and Connection to Existing Literature where the tapers are orthogonal to each other and exhibits good Our goal is to overcome the foregoing challenges faced by leakage properties. This can be achieved by using the discrete the sliding window MT spectrogram analysis in estimating the prolate spheroidal sequences (dpss) or Slepian sequences [35], expected value of the Wigner-Ville distribution (See Eq. (3)) due to their orthogonality and optimal leakage properties. under the quasi-stationarity assumption. In addition, our work Another viewpoint of the MT method with this particular can be viewed in the context of the spectrogram approximation choice of data tapers is the decomposition of the spectral to the evolutionary spectra [13]. The evolutionary spectra is representation of the process over a set of orthogonal basis obtained by considering an expansion of y over the set of functions. Indeed, these basis functions originate from an t complex sinusoids as approximate solution to the integral equation expressing the Z 1/2 projection of dz(f) onto the Fourier transform of the data: j2πft yt = xt(f)e df, (6) 1 Z 2 −1/2 sin W π(f − ζ) −i2π(f−ζ) W −1 y(f) = e 2 dz(ζ). with uncorrelated time-varying expansion coefficients, i.e., − 1 sin π(f − ζ) 2 E ∗ [xt(f1)xt (f2)] = D(t, f1)δ(f1 − f2). In standard MT spec- where W is window length, i.e., number of samples, and dz(ζ) trogram analysis, the extent of overlap between consecutive is an orthogonal increment process. This integral equation can segments dictates the amount of temporal smoothness in the be approximated using a local expansion of the increment estimates. Our approach is to avoid the usage of overlapping process over an interval [−B,B], for some small design band- windows by modeling and estimating the dependence of width B, in the space spanned by the eigenfunctions of the the spectra across windows using state-space models, while sin W πf Dirichlet kernel sin πf [8], [9]. retaining the favorable leakage properties of the MT analysis. These eigenfunctions are known as the prolate spheroidal As will be revealed in the subsequent sections, in the same vein wave functions (PSWFs), which are a set of doubly-orthogonal as the sliding window multitaper analysis our methods pertain functions over [−B,B] and [−1/2, 1/2], with time-domain to the class of spectrogram estimates, which are viewed as (k) representations given by the dpss sequences. Let ul be the smoothed versions of the Wigner-Ville spectrum [16]. lth sample of the kth dpss sequence, for a given bandwidth B Due to the underlying quasi-stationarity assumption, i.e., and window length W . The kth PSWF is then defined as: negligible spectral correlation, the domain of applicability of 4 our methods might be narrower than the more general non- 1, 2 ··· N, denotes the data in the nth window. This way, stationary spectral analysis methods such as GES and Weyl the entire data is divided into N non-overlapping segments spectral estimation and TFARMA modeling; however, our of length W each. To this end, we invoke quasi-stationarity methods admit simple and efficient implementations, which assumption by modeling yt to be stationary within each of makes them attractive for exploratory applications in which these segments of length W . With this assumption, motivated sliding window processing is widely used with subjective and by the major sources of uncertainty in spectral estimation, ad hoc choices of design parameters. In this context, the i.e., measurement noise and sampling noise, we formulate two novelty of our contributions lies in: state-space frameworks in the following subsections. 1) Capturing the evolution of the spectra across windows 1) Mitigating the Measurement Noise: Suppose that yet by modeling the dynamics of certain eigen-spectral is the noise corrupted observation obtained from the true T quantities arising in MT analysis (e.g., spectral eigen- signal yt, i.e., yet = yt + vt, where (vt)t=1 is an i.i.d. zero- 2 coefficients and eigen-spectra); mean Gaussian noise sequence with fixed variance σ . By 2) Addressing the additive measurement noise and mul- discretizing the representation in (1) at a frequency spacing of tiplicative sampling noise, which severely distort the 2π/J with J an integer, at any arbitrary window n, we have spectrograms obtained by the multitaper framework; and yn = Fnxn + vn, (7) 3) Constructing a framework for precise statistical assess- e ment of the estimates, by addressing the dependency where Fn is a matrix with elements (Fn)l,j := exp (i2π(((n− j−1 among windows using a Bayesian formulation. 1)W + l) J )) for l = 1, 2, ··· ,W and j = 1, 2,, ··· J; > As it will be evident in Section V, the use of state-space yen := [ye(n−1)W +1, ye(n−1)W +2, ··· , yenW ] is the noisy models in the context of MT analysis results in adaptive observation of the true signal yn; xn(f) and xn := 1 J−1 > weighting of the estimates of the eigen-coefficients or eigen- [xn(0), xn(2π J ), ··· , xn(2π J )] denote the orthogonal in- crement process and its discretized version, respectively at spectra across windows, thanks to the optimal data combining > feature of Bayesian smoothing. These adaptive weights depend window n and vn = [v(n−1)W +1, v(n−1)W +2, ··· , vnW ] is on the common dynamic trends shared across windows and zero-mean Gaussian noise with covariance Cov{vi, vj} = 2 hence result in capturing the degree of smoothness inherent σ Iδi,j . (k) (k) (k) (k) > in the signal, while producing estimates robust against uncer- Let u := [u1 , u2 , ··· , uW ] denotes the kth (k) (k) tainties due to observation noise and limited data. dpss taper and yn := u yn, where denotes e e(k) (k) It is noteworthy that the use of state-space models in our element-wise multiplication. Let xn (f) and xn := > work is significantly different from those used in parametric h (k) (k) 1 (k) J−1 i xn (0), xn (2π ), ··· , xn (2π ) denote the kth spec- non-stationary spectral analysis methods such as TFARMA J J tral eigen-coefficient of y and its discretized version, respec- modeling [24]. In the TFARMA formulation, time delays and n tively, for k = 1, 2, ··· ,K. Then, following (7) we consider frequency shifts are used to model the non-stationary dynamics the following spectrotemporal representation of the tapered of the process in a physically intuitive way. These state-space data segments: models therefore determine the functional form of the resulting y(k) = F x(k) + v(k), (8) spectral estimates in closed form in terms of the finite set of en n n n (k) ARMA coefficients. The state-space models used in our work, where vn is the contribution of vn to the kth tapered data, however, do not determine the functional form of the spectral (k) assumed to be independent of x1:n−1, and identically dis- estimates at each window, and rather control the temporal tributed according to a zero-mean Gaussian distribution with smoothness of the eigen-spectral quantities via forming a reg- (k) (k) (k)2 (k) covariance Cov{vi , vj } = σ Iδi,j. We view yen as a ularization mechanism in the underlying Bayesian estimation noisy observation corresponding to the true eigen-coefficient framework (See Section II-E). In our approach, we indeed (k) xn , which provides a linear Gaussian forward model for the estimate the spectrogram at a given number of frequency observation process. bins in each window, which scales with the total number of In order to capture the evolution of the spectrum and samples, in the same vein as sliding window MT spectrogram. hence systematically enforce temporal smoothness, we impose In light of the above, our algorithms belong to the class of a stochastic continuity constraint on the eigen-coefficients semi-parametric estimation methods, as the underlying model (k) N (xn ) for k = 1, 2, ··· K, using a first-order difference parametric n=1 is a hybrid of unobservable state evolution process equation: and a non-parametric data generating process [37]. In Section x(k) = α(k)x(k) + w(k), (9) IV-A, we will compare our proposed semi-parametric method- n n−1 n ology with both non-parametric and parametric techniques, (k) (k) (k) where 0 6 α < 1, and wn is independent of x1:n−1 namely the MT spectrogram analysis and the Time-Frequency and assumed to be independently distributed according to Autoregressive (TFAR) modeling technique [24]. a zero-mean Gaussian distribution with diagonal covariance (k) (k) (k) Cov{wi , wj } = Qi δi,j. Under this assumption, the D. Problem Formulation (k) N discrete-time process, (xn )n=1 forms a jointly Gaussian ran- Assume, without loss of generality, that an arbitrary window dom process with independent increments, while the process of length W is chosen so that for some integer N, NW = T itself is statistically dependent. An estimate of the unobserved > and let yn = [y(n−1)W +1, y(n−1)W +2, ··· , ynW ] for n = states (true eigen-coefficients) from the observations (tapered 5 data) under this model suppresses the measurement noise and State Space captures the state dynamics. Eigen-coefficients/ Eigen-coefficients/ eigen-spectra eigen-spectra 2) Mitigating the Sampling Noise: Suppose the additive State measurement noise is negligible, i.e. vt u 0. For now, consider States at window Dynamics at window only a single window of length W . It is known that when the spectrum does not rapidly vary over the chosen design band- width B, the eigen-spectra are approximately uncorrelated and the following approximation holds for k = 1, 2 ··· ,K [9], [32]: Sb(k)(f) χ2 dpss tapers ∼ 2 , 0 < f < 1/2, (10) S(f) 2 where Sb(k)(f) and S(f) are the tapered estimate and true Data PSD, respectively. In other words, the empirical eigen-spectra Window Window of the process can be thought of as the true spectra corrupted by a multiplicative noise, due to sampling and having access Observation Space to only a single realization of the process. We refer to this un- Fig. 1: Schematic depiction of the proposed models. certainty induced by sampling as sampling noise. By defining ψ(k)(f) := log Sb(k)(f) + log 2 and s(k)(f) := log S(f), we In summary, through these models we project the data of can transform the multiplicative effect of the sampling noise each short window onto the functional space spanned by the in Eq. (10) to the following additive forward model [38]: PSWFs and impose stochastic continuity constraints ((9) and (14)) on these projections (eigen-coefficients or eigen-spectra) (k) (k) (k) ψ (f) = s (f) + φ (f), (11) in order to recover spectral representations that are smooth where φ(k)(f) is a log-chi-square distributed random variable, in time and robust against measurement or sampling noise. capturing the uncertainty due to sampling noise. It can be Fig. 1 provides a visual illustration of the proposed modeling shown that φ(k)(f) has a density given by: paradigm. 1  1  E. The Inverse Problem p(φ) = exp φ − exp(φ) , (12) 2 2 We formulate the spectral estimation problem as one of which belongs to the family of log-Gamma distributions, Bayesian estimation, in which the Bayesian risk/loss function, (k) N,K including the Gumbel and Bramwell-Holdsworth-Pinton dis- fully determined by the posterior density of (xn )n=1,k=1 tributions common in extreme value statistics [39]. (k) N,K (k) N,K (resp. (sn )n=1,k=1) given the observations (yen )n=1,k=1 In order to incorporate this observation model in our (k) N,K (k) (resp. (ψn )n=1,k=1) is minimized. We first consider the dynamic framework, we define the state vector, sn := > forward model of (8), which provides the observed data h (k) (k) 1  (k) J−1 i sn (0), sn 2π J , ··· , sn 2π J ; the observation likelihood given the states. Under the state-space model of > Eq. (9), the kth eigen-coefficient can be estimated by solving (k) h (k) (k) 1  (k) J−1 i vector, ψn := ψn (0), ψn 2π J , ··· , ψn 2π J the following maximum a posteriori (MAP) problem: (k) and the observation noise vector, φn := N  > X 1 2 h (k) (k) 1  (k) J−1 i (k) (k) φ (0), φ 2π , ··· , φ 2π . Then, the min yn − Fnxn + n n J n J (k) (k) (k) σ2 e 2 x1 ,x2 ,···,xN n=1 forward model at window n can be stated as:  (k) (k) H (k)−1 (k) (k) (k) (k) (k) (xn − αxn−1) Qn (xn − αxn−1) , (15) ψn = sn + φn , (13) (k) where each element of φn is log-chi-square distributed. Sim- for k = 1, 2, ··· ,K. Similarly, in the second state space ilar to the preceding model, we impose a stochastic continuity framework (13), the eigen-spectra can be obtained by solving constraint over the logarithm of the eigen-spectra as follows: another MAP estimation problem:

(k) (k) (k) (k) N  sn = θ sn−1 + en , (14) X > (k) (k) 1 (k) (k) min 1J [sn − ψn + exp(ψn − sn )] (k) (k) (k) 2 (k) (k) s1 ,s2 ,···,sN n=1 where 0 6 θ < 1, and en is assumed to be a zero-mean (k)  (k) (k) H (k)−1 (k) (k) Gaussian vector independent of s1:n−1 and with a diagonal +(s − θs ) R (s − θs ) , (16) (k) (k) (k) n n−1 n n n−1 covariance Cov{ei , ej } = Ri δi,j. Note that the loga- rithm function maps the range of the eigen-spectra in [0, ∞) to for k = 1, 2, ··· ,K, where 1J is the vector of all ones (−∞, ∞) which makes the Gaussian state evolution plausible. of length J. We call the MAP estimation problems in (15) An estimate of the unobserved states (logarithm of the true and (16) the Dynamic Bayesian Multitaper (DBMT) and spectra) from the observations (logarithm of the empirical the log-DBMT estimation problems, respectively. Similarly, eigen-spectra) under this model suppresses the sampling noise the respective spectrogram estimates will be denoted by the and captures the state dynamics. DBMT and log-DBMT estimates. 6

(k) W Equation (15) is a strictly convex function of xn ∈ C Algorithm 1 The DBMT Estimate of the kth Eigen-coefficient (k) W and Qn ∈ S++ for n = 1, 2 ··· ,N, which can be solved (k) (0) 1: Initialize: observations ye1:N ; initial guess x1:N ; initial using standard optimization techniques. However, these tech- [0] guess Q ; initial conditions Σ0|0; tolerance tol ∈ niques do not scale well with the data length N. A careful −3 + (0, 10 ), Maximum Number of iteration Lmax ∈ N . examination of the log-posterior reveals a block tri-diagonal 2: repeat structure of the Hessian, which can be used to develop 3: l = 0 . efficient recursive solutions that exploit the temporal structure 4: Forward filter for n = 1, 2, ··· ,N: [l] of the problem. A similar argument holds for the optimization xn|n−1 = α xn−1|n−1 problem in (16). However, the parameters of these state-space [l]2 [l] Σn|n−1 = α Σn−1|n−1 + Q models need to be estimated from the data. In the next section, H H 2 −1 we show how the EM algorithm can be used to both estimate Kn = Σn|n−1Fn (FnΣn|n−1Fn + σ I) the parameters and states efficiently from the optimization xn|n = xn|n−1 + Kn(yen − Fnxn|n−1) problems (15) and (16). Σn|n = Σn|n−1 − KnFnΣn|n−1 5: Backward smoother for n = N − 1,N − 2, ··· , 1: [l] −1 III.FAST RECURSIVE SOLUTIONSVIATHE EM Bn = α Σn|nΣn+1|n

ALGORITHM xn|N = xn|n + Bn(xn+1|N − xn+1|n) H In order to solve the MAP problem in (15), we need to Σn|N = Σn|n + Bn(Σn+1|N − Σn+1|n)Bn (k) W (k) find the parameters Qn ∈ S++ and α ∈ (0, 1] for 6: Covariance smoothing for n = N − 1,N − 2, ··· , 1: (k) W Σn,n−1|N = Bn−1Σn|N n = 1, 2 ··· ,N and k = 1, 2, ··· ,K. Similarly Rn ∈ S++ and θ(k) ∈ (0, 1] need to be estimated for the problem in [l] H H H H 7: Let Xb := [x1|N , x2|N , ··· , xN|N ] . (16). If the underlying states were known, one could further 8: Update α[l+1] and Q[l+1] as: maximize the log-posterior with respect to the parameters. This PN [l]−1 H [l]−1 Tr(Σn,n−1|N Q ) + x Q xn|N observation can be formalized in the EM framework [27], [28], α[l+1] = n=2 n−1|N , PN [l]−1 H [l]−1 [40]. To avoid notational complexity, we drop the dependence n=2 Tr(Σn−1|N Q ) + xn−1|N Q xn−1|N N of the various variables on the taper index k in the rest of this [l+1] 1 X H [l+1]2 H Q = [x x +Σ +α (x x +Σ ) N n|N n|N n|N n−1|N n−1|N n−1|N subsection. n=1 [l+1] H H −α (xn−1|N xn|N +xn|N xn−1|N +2Σn,n−1|N )]. 9: Set l ← l + 1. A. The DBMT Spectrum Estimation Algorithm [l] (l−1) kXb −Xb k2 10: until [l] < tol or l = Lmax. By treating xn, n = 1, 2, ··· ,N as the hidden variables kXb k2 [L] and α, Qn, n = 1, 2, ··· ,N as the unknown parameters to be 11: Output: Denoised eigen-coefficients Xb where L is the estimated, we can write the complete log-likelihood as: index of the last iteration of the algorithm, and error N  covariance matrices Σn|N for n = 1, 2, ··· ,N in from X 1 2 log L(α, Q ) := − ky − F x k + log|det Q | last iteration of the algorithm. 1:N σ2 en n n 2 n n=1  H −1 + (xn − αxn−1) Q (xn − αxn−1) + c, (17) n PN [l]−1 H [l]−1 Tr(Σn,n−1|N Q ) + x Q xn|N α[l+1] = n=2 n−1|N (18) PN [l]−1 H [l]−1 where c represents the terms that do not depend on α, n=2 Tr(Σn−1|N Q ) + xn−1|N Q xn−1|N N N (Qn)n=1 or (xn)n=1. For simplicity of exposition, we assume and N that Q = Q for n = 1, 2, ··· ,N. The forthcoming treatment 1 X h 2 n Q[l+1] = x xH +Σ +α[l+1] (x xH +Σ ) N n|N n|N n|N n−1|N n−1|N n−1|N can be extended to the general case with little modification. n=1 2 [l+1] H H i Also, note that σ can be absorbed in Q, and thus is assumed −α (xn−1|N xn|N +xn|N xn−1|N +2Σn,n−1|N ) . (19) to be known. At the lth iteration, we have: These iterations can be performed until convergence to a [l] [l] 1) E-Step: Given α , Q , for n = 1, 2, ··· ,N, the possibly local maximum. However, with even one such update, E [l] [l] expectations, xn|N := [xn|ye1:N , α , Q ], Σn|N := the overall algorithm forms a majorization-minimization (MM) E H [l] [l] [(xn − xn|N )(xn − xn|N ) |ye1:N , α , Q ], Σn,n−1|N := procedure, generalizing the EM procedure and enjoying from E H [l] [l] [(xn−xn|N )(xn−1−xn−1|N ) |ye1:N , α , Q ], can be cal- similar convergence properties [43]. One possible implemen- culated using the Fixed Interval Smoother (FIS) [41] (lines tation of this iterative procedure is described in Algorithm 4 and 5) and the state-space covariance smoothing algo- 1. Once the DBMT estimates of all the K eigen-coefficients rithm [42] (line 6). These expectations can be used to (k) xbn are obtained, for n = 1, 2, ··· ,N and k = 1, 2, ··· ,K, compute the expectation of the complete data log-likelihood the DBMT spectrum estimate is constructed similar to (5): E [l] [l] [ log L(α, Q)|ye1:N , α , Q ]. [l+1] K 2 2) M-Step: The parameters for subsequent iterations, α 1 X  (k) [l+1] Dbn(fj) = xn , (20) and Q can be obtained by maximizing the expectation K b j k=1 of (17). Although this expectation is convex in α and Q 2π(j−1) individually, it is not a convex function of both. Hence, we where fj := J for j = 1, 2, ··· ,J and n = 1, 2, ··· ,N. perform cyclic iterative updates for α[l+1] and Q[l+1] given Confidence intervals can be computed by mapping the Gaus- (k) by: sian confidence intervals for xbn ’s to the final DBMT estimate. 7

B. The log-DBMT Spectrum Estimation Algorithm Algorithm 2 The log-DBMT Estimate of the kth log-Eigen-spectra (k) [0] We utilize a similar iterative procedure based on the EM 1: Initialize: observations ψ1:N ; initial guess s1:N ; initial [0] algorithm to find the log-DBMT spectrum estimate. As be- guess R ; initial conditions Ω0|0; tolerance tol ∈ −3 N+ fore, we treat sn, n = 1, 2, ··· ,N as hidden variables and (0, 10 ), Maximum Number of iteration Lmax ∈ . θ, Rn, n = 1, 2, ··· ,N as the unknown parameters to be 2: repeat estimated. In order to give more flexibility to the observation 3: l = 0 . model, we consider the observation noise to be distributed as 4: Forward filter for n = 1, 2, ··· ,N: [l] log-chi-square with degrees of freedom 2ν, for some positive sn|n−1 = θ sn−1|n−1 (k) [l]2 [l] integer ν to be estimated. The density of each element of φn Ωn|n−1 = θ Ωn−1|n−1 + R is then given by:  1  s = s + Ω exp(ψ − s ) − ν[l]1 1  1  n|n n|n−1 n|n−1 2 n n|n J p(φ) = exp νφ − exp(φ) . (21) ν 1 2 Γ(ν) 2 Ω = Ω−1 − diag{exp(ψ − s )} n|n n|n−1 2 n n|n We can express the complete data log-likelihood as: 5: Backward smoother for n = N − 1,N − 2, ··· , 1: [l] −1 An = θ Ωn|nΩn+1|n N    X 1 sn|N = sn|n + An(sn+1|N − sn+1|n) log L(ν, θ, R ):= − 1> ν(s − ψ ) + exp(ψ − s ) 1:n J n n 2 n n H n=1 Ωn|N = Ωn|n + An(Ωn+1|N − Ωn+1|n)An   H −1 6: Covariance smoothing for n = N − 1,N − 2, ··· , 1: + J ν log 2 + log Γ(ν) + (sn − θsn−1) Rn (sn − θsn−1) Ωn,n−1|N = An−1Ωn|N  [l] H H H H + log|det Rn| + c, (22) 7: Let Sb := [s1|N , s2|N , ··· , sN|N ] . 8: Update ν[l+1], θ[l+1] and R[l+1] as: c ν θ PN [l]−1 H [l]−1 where represents the terms that do not depend on , , Tr(Ωn,n−1|N R ) + s R sn|N N N [l+1] n=2 n−1|N (R ) or (s ) . Again assuming R = R for all θ = −1 −1 , n n=1 n n=1 n PN Tr(Ω R[l] ) + sH R[l] s n = 1, 2, ··· ,N for simplicity, the following EM algorithm n=2 n−1|N n−1|N n−1|N 1 PN > [l] [l] [l+1] [l+1] 1 − log 2 + 1J (ψn −sn ) − z(ν ) can be constructed: ν = JN n=1 , 2 PN > [l] [l] 1) E-step: Computation of the conditional JN n=1 1J exp(ψn −sn ) N expectation of the log-likelihood in (22) re- [l+1] 1 X H [l+1]2 H R = [sn|N sn|N +Ωn|N +θ (sn−1|N sn−1|N +Ωn−1|N ) [l] [l] [l] N quires evaluating E[sn|ψ1:N , R , θ , ν ] and n=1 [l] [l] [l] [l+1] H H E[exp(−sn)|ψ1:N , R , θ , ν ] for n = 1, 2, ··· ,N. −θ (sn−1|N sn|N +sn|N sn−1|N +2Ωn,n−1|N )]. Unlike the DBMT estimation problem, the forward model 9: Set l ← l + 1. [l] (l−1) in this case is non-Gaussian, and hence we cannot apply kSb −Sb k2 10: until [l] < tol or l = Lmax. the Kalman filter and FIS to find the state expectations. kSb k2 [L] To compute the conditional expectation, the distribution 11: Output: Denoised log-eigen-spectra Sb where L is the [l] [l] [l] of sn|ψ1:n, R , θ , ν or its samples are required [29]. index of the last iteration of the algorithm, and error [l] [l] [l] Ω n = 1, 2, ··· ,N Computation of the distribution sn|ψ1:n, R , θ , ν involves covariance matrices n|N for in from intractable integrals and sampling from the distribution using last iteration of the algorithm. numerical methods such as Metropolis-Hastings is not computationally efficient, especially for long data, given that it has to be carried out at every iteration. Since the 2) M-step: Once the conditional expectation of the log- [l] [l] [l] posterior distribution is unimodal and a deviation from likelihood in (22) given ψ1:n, R , θ , ν is available, we can the Gaussian posterior, we approximate the distribution of update R[l+1] and θ[l+1] using similar closed form equations [l] [l] [l] [l+1] sn|ψ1:n, R , θ , ν as a Gaussian distribution by matching as in (19). But updating ν by maximizing the conditional its mean and covariance matrix to the log-posterior in expectation of the log likelihood in (22) wrt. ν[l+1] requires Eq. (22). To this end, the mean is approximated by the solving following nonlinear equation: mode of fs |ψ ,R[l],θ[l],ν[l] and the covariance is set to n 1:n 1 PN > [l] [l] [l+1] 1 − log 2 + 1J (ψn −sn ) − z(ν ) the inverse of the negative Hessian of the log-likelihood ν[l+1] = JN n=1 , (24) 2 PN > [l] [l] in (22) [28], [44]. Under this approximation, computing JN n=1 1J exp(ψn −sn ) [l] [l] [l] E[exp(−sn)|ψ1:n, R , θ , ν ] is also facilitated thanks to the closed-form moment generating function of z ∼ N (µ, Σ): where z(·) is the digamma function. We can use Newton’s method to solve this equation up to a given precision. An  1  implementation of the log-DBMT is given by Algorithm 2. E exp a>z = exp a>µ + a>Σa . (23) 2 Note that unlike the DBMT algorithm which pertains to a Gaussian observation model, the forward filtering step to Similar to the case of DBMT, we can exploit the block tri- compute sn|n is nonlinear, and standard techniques such as diagonal structure of the Hessian in (16) to carry out the E-step Newton’s method can be used to solve for sn|n. We use the efficiently using forward filtering and backward smoothing. log-DBMT algorithm to find all the K estimates of true log- 8 spectra and construct the log-DBMT estimate as: 15

K   10 1 X  (k) Dbn(fj) = exp sn , (25) K b j k=1 0

2π(j−1) Amplitude where fj := J for j = 1, 2, ··· ,J and n = 1, 2, ··· ,N. -10 Again, confidence intervals can be computed by mapping the (k) Gaussian confidence intervals for sn ’s to the final log-DBMT -15 b 384 386 388 390 392 394 396 estimate. Time (s) Fig. 2: Sample from the synthetic data from t = 384 s to t = 396 s.

C. Parameter Selection consisting of several narrow-band components [24]. The TFAR The window length W , design bandwidth B, and the model is defined by the input-output relation number of tapers K need to be carefully chosen. Since both MA LA LB X X i 2π lt X i 2π lt proposed algorithms are motivated by the standard overlapping yt :=− am,le N yt−m + b0,le N et,

MT method, we use the same guidelines for choosing these m=1 l=−LA l=−LB parameters [32], [36]. The window length W is determined based on the expected rate of change of the PSD (given where et is a stationary white noise process with unit variance, MA,LA LB and (am,l) and (b0,l) are the autoregressive domain-specific knowledge) in order to make sure that the m=1,l=−LA l=−LB quasi-stationarity assumption holds. The design bandwidth B (AR) and zero-delay moving average (MA) parameters, is chosen small enough to be able to resolve the dominant respectively. The integers MA and LA are respectively the frequency components in the data, while being large enough to delay and Doppler model orders of the AR component and L denotes the Doppler model order of the zero-delay MA keep the time-bandwidth product ρ := WB > 1. The number B component. The AR and MA parameters are estimated by of tapers K is then chosen as K 6 b2ρc − 1 [32]. solving the time-frequency Yule-Walker equations, from which the evolutionary spectra can be constructed (See the IV. APPLICATION TO SYNTHETICAND REAL DATA methods described in [24] for more details). Before presenting our theoretical analysis, we examine the Figure 3 shows the true as well as estimated spectrograms performance of DBMT and log-DBMT spectrogram estimators by the standard overlapping MT, DBMT, log-DBMT and the on synthetic data, and then demonstrate their utility in two real TFAR estimators. Each row consists of three panels: the left world data applications, namely spectral analysis of human panel shows the entire spectrogram; the middle panel shows EEG during sleep and Electric Network Frequency signal a zoomed-in spectrotemporal region marked by the dashed detection. box in the left panel; and the right panel shows the PSD along with confidence interval (CI) in gray hull, at a selected A. Application to Synthetic Data time point marked by a dashed vertical line in the middle The synthetic data consists of the linear combination of two panel. Note that for the standard MT estimates, the CIs are 2 amplitude-modulated and frequency-modulated processes with constructed assuming a χ2K distribution of the estimates high dynamic range (i.e., high-Q). The amplitude-modulated around the true values [9], [32], whereas for DBMT and (1) log-DBMT estimate by mapping the Gaussian confidence component yt is generated through modulating an AR(6) process tuned around 11 Hz by a cosine at a low frequency intervals for eigen-coefficients or eigen-spectra to the final (2) estimates. We were not able to evaluate the CIs for the f = 0.02 Hz. The frequency-modulated component y is a 0 t TFAR estimates, since to the best of our knowledge we are realization of an ARMA(6, 4) with varying pole loci. To this not aware of any method to do so. Figure 3A shows true end, the process has a pair of 3rd order poles at ω := 2πf t t spectrogram of the synthetic process, in which the existence and −ω , where f increases from 5 Hz, starting at t = 0, t t of both amplitude and frequency modulations makes the every ∼ 26 s by increments of 0.48 Hz, to achieve frequency spectrogram estimation a challenging problem. modulation. In summary, the noisy observations are given by: (1) (2) Fig. 3B shows the standard overlapping MT spectrogram yt = yt cos(2πf0t) + yt + σvt, (26) estimate. We used windows of length 6 s and the first where vt is a white Gaussian noise process and σ is chosen to 3 tapers corresponding to a time-bandwidth product of 3 achieve an SNR of 30 dB. The process is truncated at 600 s to and 50% overlap to compute the estimates (note that the be used for spectrogram analysis. Figure 2 shows a 12 second same window length, tapers and time-bandwidth product are sample window of the process. used for the DBMT and log-DBMT estimators). Although In addition to the standard overlapping MT, we present the standard MT spectrogram captures the dynamic evolu- comparison to the TFAR method as an example of parametric tion of both components, it is blurred by the background state-space modeling approaches to non-stationary spectral noise and picks up spectral artifacts (i.e., vertical lines) due analysis. This method is known to be well suited to processes to window overlap, frequency mixing, and sampling noise. whose time-varying spectra exhibits sharp peaks, i.e., signals Fig. 3C demonstrates how the DBMT spectrogram estimate 9

A 20 15 0 0 10 12 -20 PSD -40

Freq. (Hz) -40 0 9 -80 B 20 15 0 0 10 12 -20 PSD -40

Freq. (Hz) -40 0 9 -80 C 20 15 0 0 10 12 -20 PSD -40

Freq. (Hz) -40 0 9 -80 D 20 15 0 0 10 12 -20 PSD -40

Freq. (Hz) -40 0 9 -80 E 20 15 0 0 10 12 -20 PSD -40

Freq. (Hz) -40 0 9 -80 0 100 200 300 400 500 600 370 405 440 0 10 20 Time (s) Time (s) Freq. (Hz) Fig. 3: Spectrogram analysis of the synthetic data. (A) Ground truth, (B) overlapping MT estimates, (C) DBMT estimates, (D) log-DBMT estimates, and (E) TFAR estimates. Left: spectrograms. Middle: zoomed-in views from t = 370 s to t = 440 s. The color scale is in decibels. Right: PSDs corresponding to a window of length 6 s starting at t = 474 s. Dashed and solid lines in row A show respectively the noiseless and noisy PSDs. Grey hulls show 95% confidence intervals. overcomes these deficiencies of the overlapping MT spectro- time-frequency resolution. As it is shown in the right panel, gram: the spectrotemporal localization is sharper and smoother the TFAR method provides the smoothest estimate along the across time, artifacts due to overlapping between windows frequency axis. However, it is not successful in capturing the are vanished, and frequency mixing is further mitigated. By true dynamic range of the signal due to spectral leakage (See comparing the right panel of the second and third rows, two middle and right panels). In addition, it is contaminated by important observations can be made: first, the DBMT captures similar vertical frequency artifacts as in the case of the MT the true dynamic range of the original noiseless PSD, while the spectrogram (See Figure 3A). standard MT estimate fails to do so. Second, the CIs in Fig. 3C In the spirit of easing reproducibility, we have deposited as compared to 3B are wider when the signal is weak (e.g., a MATLAB implementation of these algorithms on the open near 5 Hz) and tighter when the signal is strong (e.g., near source repository GitHub [45], which generates Figure 3. 11 Hz). The latter observation highlights the importance of the model-based confidence intervals in interpreting the denoised B. Application to EEG data estimates of DBMT: while the most likely estimate (i.e., the To illustrate the utility of our proposed spectrogram esti- mean) captures the true dynamic range of the noiseless PSD, mators, we apply them to human EEG data recorded during the estimator does not preclude cases in which the noise floor sleep. In the interest of space, in the remainder of this section, of −40 dB is part of the true signal, while showing high we only present comparisons with the MT spectrogram as confidence in detecting the spectral content of the true signal a non-parametric benchmark. The EEG data set is avail- that abides by the modeled dynamics. able online as part of the SHHS Polysomnography Database Next, Fig. 3D shows the log-DBMT spectrogram estimate, (https://www.physionet.org/pn3/shhpsgdb/). The data is 900 s which shares the artifact rejection feature of the DBMT spec- long during stage 2 sleep, and sampled at 250 Hz. During trogram. However, the log-DBMT estimate is smoother than stage 2 sleep, the EEG is known to manifest delta waves both the standard overlapping MT and DBMT spectrograms (0 − 4 Hz) and sleep spindles (transient wave packets with in time as well as in frequency (see the zoomed-in middle frequency 12 − 14 Hz) [46], [47]. Accurate localization of panels), due to its sampling noise mitigation feature (by these spectrotemporal features has significant applications in design). In addition, similar to the standard MT estimate, studying sleep disorders and cognitive function [46]. Since the the log-DBMT estimator is not able to denoise the estimate transient spindles occur at a time scale of seconds, we choose by removing the observation noise. Though, the confidence a window length of 2.25 s for all algorithms (with 50% overlap intervals of the log-DBMT PSD estimate are tighter than those for the standard overlapping MT estimate). We also chose a of the standard overlapping MT estimate due to averaging time-bandwidth product of 2.25 for all algorithms, in order across multiple windows via Bayesian filtering/smoothing. As to keep the frequency resolution at 2 Hz. Figs. 4A, B and C we will show in Section V, these qualitative observations can show the MT, DBMT and log-DBMT spectrogram estimates, be established by our theoretical analysis. respectively, with a similar presentational structure as in Fig. Finally, Fig. 3E shows the TFAR spectral estimate. The 3. As the middle panels reveal, the overlapping MT estimate model orders are chosen as MA = 20, LA = 25, LB = 25, is not able to clearly distinguish the delta waves and sleep large enough to allow the parametric model to achieve high spindles due to high background noise. The DBMT estimate 10

A 30 30 40 20 20 20 20 0 0 10 PSD 10 -20 -20 Freq. (Hz) 0 0 -40 -40 B 30 30 40 20 20 20 20 0 0 10 PSD 10 -20 -20 Freq. (Hz) 0 0 -40 -40 C 30 30 40 20 20 20 20 0 0 PSD 10 10 -20 -20 Freq. (Hz) -40 0 0 -40 0 100 200 300 400 500 600 700 800 900 700 725 750 0 10 20 30 Time (s) Time (s) Freq. (Hz) Fig. 4: Spectrogram analysis of the EEG data. (A) overlapping MT estimates, (B) DBMT estimates, and (C) log-DBMT estimates. Left: spectrograms. Middle: zoomed-in views from t = 700 s to t = 750 s. The color scale is in decibels. Right: PSD estimate corresponding to a window of length 2.25 s starting at t = 722.25 s. Grey hulls show 95% confidence intervals. shown in Fig. 4B, however, provides a significantly denoised the noisy background. Fig. 5B shows the DBMT spectrogram, spectrogram, in which the delta waves and sleep spindles are in which the background noise is significantly suppressed, visually separable. The log-DBMT estimator shown in Fig. 4C yielding a crisp and temporally smooth estimate of the ENF provides significant spectrotemporal smoothing, and despite dynamics. The log-DBMT estimate is shown in Fig. 5C, not fully reducing the background noise, provides a clear which provides higher spectrotemporal smoothness than the separation of the delta waves and spindles (see the PSD in standard MT estimate. Although the log-DBMT shows smaller the right panel). Similar to the analysis of synthetic data, the variability in the estimates (middle and right panels), the gain same observations regarding the confidence intervals of the is not as striking as in the cases of synthetic data and EEG estimators can be made. analysis, due to the usage of longer windows which mitigates the sampling noise for all algorithms. Similar observations as C. Application to ENF data in the previous two cases regarding the statistical confidence Finally, we examine the performance of our proposed al- intervals can be made, which highlight the advantage of mod- gorithms in tracking the Electrical Network Frequency (ENF) eling the spectrotemporal dynamics in spectrogram estimation. signals from audio recordings. The ENF signal corresponds to V. THEORETICAL ANALYSIS the supply frequency of the power distribution network which is embedded in audio recordings [48], [49]. The instantaneous A. Filter Bank Interpretation values of this time-varying frequency and its harmonics form In order to characterize the spectral properties of any non- the ENF signal. The ability to detect and track the spectrotem- parametric spectrum estimator, the tapers applied to the data poral dynamics of ENF signals embedded in audio recordings need to be carefully inspected. In the MT framework, the dpss has shown to be crucial in data forensics applications [48]. sequences are used as tapers, which are known to produce Fig. 5A shows the spectrogram estimates around the sixth negligible side-lobes in the frequency domain [8], [9]. The harmonic of the nominal 60 Hz ENF signal (data from [49]). DBMT and log-DBMT algorithms also use the dpss tapers to We used 1000 s of audio recordings, and constructed spectro- alleviate the problem of frequency leakage. However, because grams with windows of length 5 s and using the first 3 tapers of the stochastic continuity constraint we introduced, the corresponding to a time-bandwidth product of 3 for all three estimate associated to any given window is now a function methods (with 25% overlap for the overlapping MT estimate). of the data in all the windows. Therefore, the theoretical The two dominant components around the sixth ENF harmonic properties of the MT method do not readily apply to our exhibit temporal dynamics, but are hard to distinguish from estimators.

A 380 366 -50 -50 -70 360 361 -70

PSD -90

Freq. (Hz) -110 340 356 -90 B 380 366 -50 -50 -70 360 361 -70

PSD -90

Freq. (Hz) -110 340 356 -90 C 380 366 -50 -50 -70 360 361 -70

PSD -90

Freq. (Hz) -110 340 356 -90 0 100 200 300 400 500 600 700 800 900 1000 480 500 520 540 340 360 380 Time (s) Time (s) Freq. (Hz) Fig. 5: Spectrogram analysis of the ENF data. (A) overlapping MT estimates, (B) DBMT estimates, and (C) log-DBMT estimates. Left: spectrograms. Middle: zoomed-in views from t = 480 s to t = 540 s. The color scale is in decibels. Right: PSD estimate corresponding to a window of length 5 s starting at t = 505 s. Grey hulls show 95% confidence intervals. 11

-5 To characterize the statistical properties of our estimates, we 10 0 5 Equivalent filter at 11 Hz first need take a detour from the usual analysis of spectrum estimation techniques. In what follows, we mainly focus on 0 100 Filter Coeff. the DBMT algorithm for the sake of presentation. By virtue of -5 (dB) Power 200 0 the FIS procedure under the assumptions that: 1) the window 0.5 Equivalent filter at 9 Hz length W is an integer multiple of J, the number of discrete 100 0 frequencies, so that Fn = F1, ∀n, and 2) the state noise 200 Power (dB) Power -0.5Filter Coeff. covariance matrices are time-invariant, i.e., Qn = Q, ∀n, 300 (k) 280 285 290 295 300 305 310 315 320 0 20 40 Time (s) Frequency (Hz) one obtains the following expansion of xn|N in terms of the observed data [25]: Fig. 6: Equivalent filters corresponding to the first taper of the DBMT estimate of the synthetic data example. Left: equivalent filters in time n−1 n−1 around t = 300 s. Right: equivalent filters of MT (red) and DBMT (k) X Y (k) (k) xn|N = [α(I−KmFm)]KsU yes + KnU yen (green) in frequency. s=1m=s N s X Y method for constructing time-frequency representations given + B K U(k)y . (27) m s es noisy time series data. Next, we will characterize the perfor- s=n+1m=n mance of the DBMT estimator in terms of bias-variance trade- In other words, the DBMT algorithm maps the entire data ye := off. [y , y , ··· , y ]> X(k) e1 e2 eT to the vector of coefficients b according B. Bias and Variance Analysis to [25]: We first consider the implication of the stochastic continuity Xb (k) = G(k)FH U(k)y, (28) e constraint of Eq. (9) on the evolution of the orthogonal incre- (k) where F and U are block-diagonal matrices with F1 and ment processes governing the time series data. We first assume (k) (k) Uk := diag[u ] as the diagonal blocks, respectively, and that the parameters α = α, for all k = 1, 2, ··· ,K. Suppose G is a weighting matrix which depends only on Q∞ = that the data in window n has a Cramer´ representation with an [l] [l] liml→∞ Q , α∞ = liml→∞ α , and window length, W . orthogonal increment process dzn(f), n = 1, 2, ··· ,N. Then, The rows of G(k)FH U(k) form a filter bank whose output one way to achieve the stochastic continuity of Eq. (9) is to is equivalent to the time-frequency representation. assume: dz (f) = αdz (f) + d (f), (30) In order to continue our analysis, we make two assumptions n+1 n n where d (f) is a Gaussian orthogonal increment process, common in the analysis of adaptive filters [50], [51]. First, n independent of dz (f). In the forthcoming analysis we also we assume that the parameter estimates Q and α are n ∞ ∞ assume the locally stationarity condition, i.e., the generalized close enough to the true values of Q and α, and therefore Fourier transform of the process remains stationary within replace them by Q and α, i.e., as if the true parameters were each window. This assumption is common in the analysis known. Note that we have discarded the dependence of Q of non-parametric spectral estimators [32], [36]. Finally, we and α on k for lucidity of analysis. Second, noting that α(I − assume a scaling of K,N,W → ∞, B → 0, BW → ρ, for K F ) = αΣ Σ−1 and B = αΣ Σ−1 and m m m|m m|m−1 m m|m m+1|m some constant ρ [52]. The following theorems characterize the that in steady state we have Σm|m := Σ∞ and Σm|m−1 = 2 bias and variance of the DBMT estimator: α Σ∞ + Q, Eq. (27) can be approximated by: N Theorem 1. Suppose that the locally stationary process yt (k) X |s−n| H (k) is governed by orthogonal increment processes evolving ac- xn|N = Λ ΓFs U yes, (29) s=1 cording to the dynamics dzn+1(f) = αdzn(f) + dn(f), with α < 1, where the noise process dn(f), ∀f ∈ (−1/2, 1/2] 2 −1 for 1  n  N, where Λ = αΣ∞(α Σ∞ + Q) , and is a zero-mean Gaussian increment process with variance Γ = (α2Σ + Q)[I − rW ((α2Σ + Q)−1 + rW I)−1]. That ∞ ∞ q(f) > 0, independent of dzn(f). If the process is corrupted is, for values of n far from the data boundaries, the weighting by additive zero-mean Gaussian noise with variance σ2, then matrix is equivalent to a weighted set of dpss tapers in matrix for f ∈ {f1, f2, ··· , fJ }, the DBMT estimate satisfies: form acting on all the data windows with an exponential decay  K  with respect to the nth window. As an example, the equivalents E 1 X [Dbn(f)] − D(f) 6 1 − λk κn(f) sup{D(f)} filters of the DBMT estimator corresponding to the first taper K f 11 9 300 k=1 for the Hz and Hz frequencies around secs, from 2 the synthetic data example are shown in Fig. 6. They are + |1 − κn(f)|D(f) + µn(f)σ + κn(f)o(1), also compared to the equivalent filters corresponding to first where λk is the eigenvalue associated with the kth PSWF, taper of standard MT method in the frequency domain. As D(f) := q(f)/(1 − α), and κn(f), µn(f) are functions of α apparent from Fig. 6, the weighting matrix sets the gain of and q(f) and explicitly given in the proof. these filters in an adaptive fashion across all windows, unlike Theorem 2. Under the assumptions of Theorem 1, the vari- the standard MT method which only uses the data in window ance of the DBMT estimate D (f) satisfies: n. In addition, the filter corresponding to frequency of 9Hz, bn which is negligible in the data, is highly attenuated, resulting " #2 n o 2 2 in significant noise suppression. In this sense, the proposed Var Dbn(f) sup{κn(f)D(f) + µn(f)σ } . 6 K estimation method can be identified as a data-driven denoising f 12

A 0.85 B 1.2 1

0.845 1.1

0.84 0.9 1.0

0.835 α

0.9 0.83 0.8

0.825 0.8 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0.7 -1 0 1 10 10 10 2 Fig. 7: (A) µn versus α, (B) κn and its upper/lower bounds versus q/σ α for N = 100, n = 50, and q/σ2 = 10. 2 Fig. 8: α corresponding to κn ≈ 1 against q/σ The proofs of Theorems 1 and 2 integrate the treatment of and hence obtain lower variance than that of the standard [52] with the structure of the FIS estimates, and are presented MT estimate. Fig. 8 illustrates this statement by showing the 2 in Appendix A. In order to illustrate the implications of these value of α for which κn ≈ 1 vs. qn/σ . For models with theorems, several remarks are in order: high temporal dependence (i.e., α close to 1), it is possible Remark 1. The function κn(f) controls the trade-off be- to achieve κn < 1 and hence reduce the estimator variance, tween bias and variance: for values of κn(f) < 1, the bound due to the increase in the weight of data pooled from adjacent 2 on the variance decreases while the bias bound increases, and windows, even for small values of qn/σ (i.e., low SNR). for κn(f) ≈ 1, all the terms in the bias bound become negligi- However, this reduction in variance comes with the cost of ble, while the variance bound increases. The function µn(f), increasing the bias. On the contrary when the data across on the other hand, reflects observation noise suppression in windows have low temporal dependence (i.e., α  1), it both the bias and variance. Note that these upper bounds are is only possible to achieve a reduction in variance for high 2 tight and achieved for a signal with flat spectrum. values of qn/σ (i.e., high SNR). This is due to the fact that Remark 2. The bias and variance bounds of [52] for the at low SNR with low temporal dependence, pooling data from standard MT method can be recovered by setting κn(f) = 1, adjacent windows is not beneficial in reducing the variance in 2 µn(f) = 1, and σ = 0 in the results of Theorems 1 and a particular window, and indeed can result in higher bias. 2, i.e., in the absence of signal dynamics and measurement Remark 3. Even though the parameter α is estimated in a noise. For the DBMT estimator, signal and measurement noise data-driven fashion, it can be viewed as a tuning parameter variances, respectively contribute to the bias/variance upper controlling the bias-variance trade-off, given the foregoing bounds in different fashions through κn(f) and µn(f), due discussion. For a given SNR, fixing α at a small value can to the distinction of the signal and measurement noise in our help reduce the variance but with a cost of increasing bias, state-space model. In contrast, in the standard MT method, and vice versa. In light of this observation, Fig. 8 can be possible measurement noise is treated in the same way as the thought of as a guideline for choosing α to achieve κn ≈ 1, true data, and hence both the signal and noise variances have so that the estimator is nearly unbiased, and achieves a lower equal contributions in the estimator bias/variance. variance than that of the standard overlapping MT estimator The functions κn(f) and µn(f) do not have closed-form due to the noise suppression virtue of the state-space model. expressions with respect to the state-space parameters α, σ2 Although we focused on the case of flat spectrum in the and q(f). In order to illustrate the roles of κn(f) and µn(f), foregoing discussion, it is possible to numerically compute we consider the scenario under which the upper bounds on the these trade-off curves for more general cases, given the general bias and variance are achieved, i.e., q(f) being independent expressions for κn(f) and µn(f) given in Appendix A. of f, and hence κn(f) = κn and µn(f) = µn, ∀f. In this Remark 4. Extending Theorems 1 and 2 to the log-DBMT scenario, even though the dependence of µn and κn on the algorithm is not straightforward, due to the high nonlinearity state-space parameters are quite involved, it is possible to of the underlying state-space models. However, under the com- obtain upper and lower bounds on κn and µn. As it is shown in mon Gaussian approximation of the log-posterior density, the Proposition 3 in Appendix B, the main parameters determining well-known variance reduction property of the fixed interval the behavior of κn and µn are qn/σ (i.e., the SNR) and α (i.e., smoother [41], [51] carries over to the estimator of log Sn(f). temporal signal dependence). Here, we present a numerical That is, the variance of the log-DBMT estimate of log Sn(f) example for clarification. Fig. 7A and B shows the plot of µn obtained using the state-space model is lower than that of the 2 vs. α and κn vs. α for n = 50 and q/σ = 10. It is apparent standard MT estimate which only uses the data within window that µn increases with α and does not exceed 1. The fact that n. This fact agrees with our earlier observation in Section µn < 1 implies that the DBMT estimator achieves a higher IV-A regarding the tightening of the confidence intervals for noise suppression compared to the standard MT method. This log-DBMT as compared to the overlapping MT estimates. fact agrees with the noise suppression performances observed in Section IV. VI.CONCLUDING REMARKS

Fig. 7B shows the plot of κn vs. α, which exhibits a similar Spectral analysis of non-stationary time series data poses increasing trend, but eventually exceeds 1. This result implies serious challenges for classical non-parametric techniques, in that with a careful choice α, it is possible to achieve κn < 1, which temporal smoothness of the spectral representations 13 are implicitly captured using sliding windows with overlap. Taking the expectation of both sides and after some simplifi- This widely-practiced approach tends to ignore the inherent cation, one arrives at: smoothness of the data and is not robust against measure- 2 K N N 0 h i E η (fj) X X X |s−n| |s −n|E (k) (k) ∗ ment/sampling noise. In this paper, we address these issues [Dbn|N (fj)] = γ(fj) γ(fj) (y )j(y 0 ) K es es j and provide an alternative to the sliding window spectro- k=1 s=1 s0=1  N N  gram analysis paradigm. We propose two semi-parametric 2 X X |s−n| |s0−n| |s−s0| = η(fj) γ(fj) γ(fj) α × spectrogram estimators, namely the DBMT and log-DBMT s=1 s0=1 K estimators, by integrating techniques from MT analysis and 1 X Z 1/2 U (f − β)D(β)U ∗(f − β)dβ Bayesian estimation. To this end, we explicitly model the K k j k j k=1 −1/2 temporal dynamics of the spectrum using a state-space model  N  2 X 2|s−n| 2 over the spectral features obtained by multitapering. Therefore + η(fj) γ(fj) σ . (34) our algorithms inherit the optimality features of both Bayesian s=1 estimators and MT analysis. Using the orthogonality of the PSWFs as in [52], and using Our algorithms admit efficient and simple implementations, the fact that D(β) sup D(f), ∀β, we get: thanks to the Expectation-Maximization algorithm and the 6 f well-known fixed interval state-space smoothing procedure. E | [Dbn|N (f)] − κn(f)D(f)| 6 κn(f)(sup{D(f)} − D(f))× Unlike existing approaches, our algorithms require no a priori f K assumptions about the structure of the spectral representation  1 X  1 − λ + µ (f)σ2 + κ (f)o(1), (35) and operate in a fully data-driven fashion. While both algo- K k n n k=1 rithms yield spectral estimates that are continuous in time, N N 2 X X |s−n| |s0−n| |s−s0| by design the DBMT algorithm significantly suppresses the where κn(f) := η(f) γ(f) γ(f) α , measurement noise in forming the spectrogram and the log- s=1 s0=1 DBMT algorithm mitigates sampling noise due to small obser- 2 PN 2|s−n| vation length. We establish the performance gains provided by and µn(f) := η(f) s=1 γ(f) , for f ∈ our algorithms through theoretical analysis of the bias-variance {f1, f2, ··· , fJ }. Using the triangle inequality, the bound of E trade-off, as well as application to synthetic and real data from Theorem 1 on | [Dbn|N (f)] − D(f)| follows.  human EEG recording during sleep and ENF signals from B. Proof of Theorem 2 audio recordings. Using the notation of Appendix A, we have: APPENDIX A n (k) (l) 0 o PROOFSOF THEOREMS 1 AND 2 Cov Dbn (f), Dbm (f ) = N Recall from Eq. (29) that the kth eigen-coefficient estimate 4 X |s−n| |s0−n| 0 |t−m| 0 |t0−m| at window n can be written in terms of the observed data as: η(f) γ(f) γ(f) γ(f ) γ(f ) × s,s0,t,t0=1 N  ZZ |s−t| |s0−t0| 0 0 (k) X |s−n| H (k) α α Uk(f − β)Ul(f + β)Uk(−β − f)× xbn|N = Λ ΓFs yes , (31) 0 0 2 0 2 0 0 0 s=1 Ul(β − f )(D(β) + σ δ(s − t))(D(β ) + σ δ(s − t ))dβdβ + ZZ (k) (k) (k) |s−t0| |s0−t| 0 0 where yes = u yes = U yes. Given that Q is a α α Uk(f − β)Ul(β − f )Uk(−β − f)×  diagonal matrix with elements q(fj), j = 1, 2, ··· ,J, it can 0 0 2 0 0 2 0 0 Ul(β + f )(D(β) + σ δ(s − t ))(D(β ) + σ δ(s − t))dβdβ (36) be shown that Σ∞, Λ and Γ are also diagonal matrices. Denoting the elements of Σ , Γ and Λ, respectively by ∞ Note that we have omitted the integral limits, as they are τ(f), η(f) and γ(f), for f ∈ {f1, f2, ··· , fJ }, we have 2 understood to be same as in (1) henceforth. After summing η(f) = α τ(f)+q(f) . 1+rW (α2τ(f)+q(f)) over all tapers and rearranging the summations, the first expression within the brackets in (36) becomes: A. Proof of Theorem 1 ZZ  N  2 X |s−n| 0 |t−m| |s−t| 2 E ∗ 0 η(f) γ(f) γ(f ) α (D(β)+σ δ(s−t))× First note that Eq. (30) implies that [dzn(f)dzn+t(f )] = t 0 0 s,t=1 α D(f)δ(f−f )dfdf . By invoking the Cramer´ representation,  N  X 0 0 0 0 the covariance of the data tapered by the kth and lth dpss η(f)2 γ(f)|s −n|γ(f 0)|t −m|α|s −t | (D(β0)+σ2δ(s0 −t0))× s0,t0=1 sequences can be expressed as: K K 1/2 X 0 X 0 0 0 0 h i 0 Z Uk(f −β)Uk(−β −f) Ul(f +β)Ul(β −f )dβdβ . (37) E (k) (l) ∗ |s−s | ∗ (yes )j(yes0 )j0 = α Uk(fj − β)D(β)Ul (fj0 − β)dβ k=1 l=1 −1/2 Z 1/2 2 0 ∗ + U (f − β)σ δ(s − s )U (f 0 − β)dβ. (32) Let N k j l j  X  −1/2 A(n, m, f) := η2(f) γ(f)|s−n|γ(f 0)|t−m|α|s−t| D(f) From Eqs. (20) and (31), we get: s,t=1  N  2 K N N 2 X 2|s−n| 2 η (fj) X X X |s−n| |s0−n| (k) (k) ∗ + η(f) γ(f) σ (38) Dbn|N (fj) = γ(fj) γ(fj) (y )j(y 0 ) . (33) K es es j k=1 s=1 s0=1 s=1 14

Using the Schwarz inequality, the integral in Eq. (37) can B. Relation between κn and Q be bounded by: The expressions for κn(f) and µn(f) in Appendix A " ZZ K 2 depend on γ(f) and α. But γ(f) itself depends on q(f) and α, X 0 0 0 Uk(f − β)Uk(−β − f) A(n, m, β)A(n, m, β )dβdβ and it is not straightforward to give a closed-form expression k=1 γ(f) q(f) α K 2 #1/2 of merely in terms of and , since it requires ZZ X 0 0 0 A A 0 0 computation of the filtered error covariance matrix Σ given × Ul(f + β)Ul(β − f ) (n, m, β) (n, m, β )dβdβ , (39) n|n l=1 Q. Again, by invoking the stead-state approximation, and Using bounds on the convolutions of PSWFs from [52], and defining Σn|n−1 =: Σ, ∀n = 1, 2, ··· ,N, the matrix Σ can be A 2 upper bounding (n, m, β) by supf {κn(f)D(f)+µn(f)σ }, obtained by solving the following algebraic Riccati equation: the statement of the theorem on the variance of the DBMT 2 2 H 2 H −1 Σ = α Σ − α ΣFn (σ I + FnΣFn ) FnΣ + Q (45) estimate Dbn(f) follows.  and thereby the steady-state error covariance matrix Σn|n =: APPENDIX B 1 Σ∞ is given by Σ∞ = α2 (Σ − Q). CHARACTERIZATION OF κn(f): BOUNDS AND PARAMETER Although this procedure can be carried out numerically, in DEPENDENCE general it is not possible to solve the Riccati equation to get A. Lower and Upper Bounds on κn(f) a closed-form expression for arbitrary Q. In order to illustrate Consider the scenario where q(f) = q, i.e., flat spectrum. the explicit dependent of κn(f) on the state-space parameters, Then, the dependent of γ(f) and κn(f) on f is suppressed. we consider the case of flat spectrum where Q = qI. In this We have the following bound on κn: case, it can be shown that Σ = ζI for some ζ > 0 and the matrix equation (45) reduces to a simpler scalar equation for Proposition 3. For 0 < γ, α < 1, the quantity κ can be n ζ, given by: bounded as: ζ ζ  ζ  rW (ζ/σ2)   q  γ 21 + αγ − 2(αγ)N γ   γ 2 γ = α2 1 − 1 − rW + . (46) κ − 1 − T + 1 − , 2 2 2 2 2 n 0 2 6 2 σ σ σ rW (ζ/σ ) + 1 σ α 1 − αγ (1 − γ) α (1 − γ) Following simplification, a quadratic equation for ζ results, 1 + γ2 − γ2n − γ2(N−n+1) and since ζ 0, the positive solution for ζ is given by: where T0 := 2 . > 1 − γ " r # ζ 1  q   q 2 q = − 1 − α2 − rW + 1 − α2 − rW + 4rW . (47) Proof: To get an upper bound on κn, we rewrite the σ2 2rW σ2 σ2 σ2 expression defining κn as: 1 q/σ2 N N N−1 N Then, γ can be computed as γ = (1 − 2 ). Using these X X 0 0 X X α ζ/σ κ = η2 γ|s−n|γ|s −n|α|s−s | = η2 α|t| γ|s−n|γ|s−t−n|. n expressions for γ and α, the functions κn and µn can be s=1 s0=1 t=−N+1 s=1 computed. Fig. 7B shows the upper and lower bounds on κn PN |s−n| |s−t−n| 2 Now, let us define Tt := s=1 γ γ . Then it can evaluated for n = 50 and different values of α for q/σ = 10. be verified that: The upper and lower bounds are simple functions of α and ( 2 = γT , when t N − n q/σ and can be used to further inspect the performance trade- T t > t+1 t+1 (40) DBMT 6 γTt + γ , when 0 6 t < N − n offs of the algorithm with respect to the state-space model parameters. and ( = γTt, when t 6 −n + 1 ACKNOWLEDGMENT Tt−1 |t−1| (41) 6 γTt + γ , when − n + 1 < t < 0. We would like to thank Adi Hajj-Ahmad and Min Wu for Also, we have: providing us the ENF data from [49]. N−1 N−1 N−n−1 N−1 X t X t t X t X t α Tt 6 α γ T0 + tγ + (N − n)γ REFERENCES t=0 t=0 t=1 t=N−n N [1] P. Das and B. Babadi, “A Bayesian multitaper method for nonstationary 1 − (αγ) γ data with application to EEG analysis,” in IEEE Signal Processing in 6 T0 + . (42) 1 − αγ (1 − γ)2 Medicine and Biology Symposium (SPBM17), December 2, Philadel- Similarly, we have: phia, PA, 2017. [2] T. F. Quatieri, Discrete-time Speech Signal Processing: Principles and 0 N Practice. Prentice Hall, 2008. X |t| 1 − (αγ) γ α T T + , (43) [3] J. S. Lim, “Two-dimensional signal and image processing,” Englewood t 6 1 − αγ 0 (1 − γ)2 t=−N+1 Cliffs, NJ, Prentice Hall, 1990, 710 p., 1990. [4] G. Buzsaki, Rhythms of the Brain. Oxford University Press, 2009. which along with (42) leads to the claimed upper bound. For [5] W. J. Emery and R. E. Thomson, Data analysis methods in physical the lower bound, we use the fact that oceanography. Elsevier Science, 2001. ( [6] M. Ghil, M. Allen, M. Dettinger, K. Ide, D. Kondrashov, M. Mann, A. W. T when t 0 γT = 6 t+1 > , (44) Robertson, A. Saunders, Y. Tian, F. Varadi et al., “Advanced spectral t Reviews of Geophysics 6 Tt−1 when t 6 0 methods for climatic time series,” , vol. 40, no. 1, p. 1003, 2002. N ¨ PN−1 t 1−(αγ) [7] O. Yilmaz, Seismic data analysis: processing, inversion, and interpre- which implies t=0 α Tt > 1−αγ T0 and tation of seismic data. SEG Books, 2001, no. 10. N P0 α|t|T 1−(αγ) T . Using the latter lower [8] D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. t=−N+1 t > 1−αγ 0 IEEE, vol. 70, no. 9, pp. 1055–1096, Sept 1982. bounds for κn yield the claimed lower bound. 15

[9] D. B. Percival and A. T. Walden, Spectral Analysis for Physical [37] J. L. Powell, “Estimation of semiparametric models,” Handbook of Applications. Cambridge University Press, 1993. econometrics, vol. 4, pp. 2443–2521, 1994. [10] T. P. Bronez, “On the performance advantage of multitaper spectral [38] O. Rosen, D. S. Stoffer, and S. Wood, “Local spectral analysis via a analysis,” IEEE Trans. Signal Process, vol. 40, no. 12, pp. 2941–2946, Bayesian mixture of smoothing splines,” J. Amer. Statist. Assoc., vol. 1992. 104, no. 485, pp. 249–262, 2009. [11] L. Cohen, Time-Frequency Analysis. Englewood Cliffs, NJ: Prentice- [39] R. L. Prentice, “A log gamma model and its maximum likelihood Hall, 1995. estimation,” Biometrika, vol. 61, no. 3, pp. 539–544, 1974. [12] W. Martin and P. Flandrin, “Wigner-ville spectral analysis of nonsta- [40] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood tionary processes,” IEEE Transactions on Acoustics, Speech, and Signal from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B Stat. Processing, vol. 33, no. 6, pp. 1461–1470, 1985. Methodol., pp. 1–38, 1977. [13] M. B. Priestley, “Evolutionary spectra and non-stationary processes,” J. [41] H. E. Rauch, C. T. Striebel, and T. F., “Maximum likelihood estimates of R. Stat. Soc. Ser. B Stat. Methodol., pp. 204–237, 1965. linear dynamic systems,” AIAA Journal, vol. 3, pp. 1445–1450, August [14] G. Matz, F. Hlawatsch, and W. Kozek, “Generalized evolutionary spec- 1965. tral analysis and the Weyl spectrum of nonstationary random processes,” [42] P. D. Jong and M. J. Mackinnon, “Covariances for smoothed estimates IEEE Trans. Signal Process., vol. 45, no. 6, pp. 1520–1534, 1997. in state space models,” Biometrika, vol. 75, no. 3, p. 601, 1988. [15] G. Matz and F. Hlawatsch, “Nonstationary spectral analysis based [43] K. Lange, Optimization. Springer, 2004. on time-frequency operator symbols and underspread approximations,” [44] L. Fahrmeir, “Posterior mode estimation by extended Kalman filtering IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1067–1086, 2006. for multivariate dynamic generalized linear models,” J. Amer. Statist. [16] W. Kozek, “Matched Weyl-Heisenberg expansions of nonstationary Assoc., vol. 87, no. 418, pp. 501–509, 1992. environments,” Ph.D. dissertation, University of Technology Vienna, [45] Dynamic Bayesian Multitaper Spectral Estimators. Available on 1997. GitHub Repository: https://github.com/proloyd/DBMT, 2017. [17] J. Hammond and P. White, “The analysis of non-stationary signals using [46] L. D. Gennaro and M. Ferrara, “Sleep spindles: an overview,” Sleep time-frequency methods,” Journal of Sound and Vibration, vol. 190, Medicine reviews, vol. 7, no. 5, pp. 423–440, 2003. no. 3, pp. 419 – 447, 1996. [47] D. A. McCormick and T. J. Sejnowski, “Thalamocortical oscillations in [18] B. J. W. S. Rayleigh, The collected optics papers of Lord Rayleigh. the sleeping and aroused brain,” Science, vol. 262, no. 5134, pp. 679–85, Optical Society of America, 1994. October 1933. [19] B. Efron, The jackknife, the bootstrap and other resampling plans. [48] R. Garg, A. L. Varna, A. Hajj-Ahmad, and M. Wu, “Seeing ENF: Power- SIAM, 1982. signature-based timestamp for digital multimedia via optical sensing and [20] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.- signal processing,” IEEE Trans. Inf. Forensics Security, vol. 8, no. 9, pp. C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition 1417–1432, 2013. and the Hilbert spectrum for nonlinear and non-stationary time series [49] A. Hajj-Ahmad, R. Garg, and M. Wu, “Spectrum combining for ENF analysis,” Proceedings of the Royal Society of London A: Mathematical, signal estimation,” IEEE Signal Process. Lett., vol. 20, no. 9, pp. 885– Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903–995, 888, Sept 2013. 1998. [50] S. Haykin, Adaptive filter theory, ser. Prentice-Hall information and [21] I. Daubechies, J. Lu, and H.-T. Wu, “Synchrosqueezed wavelet trans- system sciences series. Prentice Hall, 1991. [51] B. D. Anderson and J. B. Moore, Optimal filtering. Englewood Cliffs, forms: An empirical mode decomposition-like tool,” Appl. Comput. Harmon. Anal., vol. 30, no. 2, pp. 243–261, 2011. 1979. [52] K. Lii and M. Rosenblatt, “Prolate spheroidal spectral estimates,” [22] I. Daubechies, Y. G. Wang, and H.-t. Wu, “ConceFT: concentration of Statistics & Probability Letters, vol. 78, no. 11, pp. 1339–1348, 2008. frequency and time via a multitapered synchrosqueezed transform,” Phil. Trans. R. Soc. A, vol. 374, no. 2065, p. 20150193, 2016. [23] J. Xiao and P. Flandrin, “Multitaper time-frequency reassignment for nonstationary spectrum estimation and chirp enhancement,” IEEE Trans. Signal Process, vol. 55, no. 6, pp. 2851–2860, 2007. [24] M. Jachan, G. Matz, and F. Hlawatsch, “Time-frequency ARMA models and parameter estimators for underspread nonstationary random pro- cesses,” IEEE Trans. Signal Process., vol. 55, no. 9, pp. 4366–4381, Sept 2007. [25] D. Ba, B. Babadi, P. L. Purdon, and E. N. Brown, “Robust spectrotem- poral decomposition by iteratively reweighted least squares,” Proc. Natl. Acad. Sci., vol. 111, no. 50, pp. E5336–E5345, 2014. [26] L. Fahrmeir and G. Tutz, Multivariate statistical modelling based on generalized linear models. Springer Science & Business Media, 2013. [27] R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the EM algorithm,” Science, vol. 3, no. 4, pp. 653– 264, July 1982. [28] A. C. Smith and E. N. Brown, “Estimating a state-space model from point process observations,” Neural Comput., vol. 15, pp. 965–991, 2003. [29] G. Kitagawa, “Non-Gaussian state-space modeling of nonstationary time series,” J. Amer. Statist. Assoc., vol. 82, no. 400, pp. 1032–1041, 1987. [30] T. Bohlin, “Analysis of EEG signals with changing spectra using a short- word kalman estimator,” Math. Biosci., vol. 35, no. 3-4, pp. 221–259, 1977. [31] M. Loeve,` Probability Theory. London: D. Van Nostrand Co., 1963. [32] D. J. Thomson, “Multitaper analysis of nonstationary and nonlinear time series data,” in Nonlinear and nonstationary signal processing. London, UK: Cambridge Univ. Press, 2000, pp. 317–394. [33] F. Hlawatsch and F. Auger, Time-frequency analysis. John Wiley & Sons, 2013. [34] M. B. Priestley, Spectral analysis and time series. Academic press, 1982. [35] D. Slepian, “Prolate spheroidal wave functions, , and uncertainty-V: the discrete case,” Bell Syst. Tech. J., vol. 57, no. 5, pp. 1371–1430, May 1978. [36] B. Babadi and E. N. Brown, “A review of multitaper spectral analysis,” IEEE Trans. Biomed. Eng., vol. 61, no. 5, pp. 1555–1564, May 2014.