arXiv:1906.09359v1 [cs.IT] 22 Jun 2019 aetpoes vltoayseta est arx mul observations matrix, spiking density binary spectral analysis, evolutionary process, latent plcto ora aa hc eelsgicn an nt in gains a significant trade-off. studies reveal bias-variance which simulation the data, of using real methods to propo application existing establi our of We with performance pr the theory. technique the compare of process we trade-off Finally, bias-variance point estimator. the on on bounds th theoretical based processes non-stationary mult activity, evolut latent the to spiking the extract applied of to density directly order spectral in be spec observations multitaper can spiking t a variate that In developing by problem. methodology issue proc challenging estimation this a non-stationary address is we latent observations paper, spiking of on representation based studied, well spectral sp is While time-series the understa functions. continuous cognitive to of estimation mediate key tral rhythms is brain activity the spiking how underlie that processes utvraecniuu inl n aiu ehd aeb have methods various and signals continuous multivariate ern,Uiest fMrln,CleePr,M 04.E- 20742. MD Park, College Maryland, of University neering, aaSineWrso,Mnepls N S,Jn –,2019 2–5, June a USA, part MN, in Minneapolis, presented Workshop, was Science work Insti Data This National 1U19NS107464-01. the No. and award 1807216 No. Award Foundation Science behtash ri tt rbhvoa yais[2,[16]. [12], dynamics t behavioral to or corresponding state changes are brain rapid spiking second-order exhibit neuronal may drive be and that stationary to oscillations kn is brain process it the However, that period. latent observation the the during stationary assume and [15]. [14], vations in proposed spectral been recently the fr have PSD in data the spiking estimate distortion directly to undergo methods Alternative spe domain. estimates power resulting (PSD) the spiking activity, density spiking latent the drive the that recovering capture cesses for us [11]–[13] procedures to techniques smoothing existing [8]–[10] time-domain by the theory to Due process statistics. point use dat spectr spiking experimental [8]. function, neuronal required such brain such for of exploit tailored mechanisms techniques to oscillations analysis the order inferring these In for probe scale. data opened to neuronal has opportunity the brain of at the window in unique locations a multiple rec from electrophysiology [5]–[7] high-density ings of advent The [2]–[4]. N h uhr r ihteDprmn fEetia Computer & Electrical of Department the with are authors The ne Terms Index Abstract o-ttoaytm eisaayi a enwl tde f studied well been has analysis series time Non-stationary obser- spiking univariate consider methods existing These data neuronal of analysis spectral for techniques Existing uttprAayi fEouinr pcr from Spectra Evolutionary of Analysis Multitaper eitn h ontv n oo ucin ftebrain the of in functions role motor significant and a cognitive play the to mediating known are oscillations Eural } udeu hswr a enspotdi atb h Nationa the by part in supported been has work This @umd.edu. Etatn h pcrlrpeettoso h neural the of representations spectral the —Extracting Pitpoesmdl utvraennstationary non multivariate model, process —Point .I I. utvraeSiigObservations Spiking Multivariate NTRODUCTION nttaaRpsnh n ets Babadi Behtash and Rupasinghe Anuththara ue fHealth of tutes mails: h IEEE the t inferring [1]. tdrive at oposed titaper ionary nding are a erms esses { non- Engi- ctral own pro- ord- raar, tral een om sed his ec- nd he ed sh or al i- l hrceiaino h neligsetoeprlcoupl resolutio spectrotemporal high LFP underlying a the the providing of between by characterization relation dynamics, corrob- spiking the and method on signals proposed hypotheses field Our existing local undergoing [14]. orates subject and [12], human anesthesia spike a general from multi-unit recordings deriving to (LFP) ap for potential we estimator methods Finally, proposed data. used our spiking metho widely from proposed the representations the spectral of that two reveal outperforms studies neu These by inspired oscillations. multivariate are two dynamics non-stationary whose present processes, using toregressive We studies data. case simulated neuronal experimentally-recorded applic and to studies simulation through techniques existing multi classical bias- the recove the of framework. and properties on estimator, asymptotic favorable proposed bounds the our the theoretical of given construct establish performance to states variance We use smoothed then matrix. we and ESD which variables (MAP) observations, posteriori latent spiking the a the maximum of the process estimate computing latent efficiently algorith multivariate (EM) for the Expectation-Maximization an of derive We eigen-spectra [25]. per the states underlying to the ing with spectra, evolutionary the of hc r nw o hi iia pcrllaae[29]–[31 leakage spectral minimal the their for of t known data usage are as which the [28] (dpss) to sequences spheroidal due bias prolate discrete [27], desirable performance with trade-off technique variance estimation spectral PSD of widely-used Multitaperi framework. problem a multitapering the a l pose within the estimation then to We links logistic processes. multipl with continuous as processes point observations of spi spiking matrix realizations given the process, (ESD) model We latent density observations. non-stationary spectral multivariate a evolutionary of the estimate analysis. data to neuronal modern desired of highly demands non that but emerging lacking, the approach by to is driven unified processes when observations latent A spectra stationary spiking [22]. power multivariate stationary classical spect considers are the Fourier processes to stationary the reduces of physic and a that has analysis, to the characterization time similar quantify of This interpretation instant process. to non- each multivariate order at a a distributions in energy defines matrix spectral power which local density evolutionary [23], spectral the [22], stationary is characterization example spectral notable distribut One energy-frequency-time the [17]–[26]. quantify to proposed h eto h ae sognzda olw:W rsn our present We follows: as organized is paper the of rest The ecmaetepromneo u rpsdmto to method proposed our of performance the compare We dynamics the characterize to model state-space a employ We nti ok ecoeti a ydvlpn framework a developing by gap this close we work, this In apers, ings. ation taper gis ng atent tain- king ions due ply au- for ral ral es m al ]. n d e 1 - - r 2

problem formulation in Section II, followed by the proposed where ck,j,n = Ak,j (ωn) aj,n and dk,j,n = Ak,j (ωn) bj,n estimation framework in Section III. We provide our theoret- are real-valued random variables. Further, the evolutionary ical results on the bias-variance performance of the proposed spectrum [22] at time k is defined as estimator in Section IV. Our simulation studies are presented ψ (ω ) dω = A (ω ) 2 E dZ (ω ) 2. in Section V, followed by application to experimentally- k,j n n | k,j n | | j n | recorded data in Section VI. Finally, we close the paper by Hence, according to our model, the ESD function can be our concluding remarks in Section VII. approximated by, π II. PROBLEM FORMULATION ψ (ω )= E[(c + id )(c id )]. k,j n N k,j,n k,j,n k,j,n − k,j,n Let N(t) and H(t) denote the point process representing the number of spikes and spiking history of a neuron in [0,t), The Matrix of a multivariate stationary 1 ∞ −iℓω respectively, where t [0,T ] and T denotes the observation random process is defined as, Ψ(ω)= 2π ℓ=−∞ e Γ(ℓ), duration. The Conditional∈ Intensity Function (CIF) [32] of a where Γ( ) is the covariance matrix of the process [33]. Fur- · P point process N(t) is defined as: ther, considering a vector valued orthogonal increment process ⊤ Z(ω) = [Z1(ω),Z2(ω), ,ZJ (ω)] , the spectral density P [N(t + ∆) N(t)=1 Ht] ··· E H λ(t Ht) := lim − | . (1) matrix is characterized by Ψ(ω)dω = [dZ(ω)dZ(ω) ]. We | ∆→0 ∆ extend this to the evolutionary spectra, and formulate the ESD To discretize the continuous process, we consider time bins matrix at time k and frequency ωn for our model as, of length ∆, small enough that the probability of having two π Ψ (ω )= E[(c + id )(c id )⊤], or more spikes in an interval of length ∆ is negligible. Thus, k n N k,n k,n k,n − k,n the discretized point process can be modeled by a Bernoulli ⊤ where ck,n = [ck,1,n, ck,2,n, ,ck,J,n] and dk,n = process with success probability λk := λ(k∆ Hk)∆, for 1 | ≤ [d , d , , d ]⊤. ··· k K, where K := T/∆ is an integer (with no loss of k,1,n k,2,n ··· k,J,n ≤ Further, we assume the processes to be quasi-stationary generality). We refer to λk as CIF hereafter for brevity. In a similar fashion, we consider spiking observations [25], the J-variate random process to be jointly stationary K in windows of length W . The total data duration, K is from an ensemble of J neurons, with CIFs λk,j k=1, for j = 1, 2, ,J. Suppose that for each neuron,{ L}indepen- divided into M non overlapping segments of length W , with ··· dent realizations of the spiking activity are observed. The K = MW . Thus, the vector process [xk,1, xk,2, , xk,J ] is ··· collection of the binary spiking observations are represented assumed to be jointly stationary for (m 1)W +1 k mW , − ≤ ≤ as (l) L,K,J . We model the th CIF by a logistic link 1 m M. Simplifying the previous model under this nk,j l,k,j=1 j ≤ ≤ { } ⊤ quasi-stationarity assumption, for (m 1)W +1 k mW , to a latent random process, xj = [x1,j , x2,j , , xK,j ] , ··· 1 m M we get, − ≤ ≤ which needs not be stationary in general. Thus, the matrix ≤ ≤ X x x x represents a -variate random N−1 = [ 1, 2, , J ]K×J J 2π process. ··· x = µ + (p cos(ω k) q sin(ω k)), k,j m,j N m,j,n n − m,j,n n Accordingly, for 1 j J, 1 k K and 1 l L, n=1 ≤ ≤ ≤ ≤ ≤ ≤ X we have where p and q are real-valued random variables. (l) m,j,n m,j,n n Bernoulli(λk,j ), (2) k,j ∼ Defining Xm,j = [x(m−1)W +1,j , x(m−1)W +2,j , , ⊤ N ··· where λk,j = logistic(xk,j )=1/(1+exp( xk,j )). Our goal xmW,j ] , vm,j = [ 2π µm,j, pm,j,1, qm,j,1, ,pm,j,N−1, − ⊤ ¯ ··· is to estimate the time variant power spectral density of each qm,j,N−1] , Xm = [Xm,1, Xm,2, , Xm,J ]W ×J , ··· process x , for 1 j J, and the time variant cross spectra Vm = [vm,1, vm,2, , vm,J ](2N−1)×J and Am as in j ≤ ≤ ··· ¯ between each pair of processes. Following the formation of Eq. (4), we get Xm,j = Amvm,j and Xm = AmVm. ⊤ Priestley’s Evolutionary Spectra [22], each random process Further, we define pm,n = [pm,1,n,pm,2,n, ,pm,J,n] , ⊤ ··· xk,j , with mean µk,j , will have a representation of the form, qm,n = [qm,1,n, qm,2,n, , qm,J,n] , wm,0 = ⊤ ··· ⊤ ⊤ ⊤ π [µm,1,µm,2, ,µm,J ] , wm,n = [pm,n , qm,n]2J×1 for 1 ikω ··· ⊤ ⊤ ⊤ ⊤ ≤ x µ = e A (ω) dZ (ω), (3) n N 1 and wm = [wm,0, wm,1, , wm,N−1]J(2N−1)×1. k,j − k,j k,j j ≤ − ··· Z−π Note that wm is the vectorization of the matrix Vm and where Ak,j (ω) is the time-varying amplitude function and both are equivalent representations, for the discrete parameter dZj (ω) is an orthogonal increment process. To define a harmonic process that drives the spiking observations in the discrete-parameter harmonic process, we approximate Zj (ω) time window m. by a jump process over N frequency bins [14], and thereby Accordingly, the ESD matrix, π replace dZj (ωn) with (aj,n + ibj,n), at ωn = nπ/N, 1 π N ≤ Ψ (ω )= E[(p + iq )(p iq )⊤], (5) n N 1. Given the random process is real, using the m n N m,n m,n m,n − m,n symmetry≤ −Z (ω) = Z ( ω), we express the discretization j j can be determined if E[w w⊤ ] is estimated for 1 of Eq. (3) as − m,n m,n n N 1. Thus, the task of determining the evolutionary≤ power≤ spectra− of the J-variate random process can be reduced N−1 E ⊤ 2π to computing [wmwm], for m = 1, 2, ,M, given the xk,j = µk,j + (ck,j,n cos(ωnk) dk,j,n sin(ωnk)), (l) ··· N − ensemble of spiking data = n L,K,J . n=1 k,j l,k,j=1 X D { } 3

1 cos (m−1)W +1 π sin (m−1)W +1 π ... cos (N−1)((m−1)W +1) π sin (N−1)((m−1)W +1) π N − N N − N (m−1)W +2 (m−1)W +2 (N−1)((m−1)W +2) (N−1)((m−1)W +2) 2π 1 cos  π sin  π ... cos  π sin  π A := N − N N − N . (4) m N .  .   .  .  .   .  ......     mW mW (N−1)mW (N−1)mW  1 cos N π sin N π ... cos N π sin N π   − −          (l) III. MULTITAPER ESTIMATE OF THE SPECTRAL DENSITY Given that n Bernoulli(λ ), with λ = k,j ∼ k,j k,j MATRIX logistic(xk,j ) := 1/(1 + exp( xk,j )), we have xk,j = logit(λ ) := log(λ /(1 λ −)) and, A. The Multitaper Framework k,j k,j − k,j L Multitapering is a technique widely used in power spec- 1 n = n(l) logistic(x ) a.s. tral estimation of stationary random processes, to overcome k,j L k,j → k,j bias and variance limitations of conventional Xl=1 [27]. The multitaper spectral estimate of a , by the strong law of large numbers, for nk,j = 0, 1. (l) L 6 x , x , , x is defined as, Further, consider the log-likelihood log f( nk,j l=1 xk,j ) = 1 2 ··· K { } | P L nk,j xk,j log (1 + exp(xk,j ) , in Eq. (8). We notice that mt 1 − S (ω) = y(p)(ω) 2 with, n o P | | (l) L p=1 ∂ log f( nk,j l=1 xk,j ) X (6) { } | = 0, K ∂xk,j (p) (p) −iωk xk,j =logit(nk,j ) y (ω) = νk xk e , which implies that logit(n ) is the maximum likelihood Xk=1 k,j (p) th th estimator of xk,j . Thus, we take logit(nk,j ) as an estimator where νk is the k time sample of the p discrete prolate spheroidal sequence (dpss) [28], for 1 p P . The of xk,j . By extending this argument to the tapered processes ≤ ≤ (p) K,J,P dpss tapers are a set of orthogonal tapers that maximally ν xk,j , we can similarly take { k mod W +1 }k,j,p=1 concentrate their spectral power within a design bandwidth (p) (p) (9) of [ ξfs , ξfs ], for some positive constant ξ. While originally (nk,j ) = logistic ( νk mod W +1 logit(nk,j ) ), − K K designed for univariate time series, multitapering has been to be the estimator of the ensemble mean that would have likewise extended to multivariate time series [34]. Given a been generated if the random process had been filtered by the second order jointly stationary J-dimensional random process pth data taper, for 1 k K. Note that when n = 0 ≤ ≤ k,j x1, x2, , xK , where, xk = [xk,1, xk,2, , xk,J ], the multi- or n = 1, the function logit(n ) is not defined. In such ··· ···th k,j k,j taper cross-spectral estimate between the r process and the (p) th cases, we directly estimate (nk,j ) by nk,j . t process (r, t 1, 2, ,J ) has been defined as, We thus need to compute the evolutionary spectra corre- ∈{ ···P } mt 1 sponding to each of the P tapers, and finally derive the multi- S (ω) = y(p)(ω)(y(p)(ω))∗ where, r,t P r t taper estimate by averaging the P eigen-spectral estimates. In p=1 X (7) the next subsection, we consider estimating the power spectral K density matrix of the untapered process first, and then extend y(p)(ω) = ν(p)x e−iωk. r k k,r it to the case of P tapers by replacing the ensemble average of k=1 X spiking data n with the tapered ensemble mean n(p) , (p) (p) ∗ { k,j } { k,j } The spectral estimate yr (ω)(yt (ω)) corresponding to for p =1, 2, , P . the pth taper is referred to as the pth eigen-spectral estimate. ··· Given that our observed data are binary random variables B. MAP Estimation of the Parameters via the EM Algorithm parameterized by a function of X (and not the actual pro- cess X), determining the multitaper spectral estimate is not In order to capture the evolution of the spectra, we impose a straightforward. However, note that the data log-likelihood, stochastic continuity constraint on the random variables wm, following the discrete state-space model, J K L log f( X)= n(l) x log (1 + exp(x ) wm = Φwm−1 + ηm (10) D| k,j k,j − k,j j=1 X Xk=1 Xl=1 n o where the state transition matrix Φ is a constant matrix J K and ηm (0, Qm). We consider the special case where = L n x log(1+exp(x ) (8) ∼ N k,j k,j − k,j Φ takes the form αI, for simplicity. Nevertheless, Φ can j=1 X Xk=1 n o also be estimated from data within the same Expectation- depends on the observations only through the ensemble mean, Maximization framework that follows next [35]. 1 L (l) n := n , resulting n to be a sufficient statistic. The parameters to be estimated are θ := Qm, 1 k,j L l=1 k,j k,j { ≤ Thus, if one could characterize the effect of tapering the time m M , and the observations are binary spiking data P ≤ (l)} K,J,L series on the ensemble mean, it would be possible to form the = nk,j k,j,l =1. Considering the variable V = Vm, 1 multitaper spectral estimate of the unobserved time series. mD {M to} be the hidden data, we aim at recovering{ θ via≤ ≤ } 4

m m ˆ(r) m (l) mW,J,L a MAP estimation problem. First, the log-likelihood of the f( Vs s=1 1 , θ ), where 1 = nk,j k,j,l =1 . From complete data has the form Bayes’s{ } theorem|D we see thatD this is{ proportional} to the m m ˆ(r) product of the two distributions, f( 1 Vs s=1, θ ) and log f( , V θ) = log f( V, θ) + log fV|θ(V θ) m (r) D |{ } D | D| | f( Vs θˆ ), which are Binomial and Gaussian dis- J M W L { }s=1| (l) tributed, respectively. Accordingly, we see that the distribution = n(m−1)W +w,j (AmVm)w,j m m ˆ(r) Vs =1 1 , θ is unimodal, and hence we approximate j=1 m=1 w=1 s X X X Xl=1 n {it by} a multivariate|D Gaussian, and derive the mean of the M (11) 1 distribution, w(r) by the mode of log f( V m m, θˆ(r)): log (1 + exp(A V ) )) log Q m|m s s=1 1 − m m w,j − 2 | m| { } |D m=1 o X n ⊤ −1 (r) m m ˆ(r) + (wm Φwm−1) Qm (wm Φwm−1) + C1, wm|m = argmax log f( Vs s=1 1 , θ ) − − wm { } |D o where w0 = 0 and C1 represents terms that are not functions J m W of , V or θ. Next, considering that neuronal spiking activ- = argmax L n(s−1)W +w,j (AsVs)w,j w D m j=1 s=1 w=1 ity is typically sparse in time and the number of observed  X X X n realizations (L) is limited, an appropriate prior distribution 1 m log (1 + exp(A V ) )) log Q(r) on the parameters helps in reducing the estimation variance. − s s w,j }− 2 | s | th s=1 We consider a diagonal covariance matrix Qm, whose i X n diagonal entry is denoted by Qm,i. Further, we assume Qm ⊤ (r) −1 + (ws Φws−1) (Qs ) (ws Φws−1) , (13) to be independent and identically distributed for 1 m M, − − ≤ ≤ o with a distribution of the form, (r) and its covariance, Σm|m by the negative of the inverse 2J N−2 2 m m ˆ(r) f(Q ) exp ρ log(Q ) log(Q ) . Hessian of log f( Vs s=1 1 , θ ). Observing that the ob- m − m,J(2n−1)+j − m,J(2n+1)+j  { } |D ∝ j=1 n=1 X X   jective function is a combination of convex functions and is   (r) This prior distribution encourages continuity in log scale of differentiable, we perform the above optimization for wm|m the spectral estimates corresponding to adjacent frequency bins using the Newton-Raphson method. Further, we concurrently (r) of each CIF, and can be controlled by appropriately selecting estimate Σm|m using the Hessian matrix of the objective. the hyper-parameter ρ. Accordingly, considering ( , V) to be the set of complete data, the joint distribution ofD the MAP 2) M Step: Noticing that the Q-function (r) is separable Q estimation problem is formulated as, in terms of Qm’s, we can update Qm independently, for 1 m M: ≤ logf( , V, θ) = log f θ(V θ) + log f(θ)+ C ≤ D V| | 2 (r+1) (r) M Qm = argmax . (14) 1 ⊤ −1 Qm Q = log Qm +(wm Φwm−1) Qm (wm Φwm−1) −2 | | − − The partial derivative of the log-likelihood function with m=1 n o M X2J N−2 2 respect to each diagonal element of Qm takes the form, ρ log(Qm,J(2n−1)+j ) log(Qm,J(2n+1)+j ) + C3, − − 2 m=1 j=1 n=1 (r) Q X X X   ∂ 1 1 Pm,i 2ρ m,i Q = 2 log , where C2 and C3 represent terms that do not depend on θ. ∂Qm,i −2 (Qm,i − Qm,i )− Qm,i Qm,i−2J Qm,i+2J ! We next construct the EM algorithm for solving the MAP where P is the ith diagonal element of the matrix estimation problem: m,i th E ⊤ (r) 1) E Step: Suppose that the current estimate of θ at the Pm := [(wm Φwm−1)(wm Φwm−1) , θ ] r − − |D (r) ⊤ ⊤ ⊤ iteration is given by θ . Then, the Q-function = Σm|M +wm|M wm|M +Φ(Σm−1|M +wm−1|M wm−1|M )Φ b (r) (r) ⊤ ⊤ := E[log f( , V, θ) , θ ] (12) (Σm,m−1|M + wm|M wm−1|M )Φ b − Q D |D ⊤ ⊤ can be evaluated if the conditional expectations Φ(Σm,m−1|M + wm−1|M wm|M ). (15) b − (r) Accordingly, we employ the multivariate Newton-Raphson w := E[wm , θ ] m|M |D algorithm to perform this maximization and derive the updates E ⊤ (r) Σm|M := [(wm wm|M )(wm wm|M ) , θ ] for Q , . Following convergence, we use the final − b − |D m 1 m M E ⊤ (r) ≤ ≤ Σm,m−1|M := [(wm wm|M )(wm−1 wm−1|M ) , θ ] estimates of wm|M and Σm|M derived through the above EM − − b |D procedure to compute are known. To compute the conditional expectations wm|M , b E ⊤ ⊤ Σ and Σ we utilize the Fixed Interval Smooth- Rm := [wmw , θ]= Σm|M + wm|M w , (16) m|M m,m−1|M m|D m|M ing [36] and the Covariance Smoothing [37] algorithms. for 1 m M. Then, for 1 n N 1, the ESD matrix However, considering that the forward model is not Gaussian according≤ to≤ Eq. (5) is estimated≤ as≤ − in our formulation, we cannot directly use these algorithms to estimate w and Σ . π n n m|m m|m Ψm(ωn)= Rm(1:J,1:J) + Rm(J+1:2J,J+1:2J) Hence, we employ an alternative method to esti- N (17) n  mate these conditional moments, utilizing the distribution + i Rn Rn , b m(J+1:2J,1:J) − m(1:J,J+1:2J)   o 5

Algorithm 1 Estimation of the Evolutionary Spectral Algorithm 2 Estimation of the Multitaper Evolutionary Density Matrix via the EM Algorithm Spectral Density Matrix Inputs: Ensemble averages of the spiking observations Inputs: Collection of ensemble averages of the observations K,J K,J (p) W,P nk,j , hyper-parameters ρ and α, maximum number of n , the set of P dpss tapers of length W νw { }k,j=1 { k,j }k,j=1 { }w,p=1 EM iterations Rmax. Outputs: The multitaper estimates of the ESD matrices mt Outputs: Estimates of the ESD matrices Ψm(ωn) for 1 Ψm (ωn) for 1 m M, 1 n N 1 m M, 1 j N 1 ≤ ≤ ≤ ≤ ≤ − ≤ ≤ ≤ − (0) 1: for p =1, 2, , P do Initialization: Initial choice of Qm , w b= 0, Σ = 0, b ··· 0|0 0|0 2: for 1 w W, 1 m M, 1 j J do r =1. ≤ ≤ ≤ ≤ ≤ ≤ 3: k = ((m 1)W + w) − 1: for r Rmax do 4: if nk,j =0 and nk,j =1 then ≤ 6 (p) 6 (p) 2: Forward filter, for m =1, 2,...,M 5: (nk,j ) = logistic ( logit(nk,j ) νw ) wm|m−1 = Φwm−1|m−1 6: else ⊤ (r) (p) Σm|m−1 = ΦΣm−1|m−1Φ + Qm 7: (nk,j ) = nk,j Compute wm|m and Σm|m using the Newton’s 8: end if method as described in Eq. (13). 9: end for th 3: Backward smoother, for m = M 1,M 2,..., 1 10: Compute the p tapered spectral density matrix esti- ⊤ −1 − − (p) Bm = Σm|mΦ Σm+1|m mate, Ψm (ωn) for 1 m M, 1 n N 1, using ≤(p) K,J≤ ≤ ≤ − wm|M = wm|m + Bm(wm+1|M wm+1|m) Algorithm 1, with nk,j k,j=1 as the input collection Σ = Σ + B (Σ − Σ )B⊤ of ensembleb averages.{ } m|M m|m m m+1|M − m+1|m m 4: Covariance smoother, for m = M 1,M 2,..., 1 11: end for ⊤ ⊤ − − Σm,m−1|M = Σm|M Bm−1 12: for 1 m M, 1 n N 1 do 5: Update the Q ’s independently, for mt≤ ≤ 1 P≤ ≤(p) − m m = 1, 2,...,M 13: Ψm (ωn)= P p=1 Ψm (ωn) using the multivariate Newton-Raphson method to solve 14: end for (r+1) (r) mt P Qm = argmax . 15: returnb Ψm (ωn) for 1 bm M, 1 n N 1 Qm Q ≤ ≤ ≤ ≤ − 6: Set r r +1 ← x1, x2, ,b xK with a uniformly continuous PSD S(ω) are 7: end for bounded··· as follows: 8: for 1 m M do ≤ ≤ ⊤ P 9: Rm = Σm|M + wm|M wm|M mt 1 bias(S (ω)) (sup S(ω)) 1 cp + o(1), 10: for 1 n N 1 do | | ≤ ω − P n ≤ ≤ − ( p=1 ) 11: R = (Rm) . X m (J(2n−1)+1:J(2n+1),J(2n−1)+1:J(2n+1)) P π n n mt 1 2 2 12: Ψm(ωn)= N Rm(1:J,1:J) + Rm(J+1:2J,J+1:2J) Var(S (ω)) = 1+ β(ω) c S(ω) { }P 2 p n n n p=1 + i Rm(J+1:2J,1:J) Rm(1:J,J+1:2J) . X b − P 13: end for 1  o + (1 c ) + o(1), 14: end for O P − p p=1 ! 15: Return Ψ (ω ) for 1 m M, 1 n N 1 X m n ≤ ≤ ≤ ≤ − (18) n E ⊤ where Rm b:= [wm,nwm,n , θ] is a submatrix of Rm as K . Here cp is the eigenvalue corresponding to the |D →(p) ∞ ω defined as: taper ν and β(ω)=0 if 2π =0, 1/2 mod1 and is equal to 6 mt 1 otherwise. It is evident that the multitaper estimator S (ω) Rn := (R ) . m m (J(2n−1)+1:J(2n+1),J(2n−1)+1:J(2n+1)) is asymptotically unbiased. An implementation of this estimation procedure is outlined We state our main theorem for a univariate second-orderb in Algorithm 1. As mentioned in Section III-A, the same EM stationary process x1, x2, , xK , corresponding to the special (p) case of , for the clarity··· of exposition. We later on provide procedure can be carried out for nk,j , for p = 1, 2, , P J =1 in order to estimate the multivariate{ eigen-spectra.} Final···ly, the extensions to the multivariate and quasi-stationary cases. In multitaper spectral density matrix is formed by averaging the order to apply the same treatment to the case of multitaper eigen-spectral estimates as outlined in Algorithm 2. We refer estimate of the evolutionary spectra from spiking observations, to our proposed algorithm as the Point Process Multitaper we need to make two extra technical assumptions. Evolutionary Spectral Density (PPMT-ESD) estimator. Assumption (1). From Eq. (8), the data likelihood for the univariate case can be expressed as IV. THEORETICAL ANALYSIS K K f( ) := exp L n x log (1 + exp(x ) dx , In this section we derive bounds on the bias and variance D k k − k k k=1 ! k=1 of the proposed PPMT-ESD estimator. We first briefly review Z X n o Y 1 L (l) the corresponding bounds for the direct multitaper estimator where nk := L l=1 nk . Given the nonlinear functional (Eq. (6)) of a time-series. As proven in [38], the bias and form of the integrand, we consider the saddle point approxima- P variance of the multitaper estimate of a stationary process tion [39] of the integral, and take logit(nk) as an estimator of 6

⊤ xk. Under this approximation, the multitaper spectral estimate xk = [xk,1, xk,2, , xk,J ] . Suppose that the observa- is given by ··· (l) K,J,L (l) tions are binary spiking data, nk,j k,j,l=1 with nk,j P 1{ L } (l) ∼ mt 1 Bernoulli(λ ) and n := n , where λ = S (ω)= y(p)(ω) 2, where, (19) k,j k,j L l=1 k,j k,j P | | logistic (xk,j ). Then, under Assumptions (1) and (2), the bias p=1 X and variance of the multitaper cross-spectralP estimate between b th th K b the r and t processes, given by (p) (p) −iωk y (ω) := νk logit(nk) e . (20) k=1 P X mt 1 (p) S (ω) = y(p)(ω)(y (ω))∗, with Assumptionb (2). We assume that xk B for all k = r,t P r t | | ≤ p=1 1, 2, ,K, for some fixed upper bound B. In defining the X ··· b K bias and variance, we condition the expectations on the event (p) (p)b b −iωk y (ω) = ν logit(nk,ℓ) e , A := nk nk = 0, nk = 1, 1 k K , which is highly ℓ k { | 6 6 ≤ ≤ } k=1 probable due to the absolute boundedness of xk, for large X L (See Appendix A for details). We denote the conditional are boundedb as follows: bias and variance by biasA( ) and VarA( ), respectively. mt ′ log L mt · · biasA(S (ω)) g K + bias(S (ω)) , Note that for the multivariate case, we naturally extend this | r,t | ≤ 1 √L | r,t | assumption to xk,j B for all k, j, and define the set A as | |≤ b 2 A := nk,j nk,j =0, nk,j =1, 1 k K, 1 j J . mt ′ log L mt { | 6 6 ≤ ≤ ≤ ≤ } VarA(Sr,t(ω)) g2K + Var(Sr,t(ω)) , Theorem 1 . ≤ ( √L ) (Univariate Case) Under the Assumptions (1) q and (2) and for sufficiently large L, the conditional bias and ′b ′ mt where g1 and g2 are bounded constants and depend only on variance of S (ω) in Eq. (19) is bounded with respect to B, K, and L. those of the direct multitaper estimate Smt(ω) given in Eq. (18) as: b Proof. The proof is given in Appendix B. mt mt log L mt It is not difficult to verify that bias(Sr,t(ω)) and biasA(S (ω)) g1K + bias(S (ω)) , (21) mt | |≤ √L | | Var(Sr,t(ω)) can be upper bounded in a similar fashion to 2 Eq. (18), with the true cross-spectra Sr,t(ω) replacing S(ω). b log L Before extending the result of Corollary 1 to the quasi- mt mt (22) VarA(S (ω)) g2K + Var(S (ω)) stationary case, we need an additional assumption: ≤ ( √L ) p Assumption (3). Given that Corollary 1 holds for large L, where band are bounded constants depending on , g1 g2 B K in this regime we relax the prior distribution on Qm to be and L. flat, i.e., f(Q ) 1. Recall that the rationale for using a m ∝ Proof of Theorem 1. The proof of Theorem 1 is given in prior on Qm in Section III-B was to reduce the variance of Appendix A. the estimates in the low spiking regime, i.e., small L. Finally, combining Corollary 1 and the treatment of [25], Remark. In words, Theorem 1 states that the cost of the we have the following corollary on the bias and variance of indirect access to the process x K through spiking ob- the PPMT-ESD estimator: { k}k=1 servations with L trials appears as excess terms in both the Corollary 2 (Quasi-stationary Multivariate Case). Suppose L bias and variance, which would go to zero as 2 2 . that the J-variate process in Corollary 1 is quasi-stationary K log L → ∞ Hence, for large enough number of realizations L, one can (jointly stationary within consecutive windows of length W ). th th expect a performance close to the direct multitaper estimate Let Ψm,r,t(ωn) be the cross-spectra between the r and t K mt of xk k=1. The result of Theorem 1 can be extended processes over window , , and { } m 1 m K/W Ψm,r,t(ωn) to the case of unconditional expectations, if the inverse be the corresponding multitaper≤ estimate≤ obtained from spik- mapping max(B, min( B, logit(n ))) is adopted instead of − k ing observations. Then, under Assumptions (1)–(3),b the bias logit(nk). Then, the bounds follow directly even without and variance of the proposed PPMT-ESD estimator at window conditioning on A. While Assumption (2) on the boundedness m can be bounded as, of the time-series is natural in practical scenarios, as long as mt ′′ log L B ǫ log L, for some ǫ< 1/2, the excess bias and variance biasA(Ψm,r,t(ωn)) g1 (ωn)W + Ψm,r,t(ωn) 1 κm(ωn) | |≤ √L | || − | terms≤ will be bounded by (log2 L/L1/2−ǫ), which will O 1 P converge to zero asymptotically. Thus, the bias and variance + κ (ωb ) sup Ψ (ω) 1 c + o(1) , mt mt m n {| m,r,t |} − P p of the estimator S (ω) will converge to those of S (ω) even ( ω p=1 ! ) X under the milder condition, xk ǫ log L for 1 k K, 2 | | ≤ ≤ ≤ mt and some ǫ< 1/b2. ′′ log L 2 VarA(Ψm,r,t(ωn)) g2 (ωn)W + sup κm(ω) Ψm,r,t(ω) , The following corollary extends Theorem 1 to the multi- ≤ ( √L rP ω { | |} ) ′′ ′′ variate case: where gb1 (ω), g2 (ω) are bounded functions of B, L, W and κ ( ) is a function of ω, given in Appendix B. Corollary 1 (Stationary Multivariate Case). Consider a sec- m · ond order jointly stationary J-variate process x K , where, Proof. The proof is given in Appendix B. { k}k=1 7

V. SIMULATION STUDIES A. Case 1: Latent trivariate process observed through spiking In order to simulate the three latent processes (J =3) with First, we applying the PPMT-ESD estimator to simulated spectral couplings, we consider different linear combinations (i) K data and compare its performance to some of the existing of a set of AR(6) processes, yk k=1, 1 i 6 , where methods. We consider two cases based on multivariate au- (i) {{ } ≤ ≤ } yk is tuned around the frequency fi, with f1 = 1.15 Hz, toregressive processes: f2 =0.95 Hz, f3 =1.3 Hz, f4 =1.5 Hz, f5 =0.65 Hz and Case 1: Estimating the spectral density matrix of a latent f6 = 1.85 Hz. All signals have been sampled at a sampling trivariate random process, given binary spiking observations. rate of fs = 32 Hz, for a total duration of 2000 seconds Case 2: Estimating the spectral density matrix of two (K = 64000). The trivariate random process is defined as: processes, where one is directly observable and the other is (1) f0 (4) (5) xk,1 = y cos 2π k +1.2y +1.2y uk−0.4K + σx1 ν1,k + x1,dc latent and observed via binary spiking. k fs k k (2)  (4) (5) (6) As for comparison, we consider two existing methods for xk,2 =0.83 yk +0.83 yk−6 +0.83 yk +0.83 yk + σx2 ν2,k + x2,dc (3) (5) (6) (6) extracting spectral representations of spiking data: xk,3 = yk + yk + yk−10 u 0.5K−1−k + yk uk−0.5K + σx3 ν3,k + x3,dc State-Space ESD Estimator: This estimator closely fol- where xi,dc are the DC components, uk is the unit step lows the approach in [11]. Recall that the observations are function, ν is a zero mean white Gaussian noise with unit (l) i,k n Bernoulli(λk), where λk = logistic (xk). Using the variance, and is a scaling standard deviation to set the SNR k ∼ σi same notation as in [11], we model the latent process xk as of all signals at 20 dB, for i =1, 2 and 3. a first-order autoregressive model, xk = xk−1 + ǫk, where In words, xk,1 has been formed by combining three AR i.i.d. 2 (1) (2) (5) ǫk (0, σǫ ) for 1 k K. Following an EM algorithm components, y , y and y . The component related to ∼ N ≤ ≤ k k k developed in [11], the MAP estimate of xk given the observed (1) yk has been amplitude modulated by a low frequency cosine data, x , is obtained. Then, the direct multitaper PSD of the k|K signal at f0 =0.0008 Hz. The signal initially consists of only estimated process xk|K is taken as the PSD estimate of xk. For (1) (4) (5) yk and yk , and the third component yk is added to xk,1 the multivariate non-stationary case, for each process xk,j , 1 ≤ after 800 seconds. The process xk,2 is composed of four AR j J, we assume joint stationarity within non-overlapping (2) (4) (5) (6) components, y , y , y and y , with the contribution consecutive≤ windows of length W . The MAP estimate of each k k k k from y(4) having a lag of 6 samples. The third process x process x = [x , x , , x ]⊤, k k,3 m,j (m−1)W +1,j (m−1)W +2,j ··· mW,j (3) (5) (6) for 1 m M and 1 j J are obtained, and an consists of the three AR components of yk , yk and yk . ≤ ≤ ≤ ≤ (6) estimate of the ESD matrix is derived using the corresponding Although yk appears with a lag of 10 samples initially, it estimators. We refer to this estimator as the State-Space ESD becomes in phase with after 1000 seconds. (SS-ESD) estimator. We generate spike trains for L = 20 neurons per each of the latent processes, using the logistic link model of Eq. (2). All Peristimulus Time Histogram ESD Estimator: This estima- DC components in the CIF have been set to 5.5, so that the tor is derived by directly considering the ensemble mean of average spiking rate of the ensemble corresponding− to each the spiking observations n , referred to as the peristimulus k,j CIF is 0.28 spikes/second, consistent with the low spiking time histogram (PSTH), as an estimate of the random signal rate of≈ experimentally recorded data. A 30 second sample x , for 1 k K and 1 j J [13]. With a similar k,j window of the process x and the corresponding raster plot joint stationarity≤ ≤ assumption in≤ windows≤ of length W , the k,1 of the spiking observations is shown in Fig. 1. non-overlapping sliding window multitaper spectral estimate We take the window length of stationary to be 100 seconds of the PSTH forms the PSTH-ESD estimator. (W = 3200), resulting in a total number of M = 20 windows. In order to benchmark our comparison, we consider the the- The parameter α has been fixed at 0.4 to have optimal oretical spectra of the AR processes derived using closed-form dependency across time windows and the prior parameter ρ expressions (True ESD) as well as the the non-overlapping has been set to 0.2. Note that these hyper-parameters can be sliding window direct multitaper estimates of the processes further fine tuned using cross validation, in order to achieve xk,j that have been used to generate the spikes. We refer to a higher precision. Furthermore, we set N = 800 in order to the later benchmark as the Oracle ESD, as if an oracle could have a densely sampled spectral representation. Given that the directly observe the latent processes and estimate their ESD. spectral range is known to be [0, 2] Hz, we only use the first 0 Nmax = 100 frequency bins of the matrix A for the sake of computational complexity. Finally, the time-bandwidth product of the multitaper framework is chosen as 2 (ξ = 2), and the -10 first three dpss tapers are used. 1 Fig. 2 shows the estimation results corresponding to this case. The results have been formatted as a grid, with the 20 columns representing (from left to right) the True EDS, Oracle 1600 1610 1620 1630 ESD, PPMT-ESD, SS-ESD, and PSTH ESD estimates. The Time (s) rows represent (from top to bottom) (Ψm)1,1(ω), (Ψm)2,2(ω), x Fig. 1: Samples of the signal k,1 (top) and the raster plot of the (Ψm)3,3(ω), (Ψm)1,2(ω), (Ψm)2,3(ω) and (Ψm)1,3(ω), corresponding spikes (bottom) from t = 1600 s to t = 1630 s. th where (Ψm)i,j (ω) denotes the magnitude of the (i, j) block 8

(A) True ESD (B) Oracle ESD (C) PPMT-ESD (D) SS-ESD (E) PSTH-ESD

2 0 (Hz) 0 -30 0 2000 Time (s) Fig. 2: ESD estimation results for Case 1. Each panel shows the magnitude of the spectrogram in dB scale. Columns from left to right: (A) True ESD, (B) Oracle ESD, (C) PPMT-ESD, (D) SS-ESD, and (E) PSTH-ESD. Rows from top to bottom: (Ψm)1,1(ω), (Ψm)2,2(ω), (Ψm)3,3(ω), (Ψm)1,2(ω), (Ψm)2,3(ω), and (Ψm)1,3(ω).

True ESD Oracle ESD PPMT-ESD SS-ESD PSTH-ESD the PPMT-ESD (black trace) closely matches the true ESD 0 (blue trace) on par with the Oracle ESD (red trace), while the SS-ESD (green trace) and PSTH-ESD (orange trace) show significant bias and variability. It is worth mentioning that

(dB) -40 the erroneous spectral peak near the DC component in Fig. -80 3 (black trace) is due to the estimation error of the DC component in the CIF model, in the low spiking regime of 0 our setting. Nevertheless, this peak appears in the SS-ESD and PSTH-ESD estimates as well, but is mitigated by increasing (dB) -40 the spiking rate. -80 Due to the time-domain smoothing model used in the SS- ESD estimator, the ESD rapidly decays with frequency (Fig. 0 3, green trace). As such, the SS-ESD estimate (Fig. 2(D)) heavily amplifies non-existing low frequency components that

(dB) -40 arise from the intrinsic noise in spiking observations, while suppressing the higher frequency components that exist in the -80 0 1 2 true ESD (Fig. 2(A)). Similar, the estimate of the PSTH-ESD Frequency (Hz) shown in Fig. 2(E) fails to capture most of the temporal and th Fig. 3: A snapshot of the spectrograms of Fig. 2 at the 8 spectral features of the ESD, since it does not account for the t Ψ ω window ( = 700 s–800 s). Rows from top to bottom: ( m)1,1( ), binary nature of the observations. To quantify the performance Ψ ω Ψ ω ( m)2,2( ), and ( m)1,2( ). of these estimators, we repeated this numerical experiment for of the ESD matrix. Moreover, in order to have a closer a total of 50 trials, generating independent realizations of the inspection, the magnitude of the spectra corresponding to AR processes and spiking observations per trial, and computed the Mean Squared Error (MSE) with respect to the the True a window of t = 700 s to t = 800 s for (Ψ ) (ω), m 1,1 ESD (in dB scale). The average and variance of the MSE (Ψ ) (ω) and (Ψ ) (ω) are depicted in Fig. 3. m 2,2 m 1,2 values are presented in Table I. All MSE computations have It can be observed that the proposed PPMT-ESD estimator been normalized with respect to the total power of the True (Fig. 2(C)) results in much less background noise compared to ESD (in dB scale). As expected, the Oracle ESD achieves the all the others, while precisely capturing the dynamic evolution lowest MSE. Among the three methods that use the spiking of the spectra and properly resolving the various frequency observations, our proposed PPMT-ESD estimator achieves components. The latter is more evident from Fig. 3, where the lowest MSE, followed by the SS-ESD estimator with a 9 significant gap. The PSTH-ESD estimator exhibits the poorest to right) the True EDS, Oracle ESD, PPMT-ESD, SS-ESD, performance, in terms of both average MSE and variance, and PSTH ESD estimates. The rows represent (from top to which is also visually evident in Fig. 3. In the spirit of easing bottom) (Ψm)1,1(ω), (Ψm)2,2(ω), and (Ψm)1,2(ω). Note that reproducibility, a MATLAB implementation that regenerates in this case, we take the SS-ESD and PSTH-ESD estimates of the data, results and figures outlined in this section has been (Ψm)2,2(ω) to be the same as its Oracle ESD estimate, given made publicly available on the open source repository GitHub that these methods are based on estimating the process xk,2 [40]. in time domain (which is directly observable here). Table I: Comparison of Relative MSE Performance Similar to Case 1, the proposed PPMT-ESD estimator (Fig. Estimation method Average MSE Variance of MSE 4(C)) captures the dynamics of the spectra (Ψm)1,1(ω) and Oracle ESD 0.0485 8.2612 10−6 (Ψm)1,2(ω) accurately, closely matching the True ESD (Fig. PPMT-ESD 0.1864 7.4279 × 10−5 4(A)). As before, the SS-ESD estimator (Fig. 4(D)) is not able SS-ESD 0.3911 1.3991 × 10−5 to capture the ESD dynamics, especially at high frequencies. × PSTH-ESD 1.4794 1.1 10−3 Though some frequency components at certain time windows × are detected by the PSTH-ESD estimates (Fig. 4(E)), most of B. Case 2: Latent bivariate process with one directly observ- the frequency content is concealed by background noise. able component While Case 1 was a natural choice for performance com- VI. APPLICATION TO EXPERIMENTALLY-RECORDED DATA parison, Case 2 is of particular interest in the joint analysis of FROMGENERALANESTHESIA neural spiking and continuous signals, such as the local field Finally, we apply our proposed PPMT-ESD estimator to potential (LFP). The LFP signal corresponds to the electrical multi-unit recordings from a human subject under Propofol- field potential measured at the cortical surface, and mesoscale induced general anesthesia (data from [12], [14]). The data set dynamics of cortical activity. includes the spiking activity of 41 neurons as well as the LFP In this case, we consider a bivariate random process, whose recorded from a patient undergoing intra-cranial monitoring first component x is observed through spiking activity k,1 for surgical treatment of epilepsy using a multichannel micro- n(l) K,L , while its second component x is directly { k,1}k,l=1 k,2 electrode array implanted in temporal cortex [12]. Recordings observable in i.i.d. zero-mean Gaussian noise, i.e., xk,2 := were conducted during the administration of Propofol for x + ν , with ν (0, σ2). Using the same notations as k,2 k k ∼ N ν induction of anesthesia. The experimental protocol is exten- before, the two processes used in this simulation are,e sively explained in [12]. Since the original recordings have (1) f0 (4) (7) been over-sampled, we down-sample both the LFP and spike xk,1 = yk cos 2π f k + yk + yk + σx1ν1,k + x1,dc s recordings to the same sampling frequency of 25 Hz [14]. x =0.83 y(2)+0.83 y(4) +0.83 y(5) +0.83y(6) + σ ν + x . k,2 k k−6 k k x2 2,k 2,dc From the spike recordings, we select 25 neurons (L = 25) The process xk,2 considered in this scenario is exactly the with the highest spiking rates for analysis. The average spiking same as that described in Case 1. Process xk,1 is a slightly rate of this subpopulation of neurons is 0.1732 spikes/second. modified version of the process xk,1 in Case 1: the first Fig. 5(a) shows the LFP signal for a total duration of 800 s two components of xk,1 are the same as described before; considered in the analysis. A zoomed-in view of the LFP and (1) the raster plot of the neuronal ensemble corresponding to a the amplitude modulated component related to yk and the (4) 40 s window from t = 480 s to t = 520 s is shown in Fig. 5. fixed frequency AR component yk . However, instead of (5) Similar to Case 2 in Section V-B, we model the neuronal the third component y , we include a frequency modulated k ensemble and the LFP by a bivariate process, where the LFP is component, y(7), which is an AR process tuned around the k directly observable, but the latent process driving the neuronal frequency f . The frequency f changes by decrements of 7 7 activity is observed through spiking. We assume the process 0.06 Hz every 200 seconds, starting at 0.9 Hz at t =0 s. to be stationary within windows of length 40 s (W = 1000), The ESD matrix of this bivariate process can be estimated resulting in M = 20 non-overlapping windows. Given that the by Algorithm 2 with a slight modification in the Forward filter- main spectral content is known to be in the range of 0–2 Hz ing step (step 2) of Algorithm 1: Given that the second process [12], [14], we estimate the ESDs up to 2.5 Hz (N = 100). is directly observable, the distribution f( m V m, θ(r)) max 1 1 Further, we choose ξ =2, and use the first three tapers in the needs to be modified, and accordingly, theD log-posterior|{ } in analysis. We set α =0.85 and ρ =0.02. Eq. (13) changes to: b Fig. 6 shows the results of our analysis. The first column m,W illustrates the ESD estimates, where the dashed vertical line L n (A V ) log (1 + exp(A V ) )) (s−1)W +w,1 s s w,1 − s s w,1 marks the induction of anesthetic. The second column shows s,w=1 n o snapshots of the normalized magnitude spectrum correspond- Xm,W 1 2 1 m x (A V ) (w Φw )T (Q(r))−1(w Φw ) . ing to a 40 s window from t = 480 s to t = 520 s. The − 2σ2 (s−1)W +w,2 − s s w,2 − 2 s − s−1 s s − s−1 s,w=1 ν s=1 ! X   X first three rows (from top to bottom) show the estimates of e The parameters used in forming the estimates are the same as (Ψm)1,1(ω), i.e., the ESD of the latent process driving spiking those chosen for Case 1. Fig. 4 shows the estimation results activity, for PSTH-ESD, SS-PSD, and the proposed PPMT- corresponding to this case. The results have been similarly ESD, respectively. The fourth row represents the Oracle ESD formatted as a grid, with the columns representing (from left estimate of the LFP signal ((Ψm)1,1(ω)), and the fifth row 10

(A) True ESD (B) Oracle ESD (C) PPMT-ESD (D) SS-ESD (E) PSTH-ESD

2 0 (Hz) -30 0 0 2000 Time (s) Fig. 4: ESD estimation results for Case 2. Each panel shows the magnitude of the spectrogram in dB scale. Columns from left to right: (A) True ESD, (B) Oracle ESD, (C) PPMT-ESD, (D) SS-ESD, and (E) PSTH-ESD. Rows from top to bottom: (Ψm)1,1(ω), (Ψm)2,2(ω), and (Ψm)1,2(ω).

400 -40

0 -50 (mV)

PSTH-ESD -60 -400 0 0 200 400 600 800 -10 -20 SS-ESD -30 200 10 0 0 -200 -10 -400

PPMT-ESD -20 1 10 0

25 -10 480 490 500 510 520 ESD Oracle 10 1 Time (s) 2 1 -10 (dB)

Fig. 5: Top panel shows the LFP recording used in the analysis. The (Hz) downward arrow marks the induction of the anesthetic. The bottom 0 -30 0 rows show the zoomed-in view of the LFP and the spike trains for 0 800 0 0.5 1 1.5 2 2.5 t = 480 s to t = 520 s. Time (s) Frequency (Hz) Fig. 6: ESD analysis of the real data from anesthesia. Left column depicts ((Ψm)1,2(ω)), the cross-spectra between the LFP and shows the magnitude of the spectrograms in dB scale, and the right the latent process, using the proposed PPMT-ESD method. column shows a snapshot of the normalized ESD estimates for a t Similar to our simulation studies, we observe that the PSTH- 40 s window starting at = 480 s. Rows from top to bottom: PSTH- ESD, SS-ESD, PPMT-ESD, the Oracle ESD of the LFP signal, and ESD estimate (Fig. 6, first row) captures a significant number the PPMT-ESD estimate of the cross-spectrum between the spiking of spurious frequency components and harmonics, which are observations and the LFP. known to be absent during Propofol general anesthesia [12], masking the relevant spectral content. Also, as before, the SS- LFP, as visible in the left panels. The estimated cross-spectrum ESD estimate amplifies the low-frequency components (Fig. (fifth row) further corroborates this observation, by capturing 6, second row). Even though the dominant spectral content the spectral coupling between the LFP and the latent process of the LFP is around 0.5 Hz and 1.1 Hz, as observed in the driving the spiking activity. It is indeed hypothesized that the right panel of the fourth row in Fig. 6, neither of these peaks latent process driving the spiking activity is the LFP signal are present in the SS-ESD estimate (Fig. 6, second row, right itself [12]. As such, our analysis corroborates this hypothesis panel). In addition, the rapid change in the LFP spectrogram by providing an accurate estimate of the ESDs of both the (fourth row, left panel) marked by the dashed line, is not LFP and the latent process driving the neuronal spiking. captured by the SS-ESD estimate (second row, left panel). VII. CONCLUDING REMARKS The proposed PPMT-ESD estimate (third row) closely re- Brain oscillations are known to play a significant role in sembles the spectrum of the LFP (fourth row), evident in modulating the finescale dynamics of neuronal spiking. These both the spectrogram and the spectral snapshot. The dominant oscillations are also known to be non-stationary, as they reflect frequency components of the PMTM-ESD around 0.5 Hz the changes in the internal states of behavioral conditions. and 1.1 Hz match those in the LFP spectrum. The temporal Understanding how the latent processes that drive spiking variations of the spectrum are also consistent with that of the activity are related to brain oscillations is a key problem in 11

P computational neuroscience. On one hand, the theory of point mt mt (c) 1 E[S (ω) S (ω)] E y(p)(ω) 2 y(p)(ω) 2 processes has been successfully utilized in recent years to | − |≤ P | | −| | p=1 h i capture the dynamics of neuronal spiking data. On the other X hand, spectral estimation of non-stationary continuous signals b b (d) P K K has been widely studied. Existing methods either treat the 1 (p) (p) −iω(m−k) E[logit(nk) logit(nm) xkxm]ν ν e spiking data as continuous, and apply non-stationary spectral ≤ P − k m p=1k=1m=1 XX X estimation techniques off-the-shelf, or try to first estimate P K (e) 1 these latent processes in time-domain, followed by forming E (logit(n ))2 x2 (ν(p))2 ≤ P k − k k spectrograms. Both these approaches are known to suffer from p=1 k=1 X X   high bias and variability. A unified methodology for inferring P spectrotemporal representations of the multivariate latent non- 1 E (p) (p) + [logit(nk)logit(nm) xkxm] ν νm , (25) stationary processes that drive neuronal spiking is lacking. In P | − | k p=1k6=m this work, we proposed such a methodology for estimating X X the ESD matrix of a multivariate non-stationary latent process where (c) and (d) follow from the triangle inequality and (e) directly from binary spiking observations. To this end, we follows by bounding the complex sinusoid. The main technical integrated techniques from state-space modeling, multitaper difficulty in further development of the bounds is due to the analysis and point processes. fact that logit(z) does not have a Taylor series expansion for z (0, 1). We thus have to find other algebraically We established theoretical bounds on the bias and variance ∈ performance of the proposed estimator, and compared its per- useful bounds. To this end, we need to establish the following formance with the aforementioned existing techniques through technical lemma. application to simulated and experimentally-recorded neural Lemma 1. Consider the event A = nk nk = 0, nk = data. Our simulation studies confirmed our theoretical analy- 1, 1 k K . The following inequality{ holds| 6 true for all6 sis and revealed the favorable performance of our proposed ≤ ≤ } nk A: method over existing approaches. Our application to real data ∈ corroborated the hypothesis on the role of the local field poten- ε(nk) := logit(nk) xk g(xk,L) nk λk , | − | ≤ | − | tial in regulating spiking activity under general anesthesia, by where providing a clear picture of the underlying spectral couplings. 1 log(L 1)+ xk log(L 1) xk g(xk,L) = max , | − |, | − − | . While we have developed our methodology in the context of λk(1 λk) λk 1/L 1 1/L λk  − | − | | − − |  neuronal data analysis, it can be applied to a wide range Proof of Lemma 1. First, consider the case λ 0.5. We of discrete observations that are modulated by underlying k ≤ oscillations, such as heart-beat data [41]. Our methodology bound the function ε(nk) in a piece-wise fashion as follows. can also be extended to infer non-stationary network-level Note that logit(nk) is convex for nk 0.5 and concave for n 0.5. Thus, it immediately follows≥ that for n λ , properties such as the frequency domain Granger-Geweke k ≤ k ≤ k causality [42], [43]. ε(nk) is convex and hence,

log(L 1)+ xk VIII. ACKNOWLEDGMENT ε(nk) | − | (λk nk) . (26) ≤ λk 1/L − We would like to thank Emery N. Brown and Patrick L. | − | Furthermore, for λ n 0.5, ε(n ) is concave, and Purdon for sharing the data from [12]. k ≤ k ≤ k hence is bounded by the tangent at λk as follows. APPENDIX A 1 PROOF OF THEOREM 1 ε(nk) (nk λk) (27) ≤ λk(1 λk) − K − Proof. Let S(ω) be the PSD of the process xk k=1. Then, { } Finally, for the case of nk 0.5, consider the line mt mt ≥ bias(S (ω)) := E[S (ω)] S(ω) | | | − | log(L 1) xk (23) ℓ(nk) := | − − | (nk λk). (28) (a) mt mt mt E[S (ω) S (ω)] + bias(S (ω)) , 1 1/L λk − ≤ | b − b | | | | − − | From the convexity of ε(n ), ℓ(n ) upper bounds ε(n ) for where (a) follows from the triangle inequality. Further, k k k b nk 0.5, since ℓ(0.5) ε(0.5) for λk 0.5. Combining the mt mt mt ≥ ≥ ≤ Var(S (ω)) := E[ S (ω) E[S (ω)] 2] piece-wise bounds in Eqs. (26), (27) and (28), we conclude | − | mt mt 2 mt the claim in Lemma 1 for λ 0.5. Due to the symmetry E[ S (ω) S (ω) ] + Var(S (ω)) k ≤ ≤ b| − b | b of ε(nk), through a similar argument, the bound can be E mt mt mt E mt (24) +2 [(S (ω) S (ω))(S (ω) [S (ω)])] established for λ > 0.5, which concludes the proof. b − − k (b) 2 mt mt mt Given that xk B and assuming that L is large enough Eb[ S (ω) S (ω) 2] + Var(S (ω)) , | |≤ ≤ | − | so that L 2(1+exp(B)), we can further simplify the bound   q p of Lemma≥ 1. We have: where (b) followsb from the Cauchy-Schwarz inequality. Thus, 2 log(L 1)+ B the desired bounds on the bias and variance can be established g(xk,L) max exp(B) (1 + exp( B)) , | − | mt − (1/(1 + exp(B)) 1/L) through bounding the first and second moments of (S (ω) ≤  −  Smt(ω)). The first moment can be bounded by − max exp(B) (1 + exp( B))2, 4(1 + exp(B)) log L . ≤ − b n o 12

Thus, for sufficiently large L, we conclude that, Finally, using the bounds of Eqs. (33) and (34) and noting that K−1 (p) 2 k=0 (νk ) = 1, for all 0 p P , we can upper bound ε(nk) 4(1 + exp(B)) log L nk λk . (29) the expectation in Eq. (25) as,≤ ≤ ≤ | − | P Now, consider the expectations in the bounds of Eq. (25). mt mt log L E[S (ω) S (ω) A] g1K , (35) Using iterated conditioning, | − | |≤ √L E 2 2 E E 2 2 [(logit(nk)) xk] = [ [(logit(nk)) xk] xk] where b − | − E E E 2 1 1 log L = [2xk [(logit(nk) x k) xk]+ [(logit(nk) xk) xk]] g := 8(1 + exp(B)) +1 B + +2 (1 + exp(B)) . − | − | 1 K K √ (f)     L  E [2 x E[ logit(n ) x x ]]+E[E[(logit(n ) x )2 x ]] , ≤ | k| | k − k|| k k − k | k This concludes the proof of the bound on bias. Following (30) similar bounding techniques, the second moment in Eq. (24) where (f) follows from triangle and Jensen’s inequalities. can be bounded by: In order further simplify these bounds, we invoke the result mt mt 2 log L E[ S (ω) S (ω) A] g2K , (36) of Lemma 1. First, note that logit(nk) is unbounded in the | − | | ≤ √L complement of event A. Provided x B, P(n = 0) and q k k where P(n = 1) can be lower bounded| by|≤1 exp( L6 log(1 + b k 6 − − exp( B))), which implies that √2 13 log L − g2 := 4(1 + exp(B)) (1 + exp(B)) + B P(A) 1 2 exp( L log(1 + exp( B))). K 3 √L ≥ − − − ( "r # 1/2 Therefore, for sufficiently large L, we see that P(A) is 2 log L 2 exponentially close to 1. Thus, hereafter we condition the + 4 (1 + exp(B)) + B B . " √L − # ) expectations on the highly probable event A. From Eq. (29),   we get This concludes the proof of Theorem 1. E[ logit(nk) xk xk, A] 4(1 + exp(B)) log L | − | | ≤ APPENDIX B E[ nk λk xk, A]. | − | | PROOF OF COROLLARIES 1 AND 2 Note that the random variable is the sum of nk = Lnk L Proof of Corollary 1. Proof of Corollary 1 follows the proof independent Bernoulli random variables given . Thus, given xk of Theorem 1 closely, with the natural extension to the x , n Binomial(L, λ ). Accordingly, k k ∼ k multivariate case. Following the proof of Theorem 1, the P ′ ′ E[ n λ x , A]= E[ n λ ½ x ]/ (A) constants g1 and g2 in this case are given by: | k − k| | k | k − k| A | k (g) 2 ′ log L E[ (nk λk) ½A xk] / P(A) g := 8 (1 + exp(B)) B + 2(1 + exp(B)) ≤ − | 1 × √L (h)   p P λk(1 λk)/(√L (A)), (31) 2 ≤ − ′ log L 2 g2 := 4(1 + exp(B)) 4 (1 + exp(B)) + B B where (g) follows from the Jensen’sp inequality and (h) follows s √L − from substituting expression for the variance of a binomial   random variable. Further, note that λk(1 λk) 1/4, for L − L ≤ λk [0, 1] and P(A)=1 (λ + (1 λk) ) 1/2, if Proof of Corollary 2. As for Corollary 2, we work under the ∈ − k − ≥ (1+exp(B))L > 2(1+exp(BL)), which is satisfied for large technical assumptions of Theorems 1 and 2 in [25]. Following enough L. Thus, combining the bounds in Eqs. (31) and (29), [25], we assume that in Eq. (10), Qm = Q for all m in we get, this proof, and that the EM algorithm finds estimates of Q and α close to their true value (for large enough K). Then, E log L [ logit(nk) xk xk, A] 4(1 + exp(B)) . (32) under Assumptions (1) and (2), we identify the effective | − | | ≤ √L (p) th observation ym corresponding to the p taper at window By a similar argument we can show that, (p) m in [25] by the concatenation of νw logit (¯n(m−1)W +w,j ) 2 2 2 log L for w = 1,e2, , W and j = 1, 2, ,J in a vector of E[ (logit(n ) x ) x , A] 8(1 + exp(B)) . ··· ··· k k k √ length WJ. We also assume, without loss of generality that − | ≤  L  Thus, the expectation in Eq. (30) is bounded as: W = uN, for some integer u. Then, we denote by Σ∞ the steady state covariance of the backward smoother, and Λ := E 2 2 log L [(logit(nk)) xk A] 8(1 + exp(B)) αΣ (α2Σ +Q)−1 and Γ = (α2Σ +Q)[I uW ((α2Σ + − | ≤ √L ∞ ∞ ∞ ∞ Q −1 I −1 , as in [25]. Note that these− matrices are log L ) + uW ) ] B + (1 + exp(B)) . (33) × √L NJ NJ in our case. Under the same assumptions [25], we   consider× them to be diagonal with the ith diagonal element Following a similar argument, one can show for n = m, 6 being γi and ηi respectively. Then, following the proof of E[logit(n ) logit(n ) x x A ] 8(1 + exp(B)) Theorem 1 and those of Theorems 1 and 2 in [25], it can | k m − k m | | ≤ log L log L be shown that the statement of the corollary holds with the B + 2(1 + exp(B)) . (34) √ √ × L  L  constants: 13

[17] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.- C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition log L g′′(ω ) := 8(1 + exp(B)) B + 2(1 + exp(B)) and the Hilbert spectrum for nonlinear and non-stationary time series 1 n √  L  analysis,” Proceedings of the Royal Society of London. Series A: M ′ Mathematical, Physical and Engineering Sciences, vol. 454, 1998. |s−m| |s −m| [18] L. Cohen, Time-frequency Analysis: Theory and Applications. Upper η(n−1)J+r η(n−1)J+t γ(n−1)J+rγ(n−1)J+t, × ′ Saddle River, NJ, USA: Prentice-Hall, Inc., 1995. s,sX=1 [19] M. B. Priestley, Spectral analysis and time series. Academic Press London ; New York, 1981. 2 ′′ log L [20] G. Matz and F. Hlawatsch, “Nonstationary spectral analysis based g (ωn) := 4(1+exp(B)) 4 (1 + exp(B)) + B B2 2 s √L − on time-frequency operator symbols and underspread approximations,”   IEEE Transactions on Information Theory, vol. 52, no. 3, pp. 1067– M ′ |s−m| |s −m| 1086, March 2006. η(n−1)J+r η(n−1)J+t γ(n−1)J+rγ(n−1)J+t, [21] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonsta- × ′ s,sX=1 tionary processes,” IEEE Transactions on Acoustics, Speech, and Signal and Processing, vol. 33, no. 6, pp. 1461–1470, December 1985. M ′ ′ |s−m| |s −m| |s−s | [22] M. B. Priestley, “Evolutionary spectra and non-stationary processes,” κm(ωn) := η(n−1)J+r η(n−1)J+t γ(n−1)J+r γ(n−1)J+t α . Journal of the Royal Statistical Society. Series B (Methodological), ′ s,sX=1 vol. 27, no. 2, pp. 204–237, 1965. [23] G. Matz, F. Hlawatsch, and W. Kozek, “Generalized evolutionary spec- tral analysis and the Weyl spectrum of nonstationary random processes,” REFERENCES IEEE Transactions on , vol. 45, no. 6, pp. 1520–1534, June 1997. [1] A. Rupasinghe and B. Babadi, “Multitaper analysis of evolutionary [24] J. Hammond and P. White, “The analysis of non-stationary signals using spectral density matrix from multivariate spiking observations,” in 2019 time-frequency methods,” Journal of Sound and Vibration, vol. 190, IEEE Data Science Workshop (DSW), Minneapolis, MN, June 2–5, 2019. no. 3, pp. 419 – 447, 1996. [2] A. Schnitzler and J. Gross, “Normal and pathological oscillatory com- [25] P. Das and B. Babadi, “Dynamic Bayesian multitaper spectral analysis,” munication in the brain,” Nature Reviews Neuroscience, vol. 6, pp. 285– IEEE Transactions on Signal Processing, vol. 66, no. 6, pp. 1394–1409, 296, 2005. March 2018. [3] P. J. Uhlhaas and W. Singer, “Neural synchrony in brain disorders: [26] S.-E. Kim, M. K. Behr, D. Ba, and E. N. Brown, “State-space multitaper Relevance for cognitive dysfunctions and pathophysiology,” Neuron, time-frequency analysis,” Proceedings of the National Academy of vol. 52, no. 1, pp. 155 – 168, 2006. Sciences, vol. 115, no. 1, pp. E5–E14, 2018. [4] L. M. Ward, “Synchronous neural oscillations and cognitive processes,” [27] D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proceed- Trends in Cognitive Sciences, vol. 7, no. 12, pp. 553 – 559, 2003. ings of the IEEE, vol. 70, no. 9, pp. 1055–1096, Sept 1982. [5] R. P. Vertes and R. W. Stackman, Electrophysiological Recording [28] D. Slepian, “Prolate spheroidal wave functions, fourier analysis, and Techniques. Humana Press, 2011, vol. 54. uncertainty V: the discrete case,” The Bell System Technical Journal, [6] J. J. Jun, N. Steinmetz, J. H. Siegle, D. J. Denman, M. Bauza, vol. 57, no. 5, pp. 1371–1430, May 1978. B. Barbarits, A. K. Lee, C. Anastassiou, A. Andrei, C. Aydin, M. Barbic, [29] D. B. Percival and A. T. Walden, Spectral Analysis for Physical T. Blanche, V. Bonin, J. Couto, B. Dutta, S. Gratiy, D. Gutnisky, Applications. Cambridge University Press, 1993. M. Husser, B. Karsh, and T. D. Harris, “Fully integrated silicon probes [30] B. Babadi and E. N. Brown, “A review of multitaper spectral analysis,” for high-density recording of neural activity,” Nature, vol. 551, pp. 232– IEEE Transactions on Biomedical Engineering, vol. 61, pp. 1555–1564, 236, 11 2017. 2014. [7] J. Du, T. J. Blanche, R. R. Harrison, H. A. Lester, and S. C. Masmanidis, [31] T. P. Bronez, “On the performance advantage of multitaper spectral “Multiplexed, high density electrophysiology with nanofabricated neural analysis,” IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. probes,” PLOS ONE, vol. 6, no. 10, pp. 1–11, 10 2011. 2941–2946, Dec 1992. [8] E. N Brown, R. Kass, and P. P Mitra, “Multiple neural spike train data [32] D. J. Daley and D. Vere-Jones, An introduction to the theory of point analysis: state-of-the-art and future challenges,” Nature neuroscience, processes: volume II: general theory and structure. Springer Science vol. 7, pp. 456–61, 06 2004. & Business Media, 2007, vol. 2. [9] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. [33] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods. Brown, “A point process framework for relating neural spiking activity to Berlin, Heidelberg: Springer-Verlag, 1986. spiking history, neural ensemble, and extrinsic covariate effects,” Journal [34] A. T. Walden, “A unified view of multitaper multivariate spectral of Neurophysiology, vol. 93, no. 2, pp. 1074–1089, 2005. estimation,” Biometrika, vol. 87, no. 4, pp. 767–788, 2000. [10] L. Paninski, “Maximum likelihood estimation of cascade point-process [35] R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing neural encoding models,” Network: Computation in Neural Systems, and forecasting using the EM algorithm,” Journal of time series analysis, vol. 15, no. 4, pp. 243–262, 2004. vol. 3, no. 4, pp. 253–264, 1982. [11] A. C. Smith and E. N. Brown, “Estimating a state-space model from [36] H. E. Rauch, C. T. Striebel, and F. Tung, “Maximum likelihood estimates point process observations,” Neural Computation, vol. 15, no. 5, pp. of linear dynamic systems,” AIAA Journal, vol. 3, no. 8, pp. 1445–1450, 965–991, 2003. aug 1965. [12] L. D. Lewis, V. S. Weiner, E. A. Mukamel, J. A. Donoghue, E. N. [37] P. D. Jong and M. J. Mackinnon, “Covariances for smoothed estimates Eskandar, J. R. Madsen, W. S. Anderson, L. R. Hochberg, S. S. Cash, in state space models,” Biometrika, vol. 75, no. 3, pp. 601–602, 1988. E. N. Brown, and P. L. Purdon, “Rapid fragmentation of neuronal net- [38] K. Lii and M. Rosenblatt, “Prolate spheroidal spectral estimates,” works at the onset of propofol-induced unconsciousness,” Proceedings of Statistics & Probability Letters, vol. 78, no. 11, pp. 1339 – 1348, 2008. the National Academy of Sciences, vol. 109, no. 49, pp. 19 891–19 892, [39] C. Goutis and G. Casella, “Explaining the saddlepoint approximation,” 2012. The American Statistician, vol. 53, no. 3, pp. 216–224, 1999. [13] D. M. Halliday and J. R. Rosenberg, Time and Frequency Domain [40] “Multitaper Analysis of Evolutionary Spectra from Multivariate Analysis of Spike Train and Time Series Data. Berlin, Heidelberg: Spiking Observations MATLAB Codes,” 2019. [Online]. Available: Springer Berlin Heidelberg, 1999, pp. 503–543. https://github.com/Anuththara-Rupasinghe/PPMT-ESD-Estimation [14] S. Miran, P. L. Purdon, E. N. Brown, and B. Babadi, “Robust estimation [41] R. Barbieri, E. C. Matten, A. A. Alabi, and E. N. Brown, “A point- of sparse narrowband spectra from neuronal spiking data,” IEEE Trans- process model of human heartbeat intervals: new definitions of heart actions on Biomedical Engineering, vol. 64, no. 10, pp. 2462–2474, Oct rate and heart rate variability,” American Journal of Physiology-Heart 2017. and Circulatory Physiology, vol. 288, no. 1, pp. H424–H435, 2005. [15] P. Das and B. Babadi, “Multitaper spectral analysis of neuronal spiking [42] J. Geweke, “Measurement of linear dependence and feedback between activity driven by latent stationary processes,” preprint, available onlie multiple time series,” Journal of the American statistical association, at: https://arxiv.org/abs/1906.08451, 2019. vol. 77, no. 378, pp. 304–313, 1982. [16] S. Gr¨un, M. Diesmann, and A. Aertsen, “Unitary events in multiple [43] L. A. Baccal´a and K. Sameshima, “Partial directed coherence: a single-neuron spiking activity: II. Nonstationary data,” Neural Compu- new concept in neural structure determination,” Biological cybernetics, tation, vol. 14, no. 1, pp. 81–119, 2002. vol. 84, no. 6, pp. 463–474, 2001.