Online Bayesian Inference in Some Time-Frequency Representations of Non-Stationary Processes

School of Mathematical and Physical Sciences

Department of Mathematics and Statistics

Preprint MPS-2013-21

25 April 2014

Online Bayesian Inference in Some Time-Frequency Representations of Non-Stationary Processes

Richard G. Everitt, Christophe Andrieu and Manuel Davy

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX 1

Online Bayesian Inference in Some Time-Frequency Representations of Non-Stationary Processes Richard G. Everitt*, Christophe Andrieu, and Manuel Davy

Abstract—The use of Bayesian inference in the inference rely on the idea that the noise is spread all over the time- of time-frequency representations has, thus far, been limited frequency plane, thus increasing the local signal-to-noise to offline analysis of signals, using a smoothing spline based ratio. Based on the same idea, several algorithms have been model of the time-frequency plane. In this paper we introduce a new framework that allows the routine use of Bayesian proposed to estimate signal parameters directly from TFRs, inference for online estimation of the time-varying spectral most of which being devoted to linear chirps, see e.g., [9]. density of a locally stationary Gaussian process. The core of Others are devoted to more general time-varying spectrum our approach is the use of a likelihood inspired by a local estimation. In particular, many studies have focused on Whittle approximation. This choice, along with the use of estimating narrow band time-frequency trajectories (also a recursive algorithm for non-parametric estimation of the local spectral density, permits the use of a particle filter for termed components), see [10], [11] among others. Most estimating the time-varying spectral density online. We provide of these techniques implement Kalman filtering where the demonstrations of the algorithm through tracking chirps and observations are extracted from peaks of TFRs (sequential the analysis of musical data. approach), or fit curves onto the TFR (batch approach). As Index Terms—Signal processing algorithms, particle filters, shown by these many previous studies, using TFRs for es- spectrogram, Bayesian methods, frequency domain analysis. timation or decision purposes leads to powerful algorithms. EDICS Categories: DSP-TFSR, MLR-BAYL, MLR-MUSI, Many practical problems have received a convincing solution SSP-NSSP, SSP-TRAC. through the use of these methods [12], [13]. For a general overview of the time-frequency analysis, see [14]. I.INTRODUCTION In parallel to the development of time-frequency tech- A. Background niques, statistical signal processing tools have developed Time-frequency representations (TFRs) are celebrated sig- considerably over the last few years. In particular, up-tonal processing tools, for they turn time domain signals into date Bayesian approaches enable to estimate parameters in images, representing the time-frequency decomposition of situations where the model is highly non-linear and/or non- the signal, whose interpretation is intuitive. Such images are Gaussian, thanks to Monte Carlo methods. Surprisingly, few thus often used as early analysis tools, to be used when works have been devoted to apply these techniques to TFRs. faced with non-stationary signals for which the concept of Previous work can be summarised as follows: frequency is relevant. Typical applications range from audio 1) Monte Carlo Markov Chain (MCMC) for estimating signals (speech, music, animal cries) [1], biomedical time the coefficients of Gabor representations [15], [16]. series (EEG, ECG, EMG) [2], accelerometer signals [3] 2) Tracking the pole parameterisation of time-varying collected on dynamic mechanical systems, etc. autoregression (TVAR) models using sequential Monte Aside their ability to provide easy-to-understand images, Carlo [17]. TFRs can be used in automated decision applications. [4] 3) Tracking trajectories in spectrograms by sequential showed that some Cohen’s group TFRs can be used to im- Monte Carlo (SMC) [18]. plement optimal linear detection, by providing an equivalent 4) Using reversible-jump MCMC to infer the parameters implementation of the time domain matched filter in the time of a model for the time-varying spectrum of a locally frequency domain. This seminal work was followed by a stationary process [19] and [20]. The model in these number of studies about time-frequency detection / classi- papers partitions the process into segments, using a fication of signals, see e.g., [5]–[8]. All these approaches mixture of smoothing splines to model the log spectrum of each segment and time varying mixing weights Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be to introduce non-stationarity across the segments. obtained from the IEEE by sending a request to [email protected]. R. G. Everitt* is with the Department of Mathematics and Statistics, Despite these advances, there exists no principled approach University of Reading, UK e-mail: [email protected]. to Bayesian inference that uses the full TFR as a raw C. Andrieu is with the Department of Mathematics, University of Bristol, observation and relates it to some statistical model through UK. M. Davy is with the LAGIS Laboratory, CNRS, Lille, France. a likelihood function. Devising such a methodology is the Manuscript received xxx xx, xxxx; revised xxx xx, xxxx. aim of this paper. 2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

B. Overview of the approach improve interpretability, including interpretability by a com- This paper brings together TFRs and the Bayesian puter. However the choice of a likelihood in such situations paradigm for statistical inference. More precisely, we aim is not always obvious. In this section we develop such a to take advantage of the numerous conceptual benefits of likelihood which is motivated by Whittle’s approximant of TFRs in order to motivate and develop models for non- the likelihood for stationary processes and inspired by its stationary processes while at the same time exploiting the extension suggested in [21] for locally stationary processes. Bayesian paradigm for both incorporating prior information Consider a zero mean real valued stationary Gaussian and quantifying uncertainty in the inference procedure. This process {yt|t = 1,...,T } for some T ≥ 1 with nonzero leads to a natural description of the process to be analysed PSD fθ : [−π, π] → (0, ∞) dependent on a parameter in terms of a state-space model which is well suited to θ ∈ Θ. In [22] it was shown that the log-likelihood of a the sequential processing of data (often referred to as fil- real valued realisation of such a stationary Gaussian process tering) and the incorporation of prior temporal information can for a general class of processes be approximated for T (e.g. smoothness, abrupt changes, etc). Sequential processing large enough by might be desirable in scenarios where an update of the T π information contained in the signal is required as soon as Lθ(y1:T ) := C − [log(2πfθ(ω))+ IT (ω)/fθ(ω)]dω, 4π ˆ−π a new data is available, but can also present a significant (1) computational and storage advantage when large datasets are for some constant C, where IT (ω) is the periodogram of the involved. SMC methods (aka particle filters) are well estab- data, defined for ω ∈ [−π, π] as lished statistical techniques that allow one to effectively and T 2 efficiently carry out sequential inference for such models, in 1 IT (ω) := yt exp(−ıωt) . (2) the presence of nonlinearity and non-Gaussianity. T t=1 The main difficulty inherent to achieve the goals set X This approximation can easily be extended to accommo- above was to find a principled way of relating TFRs to the classical statistical parametric inference framework and in date vector valued time series and rates of convergence particular define a likelihood function for such objects. Our of this approximation can also be derived (see [22] and approach consists of exploiting Whittle type approximations [23]). The interest of this approximation is that it relates to the likelihood of Gaussian processes. These approximate the spectral properties of the data (the periodogram) to the likelihoods have the advantage that they can be shown to statistical model for the data (the PSD fθ) in a principled depend exclusively on the spectral properties of the data manner and presents the advantage of allowing modelling of and the candidate statistical parametric model. Although this class of processes directly in the spectral domain. originally developed in the context of stationary processes, There is a large literature on the use of this approximation recent theoretical advances have allowed for the rigorous for maximum likelihood inference of the parameter θ ( [23] generalisation of this framework to some types of non- is a good starting point). However, the approach of [24], stationary processes; a review of such approximations is where the Whittle approximation to the likelihood as a part provided in section II. This is the route we follow in this of a Bayesian model, is of more relevance to this paper. paper. In section III we show how the standard Bayesian In this paper a numerical approximation to the integral in state-space modelling framework, in conjunction with Whit- equation (1) is used, evaluating the integrand at the Fourier tle type likelihoods, lends itself naturally to the modelling of frequencies {ωk = 2πk/T | 0 ≤ k ≤⌊T/2⌋} to obtain non-stationary processes in the spectral domain. A generic 1 Lˆ (y )= C − [log(2πf (ω )) + I (ω )/f (ω )]. particle filter to perform sequential inference is also briefly θ 1:T 2 θ k T k θ k k reviewed. X (3) At the core of our approach to sequential estimation is to In the Bayesian model in [24] this likelihood is further use efficient recursive estimators of time-frequency images approximated by a mixture of Gaussians (for computational from data. A novel solution to this problem is described reasons), and MCMC is used to infer a smoothing spline in section III-C, which relies on an unusual interpretation representation of the log PSD. From our perspective, the of the power spectral density (PSD) of stationary processes most important characteristic of this work is that the use of and its non-stationary counterparts. In section IV we apply the Whittle approximation provides a principled approach to our methodology to tracking chirps whose signals overlap in Bayesian inference of any appropriate statistical model of the time-frequency space and in section V to a simple problem PSD, using the periodogram directly as an observation. This in the analysis of musical data. flexibility in the choice of model for the PSD is exploited in [25] and [26], amongst others. II.FROMTHE GAUSSIANLIKELIHOODTOTHE SPECTROGRAM B. Locally stationary processes A. The Whittle approximation For our analogous methodology for the Bayesian estima- When looking at TFRs, it is tempting to try to fit a tion of TFRs, a natural question is that of the existence of parametric model in order to reduce dimensionality and such approximations for Gaussian non-stationary processes EVERITT et al.: ONLINE BAYESIAN INFERENCE IN SOME TIME-FREQUENCY REPRESENTATIONS OF NON-STATIONARY PROCESSES 3

that would allow us to relate a TFR of the observed pro- of the maximum approximate likelihood estimator derived cess and a parametric model through a likelihood, or an by maximising Lθ(y1:T ), as well as a central limit theorem. approximation of such a likelihood. A significant step in Bayesian estimation of the evolutionary spectrum of a this direction was achieved by [21] who extended Whittle’s locally stationary process has previously been considered approximant to a particular class of non-stationary processes in [19] and [20]. In common with this paper, the choice called “locally stationary processes” [21], [27]–[30]. This of likelihood in these papers is also motivated by [30]. class of processes can be thought of as being a time varying However, they differ from our work in that the likelihood generalisation of stationary harmonisable processes and are used for the segmented process in their model (see section assumed to have a representation of the form (assuming for I-A) is simply the Whittle approximation for the likelihood simplicity that E(Yt,T ) = 0 for t = 1,...,T ) of the data in each segment - inference of the parameters of π their model is based on the periodogram of the data in each Y = exp(ıωt)Λ (t,ω)dξ(ω), (4) segment. In addition, our goal is to infer the evolutionary t,T ˆ T −π spectrum online, whereas these papers take a batch approach. where {ξ(ω)} is a complex valued Gaussian process on [−π, π] with additional statistical properties detailed in the C. Approach appendix and {Λ (t,ω) : {1,...,T }× [−π, π] → C,T ∈ T In the present paper we develop a methodology to perform N} is a family of complex valued functions which can be sequential Bayesian inference for processes modelled in the approximated by a function Λ : [0, 1] × [−π, π] → C, spectral domain that uses time-frequency estimates as input. Λ(u,ω)∗ = Λ(u, −ω) such that there exists K satisfying Our model for the observed data y1:T is parameterised by −1 sup |ΛT (t,ω) − Λ(t/T, ω)|≤ KT . (5) the time-varying parameter {θt}, which we treat as a random (t,ω)∈{1,...,T }×[−π,π] variable. The aim of the Bayesian approach is to infer This class of processes is especially useful, since it a sequence of posterior distributions {p(θt|Y1:t = y1:t)} defines a unique evolutionary spectrum for many commonly thus describing the uncertainty present in inferring {θt}, used processes, including time-varying ARMA or GARCH in contrast to maximum likelihood estimators which seek processes, for example. only a point estimate. The Bayesian formulation consists of In the context of parametric estimation it is natural to con- specifying: a prior p(θ1:T ) on the time-varying parameter; sider a family of potential candidates {Λθ,θ ∈ Θ} in order to and a likelihood g(Y1:T |θ1:T ), modelling how a random 2 explain the data. The quantity fθ(u,ω) := |Λθ(u,ω)| ≥ 0, process Y1:T arises given the underlying parameter θ1:T . The the evolutionary spectrum, can be thought of as being a posterior p(θ1:T |Y1:T = y1:T ) is then found via Bayes’ the- parametric TFR for the process. orem: p(θ1:T |Y1:T = y1:T ) ∝ p(θ1:T )g(Y1:T = y1:T |θ1:T ). In close analogy with Whittle’s likelihood approximation The likelihood we use is inspired by Dahlhaus’ develop- for Gaussian stationary processes, and for the purpose of ments for locally stationary processes, but differs in that we statistical inference, In [21] the following approximation of assume the evolutionary spectrum to consist of a succession the log likelihood for T observations of a Gaussian locally of local spectra fθt for the varying values {θt} and we chose stationary process is derived the prior such that p(θ1:T ) forms a Markov chain: p(θ1:T )= T −1 p(θ1) p(θt+1|θt). This flexible approach allows us T π t=1 1 to easily incorporate information about the shapes for the Lθ(y1:T ) := C + − [log(2πfθ(t,ω)) + 4π ˆ possibleQ “local” spectra as well as smoothness or jump in the t=1 −π X sequence {θ }. This, together with the sequential constraint I˜ (t,ω)/f (t,ω)]dω, (6) t T θ naturally leads us to formulate the related inference about where C is a constant and I˜T (t,ω) is the pre-periodogram the process as that of a filtering problem in a dynamical system framework. Due to the non-linearity involved, one ˜ IT (t,ω) = y⌊t+1/2+k/2⌋ needs to resort to a particle filter in in order to estimate the N {k∈ :1≤t+1/2±k/2≤T } sequence of posterior distributions {p(θ |y )}. A crucial ∗ X t 1:t y⌊t+1/2−k/2⌋ exp(−ıωk), (7) point in order to carry out this inference sequentially in time with ∗ denoting complex conjugation and ⌊⌋ the largest is the ability to recursively estimate the evolving spectrum; integer smaller than the argument (the pre-periodogram is this is detailed in section III. a particular discretisation of the Wigner-Ville distribution, and is essentially an estimator of the evolutionary spectrum). III.STATE-SPACE FORMULATION AND PARTICLE FILTER It is noted in [21] that the likelihood approximation may A. Statistical model be factorised as L = T l , where each factor can N θ t=1 θ,t We assume that the data {yt} ⊂ Y is generated by the be thought of as the “local log likelihood” at time t. By following state-space model analogy with the stationaryP case, it is suggested by [21] 1 π to maximise this approximate likelihood over values of θ, log(g(yt|θt,y1:t−1)) = C − [log(2πfθt (ω)) + approximating a maximum likelihood estimator. Theoretical 4π ˆ−π ˆ results are provided concerning the asymptotic consistency I(t,ω)/fθt (ω)]dω (8) 4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

for some constant C, a time-frequency estimate Iˆ(t,ω) of available. The empirical distribution the the evolving spectrum and a parametric model for the 1 N evolving spectra fθ : [−π, π] → [0, +∞) dependent on a N t pˆ (dθ1:t−1| y1:t−1)= δθi (dθ1:t−1) (14) N 1:t−1 parameter θt ∈ Θ. The sequence {θt} is a Markov chain with i=1 initial distribution θ ∼ µ and θ |θ ∼ f(|θ ). Note X 0 t 1:t−1 t−1 is an approximation of p (dθ | y ), where δ (dθ ) that there are numerous other possibilities for the choice of 1:t−1 1:t−1 θ0 0 π represents the delta Dirac mass function located in θ . Now the log-likelihood, such as − |f (ω) − Iˆ(t,ω)|2dω, but 0 −π θt at time t, we wish to produce N particles which will that their choice might be more´ difficult to justify statistically define an approximation pˆN (dθ | y ) of p (dθ | y ). than ours (see section III-E). 1:t 1:t 1:t 1:t The simplest method to achieve this consists of sampling i i Example 1. Assume that we are interested in tracking θt ∼ f(| θt−1). The resulting empirical distribution of the i chirps, but that we are not given enough information con- particles {θ1:t} is an approximation of the joint density cerning the evolution the instantaneous frequency of the p (dθ1:t−1| y1:t−1) f (θt| θt−1). We correct for the discrep- process. In such situations one for example may choose that ancy between this density and the target p (θ1:t| y1:t) using for θ =(a,ω0,s) ∈ Θ = [0, +∞) × [0, π) × [0, +∞) importance sampling. This yields the following approximation of p (dθ1:t| y1:t) 0 2 fθ(ω)= a exp(−(ω − ω ) /s)I(0 ≤ ω ≤ π) (9) N N i p (dθ1:t| y1:t)= W δ i (dθ1:t) , (15) or t θ1:t a i=1 fθ(ω)= I(0 ≤ ω ≤ π). (10) X 1+(ω − ω0)2/s where N Then one can suggest the following a priori evolution i i i in time of the parameter of the evolving spectrum, where Wt ∝ g(yt|θt,y1:t−1) and Wt = 1. (16) i=1 N (.; ,σ2) denotes a normal distribution with variance σ2: X To obtain an unweighted approximation of p (dθ1:t| y1:t) 0 2 I i at+1|(at,ωt ,st) ∼N (at+1; at,σa) (at+1 ≥ 0) (11) of the form (14), we resample particles {θ1:t} according 0 0 0 0 2 I 0 to probabilities proportional to their weights {W i}. The ωt+1|(at,ωt ,st) ∼N (ωt+1; ωt ,σω) (0 ≤ ωt+1 ≤ π) t (12) underlying idea is to get rid of particles with small weights and multiply particles which are in the regions with high s |(a ,ω0,s ) ∼N (s ; s ,σ2)I(s ≥ 0). (13) t+1 t t t t+1 t s t+1 probability masses. Many such resampling schemes have Using a model such as that defined above, it is possible been proposed in the literature; see [31]. The resampling to use static inference methods such as MCMC for the step is crucial for the method to work in practice. This brief description only covers the simplest particle inference of the parameters {θt}. However, our aim is to filtering algorithms since we use nothing more complex than estimate {θt} recursively in time as the data {yt} become this in our application. For more difficult problems, more available. In our framework all information about {θt} is sophisticated algorithms will be required (see [32] for a given by the so-called filtering distributions {p(θt|y1:t)} which cannot be computed analytically and recursively in recent review). time due to the intractability of the likelihoods involved. Hence we suggest here to resort to a particle filter algorithm. C. Computation of the likelihood There are two issues that require resolution in order to implement the particle filter in practice. Firstly, each evaluation B. Particle filters of the likelihood (equation (8)) requires the calculation of an Particle filters fall in the category of Monte Carlo algo- intractable integral. Thus in practice our choice of likelihood rithms, whose principle consists of replacing the difficult to is analogous to the numerical approximation used in the sta- use algebraic representation of a probability density with tionary case by [24]. In particular, we use a likelihood based a non-parametric representation in terms of (dependent) on fixed, evenly spaced grid of M frequencies ωk = πk/M samples from the underlying distribution. The concentration over [0, π]: of samples in a particular region of the space is repre- 1 M sentative of the probability distribution that we are trying log(ˆg(y |θ ,y )) = C − [log(2πf (ω )) + to approximate. This turns out to be a powerful principle t t 1:t−1 2 θt k k=1 which has the major advantage of circumventing analytical X Iˆ(t,ω )/f (ω )] (17) intractability in complex systems. We now briefly describe k θt k how such methods can be used in the present context in The grid of frequencies must be chosen such that it is fine order to perform sequential inference. enough to capture the important features of the modelled Assume that at time t − 1, a collection of N (N ≫ 1) and observed spectra. However, M should not be larger i random samples {θ1:t−1|i = 1,...,N}, called particles, than necessary since the sum in equation (17) dominates the distributed approximately according to p (dθ1:t−1| y1:t−1) is computation time of our particle filtering algorithm. Moving EVERITT et al.: ONLINE BAYESIAN INFERENCE IN SOME TIME-FREQUENCY REPRESENTATIONS OF NON-STATIONARY PROCESSES 5

further from the statistical justification for our choice of Parameter ρ will control the bias/variance tradeoff. The re- likelihood, one can envisage that for some applications it cursions are initialised with µ1 = y1, A1 = y2y1 exp(−ıω), may be useful to choose the set of frequencies in a different B1 = y1 exp(−ıω) and given for i ≥ 1 by way. For example, in the analysis of long memory processes (23) one may follow [33], omitting higher frequency terms. µt+1 = (1 − ρ)µt + ρyt+1 The second problem we face in the evaluation of the y¯t+1 = yt+1 − µt+1 (24) likelihood is the calculation of the time-frequency estimate At+1 = (1 − ρ)At + ρy¯t+1Bt (25) Iˆ(t,ω) at every iteration. For the method to be computation- Bt+1 = exp(−ıω)(Bt +y ¯t+1) (26) ally feasible, it is required to estimate the evolving spectrum Σ = (1 − ρ)Σ + ρy¯2 , (27) recursively. The method used here is as follows. Consider t+1 t t+1 ¯ E the PSD of a stationary process (with {Yt = Yt − (Yt)}) from which one can evaluate the estimate Iˆ(t,ω): +∞ ˆ 1 E ¯ 2 E ¯ ¯ I(t,ω)= (Σt + 2At) . (28) 2πf(ω) = (Yt ) + 2 YtYt−τ 2π τ=1 X Generalisations to multivariate time series, and a time- exp(−ıωτ) (18) scale variant, are also possible. A common modification is +∞ E ¯ 2 E ¯ ¯ the introduction of a “lag window” λ into the definition of = (Yt ) + 2 [Yt Yt−τ +∞ B = y¯t−τ λ(τ)exp(−ıωτ) to reduce the variance of τ=1 τ=1 X the PSD estimator (see [34], for example). In this case, only exp(−ıωτ)], (19) equationP (26) changes. For example, if λ(τ)= ντ , we have where E is the expectation with respect to the probability dis- B = ν exp(−ıω)(B +y ¯ ). (29) tribution of the process. The recursive estimation of the first t+1 t t+1 term from realisations of {Yt} is standard (see the recursion D. Overall algorithm for {Σt} below), but this is far less standard for the second term. However one can notice that the second term can be An example of the method described in this section is interpreted as being the product of two random variables, given in algorithm 1. Here the SIR particle filter [35] is ¯ +∞ ¯ Yt and B = τ=1 Yt−τ exp(−ıωτ) (whose distributions used, with exponential windows in lag and time for the time- are by assumption independent of the time index t). One frequency estimate. Note that this particular algorithm is pre- can generate randomP variables asymptotically distributed sented only for expository purposes - for many applications according to the distribution of B using the following more sophisticated algorithms may be used. recursion (assuming here for simplicity that E(Yt) = 0), initialised with B1 = Y1 exp(−ıω) and defined for t ≥ 1 as E. Discussion of the choice of likelihood

Bt+1 = exp(−ıω)(Bt + Yt+1). (20) Our favoured model for the data, in equation (17), is derived through taking several different approximations to Naturally these random variables are dependent by con- the exact likelihood of a locally stationary Gaussian process. struction, but under rather mild ergodicity assumptions on Speciﬁcally, these approximations are: one can construct the following estimator for {Yt} t ≥ 2 1) the use of a Whittle/Dahlhaus approximation, as in 1 t−1 equation (6); A = Y B . (21) 2) the use of a numerical approximation to the integral in t t j+1 j j=1 equation (6), analagous to the one in 3 in the stationary X This sum can itself be recursively updated as follows case; 3) substituting the pre-periodogram I˜T (t,ω) in equation 1 1 (6) for our own recursive estimator in equation (28). At+1 = (1 − )At + Yt+1Bt, (22) t + 1 t + 1 The effect of each of these approximations has been inves- and the PSD easily estimated by combination with an tigated in previous work, most of which focusses on the use estimator of the variance. of the approximations in the maximum likelihood setting, In the non-stationary case {At} can be thought of as thus the theoretical results about the approximations only being “instantaneous” estimates of the cross term in the concern the large data limit. In this limit it has been shown expression for the PSD. These estimates are likely to have a in [21] that when the ﬁrst two approximations are used, the high variance and hence averaging over several neighbouring approximate likelihood tends to the exact likelihood. [30] time instants might be needed, resulting in the introduction proves an equivalent result when local periodograms are used of the classical bias/variance dilemma. An example of such as an alternative to the pre-periodogram, but no equivalent a smoothing procedure is described below, where the time result yet exists when our recursive estimator is used. dependent and decreasing stepsize 1/(t + 1) in the recur- There is a further difference between the approach in [21] sion above has been replaced with a time invariant factor and our chosen likelihood, which is that we use a different ρ ∈ [0, 1], whose role is to discard estimates far in the past. parameterisation of the plane. In [21] the time-frequency 6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

Algorithm 1 The SIR particle ﬁlter applied to sequential linear and one quadratic, and the second (with a sampling inference of a TFR of a process. frequency of 104Hz) consists of three components, two of T Input: A realisation of a time series, {yt}t=1, a set of which exhibit a frequency modulation. In this section we will M frequencies {ωj}j=1, a number of particles P , parameters apply our methodology to sequentially infer the fundamental of the recursive estimator ρ, ν. frequency of the two chirps. (i) (i) P Output: A weighted sample (θt+1,wt+1)i=1 from the Spectrograms of these signals are shown in Fig. 1 (a) and posterior p(θt+1 | y1:t+1) on receipt of each data point. Fig. 3 (a) respectively. For the ﬁrst signal, since the chirps overlap in time-frequency space, separately inferring their

µ1 = y1; fundamental frequencies is not completely straightforward y¯1 = 0; directly from the spectrogram (or other traditional TFRs). Σ1 = 0; However, similar problems to this are often encountered in for j =1: M target tracking and are routinely solved through the use of ωj B1 = 0; parametric models. The framework developed in this paper ωj A1 = 0; allows us to apply the same approach here. We note that end this problem is relatively simple - our reason for including for i =1: P these analyses is expository: firstly as a simple example of (i) Simulate θ1 ∼ p(θ1); the utility of a parametric approach; and secondly since the (i) existence of a ground truth enables a quantitative analysis Let w1 = 1/P ; end of the effect of different parameters of the algorithm. for t =1: T − 1 µt+1 = (1 − ρ)yt+1 + ρµt; B. Bayesian model y¯t+1 = yt+1 − µt+1; To model the components we parameterise the time- T Σt+1 = (1 − ρ)¯yt+1(¯yt+1) + ρΣt; frequency plane so that at each time at which the signal for j =1: M is observed, frequency space is modelled using a mixture of ωj ωj T ωj At+1 = (1 − ρ)¯yt+1(Bt ) + ρAt ; K kernels plus a constant. Specifically we use the model: ωj ωj B = ν exp(−iωj)(¯yt+1 + B ); t+1 t K ˆ 1 ωj a(k) I(t + 1,ωj)= (Σt+1 + 2At+1); 2π fθ(ω)= + c (30) end (k) 2 (k) k=1 1+ ω − µ /v for i =1: P X (i) (i) The parameter µ(k) represents the positions of the two com- Simulate θt+1 ∼ p(. | θ1:t,y1:t+1); ponents. To allow us to distinguish between the components Reweight w(i) = w(i) exp − 1 M log (2π)2d t+1 t 2 j=1 when they have the same location, we also include the −1 ′(k) (i) tr (i) ˆ det fθ (t + 1,ωj) + fθ (t + 1,ωPj) I(t + 1,ωj) derivative, µ , of the component position in our model. In t+1 t+1 K ; i h io this model θ = µ(1), µ˙ (1),v(1),a(1) ,c ∈ Θ, where end k=1 µ(k) ∈ [0, π), µ˙ (k) ∈ (−∞, +∞), v(k),a(k),c∈ [0, +∞). Resample (i) (i) P ; (θt+1,wt+1)i=1 Our a priori model for the evolution of each of these end parameters is a random walk, except for the location parameters for which we use a constant velocity (CV) model. We have plane is parameterised by a single parameter θ, whereas in 2 I ct+1|θt ∼N (ct+1; ct,σc τ) (ct+1 ≥ 0) (31) our work we use a vector of parameters θ1:T , one for each data point. This is significant since [21] studies the behaviour and (k) (k) (k) 2 I (k) of the likelihood in the large data limit, with the dimension vt+1|θt ∼N (vt+1; vt ,σvτ) (vt+1 ≥ 0) (32) of the parameter unchanged in this limit. In our approach the (k) (k) (k) 2 I (k) size of the parameter increases with the size of the data, thus at+1|θt ∼N (at+1; at ,σaτ) (at+1 ≥ 0) (33) the theory of [21] does not directly apply to our likelihood. (k) (k) (k) µt+1 µt+1 µt The sense in which our chosen model approximates the exact (k) |θt ∼ MVN (k) ; A (k) ,Q , likelihood of a locally stationary Gaussian process is a topic µ˙ t+1 ! µ˙ t+1 ! µ˙ t ! ! for future work that would provide statistical justification for (34) where our approach. 1 τ A = (35) 0 1 IV. APPLICATION: TRACKINGMULTIPLECOMPONENTS and A. Data description and spectrogram τ 3/3 τ 2/2 Q = σ2 , (36) τ 2/2 τ µ In this section we apply our method to the analysis of two signals, both over the domain 0 ≤ t ≤ 2: the first (with a with τ = 0.001 for the first signal and τ = 0.0001 for sampling frequency of 103Hz) consists of two chirps, one the second. We used algorithm 1, and our default parameter EVERITT et al.: ONLINE BAYESIAN INFERENCE IN SOME TIME-FREQUENCY REPRESENTATIONS OF NON-STATIONARY PROCESSES 7

settings are to use 500 particles, to take ν = 0.97 and parameter, large deviations from the constant velocity model 2 −5 2 −3 2 −3 ρ = 0.97, and to use σc = 10 , σv = 10 , σa = 10 are possible, thus the ability of the algorithm to distinguish 2 and σµ = 1. These prior parameters are not optimal. They between the two separate components is diminished, also were determined through a consideration of how much the resulting in a large error. parameters might evolve in one time step and tuned through Finally we consider the analysis of the second signal. Fig. pilot runs (we note that more accurate results may be 3 (b) shows the estimated expected fundamental frequencies obtained by using more particles and appropriately tuned of the components from the output of the particle filter. We priors). Below we examine the sensitivity of our results to observe that the three components are successfully tracked, the choice of priors. along with the frequency modulation that is observed in the spectrogram for the two lower frequency components. C. Results We first consider the analysis of the first signal. Fig. V. APPLICATION: FLUTE 1 (b), (c) and (d) display the results of a single run of A. Pitch transcription the algorithm under the default settings. The TFR obtained TFRs are a natural tool for the analysis of musical data ˆ by the recursive non-parametric estimator, I(t,ω) given by (for example, [15], [36], [37]). Further, as described in [38], equation (28) and estimated expected reconstruction from such data is particularly amenable to a Bayesian analysis: the particle filter, i.e. E for each time are θt|y1:t [fθt (ω)] t often accurate physical models for the notes produced by shown in Fig. 1 (b) and (c) respectively. The non-parametric different instruments are known, thus the use of prior estimator gives comparableb results to that of the spectrogram knowledge in parameterising the time-frequency plane is in Fig. 1 (a), and we observe that the components in the natural. In this section we consider the problem of pitch Bayesian reconstruction of the TFR follow the chirps closely. transcription: estimating the pitch, onset time and duration However, the strengths of our approach are not fully illus- of notes in a music signal. In the frequency domain a trated through simply examining the TFRs that are inferred. note in a music signal consists of a fundamental frequency, More detailed information about the signal under study can determining the pitch of the note, and partials or harmon- be obtained through examining the posterior distribution ics at approximately integer multiples of the fundamental of θt itself. In particular, the posterior distribution on the frequency, whose amplitudes dictate the note’s timbre. In (k) µ allows us to retrieve estimates of the fundamental the polyphonic case, the resultant complex structure in the frequencies of the chirps over time. Fig. 1 (d) shows the time-frequency plane makes pitch transcription challenging (k) estimated expected fundamental frequencies, E (k) [µ ] µt |y1:t [16]. Here we consider the monophonic case as a simple of the chirps estimated from the particle filter (compared to illustration of the methodology introduced in the paper. We the true values). We observe that the Bayesianb approach has use a prior model for the time-frequency plane similar to successfully tracked the two overlapping components. that in [39] and the likelihood described in section III. We now explore the dependence of our method on the chosen prior parameters through examining the mean squared error of the posterior expectation of the fundamental frequency B. Transcription of monophonic flute data of each component over multiple runs of the particle filter. We use our methodology for the transcription of a Note that this method is not necessarily the most appropriate flute playing the opening twelve seconds of Debussy’s way to evaluate a Bayesian technique since the aim of Syrinx, available from the first authors’s webpage at such an approach is not usually to obtain estimators with http://www.personal.reading.ac.uk/ good frequentist properties, but it does provide a quantitative ~gt904211/flute.wav. In frequency space, we choose approach to describing the sensitivity of our method to the the following harmonic model for a flute note: choice of prior. Fig. 2 shows the log mean squared error over K time of several choices for the different parameters compared (k) (k) 0 2 fθ(ω)= c + a / 1+((ω − (k + δ )ω ) /S) . to the default choice. In general we observe some sensitivity k=1 to prior choice: changing the prior standard deviation by X (37) an order of magnitude for any of the parameters can have For the data we analyse taking K = 3 (so that two a large effect on the results, although finer tuning of the partials are modelled) is sufficient to account for the most parameters was not found to be necessary. These observa- important parts of the signal (although note that, as expected, tions are congruent with the situation that is encountered in higher partials are observed in the data). In this case, many other target tracking problems. The effect of altering θ = (ω0,S,a(1),a(2),a(3),δ(1),δ(2),δ(3),c) ∈ Θ, with fun- 2 the prior on the σµ parameter is particularly clear. For damental frequency ω0 ∈ [0, π), peak width S ∈ [0, +∞), very small values of the parameter the prior informs the peak amplitudes a(k) ∈ [0, +∞), detuning parameters [38] posterior more than the likelihood, and thus the posterior δ(k) ∈ (−∞, +∞) and constant c ∈ [0, +∞). very closely follows the constant velocity model with the In passing we note that, as in [38], polyphonic data can be result that it completely loses track of the quadratic chirp modelled simply by using a sum over several such models relatively quickly (by 0.3s). Whereas, for large values of the (one term per note). Our a priori model for the evolutionary 8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

(a) A spectrogram. (b) The TFR inferred by the obtained by the recursive non-parametric estimator given by equation (28).

(c) The TFR given by the posterior expectation from (d) The posterior expectations of µ(1) and µ(2) esti- the particle ﬁlter. mated from the particle ﬁlter output (black) and the true fundamental frequencies (grey).

Fig. 1: Analysis of the first signal. spectrum of a note is that the parameters in θ each evolve of magnitude of the initially specified priors was important. 2 −2 with random walks (some truncated): For example, we found that setting σω0 = 10 results in 0 0 0 2 0 the filter losing resolution of the finer features (such as the ω |θ ∼N (ω ; ω ,σ 0 )I(0 ≤ ω ≤ π) (38) t+1 t t+1 t ω t+1 frequency modulation) of the evolution of the note (although 2 I St+1|θt ∼N (St+1; St,σS) (St+1 ≥ 0) (39) we observed that a change of a similar magnitude to another (k) (k) (k) 2 I (k) of the parameters does not have such a dramatic effect). We at+1|θt ∼N (at+1; at ,σa) (at+1 ≥ 0) (40) used algorithm 1 with 100 particles, taking ν = ρ = 0.999. (k) (k) (k) 2 (41) δt+1|θt ∼N (δt+1; δt ,σδ ) The data was downsampled to 22050 Hz. 2 ct+1|θt ∼N (ct+1; ct,σc ) (42) The log of the TFR obtained by the recursive non- The parameters of these priors were chosen in such a way parametric estimator (given by equation (28)) and log of the as to allow the note to change quickly enough to describe reconstruction from the expected parameter values found by normal musical data, but not so much as to take account the particle filter are shown in Fig. 5. In both representations of unrealistic changes (in which case a large number of we observe the basic structure of the signal: the rising and particles would be needed to obtain accurate results). Specif- falling notes played by the flute; the “steps” representing 2 −4 2 −12 2 −8 the individual notes that are played; and the decomposition ically, we chose σω0 = 10 , σS = 10 , σa = 10 , 2 −16 2 −10 of the signal into a fundamental frequency and partials (in σδ = 10 and σc = 10 : these choices are sufficiently small to impose the desired smoothness on the inferred the non-parametric estimator the log scale also shows higher time-frequency plane, whilst still large enough to allow the order partials that are not included in the model). changes in note that we expect in the flute data. Pilot runs The Bayesian reconstruction of the time-frequency plane of the algorithm suggest that the sensitivity of our results is smoother than the non-parametric version (as is promoted to these priors is not dissimilar from our observations in by our choice of prior), but not so much as to obscure the section IV, in that intricate tuning is not required in order discrete changes between the notes. Again, we emphasise to obtain adequate results, but pilot runs to check the order that a more direct use of the posterior distribution of our EVERITT et al.: ONLINE BAYESIAN INFERENCE IN SOME TIME-FREQUENCY REPRESENTATIONS OF NON-STATIONARY PROCESSES 9

(1) (1) (a) log MSE of the estimate of µt for (b) log MSE of the estimate of µt for (c) log MSE of the posterior expectation of (d) log MSE of the posterior expectation of different values of σ2 . different values of σ2. (1) 2 (1) 2 µ v µt for different values of σa. µt for different values of σc .

(2) (2) (e) log MSE of the estimate of µt for (f) log MSE of the estimate of µt for (g) log MSE of the posterior expectation of (h) log MSE of the posterior expectation of different values of σ2 . different values of σ2. (2) 2 (2) 2 µ v µt for different values of σa. µt for different values of σc .

Fig. 2: The log mean-squared error of the posterior expectations of the fundamental frequency of the linear chirp (top) and the quadratic chirp (bottom) over 50 runs of the particle ﬁlter for different prior parameters. In each case the result for the default parameters uses a solid line.

(a) A spectrogram. (b) The posterior expectations of µ(1),µ(2),µ(3) estimated from the particle ﬁlter output (black) and the true fundamental frequencies (grey).

Fig. 3: Analysis of the second signal.

parametrisation of the time-frequency plane can reveal a where much richer and subtle structure underpinning the signal being analysed. Examination of the posterior distribution of 12 22050ω0 note(ω0) = 69 + log − log(440) our parameterisation of the time-frequency plane can tell log(2) 2π us more about the data. For example, Fig. 6 (a) shows the (43) estimated expected note played by the ﬂute at each time step, is the function that converts the fundamental frequency of the 0 E 0 [note(ω )], found by the sample mean of note(.) note into its MIDI pitch number. We observe that our method ωt |y1:t evaluated at the fundamental frequency of each particle, has successfully tracked the short changes in the note played b by the ﬂute, but also see the superimposed modulation that is 10 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

heard in the audio file towards the end of the longer notes; and we expect parallel implementations to make a significant a feature that is not immediately obvious from the TFRs contribution here. We note that this paper only considers a themselves. Further, our Bayesian approach also allows us to relatively low-dimensional example, but we also expect the assess the uncertainty associated with such summaries of the methods introduced here to be of use in high dimensional data. For example, Fig. 6 (b) shows the posterior uncertainty settings. In such cases, since particle filters can become over the note played by the flute at each time, as represented challenging to implement for high-dimensional problems, by the transformation of the fundamental frequency of each the careful design of the particle filtering algorithm becomes particle into its pitch number. Fig. 6 (c) shows the standard more important, but the basic framework remains the same. deviation of the expected note estimated from 50 runs of We anticipate that in such cases, the simple SIR filter will not the particle filter - we see that this error is small compared be appropriate, and the users of the method introduced here to the posterior uncertainty. We find 100 particles to be will need to draw on the large literature (e.g. [32]) devoted sufficient for this application - using 1000 particles does to the design of particle filters in such settings. There is also not dramatically alter the results. a clear opportunity for the study of the theoretical properties of our approach. Whilst our local Whittle likelihood is inspired by that in [21], our approach contains several differences and thus the theoretical results in that paper do not apply. We note that in order to formalise the notion of our approach being used to infer a TFR, some constraints on the choice of prior may be necessary; for example, it may be important to represent the well documented Uncertainty Principle between time and frequency [40].

APPENDIX Definition of the evolutionary spectrum The framework is similar to that for stationary processes. Suppose that {Xt,N | t = 1,...,N} is a Dahlhaus locally stationary process with transfer function matrix Λ0 and mean Fig. 4: The standard deviation of the estimated expected note function vector µ. That is, directly from Dahlhaus [21], there found from 50 runs of the particle filter. exists a representation t π VI.CONCLUSIONS X = µ + exp{ıωt}Λ0 (ω)dξ(ν) (44) t,T T ˆ t,T This paper presents a new methodology for the sequential −π Bayesian estimation of a TFR of a time series, providing a with the following properties: connection between Bayesian estimation and the literature 1) ξ(ω)is a complex valued Gaussian vector process on on TFRs. The key underpinning ideas in the work are the [−π, π] with ξa(ω)= ξa(−ω), E[ξa(ω)] = 0 and use of a likelihood motivated by the extension of the Whittle approximation in [21] and, within this, the use of a recursive E[dξa(ω1) dξb(ω2)] = δabη(ω1 + ω2) dω1 dω2, (45) estimate of a time-frequency estimate. Our approach has the where η(ω) = ∞ δ(ω + 2πj) is the period 2π following features: j=−∞ extension of the Dirac delta function. 1) The Bayesian approach is used, which allows the use 2) There exists a constantP K and a 2π-periodic matrix of prior knowledge in the modelling of a TFR. valued function Λ : [0, 1]×R → Cd×d with Λ(u,ω)= 2) Signals are modelled directly in the time-frequency Λ(u, −ω) and domain in a flexible manner. 3) Inference of the evolutionary spectrum is performed t sup Λ0 (ω) − Λ ,ω ≤ KT −1 (46) sequentially, which is particularly useful for some t,T ab T t,ω applications. for all a,b = 1,...,d and T ∈ N . Λ(u,ω) and µ(u) The work in this paper paves the way for a wider use of are assumed to be continuous in u. Bayesian methods in time-frequency analysis. Our frame- T work permits the freedom to use the full power of Bayesian We define that f(u,ω) = Λ(u,ω)Λ(u,ω) is the time- methodology in time-frequency estimation: this should make varying spectral density matrix or evolutionary spectrum of a significant impact in applications where advanced mod- the process. els (for example, trans-dimensional and/or semi-parametric models) are appropriate. REFERENCES There are several opportunities for future research arising [1] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni, “Time- from this work. We have already mentioned that the compu- frequency analysis of musical signals,” Proceedings of the IEEE, tational cost of the algorithm is an important consideration, vol. 84, no. 9, pp. 1216–1230, 1996. EVERITT et al.: ONLINE BAYESIAN INFERENCE IN SOME TIME-FREQUENCY REPRESENTATIONS OF NON-STATIONARY PROCESSES 11

(a) The TFR inferred by the obtained by the recursive non-parametric (b) The TFR obtained by taking posterior expectations from the particle ﬁlter estimator introduced in section III-C. output.

Fig. 5: TFRs inferred from the ﬂute data.

(a) The expected note played by the flute. (b) The posterior uncertainty about the note played by the flute, as represented by the output of the particle filter.

Fig. 6: The note played by the ﬂute, estimated from the output of the particle ﬁlter.

[2] J. Dripps, “An introduction to time-frequency methods applied to [5] M. Davy, A. Gretton, A. Doucet, and P. J. W. Rayner, “Optimized biomedical signals,” in Time-Frequency Analysis of Biomedical Sig- Support Vector Machines for Nonstationary Signal Classiﬁcation,” nals (Digest No. 1997/006), IEE Colloquium on. IET, 1997, pp. IEEE Signal Processing Letters, vol. 9, no. 12, 2002. 1–1. [6] C. Richard, “Time-frequency-based detection using discrete-time [3] I. Spulber, P. Georgiou, A. Eftekhar, C. Toumazou, L. Duffell, discrete-frequency Wigner distributions,” IEEE Transactions on Sig- J. Bergmann, A. McGregor, T. Mehta, M. Hernandez, and A. Burdett, nal Processing, vol. 50, no. 9, 2002. “Frequency analysis of wireless accelerometer and emg sensors data: [7] B. W. Gillespie and L. E. Atlas, “Optimizing time-frequency kernels Towards discrimination of normal and asymmetric walking pattern,” for classiﬁcation,” IEEE Transactions on Signal Processing, vol. 49, in Circuits and Systems (ISCAS), 2012 IEEE International Symposium no. 3, 2001. on. IEEE, 2012, pp. 2645–2648. [8] E. Chassande-Mottin, P. Flandrin, and F. Auger, “On the Statistics of [4] P. Flandrin, “A time-frequency formulation of optimum detection,” Spectrogram Reassignment Vectors,” Multidimensional Systems and IEEE Transactions on Acoustics, Speech, and Signal Processing, Signal Processing, vol. 9, no. 4, 1998. vol. 36, no. 9, pp. 1377–1384, 1988. [9] S. Barbarossa, “Analysis of multicomponent LFM signals by a 12 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, JANUARY 20XX

combined Wigner-Hough transform,” IEEE Transactions on Signal [35] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to Processing, vol. 43, no. 6, 1995. nonlinear/non-Gaussian Bayesian state estimation,” IEE Proceedings [10] B. Boashash, “Estimating and interpreting the instantaneous frequency on Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993. of a signal,” Proceedings of the IEEE, vol. 80, no. 4, 1992. [36] A. Klapuri, “Pitch estimation using multiple independent time- [11] O. Michel, P. Flandrin, and A. O. Hero III, “Automatic extraction of frequency windows,” in Proceedings of the 1999 IEEE Workshop on time-frequency skeletons with minimal spanning trees,” in Proceed- Applications of Signal Processing to Audio and Acoustics, 1999, pp. ings of the IEEE International Conference on Acoustics, Speech, and 115–118. Signal Processing, 2000. [37] A. Klapuri and M. Davy, Signal processing methods for music [12] M. Davy and C. Doncarli, “A New Nonstationary Test Procedure transcription. Springer-Verlag, 2006. for Improved Loudspeaker Fault Detection,” Journal of the Audio [38] M. Davy and S. J. Godsill, “Bayesian harmonic models for musical Engineering Society, vol. 50, no. 6, 2002. signal analysis,” in Bayesian Statistics 7, J. M. Bernardo, M. J. [13] P. Bonato, S. H. Roy, M. Knaflitz, and C. J. De Luca, “Time- Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, Frequency Parameters of the Surface Myoelectric Signal for Assessing and M. West, Eds. Oxford University Press, 2003, pp. 105–124. Muscle Fatigue during Cyclic Dynamic Contractions,” IEEE Trans- [39] C. Dubois and M. Davy, “Joint detection and tracking of time- actions on Biomedical Engineering, vol. 48, no. 7, 2001. varying harmonic components: a flexible Bayesian approach,” IEEE [14] F. Hlawatsch and F. Auger, Time-frequency analysis: concepts and Transactions on Audio, Speech and Language Processing, 2008. methods. Wiley, 2008. [40] P. Flandrin, Time-Frequency / Time-Scale Analysis. Academic Press, [15] P. J. Wolfe, S. J. Godsill, and W.-J. Ng, “Bayesian Variable Selection 1999, Translated from the French by Joachim Stöckler. and Regularization for Time-Frequency Surface Estimation,” Journal of the Royal Statistical Society Series B, vol. 66, no. 3, pp. 575–589, 2004. [16] M. Davy, S. J. Godsill, and J. Idier, “Bayesian analysis of polyphonic western tonal music,” Journal of the Acoustical Society of America, vol. 119, no. 4, pp. 2498–2517, 2006. [17] C. Andrieu, M. Davy, and A. Doucet, “Efficient Particle Filtering Richard G. Everitt Richard G. Everitt received for Jump Markov Systems. Application to Time-Varying Autoregres- a BSc in Mathematics from the University of sions,” IEEE Transactions on Signal Processing, vol. 51, no. 7, pp. Warwick in 2000 and a PhD in Statistics from 1762–1769, 2003. the University of Bristol in 2008. From 2008- [18] C. Dubois, M. Davy, and J. Idier, “Tracking of time-frequency com- 2010 he was a Brunel Fellow in Statistics at the ponents using particle filtering,” in Proceedings of the International University of Bristol, then from 2010-2012 was a Conference on Acoustics, Speech, and Signal Processing, 2005. postdoctoral researcher working on statistics and [19] O. Rosen, D. S. Stoffer, and S. Wood, “Local Spectral Analysis via pathogen genomics at the University of Oxford. a Bayesian Mixture of Smoothing Splines,” Journal of the American He is currently a Lecturer in Statistics at the Statistical Association, vol. 104, no. 485, pp. 249–262, 2009. University of Reading, with research interests in [20] S. Wood, O. Rosen, and D. Stoffer, “Adaptspec: Adaptive spectral Bayesian modelling, Monte Carlo methods, sta- estimation for nonstationary time series,” University of Pittsburgh, tistical genetics, pathogen genomics, epidemiology, social network analysis Tech. Rep., 2011. and signal and image processing. [21] R. Dahlhaus, “A likelihood approximation for locally stationary processes,” Annals of Statistics, vol. 28, pp. 1762–1794, 2000. [22] P. Whittle, “The analysis of multiple stationary time series,” Journal of the Royal Statistical Society, Series B, vol. 15, no. 1, pp. 125–139, 1953. [23] K. Dzhaparidze, Parameter estimation and hypothesis testing in spectral analysis of stationary time series. Springer, 1986. Christophe Andrieu Christophe Andrieu grad- [24] C. Carter and R. Kohn, “Semiparametric Bayesian inference for time uated from Telecom Paris-Sud and obtained his series with mixed spectra,” Journal of the Royal Statistical Society, PhD in statistical signal processing from Univer- Series B, vol. 59, no. 1, pp. 255–268, 1997. sity Paris XV. He was a research associate in [25] G. Petris and M. West, “Bayesian time series modelling and prediction the signal processing group at the University of with long range dependence,” Duke University, Tech. Rep., 1998. Cambridge 1998-2001. Since 2001 he has been [26] N. Choudhuri, S. Ghosal, and A. Roy, “Bayesian Estimation of the a faculty member of the Statistics group at the Spectral Density of a Time Series,” Journal of the American Statistical University of Bristol where he is now a professor Association, vol. 99, no. 468, pp. 1050–1059, 2004. in Statistics. His research interests are in the [27] R. Dahlhaus, “On the Kullback-Leibler information divergence of area of computational methods for statistics and locally stationary processes,” Stochastic Processes and their Appli- cognate areas. He has been on the editorial board cations, vol. 62, pp. 139–168, 1996. of JRSS-B and is currently on the editorial board of the Annals of Statistics. [28] ——, “Asymptotic statistical inference for nonstationary processes with evolutionary spectra,” Universität Heidelberg, Tech. Rep., 1996. [29] M. Neumann and R. von Sachs, “Wavelet thresholding in anisotropic function classes and application to adaptive estimation of evolutionary spectra,” Annals of Statistics, vol. 25, pp. 38–76, 1997. [30] R. Dahlhaus, “Fitting time series models to non-stationary processes,” Annals of Statistics, vol. 24, pp. 1–37, 1997. Manuel Davy Manuel Davy graduated in Engineering in 1996 from the [31] A. Doucet, N. De Freitas, and N. Gordon, Sequential Monte Carlo in Ecole Centrale de Nantes and he obtained his Ph.D. in Signal Processing Practice. Springer-Verlag, 2001. at the University of Nantes in 2000. He was a research associate at the [32] A. Doucet and A. M. Johansen, “A tutorial on particle filtering and University of Cambridge from 2000 to 2002, and became a full time smoothing: Fifteen years later,” in Oxford Handbook of Nonlinear researcher at the French National Research Center (CNRS) in the LAGIS Filtering, D. Crisan and B. Rozovsky, Eds. Oxford University Press, laboratory located in Lille, France, from 2002 to 2012. In 2008, he created 2009. the start-up company Vekia SAS, an editor of softwares to handle stock [33] H. R. Künsch, “Statistical aspects of self-similar processes,” in and manpower in shops by using forecast sales. The company employs 35 Proceedings of the First World Congress of the Bernoulli Society, persons. His research interests are Signal Processing, Bayesian statistics, Y. Prohorov and V. V. Sazanov, Eds., vol. 1. Utrecht: VNU Science Machine Learning, times series forecasting and audio signals. Press, 1987, pp. 67–74. [34] M. Priestley, Spectral Analysis and Time Series. Academic Press, 1981.