A&A 388, 1124–1138 (2002) Astronomy DOI: 10.1051/0004-6361:20020520 & c ESO 2002 Astrophysics

Joint Time–Frequency Analysis: A tool for exploratory analysis and filtering of non-stationary time series

R. Vio1,2 and W. Wamsteker2

1 Chip Computers Consulting s.r.l., Viale Don L. Sturzo 82, S.Liberale di Marcon, 30020 Venice, Italy 2 ESA-VILSPA, Apartado 50727, 28080 Madrid, Spain e-mail: [email protected]

Received 24 July 2001 / Accepted 3 April 2002

Abstract. It is the purpose of the paper to describe the virtues of time-frequency methods for signal process- ing applications, having astronomical time series in mind. Different methods are considered and their potential usefulness respectively drawbacks are discussed and illustrated by examples. As areas where one can hope for a successful application of joint time-frequency analysis (JTFA), we describe specifically the problem of signal denoising as well as the question of signal separation which allows to separate signals (possibly overlapping in time or frequency, but) which are living on disjoint parts of the time-frequency plane. Some recipes for practical use of the algorithms are also provided.

Key words. methods: data analysis – methods: numerical – methods: statistical

1. Introduction – The implicit assumption that it is possible to obtain the equations describing the evolution of a given sys- The analysis of time series has always been an issue of tem solely through the analysis of an experimental sig- central interest to astronomers. Indeed, the most natural nal. In reality, if one takes into account that a given way to get some insights on the characteristics of a phys- time series is only the one-dimensional output of a sys- ical system consists in studying its behavior over time. tem which very often can be described only in terms of Unfortunately, maybe because until few years ago the a set of ordinary and/or partial differential equations most interesting objects known to change their luminos- (in other words a time series is the one-dimensional ity were periodic/semiperiodic stars and multiple star sys- projection of a more complex dynamics), it is easy to tems, time series analysis became synonymous of research understand that this position is not well founded. of periodicities and power-spectrum became the tool for signal analysis. Only in the eighties people began to realize The consequence of these points is that if not carried out that various astrophysical objects, principally extragalac- in a well defined physical context, signal analysis can be tic sources, can show very complex and unpredictable time expected to be useful only for explorative purposes (e.g. evolutions (see, for example, Miller & Wiita 1991; Duschl feature detection) and/or for filtering tasks (e.g. removal et al. 1991). In spite of that, although inadequate, power- of unwanted components as, for example, noise or spurious spectrum continued to be the preferred approach by most contributions). Furtehrmore, in order to obtain meaning- of researchers. Such a predilection can be explained with ful results, the nonstationarity of signals must be explicitly the substantial failure of the methods proposed as alter- considered in the analysis. natives (Vio et al. 1992). The main limitations of such methods are two: 2. Data exploration – The implicit assumption of stationarity of the pro- cesses underlying the signals. In other words, an ob- 2.1. Limits of the classical approach served signal is assumed to be characterized by statis- tical properties that do not change with time. However, It is well known that the power-spectrum PS(ν)ofacon- real signals in their vast majority are nonstationary. tinuous and integrable (possibly complex) signal x(t)is defined as Send offprint requests to:R.Vio, e-mail: [email protected] PS(ν)=|FT(ν)|2, (1)

Article published by EDP Sciences and available at http://www.aanda.org or http://dx.doi.org/10.1051/0004-6361:20020520 R. Vio and W. Wamsteker: JTFA of non-stationary time series 1125

2 0.5 (a) 1 0.45

0 0.4 Flux

−1 0.35

−2 0.3 0 20 40 60 80 100 120 Time (s) 0.25

0.2 1 Frequency (Hz) (b) 0.8 0.15 0.6 0.1 0.4 0.2 0.05 Normalized Power

0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 20 40 60 80 100 120 Frequency (Hz) Time (s)

Fig. 1. a) Realization of the non-stationary process described Fig. 2. Grayscale image of the ideal power-spectrum IPS(t, ν) in the text. b) Corresponding normalized power-spectrum. corresponding to the signal shown in Fig. 1a. where the frequency range is the full real line, i.e. −∞ < spectrum clearly contains no information about the time ν<+∞,and intervals on which these two frequencies occur. Moreover, ∞ Z+ since PS(ν) =6 0 also at frequencies ν =6 ν1,ν2, one could FT(ν)= x(t)e−j2πνt dt (2) erroneously infer that x(t) contains more than two sinu- −∞ soidal components. The main problem is that, if a sig- nal changes its characteristic over time, a representation is the of x(t). PS(ν) can be considered limited exclusively to the frequency domain is absolutely as a measure of the similarity between signal x(t)andthe inadequate. Indeed, the function exp(−j2πνt)doesnot complex sinusoid exp(−j2πνt). Indeed, if FT(ν) is again have any temporal localization: it has neither start nor integrable, the Fourier inversion formula reads end. This is an useful characteristic only in case of sig-

Z+∞ nals with constant statistical properties (i.e. periodic or j2πνt stationary signals). Unfortunately, in nature this kind of x(t)= FT(ν)e dν, (3) signals is not so common. Therefore, for a good analysis of −∞ most “natural” signals it is necessary to use a representa- i.e., the signal can be written (uniquely) as a kind of super- tion that is able to consider also the temporal dimension. position of pure frequencies. Therefore, the larger PS(ν), A possible solution is a representation, that we call ideal the more important is the contribution of the pure fre- power-spectrum,IPS(t, ν), defined as the squared ampli- quency ν within x(t). This is the reason why PS(ν)isso tude of the sinusoids that at a given time instant t consti- useful in the search of periodic components. However, this tute the signal of interest. That is shown in Fig. 2 where is also its more serious limit in the analysis of non-periodic it is possible to realize that with IPS(t, ν), contrary to signals (Feichtinger & Strohmer 1998). Figure 1a shows a PS(ν), the characteristics of the signal in Fig. 1a are very signal constituted by two sinusoids with amplitudes dif- well defined; it is clear that such a signal consists of two ferent from zero only within a short time interval: pure sinusoids having different frequencies but amplitudes which evolve in the same way. x(t)=a1(t)sin(2πν1t)+a2(t)sin(2πν2t)(4) with ν1 =0.1Hz,ν2 =0.4Hzand 2.2. Time–frequency representation of continuous  signals 1if10≤ t ≤ 50 and 70 ≤ t ≤ 110 ; a (t),a (t)= (5) 1 2 0 otherwise, In the previous section we have seen that in case of non-stationary signals a joint time-frequency representa- with t expressed in seconds. The two key facts are: tion (JTFR) might be a good analysis tool. However, – this signal clearly is not periodic and the JTFR in Fig. 2 has been obtained directly from the – it does not maintain its characteristics constant over model with which the time series has been simulated. time (in poor words, it is non-stationary). Unfortunately, in an experimental context the dynami- cal model underlying a given time series will usually be From Fig. 1b it is evident that PS(ν) is not able to ex- unknown. Consequently, the problem consists in the way tract the salient features of x(t). In particular, the power such a JTFR can be estimated from a set of available data. 1126 R. Vio and W. Wamsteker: JTFA of non-stationary time series

The simplest solution is given by the so called Fourier principle, a possible way to mitigate this occurrence is the ,FS(t, ν), choice of a window w(t) for which the degradation of the time-frequency resolution is reduce to a minimum. In this | |2 FS(t, ν)= STFT(t, ν) , (6) regard, it has been proved (e.g., Qian & Chen 1996) that, if where −∞

it is easy to show that DFS[m, n]=|DSTFT[m, n]|2; (13) − ⊗ LXs 1 STFT(to,ν)=FT(ν) sincT (to,ν), (10) ∗ − DSTFT[m, n]= w [k − m] x[k]e j2πnk/Ls , (14) where the symbol “⊗” stands for convolution, and k=0

sin(Tπν) − j2πνto with m, n,0≤ m, n ≤ Ls − 1, representing the discrete sincT (to,ν)= e . (11) πν time and frequency indices, respectively.

This means that, in FS(to,ν), the narrowest structure Figure 3 shows the representation DFS[m, n] of a time along the frequency direction is charaterized by a fre- series obtained by sampling the signal in Fig. 1 with a time quency support equal to 2/T . Such value corresponds step of one second: in spite the loss of resolution described to the full width at zero intensity of the main lobe in above, the salient characteristics of the signal remain well 2 the graphical representation of |sincT (to,ν)| ; defined. – since FS(t, ν) is calculated by using all the points in a interval of length T centered at t,itsuffersalsoa 2.4. Some practicalities degradation of the resolution along the time dimension which is proportional to the length of w(t). Indeed, in 2.4.1. Digital implementation case of a signal constituted by an impulse at time t0, and again with a rectangular w(t), FS(t, ν) =0when6 The calculation of DFS[m, n] requires some technical ad- justments. The first one regards the edge effects, due to to − T/2 ≤ t ≤ to + T/2. In other words, the narrowest structure along the time direction is charaterized by a the finite length of the signal, that do not permit the time support equal to T . calculation of this representation for mLs − 1 − Lw/2, with Lw the length of w[k]. Usually From these two points it is easy to understand that the this problem is solved through one of the following three main limitation of FS(t, ν) lies in the impossibility to in- methods: crease simultaneously the time and the frequency resolu- tions. In other words, there is a trade-off: if w(t)ischosen 1) by assuming x[k]andw[k] to be periodic with period to have good time resolution (i.e., smaller T ), then its fre- Ls (NB. since usually Lw

Signal where 0 ≤ m

Flux ∆N) 1. In such a signal representation the coefficients −1 amn are called the Gabor coefficients and the Eq. (16) goes by the name Gabor representation (GR). Power DFS Given the above arguments, in order to avoid “un-

0.45 necessary” redundancy, one could be tempted to use a

0.4 SDFS[m, n] containing only Ls elements. In reality, numer-

0.35 ical stability of the expansion and easiness of the analysis

0.3 (Daubechies 1992, and see below), make it necessary to

0.25 work with redundant representations. The degree of re-

0.2 dundancy is expressed through the so called oversampling Frequency (Hz) Power Spectrum rate,sayα, 0.15 0.1 number of elements of SDFS[m, n] α = , (17) 0.05 length of x[k] 0 1600 800 0 20 40 60 80 100 120 Time (s) that will take values in the range [1,Ls]. When α =1, SDFS[m, n] is said to be sampled at a critical rate, whereas Fig. 3. Contour plot of the DFS[m, n] representation for the when α>1itissaidoversampled. Using a full short-time discrete time series, shown in the uppermost panel, correspond- ing to the signal in Fig. 1a when sampled at a frequency rate Fourier transform corresponds to the case α = Ls. of 1.0 Hz (the corresponding power-spectrum is shown in the leftmost panel). In this example the analysis function is an 2.4.2. Which Lw for w[k]? Hamming window with length Lw = 31. Levels are uniformed distributed between the 15% and 85% of the peak value. One important step in the construction of JTFR’s, such as SDFS[m, n], is the choice of the analysis function w[k]. Although the exact shape of this function is not so critical, 2) by zero padding one of the ends of x[k] with L data w the value of L can severely influence the final result. In and assuming the new time series to be periodic with w fact, in order to be meaningful, a SDFS[m, n] has to reflect period L + L (for an efficient implementation of this s w both the instantaneous frequency content and the (possi- method see Qian & Chen 1996); ble) non-stationarity of the signal. However, very often 3) by imposing that 0 ≤ m ≤ M , with M = L −L +1 T T s w these two requirements conflicts one with the other. The the number of possible shifts of w[k] inside the signal reason is that a large L is preferable when the frequency length (for an efficient implementation of this method w content is of interest and shows little changes, whereas see Munk 1997, 2001); the time evolution of a signal can be better tracked with If not stated otherwise, in the following the discussion will asmallLw. Again, we are forced to decide within a trade- be developed according to the first approach. off. A more serious drawback is represented by the fact Despite its importance and its long-time use in sig- that, by increasing the length of x[k], the corresponding nal processing, surprisingly this subject has found little DFS[m, n] becomes quickly an huge matrix. Indeed, al- attention in current literature (however, see Stankovic & 6 ready Ls = 1000 will result in a matrix with '10 ele- Katkovnik 1999; Stankovic 2001, and references therein). ments. A viable solution is to cut the signal in shorter A possible explanation is that this problem is not so well segments and then to analyze them separately. An alterna- defined. Indeed, the concept of “optimal” Lw is strictly tive comes out from the observation that DFS[m, n]isan linked to the problem at hand. For example, if one is in- ' 2 highly redundant representation; it is composed by Ls terested in tracking the “long term” evolution of a sig- elements that, however, have been obtained from only Ls nal rather than possible short “bursts”, then larger Lw’s data. Since Ls is the dimension of the signal space, at most represent the “best” choice. The contrary is true if the Ls elements can be linear independent. This means that, short term evolution is of interest. However, in certain ap- in principle, it is possible to recover arbitrary signals if plications (e.g. automatic signal detection) it should be samples of DFS[m, n] for at least Ls coordinates are avail- useful to have an “objective” method able to provide a able. According to this observation, Eqs. (13), (14) could “good” Lw without the intervention of an operator. Of be modified in course, that requires the adoption of an optimality crite- rion. At this regards, a possible choice is that Lw provides | |2 SDFS[m, n]= amn ; (15) the representation with the “best” global time-frequency resolution. In this case the following procedure is useful: amn =DSTFT[m∆M,n∆N] (16) – a set, say Λ, of integer values for Lw is fixed; LXs−1 ∗ − ∈ = w [k − m∆M] x[k]e j2πnk∆N/Ls – the SDFS[m, n]’s corresponding to the Lw Λ are k=0 calculated; 1128 R. Vio and W. Wamsteker: JTFA of non-stationary time series

Signal 60

1

0 Flux 50

−1

Power DFS 40

0.45

0.4 30

0.35 Item Count

0.3 20 0.25

0.2 Frequency (Hz) Power Spectrum 0.15 10

0.1

0.05 0 0 26 28 30 32 34 36 38 40 42 224 112 0 10 20 30 40 50 60 70 80 90 100 Window Length Time (s) Fig. 5. Histogram concerning the lengths L provided in Fig. 4. Grayscale image of the DFS[m, n] representation rel- w 100 simulations by the autocorrelation method described in ative to the chirp signal x(t)=sin[2πtν(t)] with ν(t)= − the text. The signal used in this simulation it is shown in the 4.0 × 10 3 × t +0.10 and 0 ≤ t ≤ 100 s, when sampled with uppermost panel of Fig. 4 and the target value is L = 31. a frequency rate of 1.0 Hz and contaminated with a discrete, w Gaussian white noise process with standard deviation equal to 0.3. Indeed, for saving computing time, this operation has to be carried out in the Fourier domain via the inverse double – for each of these SDFS[m, n] the two-dimensional au- Fourier transform of the bidimensional power-spectrum of tocorrelation function ACF[k, l] is computed: SDFS[m, n]. Although very fast, with this approach it is implicitly assumed that SDFS[m, n] is periodic with re- ACF[k, l] gards to both indices; this fact introduces edge effects. MX−1 NX−1 Fortunately, they can be avoided via zero padding; if in- = SDFS[m, n]SDFS[m + k, n + l] (18) terested in the correlation for lags as large as K along m=0 n=0 both dimensions, then both ends of SDFS[m, n]mustbe zero padded in such a way to obtain a matrix with dimen- with −M

−∞ Power Wigner−Ville Distribution Z+∞ 0.45

WVD(t, ν)dt =PS(ν). (20) 0.4

−∞ 0.35 In other words, the marginals with respect to frequency 0.3 and time have to provide, respectively, the squared signal 0.25 0.2 Frequency (Hz) and the corresponding power-spectrum. The form of this Power Spectrum function, that however is not the only one sharing proper- 0.15 ties (19) and (20), is given by (Flandrin 1999; Gro¨chenig 0.1 2001) 0.05 0 1600 800 0 20 40 60 80 100 120 WVD(t, ν) Time (s) Z+∞ Fig. 6. Grayscale image of the WVD[m, n] representation for = x(t + τ/2) x∗(t − τ/2) e−j2πντ dτ. (21) the time series shown in the uppermost panel of Fig. 3. −∞ In reality, the use of this JTFR has been rather limited last years, one of the most important subject in the field of because of two drawbacks: time-frequency analysis has been the research of methods able to reduce the importance of the interference terms –WVD(t, ν) can go negative. In fact, except very par- with the minimum deterioration of such properties. ticular cases, e.g., x(t) given by a Gaussian function, it will go negative for all signals. The non-positive prop- erty prohibits an energy interpretation of this repre- 2.5.2. The reassignment method sentation; It is possible to show (Flandrin 1999) that the spectro- – the quadratic nature of WVD(t, ν) determines the oc- gram FS(t, ν) is the 2D-convolution of the Wigner-Ville currence of non-linear effects when dealing with mul- distribution of the signal x(t), say WVDx(t, ν) with the ticomponents signals. Wigner-Ville distribution, say WVDw(t, ν) of the window This last point is particularly troublesome. In fact, if function w(t): x(t)=g(t)+r(t), it happens that FS(t, ν)

WVDx(t, ν) Z+∞ Z+∞ =WVD (t, ν)+WVD (t, ν) +2 · Re{WVD (t, ν)} − − | g {z r } | {zg,r } = WVDx(s, ξ)WVDw(t s, ν ξ)ds dξ. (24) Auto−Terms Cross−Terms −∞ −∞ (22) From this equation it is possible to see that the reason why where FS(t, ν), contrary to WVD(t, ν), does not present the in- terference terms is due to the fact that such terms are WVDg,r(t, ν) smoothed out by the convolution. Such operation, how- Z+∞ ever, deteriorates the time-frequency resolution. ∗ − − − = g(t + τ/2) r (t − τ/2) e j2πνt dτ. (23) A closer look at Eq. (24) shows that Ww(t s, ν ξ) delimits a time-frequency domain at the vicinity of the −∞ (t, ν) point, inside which a weighted average of the WVDx In general, the WVD(t, ν) corresponding to a signal con- values is performed. This is the reason of the bad time- taining N components will be characterized by N auto- frequency resolution of FS(t, ν). Indeed, the key point of terms, that constitutes the desired information, and 0.5 × the reassignment principle is that these values have no rea- N × (N − 1) real cross-terms which can be considered a son to be symmetrical distributed around (t, ν), which is sort of unwanted interference disturb. The worst point is the geometrical center of this domain; their average should that the intensity of such cross-terms is larger than those not be assigned at this point, but rather at the center of of the corresponding auto-terms. As shown in Fig. 6, that gravity of this domain, which is much more representa- can mean unreadable results. In spite of these problems, tive of the local energetic distribution of the signal. In WVD(t, ν) has very interesting properties not shared by practice, with the reassignment approach one moves each most of the other representations. For this reason, in these value of the spectrogram computed at any point (t, ν)to 1130 R. Vio and W. Wamsteker: JTFA of non-stationary time series

Signal of FS(t, ν). They start from the consideration that each 1 cross-term in WVD(t, ν) is located midway the pair of 0

Flux corresponding auto-terms and tends to highly oscillate in −1 both time and frequency directions. On the other hand, useful properties, such as the time marginal, frequency Power Reassigned Spectrogram marginal and instantaneous frequency, are obtained by av-

0.45 eraging WVD(t, ν). For example, the mean instantaneous

0.4 frequency νinst, a quantity that is commonly used for char- acterizing non-stationary signals, can be evaluated via: 0.35 Z 0.3

0.25 ν WVD(t, ν)dν Z · 0.2 νinst = (28) Frequency (Hz) Power Spectrum 0.15 WVD(t, ν)dν

0.1

0.05 These observations suggest that, if WVD(t, τ)canbe

0 decomposed as the sum of two-dimensional (time and 1600 800 0 20 40 60 80 100 120 Time (s) frequency) localized harmonic functions, then we could use the low-order harmonic terms to delineate the time- Fig. 7. Grayscale image of the FSr[m, n] representation for the dependent spectrum with reduced interference. High-order time series shown in the uppermost panel of Fig. 3. harmonics that introduce high oscillation have relatively small averages and thereby have negligible influence on ˆ another point (t, νˆ) which is the center of gravity of the the useful properties. The signal energy and useful prop- signal energy distribution around (t, ν): erties are mainly determined by a few low-order harmonic Z+∞ Z+∞ terms. On these bases, they propose the so called Gabor spectrogram GS representation s WVDw(t − s, ν − ξ)WVDx(s, ξ)dsdξ D ˆ −∞ −∞ XD t(t, ν)= ∞ ∞ Z+ Z+ GSD[k, l]= Pd[k, l] (29) d=0 WVDw(t − s, ν − ξ)WVDx(s, ξ)dsdξ −∞ −∞ where (25) X ∗ Pd[k, l]=2 amn amn Z+∞ Z+∞ ( Ad ) ξ WVDw(t − s, ν − ξ)WVDx(s, ξ)dsdξ − 0 2 − 0 2 × −[k ∆M(m + m )/2] − [l ∆N(n + n )/2] −∞ −∞ exp 2 2 νˆ(t, ν)= 2σk 2σl Z+∞ Z+∞  j2π 0 WVDw(t − s, ν − ξ)WVDx(s, ξ)dsdξ × exp [k(n − n )∆N L −∞ −∞ w    n + n0 (26) + l − ∆N ∆M(m0 − m) , (30) 2 and thus leads to a reassigned spectrogram, whose value at any point (t0,ν0) is the sum of all the spectrogram values where reassigned to this point: 0 0 0 0 Ad = {(m, m ), (n, n ); |m − m | + |n − n | = d}, (31) 0 0 FSr(t ,ν ) Z+∞ Z+∞ T√1 Lw√ 0 0 σk = ,σl = , (32) = FS(t, ν) δ(t − tˆ(t, ν)) δ(ν − νˆ(t, ν)) dt dν. 2 π T12 π −∞ −∞ 0 0 with 0 ≤ m, m ≤ Mtot − 1, −N/2 ≤ n, n ≤ N/2 − 1, (27) 0 ≤ k

−j2πmn∆M/N amn =e 2.5.3. Gabor spectrogram − LXw 1 ∗ − Recently, Qian & Chen (1996) have presented an interest- × w [k] x[k + m ∆M]e j2πkn/N , (33) ing technique for improving the time-frequency resolution k=0 R. Vio and W. Wamsteker: JTFA of non-stationary time series 1131

L =256, α=1, M=16, N=16 L =256, α=2, M=32, N=16 s s 0 ≤ m ≤ MT =[(Ls − Lw)/∆M +1],0≤ n

0.15 w[k] 0.15 0.1 0.1 3. Time-variant filtering of discrete signals Normalized g[k] 0.05 0.05 Although data exploration is an interesting application of 0 0 −100 −50 0 50 100 −100 −50 0 50 100 time-frequency analysis, the issue where such an approach k k shows all its potency is signal filtering (Matz & Hlawatsch Fig. 8. 2002; Hlawatsch & Matz 2002). Indeed, given the capa- Gaussian analysis window w[k] and the corresponding synthesis windows g[k] for different values of the oversampling bility of a JTFR in characterizing a non-stationary signal, rate α. For easyness of comparison, functions g[k] are normal- we may expect that a filtering technique, developed within ized to unit area. this framework, be able to adapt its action to the instanta- neous properties of the signal itself. Since it is much easier For oversampled GR, with α>2L /(L +∆M), the to treat this subject in the context of the linear represen- w w system of linear Eq. (36) is underdetermined. In other tations (WVD, GS and FS are bilinear representations D r words, g[k]andw[k] do not determine each other in a carrying no phase information in a form easy to use), we unique way. Hence, for a given g[k] there is a certain free- shall concentrate in our discussion on the use of the Gabor transform. dom in the choice of the corresponding dual complemen- tary w[k]. The solution suggested by Qian & Chen (1996) is to solve system (36) with the constraint, say Γ, that the 3.1. Some preliminary notes shapes of g[k]andw[k] be as close as possible (in least squares sense): We have seen that the GR of a discrete signal corresponds L −1   to map a one-dimensional array onto a two dimensional Xs w[k] g[k] 2 matrix. Of course, in order to benefit of this kind of rep- Γ=min − . (37) kwk kgk resentation for signal filtering purposes, it is necessary to k=0 be able to do the reverse job. It can be shown that the This choice turns out to correspond to the solution of the relationship between a discrete signal x[k] and the corre- system (36) via a pseudo-inverse method. For example, by sponding GR is given by (Gro¨chenig 2001) supposing g[k] fixed and rewriting system (36) in a matrix MX−1 NX−1 notation, j2πkn/N x[k]= a g[k − m∆M]e (34) ∗ mn Hw = u, (38) m=0 n=0 the standard synthesis function w, i.e., the window satis- with 0 ≤ k

− g[k] are univocally determined. However, according to the LXs 1 ∗ − ∆M Balian-Low theorem, both the functions cannot be simul- w [k] g[k + pN]e j2πkq/∆M = δ δ (36) N p q taneously well localized in time and frequency (see, for ex- k=0 ample, Mallat 1998). In the practical application, the con- with −∆N

0.36 1250 (a) 1000

750 95 99 100

0.32 ← ← ← 500 Item Count 250 0.28 0 0 0.5 1 1.5 2 2.5 | a | mn

0.24 800 (b)

STD of the True Residuals 600 95 99 100 ← ← ← 0.2 400 Item Count 200

0.16 0 0 0.25 0.5 0.75 1 1.25 1.5 0 0.5 1 1.5 2 2.5 γ | a | a mn Fig. 9. Dependence of the standard deviation (STD) of the Fig. 10. a) Histogram of the |amn| coefficients for a realiza- residuals between the true and the reconstructed signals on tion of a noise process with the same characteristics of that the value of the noise threshold γa. The signal used in this contaminating the signal of Fig. 4. b) Histogram of the |amn| example is that shown in the uppermost panel of Fig. 4. coefficients for the signal of Fig. 4. The three arrows indicate the 95%, 99%, 100% empirical percentiles relative to the his- 3.2. Time–variant signal denoising togram of the simulated noise process and provide three pos- sible choices for γa. AsshowninFig.9,thechoiceofγa is an important step in a filtering procedure. Unfortunately, without substantial a priori information (typical situation in many applica- The good reason for this is that g[k] is often chosen to tions), there are no so many ways to select the “optimal” have a good localization both in time and frequency. If value for this threshold. w[k] would not be forced to share this property it might In many practical situations the signal of interest is be badly localized along one or both of these two dimen- contaminated by a broad–band random noise (white– sions. The consequence is that the synthesis ofx ˆ[k], via noise). The key point is that such a noise tends to the coefficienta ˜mn, could provide different results from spread evenly over the entire joint time-frequency domain, what expected; whereas the signal component is usually concentrated in Here it is necessary to stress that, in general, the set a relatively small region (e.g., see Fig. 4). This fact sug- of modified coefficientsa ˜mn will not constitute a valid gests a simple procedure, similar to the hard thresholding GR representation of any signal whatever. Indeed, it hap- method introduced by Donoho for wavelets (see, for ex- pens that ample, Donoho 1995), to filter out the noise contribution: MX−1 NX−1 1) determination of those indices for which the corre- j2πkn/N xˆ[k] =6 a˜mn g[k − m∆M]e . (40) sponding coefficients amn may be considered as con- m=0 n=0 tributed by the noise. This step can be carried out in several ways according to the problem at hand. A typ- In particular the coefficientsa ˆmn, obtained fromx ˆ[k]via ical solution consists in setting a threshold value, say transform (35), can be different from zero even for those 2 | |2 γa, for the elements amn of SDFS[m, n]. The idea is indices m0,n0, for whicha ˜m0n0 = 0. The reason is that that the coefficients amn of GR, whose absolute value coefficients amn are obtained via a convolution operation is smaller than γa, can be assumed unaffected by the and therefore no physical signal can correspond to a rep- signal contribution; resentationa ˜mn characterized by the sharp cutoffs created 2) masking (zeroing) of such coefficients in order to obtain by the masking procedure. In any case, it can be shown a “cleaned” version of the coefficients, denoted bya ˜mn, (see Appendix A) that, if w[k]andg[k] satisfy the con- equal to zero in areas only occupied by noise; straint (37),x ˆ[k] is a signal whose Gabor coefficientsa ˆmn 3) synthesis of the filtered signalx ˆ[k] via Eq. (34). are as close as possible (in least squares sense) toa ˜mn. A last note regards the fact that, in certain situations, Although this approach is fairly simple, its application re- the zeroing of coefficients a can constitute a too drastic quires some care in order to avoid unpleasant side effects. mn operation. In fact, it can introduce some spurious struc- In particular, one point is important: tures in the reconstructed signals. In this case, a possible once the analysis function g[k] is fixed, it is advis- alternative is represented by the weighting of coefficients able that the synthesis function w[k] be calculated amn, according to the expected level of noise contamina- via Eq. (39). In other words, the shapes of g[k]and tion, in a way similar to that suggested by Donoho (1995) w[k] have to be as close as possible. (soft thresholding) in the context of wavelets. R. Vio and W. Wamsteker: JTFA of non-stationary time series 1133

20 3.2.1. Practical determination of an “optimal” γa (a) Apart the very common assumption of Gaussianity, the 15 typical information available about the noise is an esti- 10 mate of its standard deviation. Conversely to the case of 5 classical Fourier analysis, the statistical characterization Number of Iterations of a pure random Gaussian process in the time-frequency 0 0 0.25 0.5 0.75 1 1.25 1.5 domain is much more complex. The problem is that coef- γ a ficients a are not independent. This situation suggests mn 0.36 that the simplest (and safest) way to determine the thresh- 0.32 (b) old γa is the following: 0.28 – simulation of a noise process (not necessarily 0.24 Gaussian) via a pseudo-random generator; – calculation of the corresponding SDFS[m, n] with the 0.2

same modalities used for the calculation of the GR of STD of the True Residuals 0.16 0 0.25 0.5 0.75 1 1.25 1.5 γ the signal; a – estimation of a simple statistical index (e.g. empiri- Fig. 11. a) Number of iteration for the Qian & Chen method cal percentiles), describing the characteristics of such providing the best reconstruction of the signal of Fig. 4 as func- a representation, that provides the desired threshold tion of the noise threshold γa. b) Standard deviation (STD) for the SDFS[m, n] of the signal. of the corresponding residuals between the true and the recon- structed signals. Although very simple, in our numerical simulations this method has proved to be rather solid (see Fig. 10 to compare with Fig. 9). An alternative to the last step could be the calculation of a set of filtered time series operation for which at the n-th iteration xˆ[k] corresponding to different choices of γa. After that, 2 2 among these solutions it is chosen the time series that sat- kxn[k]k < kxn−1[k]k . (41) isfies some statistical requirements as, for example, that the standard deviation of the residuals between the orig- In other words, the total energy of the filtered signal is inal and the filtered signal is equal to the nominal value continuously reduced with the number of iterations. At of noise level. Although appealing, from our numerical ex- first sight, one could accept this fact as useful since ef- periments this approach appears to be not so stable. fectively the noise tends to increase the power of an ob- In the case one lacks also the information regarding the served time series. Unfortunately, such an increase regards standard deviation of the noise, the simplest solution con- the global characteristics of a signal whereas the procedure sists in considering directly the SDFS[m, n] of the signal suggested by Qian & Chen is applied to the coefficient amn and then in determining the threshold interactively by de- that reflect its local properties. In other words, given an limiting, via some graphical facility, the region where the observed Gabor coefficient, the only thing one is allowed signal is supposed to give its contribution. Another possi- to claim is that, because of the noise contamination, it bility is the use of some statistical indices of the values of will be different from the true one. For a correct filter- SDFS[m, n] (e.g. percentiles). ing, however, we should be able to decide at least whether the contaminated coefficient is larger or smaller than the uncontaminated one. 3.2.2. Could the SNR be further improved? The only situation where the Qian & Chen procedure The synthesis ofx ˆ[k], via the coefficientsa ˜mn, permits to can be expected to work is when γa is chosen too low. In filter out the noise contribution corresponding to the coef- situations like this one, a large fraction of the unmasked ficients amn that have been zeroed. However, what about coefficient amn could be erroneously assumed linked to the the contamination of the remaining coefficients? In their signal. In general, these coefficients are smaller, in abso- book, Qian & Chen (1996) suggest a method that, accord- lute value, than the coefficients effectively reflecting the ing to their opinion, should further improve the SNR of signal contribution. Consequently, the contraction con- the final results. Their idea is very simple and consists in nected with the Qian’s & Chen’s technique can be able an iterative application of the procedure described in the to smooth out them before the signal component be too previous section. In other words, once obtained the filtered negatively affected. However, we stress that, contrary to xˆ[k], to this signal is applied the same procedure used for what claimed by the two authors, the improvement is due x[k]andsoon. to the removal of Gabor coefficients still related to the The most serious problem of this method consists in noise rather that a better estimate of the coefficients rep- the lack of theoretical argumentation that guarantee its resenting the contribution of the signal. Better results are reliability. Indeed, the repeated application of the mask- to be expected with a more appropriate choice of γa.All ing procedure corresponds to a contraction, namely to an that is shown in the Fig. 11 to compare with Fig. 9. 1134 R. Vio and W. Wamsteker: JTFA of non-stationary time series

Signal Observed Signal 2 2 1

0 Flux 0 −1 Flux

−2 Power DFS First Component 2 0.45

0.4 0

0.35 Flux

0.3 −2 0.25 Second Component 0.2 2 Frequency (Hz) Power Spectrum 0.15

0.1 0 Flux

0.05

0 −2 457 229 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Time (s) Time (s)

Fig. 12. Contour plot of the DFS[m, n] representation for the Fig. 13. Observed signal, first component, and second compo- signal mixture, given by two components disjoint in the time- nent corresponding to the DFS[m, n] representation of Fig. 12. frequency domain, described in the text. Levels are uniformly Continuous line indicates the original signals whereas crosses distributed between the 10% and 90% of the peak value. The indicate the corresponding reconstructions obtained via the dotted line delimits the two regions of the time-frequency do- method described in the text. main that have been used to separate the components shown in Fig. 13. x1(t)andx2(t) can be separated only if they do not in- terference and the total energy is conserved. From the 3.3. Signal separation statistical point of view this condition means that the two components are uncorrelated. In fact, if x (t)andx (t) In certain practical situations it can happen that a given 1 2 are supposed, without loss of generality, zero mean pro- signal is the mixture of two or more components. Often cesses, then the integrals in Eq.(42) provide the variance it can be of interest to separate the various contribu- of the signals. However, it is well known that tions. In this respect, it can be shown (Hopgood 2000, and Appendix B) that, within the context of linear filtering, σ2 = σ2 + σ2 +2σ (43) x(t) x1(t) x2(t) x1(t)x2(t) this operation is possible (to a given degree of accuracy) only when a representation of signal is available where where the last term provides the covariance between x1(t) and x (t). Therefore, Eq. (42) implies that σ =0. the various components are disjoint. For example, in the 2 x1(t)x2(t) Fourier domain, two components are “perfectly” separa- From the above discussion it is evident that, in the ble if their power-Spectra do not overlap. Therefore, in the separation of signals overlapping in the time-frequency do- context of the joint time-frequency representations, it can main, things have to be expected much more complex. For be expected that separability of signal mixtures is possi- example, Fig. 14 shows the SDFS[m, n] corresponding to ble if the components do not overlap in the time-frequency three signals, sampled with a frequency of one Hertz, that domain (see Figs. 12 and 13). A simple and yet effective have been obtained through the following models: procedure is the same as that presented in Sect. 3.2: each x(t)=sin[2πtνa(t)] + sin[2πtνb(t)+φi] (44) component is reconstructed by zeroing the Gabor coeffi- −3 cients corresponding to the remaining ones. with i =1, 2, 3, νa(t)=4.0 × 10 × t +0.10, νb(t)= −3 In order to understand the potential but also the limits −3.5×10 ×t+0.45, 0 ≤ t ≤ 100 s, and φ1 =0,φ2 = π/2, of such an approach, it is necessary to understand what and φ3 = π rad. In practice, the only difference lies in the does it mean the fact that two signal are disjoint in the phase φ of the second component. The key point to note time-frequency domain. To discuss this problem, let sup- is that, although the SDFS[m, n]’s are different, these sig- pose to have a signal x(t) given by the sum of two compo- nals are characterized by the same IPS(t, ν) (see Fig. 14). nents, say x1(t)andx2(t). If SDFS[m, n] is interpreted as In other words, in this specific case the equality (42) does an energy distribution, no overlap in the time-frequency not hold. Of course, that is the consequence of the interfer- plane implies that ence of the components present in x(t). Unfortunately, the Z Z Z details of such a interference will depend not only on the 2 2 2 |x(t)| dt = |x1(t)| dt + |x2(t)| dt. (42) amplitudes of the components at a given time instant, but also on the corresponding phases. Therefore, their separa- In other words, the energy of the observed signal is equal tion should require that each coefficient amn be able to to the sum of the energies corresponding to the single com- provide information about four parameters. At least of ponents. This point is very important since indicates that some a priori information (e.g. the value of the phases), R. Vio and W. Wamsteker: JTFA of non-stationary time series 1135

° ° Phase = 0 Phase = 45 Signal 0.5 0.5 1 0.4 0.4 0 Flux 0.3 0.3 −1

0.2 0.2

Frequency (Hz) Frequency (Hz) Power Scalogram 0.1 0.1 0.5

0 0 0.45 0 20 40 60 80 100 0 20 40 60 80 100 0.4

° Phase = 90 Ideal Power Spectrum 0.35 0.5 0.5 0.3

0.4 0.4 0.25

0.3 0.3 0.2 Frequency (Hz) Power Spectrum 0.15 0.2 0.2

Frequency (Hz) Frequency (Hz) 0.1 0.1 0.1 0.05 0 0 0 20 40 60 80 100 0 20 40 60 80 100 1600 800 0 20 40 60 80 100 120 Time (s) Time (s) Time (s)

Fig. 14. Grayscale image of the DFS[m, n] representation for Fig. 15. Scalogram for the time series shown in the uppermost three chirps described in the text and the corresponding ideal panel of Fig. 3. power-spectrum IPS(t, ν). this is clearly an underdetermined problem. The situation (or width) but changes its shape due to frequency modu- becomes worst in case the number of overlapping signals lation, in wavelet analysis when the scale factor a is varied is larger than two. Ψ(t, a) maintains its shape but changes its duration and One time more, these facts suggest that in order to consequently its bandwidth. The practical consequence is obtain some insights on the characteristics of a complex that, while FS(t, ν) is characterized by a time-frequency system, in general, the analysis of a signal, if not developed resolution constant over the entire time-frequency domain, within a given physical context, is absolutely insufficient. SCL(t, a) presents a better time resolution at high fre- quencies but a lower resolution at low frequencies (see Fig. 15 to compare with Fig. 3). At this point, the natural 4. JTFR better than wavelets? question arises concerning which choice, between the time- Joint time-frequency analysis does not constitute the frequency or wavelet approach, is the most appropriate only possible tool for the analysis of non-stationary sig- one in the context of the analysis of astrophysical signals. nals. Nowadays the so-called Wavelet Analysis arises as Of course, the answer strictly depends on the problem at a new alternative to JTFA. In the continuous case the hand. In particular, it is useful to have a representation (Scalogram) of a signal x(t) is given by: characterized by a non-constant time-frequency resolution when the signal of interest presents a non-stationary evolu- SCL(t, a)=|WT(t, a)|2 (45) tion on very different time scales as, for example, in case of non-stationary self similar (fractal) signals. On the other with W (t, a) being the continuous given hand, especially in problems of exploratory data analysis, by when the time behavior of a signal is dominated by one Z+∞   or more non-stationary processes which evolve on similar 1 s − t WT(t, a)=p x(s)Ψ∗ ds. (46) time scales (e.g. a periodic process that changes slowly its |a| a period), then it is by far better to use a representation −∞ with a constant time-frequency resolution since it permits HereR Ψ(t) is a function, named wavelet, with the property an easier tracking of the changing characteristics of the that Ψ(t) = 0. The idea behind such an approach is signal. In the astrophysical applications, very often this is that, if Ψ(t) is centered in the neighborhood of t =0and the situation of interest. has a Fourier transform centered at a frequency ν0 =0,6 It is necessary to mention that, especially in the field then in the time-frequency plane Ψ [(t − b)/a] will be cen- of discrete wavelets, the above indicated limitations have tered at t = b and ν = ν0/a, respectively. Consequently, been overcome in the sense that some approaches have SCL(b, a) will reflect the signal behavior in the vicinity of been devised that permit a large flexibility in the choice of such time instant and frequency. The main difference with the resolution across the time-frequency plan (e.g., wavelet FS(t, ν) is that the variable a corresponds to a scale fac- packets, local cosine bases, cf. Mallat 1998). In spite of tor, in the sense that taking |a| > 1 dilates Ψ and taking that, in the exploratory analysis of the typical astrophys- |a| < 1 compress it. Therefore, in contrast to the analysis ical signals the joint-time frequency approach maintains function w(t) in JTFR which maintains a fixed duration its supremacy since it is more intuitive and with better 1136 R. Vio and W. Wamsteker: JTFA of non-stationary time series

Time (s) Original DFS

30 30 25

20 25 15

20 Frequency (Hz) 10

5

15 0

Frequecy (Hz) 0 0.5 1 1.5 Time (s) 10 Filtered DFS

30 5 25

0 20

15

Frequency (Hz) 10

5

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 Time (s) Fig. 16. GrayscaleimageoftheDFSrepresentationofa3s segment extracted from the X-ray light curve of SCO-X1 de- Fig. 18. Original vs. filtered DFS representation of a 1.5sseg- scribedinthetext. ment extracted from the X-ray light curve of SCO-X1 described in the text. Filtering has been carried out via the method de- Time (s) scribed in Sect. 3.2 with γa corresponding to the 95% percentile of the DFS coefficients. 30

25 1.5 Original Filtered 20

1 15 Frequecy (Hz)

10 0.5

5 Flux

0 0

−0.5

0.5 1 1.5 2 2.5 3 Fig. 17. Ridges of the grayscale image 16. −1 0 0.5 1 1.5 Time (s) developed theoretical backgrounds as far as the interpre- tation is concerned. Indeed, while JTFA essentially makes Fig. 19. QPO component vs. original time series obtained via the filtering shown in Fig. 18. use of localized sinusoids, the shape of the wavelets is much more difficult to explain from the physical point of view. Consequently, the results provided by the time- frequency approach are more amenable for a direct phys- ical interpretation. step of 125 microsecond. As Sco X-1 shows quasi-periodic oscillations (QPO) in the X-ray light emission, its light 5. Practical applications curve is characterized a periodic component that changes its period over time. Therefore, an important experimen- To illustrate the analysis strengths of JFTA in astronomi- tal activity consists in tracking such changes. Figs. 16, 17 cal applications, we have applied some simple JFTA algo- show that even a very simple JTFR as DFS is able to rithms to a sequence of X-ray observations of low mass X- provide very interesting results, whereas Figs. 18, 19 show ray binary Sco X-1 (van der Klis, priv. comm.) made with how it is possible to extract the component of interest the Proportional Counter Array (PCA) on the Rossi-XTE from the observed signal. Although still preliminary and spacecraft (Bradt et al. 1993). The time series used, shown partial, this example represents a good illustration of the in Fig. 16 (lower panel), is a 3 s subset of the available se- possibilities of the application of the technique to astro- quence that is 1500 s long. Sampling is regular with a time physical (and other) time series. R. Vio and W. Wamsteker: JTFA of non-stationary time series 1137

6. Summary and conclusions is the pseudo-inverse matrix of Gw with dimensions Ls × (MN). In this paper we have considered the joint time-frequency Here the key point is that, if in Eq. (A.1) we suppose domain as a possible framework for the analysis of the a and G fixed and x unknown, then we are dealing with astrophysical signals. The methodologies based on such w a system of M × N equations in L unknowns. For over- a domain appear very interesting since they allow reli- s sampledGRthenumberM × N is larger than L and able analyses and filterings of non-stationary time series s therefore the system of Eq. (A.1) is overdetermined. In that are very difficult, if not impossible, to carry out in this case, it can be shown that Eq. (A.5) delivers the cor- the standard Fourier domain. These methodologies appear responding least squares solution (Bjorck 1990). The same also useful for the separation of signal mixtures. still holds also when a is not in the range of Gw.Thathap- Acknowledgements. We thank Prof. M. van der Klis for pens exactly, for example, when the set of amn does not the availability of the unpublished data on Sco X-1 and constitutes a valid GR for x[k]. In this case Eq. (A.1) has Prof. H. Feicthinger for his useful comments. to be rewritten as LS a˜=Gwxˆ (A.7) Appendix A: Masking of the Gabor Coefficients as Least Squares Filtering in order to highlight that the equality has to be intended in least squares sense. However,x ˆ[k] will be characterized Transforms (34) and (35) can be rewritten as matrix mul- by a GRa ˆmn = Gwxˆ. Consequently, Eq. (A.7) can be tiplications. In particular, the Gabor transform (35) de- rewritten as scribed as a linear mapping from the signal space to the coefficient space CMN takes the form a˜LS= aˆ. (A.8) ∗ a = Gwx (A.1) The meaning of this last expression is that, althougha ˜mn is not a valid GR,x ˆ[k] synthesized via Eq. (A.5) corre- where a =vec{a } and G is an (MN) × L matrix mn g s sponds to a signal whose GR is as close as possible toa ˜   mn  w∗  in least squares sense.    T w∗   ∆M   T w∗   2∆M  Appendix B: Signal separability  .   .   .  In this section we briefly present the conditions according  ∗   TM∆M w  to which, given an observed signal x(t)=d(t)+n(t), the  ∗   M∆N w  component of interest d(t) and the unwanted component  ∗   M∆N T∆M w  n(t) can be separated. M T w∗ Gw = ∆N 2∆M (A.2) Asignalx(t) may be represented on an arbitrary do-  .   .  main λ by the following general integral transform:  .   ∗  Z  M∆N TM∆M w    ∀ ∈  M2∆N g  X(λ)= x(t) K(t, λ)dt λ Λ(B.1)  ∗  T  M2∆N T∆M w   .   .  Z  .   ∗  ∀ ∈  MN∆N T(M−1)∆M w  x(t)= X(λ) k(λ, t)dλ t T (B.2)  ∗  Λ MN∆N TM∆M w where Λ is the region in the signal space that the represen- where Mq and Tp are, respectively, the modulation and the tation X(λ) lies in1. Here the functions K(t, λ), called di- translation operators rect transform kernel,andk(λ, t), called the inverse trans- (M w)[k]=w[k]ei2πqk/Ls form kernel, are linked by the relationships: q (A.3) Z (Tp w)[k]=w[k − p], k(λ, t) K(τ,λ)dλ = δ(t − τ)(B.3) and Λ

i2πqk/Ls Z (MqTp w)[k]=w[k − p]e . (A.4) k(λ, t) K(t, λˆ)dt = δ(λ − λˆ). (B.4) Since Gw is a full rank matrix, from Eq. (A.1) it easy to T show that the transform (34) can be expressed in the form For example in case of the Fourier transform λ is equal † − x = Gwa, (A.5) to the Fourier frequency ν, K(t, ν) = exp( i2πνt)and k(ν, t)=K∗(t, ν). where 1 In case of stochstic signals, integrals are to be interpreted † ∗ −1 ∗ Gw =(GwGw) Gw (A.6) in a mean-square limit. 1138 R. Vio and W. Wamsteker: JTFA of non-stationary time series

If signal d(t) has to be reconstructed by linear filtering –d(t)andn(t) are uncorrelated signals; of the observed signal x(t) (i.e., by making use of linear – σ2(t)=0. operators), then it has to hold (see, for example, Hopgood 2000) In other words, the perfect separation of two signals, Z via a linear filtering, requires the following condition be d(ˆ t)= h(t, τ) x(τ)dτ, (B.5) satisfied: T there exists some domain where the generalized where h(t, τ) is the response at time t given an impulse spectral representation of the desired components occurred at the filter input at time τ. A sufficient condition are disjoint. Thus, the desired components must be for the existence of h(t, τ)isthatitcanbewrittenin uncorrelated processes. the form Z All that holds also in the context of joint time-frequency h(t, τ)= k(λ, t) K(τ,λ)dλ analysis. The only different thing is that the function Z Z Λd (B.6) h(t, τ), with arguments in the time domain, has to be ˆ ˆ ˆ + H0(λ, λ) k(λ, t) K(τ,λ)dλ dλ. changed, via a Fourier transform with respect to τ,in Λ0 Λ0 H(t, ν) which has arguments in the mixed time-frequency Here Λd is the region of the Λ space over which the spectral domain. component of d(t) does not overlap the spectral compo- nent of n(t), whereas Λ0 is the spectral region where these two components overlap. H (λ, λˆ) is the solution of References Z 0 Bradt, H. V., Rotschild, R. E., & Swank, J. H. 1993, A&AS, Pdx(λ, λˆ)= H0(λ, λ˜) Pxx(λ,˜ λˆ)dλ,˜ ∀ λ, λˆ ∈ Λ0 (B.7) 97, 355 Λ0 Bjorck, A. 1990, Numerical Methods for Least Squares where Problems (SIAM, Philadelphia)  ∗ ˆ Chassande-Mottin, E. 1998, Ph.D. Thesis, University of Cergy- ˆ Y (λ)Z (λ)ify(t), z(t) deterministic; Pointoise Pyz(λ, λ)= ∗ (B.8) E[Y (λ)Z (λˆ)] if y(t), z(t) stochastic; Cohen, L. 1995, Time-Frequency Analysis (Prentice Hall: London) is the generalized cross power-spectrum concerning to the Daubechies, I. 1992, Ten Lectures on Wavelets (SIAM: processes y(t)andz(t). It can be shown (Hopgood 2000) Philadelphia) ˆ that the reconstruction d(t), as provided by Eqs. (B.5) Donoho, D. L. 1995, IEEE Transaction on , and (B.6), is characterized by a variance σ2(t)=E[|d(ˆ t)− 41, 613 d(t)|2] equal to Duschl, W. J, Wagner, S. J., & Camenzind, M. 1991, Z Z Variability of Active Galaxies (Springer-Verlag: Berlin) 2 ∗ σ (t)= Pσσ(λ, λˆ) k(λ, t) k (λ,ˆ t)dλ dλˆ (B.9) Feichtinger, H. G., & Strohmer, T. 1998, Gabor Analysis and Λ0 Λ0 Algorithms: Theory and Applications (Birkh¨auser: Boston) Flandrin, P. 1999, Time-Frequency/Time-Scale Analysis with P the generalized power-spectrum of σ2(t). σσ (Academic Press, London) Here the important points are three: Gro¨chenig, K. 2001, Foundations of Time-Frequeny Analysis 1) the first term in the rhs of Eq. (B.6) is independent (Birkh¨auser: Boston) of any signal. Moreover, it constitutes a so called ideal Hlawatsch, F., & Matz, G. 2002, in Time-Frequeny Signal filter for Λ , namely a filter which passes without dis- Analysis and Processing, ed. B. Boashash (Prentice Hall: d London) torsion all generalized frequencies components falling Hopgood, J. R. 2000, Ph.D. Thesis, University of Cambridge in Λd and rejects all others; Mallat, S. 1998, a Wavelet Tour of Signal Processing (Academic 2) the second term in the rhs of Eq. (B.6) depends, via Press, London) Eq. (B.7), on the crosscorrelation function between Matz, G., & Hlawatsch, F. 2002, in Applications in x(t) and d(t) and therefore, since x(t)=d(t)+n(t), Time-Frequency Signal Processing, ed. A. Papandreu- on the crosscorrelation between d(t)andn(t); Suppappola (CRC Press: Boca Raton) 3) the magnitude of σ2(t), given by Eq. (B.9), depends Miller, H. R., & Wiita, P. J. 1991, Variability of Active Galactic on the extension of the overlap between d(t)andn(t). Nuclei (Cambridge University Press: Cambridge) Munk, F. 1997, Ph.D. Thesis, Aalborg University This last point is particularly interesting since it indicates Munk, F. 2001, compendium for the course “Joint Time that signals d(t)andn(t) can be perfectly separated only Frequency Analysis”, SB8-2, Aalborg University when they have disjoint supports in Λ. Indeed, in such a Qian, S., & Chen, D. 1996, Joint Time-Frequency Analysis case, Λ0 is an empty set and the three points above imply, (Prentice Hall: London) respectively, that: Stankovic, L., & Katkovnik, V. 1999, IEEE Trans. of Signal Processing, 47, 1053 – Eq. (B.6) simplifies in Stankovic, L. 2001, Signal Processing, 81, 621 Z Vio, R., Cristiani, S., Lessi, O., & Provenzale, A. 1992, ApJ, h(t, τ)= k(λ, t) K(τ,λ)dλ; (B.10) 391, 518 Λd