Foundations - 2
Periodicity Detection, Time-series Correlation, Burst Detection
Temporal Information Retrieval Time Series
• An ordered sequence of values (data points) of variables at equally spaced time intervals Periodicity Detection
• How does one identify periodic values Periodicity Detection
• Time-series is in the time domain
• Method1 (DFT): Identify the underlying periodic patterns by transforming into the frequency domain
• Method 2 (Autocorrelation) Correlate the signal with itself
Find dominant frequencies is the Discrete Fourier Transform of the sequence .
We may write this equation in matrix form as:
. . . . . Fourier Transform.
where and etc. . • A signal has an amplitude amplitude (strength), DFT – example Let the continuous signal be frequency (periodicity) frequency and phase (offset) phase
dc 1Hz 2Hz
10 • Fourier Transform 8 6 converts a signal from 4 2 the time domain to the 0 −2
frequency domain −4 0 1 2 3 4 5 6 7 8 9 10
Figure 7.2: Example signal for DFT.
Let us sample at 4 times per second (ie. = 4Hz) from to . The values of the discrete samples are given by:
by putting
84 Discrete Fourier Transform (DFT)
This work targets similar applications and provides coefficients to record, we can perform a variety of tasks • A Fourier analysisto olsis thata method can significantly for ease expressing the “mining” of a useful functionsuch as compression, as a sum denoising, of periodic etc. information. Specifically, this paper makes the following components, andcontributions: for recovering the function from those components.Signal & Reconstruction 1. We present a novel automatic method for accurate • periodicity detection in time-series data. Our algorithm When both the functionis the first oneand that its exploits Fourier the information transform in both are replaced with discretized counterparts,periodogram and it is autocorrelation called the to provide discrete accurate Fourier transform (DFT). periodic estimates without upsampling. This work targets similar applications and provides coefficients to record, we can perform a variety of tasks tools that can significantly ease the “mining” of useful such as compression, denoising, etc. 2. We introduce new periodic distance measures that information. Specifically, this paper makes the following exploit the power of the dominant periods, as provided contributions: Signal & Reconstruction by the Fourier Transform. By ignoring the phase infor- Fourier Coefficients mation we can provide more compact representations, fourier transform f 1. We present a novel automatic method for accurate that also capture similarities under time-shift transfor- 4 f periodicity detection in time-series data. Our algorithm mations. 3 is the first one that exploits the information in both f 3. Finally, we present comprehensive experiments 2 periodogram and autocorrelation to provide accurate demonstrating the applicability and efficiency of the f periodic estimates without upsampling. 1 proposed methods, on a variety of real world datasets f 0 2. We introduce new periodic distance measures that (online query logs, manufacturing diagnostics, medical exploit the power of the dominant periods, as provided data, etc.). inv. fourier transform Figure 1: Reconstruction of a signal from its first 5 by the Fourier Transform. By ignoring the phase infor- Fourier Coefficients Fourier coefficients mation we can provide more compact representations, 2 Backgroundf that also capture similarities under time-shift transfor- 4 We provide a brieff introduction to harmonic analysis mations. using the discrete3 Fourier Transform, because we will 2.2 Power Spectral Density Estimation. In or- f der to discover potential periodicities of a time-series, 3. Finally, we present comprehensive experiments Advantages ofuse these DFT tools as apart2 the building blocks from of our algorithms.periodicity detection ? demonstrating the applicability and efficiency of the f one needs to examine its power spectral density (PSD 1 or power spectrum).denoising, The PSD compression essentially tells us how proposed methods, on a variety of real world datasets 2.1 Discrete Fourierf Transform. The normalized 0 much is the expected signal power at each frequency (online query logs, manufacturing diagnostics, medical Discrete Fourier Transform of a sequence x(n),n = of the signal. Since period is the inverse of frequency, data, etc.). 0, 1 ...N − 1 is a sequence of complex numbers X(f): Figure 1: Reconstruction of a signal from its first 5 by identifying the frequencies that carry most of the N 1 2 Fourier coefficients 1 − j πkn X(f )= x(n)e− N , k = 0, 1 ...N − 1 energy, we can also discover the most dominant peri- 2 Background k/N √N n=0 ods. There are two well known estimators of the PSD; We provide a brief introduction to harmonic analysis the periodogram and the circular autocorrelation. Both 2.2 Power Spectral Density Estimation.where the subscriptIn or- k/N denotes the frequency that using the discrete Fourier Transform, because we will each coefficient captures. Throughout the text we will of these methods can be computed using the DFT of der to discover potential periodicities of a time-series, use these tools as the building blocks of our algorithms. also utilize the notation F(x) to describe the Fourier a sequence (and can therefore exploit the Fast Fourier one needs to examine its power spectral density (PSD Transform. Since we are dealing with real signals, the Transform for execution in O(N log N) time). or power spectrum). The PSD essentially tells us how 2.1 Discrete Fourier Transform. The normalized Fourier coefficients are symmetric around the middle Discrete Fourier Transform of a sequence x(n),n = much is the expected signal power atone each (or frequency to be more exact, they will be the complex 2.2.1 Periodogram Suppose that X is the DFT of 0, 1 ...N − 1 is a sequence of complex numbers X(f): of the signal. Since period is the inverseconjugate of frequency, of their symmetric). The Fourier transform a sequence x. The periodogram P is provided by the by identifying the frequencies that carry most of the N 1 j2πkn represents the original signal as a linear combination of squared length of each Fourier coefficient: 1 − 2 N energy, we can also discover the most dominant peri- j πfn/N X(fk/N )= √ x(n)e− , k = 0, 1 ...N − 1 e 2 N 1 N the complex sinusoids sf (n)= . Therefore, the P(f )=∥X(f )∥ k =0, 1 ...⌈ − ⌉ n=0 ods. There are two well known estimators of the PSD; √N k/N k/N 2 Fourier coefficients record the amplitude and phase of the periodogram and the circular autocorrelation. Both Notice that we can only detect frequencies that are at where the subscript k/N denotes the frequency that these sinusoids, after signal x is projected on them. each coefficient captures. Throughout the text we will of these methods can be computed usingWe the can DFT return of from the frequency domain back to most half of the maximum signal frequency, due to the also utilize the notation F(x) to describe the Fourier a sequence (and can therefore exploitthe the time Fast domain, Fourier using the inverse Fourier transform Nyquist fundamental theorem. In order to find the k 1 Transform. Since we are dealing with real signals, the Transform for execution in O(N log NF)− time).(x) ≡ x(n): dominant periods, we need to pick the k largest values 1 Fourier coefficients are symmetric around the middle of the periodogram. N 1 2 1 − j πkn one (or to be more exact, they will be the complex 2.2.1 Periodogram Suppose that X isx(n the)= DFT ofX(fk/N )e N , k = 0, 1 ...N − 1 √N 1 conjugate of their symmetric). The Fourier transform a sequence x. The periodogram P is provided byn the=0 Due to the assumption of the Fourier Transform that the data is periodic, proper windowing of the data might be necessary for represents the original signal as a linear combination of squared length of each Fourier coefficient:Note that if during this reverse transformation we j2πfn/N achieving a more accurate harmonic analysis. In this work we will e 2 discardN some1 of the coefficients (e.g., the last k), then the complex sinusoids sf (n)= . Therefore, the P(f )=∥X(f )∥ k =0, 1 ...⌈ − ⌉ sidestep this issue, since it goes beyond the scope of this paper. √N k/N k/N the outcome2 will be an approximation of the original Fourier coefficients record the amplitude and phase of However, the interested reader is directed to [5] for an excellent Notice that we can only detect frequenciessequence that (Figure are at 1). By carefully selecting which review of data windowing techniques. these sinusoids, after signal x is projected on them. We can return from the frequency domain back to most half of the maximum signal frequency, due to the the time domain, using the inverse Fourier transform Nyquist fundamental theorem. In order to find the k 1 F − (x) ≡ x(n): dominant periods, we need to pick the k largest values of the periodogram. 1 N 1 2 1 − j πkn x(n)= X(f )e N , k = 0, 1 ...N − 1 √N k/N n=0 1Due to the assumption of the Fourier Transform that the data Note that if during this reverse transformation we is periodic, proper windowing of the data might be necessary for achieving a more accurate harmonic analysis. In this work we will discard some of the coefficients (e.g., the last k), then sidestep this issue, since it goes beyond the scope of this paper. the outcome will be an approximation of the original However, the interested reader is directed to [5] for an excellent sequence (Figure 1). By carefully selecting which review of data windowing techniques. is the Discrete Fourier Transform of the sequence .
We may write this equation in matrix form as:
...... where and etc. . i.e. , , , , DFT – example
Let the continuous signal be Therefore
Discretedc 1Hz Fourier2Hz Transform (DFT) The magnitude of the DFT coefficients is shown below in Fig. 7.3. periodogram 10 20 8
6 15 4 fourier 2 10 transform |F[n]| 0
−2 5
−4 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 f (Hz) Figure 7.2: Example signal for DFT. Figure 7.3: DFTsinusoidof four point sequence. Let us sample fourierat 4 times percoefficientssecond (ie. = 4Hz) from to . The values of the discrete samples are given by: Inverse Discrete Fourier Transform N 1 The inverse transform of by putting1 j2⇡kn X(f )= x(n) e N k/N p N n=0 84 X 85 • The fourier coefficients encode both the amplitude and phase This work targets similar applications and provides coefficients to record, we can perform a variety of tasks tools that can significantly ease the “mining” of useful such as compression, denoising, etc. information. Specifically, this paper makes the following contributions: Signal & Reconstruction 1. We present a novel automatic method for accurate periodicity detection in time-series data. Our algorithm is the first one that exploits the information in both periodogram and autocorrelation to provide accurate periodic estimates without upsampling. is the Discrete Fourier Transform of the sequence . 2. We introduce new periodic distance measures that exploit the power of the dominant periods,We may write as providedthis equation in matrix form as: by the Fourier Transform. By ignoring the phase infor- Fourier Coefficients mation we can provide more compact representations, f that also capture similarities under time-shift transfor- 4 f mations. . . 3 . . . . f 3. Finally, we present comprehensive experiments 2 f demonstrating the applicability and efficiency of the 1 proposed methods, on a variety of realwhere world datasets and etc. . f 0 (online query logs, manufacturing diagnostics, medical i.e. , , , , data, etc.). DFT – example Figure 1: Reconstruction of a signal from its first 5 Let the continuous signal be Fourier coefficients Therefore 2 Background We provide a brief introduction to harmonic analysis using the discrete Fourier Transform, because we will 2.2 Power Spectral Density Estimation. In or- Powerdc Spectral1Hz 2HzDensity (PSD) Estimation use these tools as the building blocks of our algorithms. der to discover potential periodicities of a time-series, one needs to examine its power spectralThe magnitude densityof the DFT coef(PSDficients is shown below in Fig. 7.3. 10 2.1 Discrete Fourier Transform. The normalized or power spectrum). The PSD essentially tells20 us how 8 Discrete Fourier Transform of a sequence x(n),n = much is the expected signal power at each frequency 6 15 0, 1 ...N − 1 is a sequence of complex numbers4 X(f): of the signal. Since period is the inverse of frequency, fourier 2 10
by identifying the frequencies that carry most|F[n]| of the N 1 j2πkn transform 1 − 0 X(f )= x(n)e− N , k = 0, 1 ...N − 1 energy, we can also discover the most dominant peri- k/N √N 5 n=0 −2 ods. There are two well known estimators of the PSD; −4 0 1 2 3 4 5 6 7 8 9 10 0 where the subscript k/N denotes the frequency that the periodogram and the circular autocorrelation. Both0 1 2 3 f (Hz) each coefficient captures. Throughout the text we willFigureof7.2: theseExample methodssignal for DFT can. be computed using the DFT of also utilize the notation F(x) to describe the Fourier a sequence (and can therefore exploit the FastFigure Fourier7.3: DFT of four point sequence. Transform. Since we are dealing withLet realus sample signals,•at the4Totimes findTransformper outsecond the(ie. for dominant= execution4Hz) from frequency in Oto(N log. TheNwe) time).need to find the values of the discrete samples are given by: Inverse Discrete Fourier Transform Fourier coefficients are symmetric around the middlepower at each frequency The inverse transform of one (or to be more exact, they will be the complex 2.2.1 Periodogramby putting Suppose that X is the DFT of conjugate of their symmetric). The Fourier transform• Periodograma sequence xencodes. The periodogram the strengthP is at provided a given by frequency the represents the original signal as a linear combination of squared length of each Fourier coefficient: j2πfn/N the complex sinusoids s (n)= e . Therefore, the 84 2 N 1 85 f √N P(fk/N )=∥X(fk/N )∥ k =0, 1 ...⌈ 2− ⌉ Fourier coefficients record the amplitude and phase of Notice that we can only detect frequencies that are at these sinusoids, after signal x is projected on them. We can return from the frequency domain back to most half of the maximum signal frequency, due to the the time domain, using the inverse Fourier transform Nyquist fundamental theorem. In order to find the k 1 F − (x) ≡ x(n): dominant periods, we need to pick the k largest values of the periodogram. 1 N 1 2 1 − j πkn x(n)= X(f )e N , k = 0, 1 ...N − 1 √N k/N n=0 1Due to the assumption of the Fourier Transform that the data Note that if during this reverse transformation we is periodic, proper windowing of the data might be necessary for achieving a more accurate harmonic analysis. In this work we will discard some of the coefficients (e.g., the last k), then sidestep this issue, since it goes beyond the scope of this paper. the outcome will be an approximation of the original However, the interested reader is directed to [5] for an excellent sequence (Figure 1). By carefully selecting which review of data windowing techniques. This work targets similar applications and provides coefficients to record, we can perform a variety of tasks tools that can significantly ease the “mining” of useful such as compression, denoising, etc. information. Specifically, this paper makes the following contributions: Signal & Reconstruction 1. We present a novel automatic method for accurate periodicity detection in time-series data. Our algorithm is the first one that exploits the information in both periodogram and autocorrelation to provide accurate periodic estimates without upsampling. 2. We introduce new periodic distance measures that exploit the power of the dominant periods, as provided by the Fourier Transform. By ignoring the phase infor- Fourier Coefficients mation we can provide more compact representations, f that also capture similarities under time-shift transfor- 4 f mations. 3 f 3. Finally, we present comprehensive experiments 2 f demonstrating the applicability and efficiency of the 1 proposed methods, on a variety of real world datasets f 0 (online query logs, manufacturing diagnostics, medical data, etc.). Figure 1: Reconstruction of a signal from its first 5 Fourier coefficients 2 Background We provide a brief introduction to harmonic analysis using the discrete Fourier Transform, because we will 2.2 Power Spectral Density Estimation. In or- use these tools as the building blocks of our algorithms. der to discover potential periodicities of a time-series, one needs to examine its power spectral density (PSD 2.1 Discrete Fourier Transform. The normalized or power spectrum). The PSD essentially tells us how Discrete Fourier Transform of a sequence x(n),n = much is the expected signal power at each frequency 0, 1 ...N − 1 is a sequence of complex numbers X(f): of the signal. Since period is the inverse of frequency, by identifying the frequencies that carry most of the Each element of the periodogramN 1 provides2 the amplitude patterns, which nonetheless appear more Each element of the periodogram1 − providesj πkn the amplitude patterns, which nonetheless appear more X(f )= x(n)e− N , k = 0, 1 ...N − 1 energy, we can also discover the most dominant peri- power at frequency k/N or,k/N equivalently,√N at period N/k. scarcelyPSD (see exampleestimation in fig. 2). using Periodogram power at frequency k/N or, equivalently,n=0 at period N/k. scarcely (see exampleods. in There fig. are 2). two well known estimators of the PSD; BeingBeing more more precise, precise, each each DFT DFT ‘bin’ ‘bin’ corresponds corresponds to to a a where the subscript k/N denotes the frequency that the periodogramSequence and the circular autocorrelation. Both range of periods (or frequencies). That is, coefficient Sequence range of periods (or frequencies).N ThatN is, coefficient of these methods can be computed using the DFT of X(f ) correspondseach to periods coefficient [ N captures.. . . N ). ThroughoutIt is easy the text we will time series data X(fk/N ) corresponds to periods [k . . .k 1 ). It is easy 0.2 k/N also utilize thek notationk− 1 F(x) to describe0.2 the Fourier a sequence (and can therefore exploit the Fast Fourier to see that the resolution of the periodogram− becomes or signal to see that the resolution of the periodogram becomes 0.1 Transform for execution in O(N log N) time). very coarse for longerTransform. periods. Since For example, we are dealing for a se- with real0.1 signals, the very coarse for longerFourier periods. coefficients For example, are symmetric for a se- around0 the middle quence of length N = 256, the DFT bin margins will be 0 quence of length N one= 256, (or the to DFT be more bin margins exact, they will will be be the complex50 1002.2.1150 Periodogram200 250 Suppose300 that350 X is the DFT of N/1,N/2,N/3,...= 256, 128, 64 etc. 50 100 150Periodogram200 250 300 350 N/1,N/2,N/3,...=conjugate 256, 128, of64 their etc. symmetric). The Fourier0.2 transform a sequencePeriodogramx. The periodogram P is provided by the 0.2 Essentially, the accuracy of the discovered periods, P1= 7 Essentially, therepresents accuracy of the the original discovered signal periods, as a linear0.15 combinationP2= 30.3333 of squaredP1= 7 length ofperiodogram each Fourier coefficient: 2 0.15 P2= 30.3333 deteriorates for large periods, due to the increasingej πfn/N deteriorates for large periods, due to the increasing 0.1 2 N 1 the complex sinusoids sf (n)= . Therefore,0.1 the P(f )=∥X(f )∥ k =0, 1 ...⌈ − ⌉ width of the DFT bins (N/k). Another related issue√ isN Power k/N k/N 2 width of the DFT binsFourier (N/k coe).ffi Anothercients record related the issue amplitude is 0.05Power and phase of spectral leakage, which causes frequencies that are not 0.05 Notice that we can only detect frequencies that are at spectral leakage, which causes frequencies that are not 0 these sinusoids, after signal x is projected0 on0 them.0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 integer multiples of the DFT bin width, to disperse over 0 6 0.05 0.1 most0.15 half0.2 of0.25 the maximum0.3 0.35 0.4 signal0.45 frequency,0.5 due to the integer multiples of the DFT bin width, to disperse over x 10 6 Circular Autocorrelation We can return from the frequency domainx 10 back to Circular Autocorrelation the entire spectrum. This can lead to ‘false alarms’ 7 Nyquist fundamental theorem. In order to find the k the entire spectrum.the This time candomain, lead using to ‘false the inverse alarms’ Fourier7 transform 1 dominant periods, we need to pick the k largest values inin the the periodogram. periodogram.F − However, However,(x) ≡ x(n the): the periodogram periodogram can can • 6 7 day 1 still provide an accurate indicator of important short 6To7 dayfind the ofdominant the periodogram. frequencies choose the top-k still provide an accurate indicatorN of1 important2 short 1 − j πkn (to medium) length periods.x(n)= Additionally,X(f through)e N , thek = 0, 15 ...N − 1 (to medium) length periods. Additionally,√N k/N through the 5dominant frequencies n=0 1 periodogram it is easy to automate the extraction of 20 40 60 Due 80to the assumption100 120 of140 the Fourier160 Transform180 that the data periodogram it is easy to automate the extraction of 20 40 60 80 100 120 140 160 180 important periods (peaks)Note that by examining if during the this statistical reverse transformation we is periodic, proper windowing of the data might be necessary for important periods (peaks) by examining the statistical Figure 2: The 7 dayachieving period a ismore latent accurate in the harmonic autocorrela- analysis. In this work we will discard some of the coefficients (e.g., theFigure last2: k), The then 7 day period is latent in the autocorrela- propertiesproperties of of the the Fourier Fourier coe coeffifficientscients (such (such as as in in [15]). [15]). tion graph, becausesidestep it has this lower issue, amplitude since it goes (even beyond though the scope of this paper. the outcome will be an approximationtion of thegraph, original because it has lower amplitude (even though it happens with higherHowever, frequency). the interested However, reader is the directed 7 day to [5] for an excellent 2.2.2 Circular Autocorrelation.sequence (FigureThe 1). second By carefully way selectingit happens which with higherreview of frequency). data windowing However, techniques. the 7 day 2.2.2 Circular Autocorrelation. The second way peakpeak is is very very obvious obvious in in the the Periodogram. Periodogram. toto estimate estimate the the dominant dominant periods periods of of a a time-series time-seriesxx,, is is to calculate the circular AutoCorrelation Function (or to calculate the circular AutoCorrelation Function (or TheThe advantages advantages and and shortcomings shortcomings of of the the peri- peri- ACF), which examines how similar a sequence is to its ACF), which examines how similar a sequence is to its odogramodogram and and the the ACF ACF are are summarized summarized in in Table Table 1. 1. previous values for different τ lags: previous values for different τ lags: FromFrom the the above above discussion discussion one one can can realize realize that that al- al- N 1 though the periodogram and the autocorrelation cannot 1 N− 1 though the periodogram and the autocorrelation cannot ACF (τ)= 1 − x(τ) · x(n + τ) ACF (τ)=N x(τ) · x(n + τ) provide sufficient spectral information separately, there Nn=0 provide sufficient spectral information separately, there n=0 is a lot of potential when both methods are combined. Therefore, the autocorrelation is formally a convo- is a lot of potential when both methods are combined. Therefore, the autocorrelation is formally a convo- WeWe delineate delineate our our approach approach in in the the following following section. section. lution,lution, and and we we can can avoid avoid the the quadratic quadratic calculation calculation in in thethe time time domain domain by by computing computing it it e effifficientlyciently as as a a dot dot 33 Our Our Approach Approach productproduct in in the the frequency frequency domain domain using using the the normalized normalized We utilize a two-tier approach, by considering the in- FourierFourier transform: transform: We utilize a two-tier approach, by considering the in- 1 formation in both the autocorrelation and the peri- ACF = F − 1
• Correlate the time series with itself
⌧ N Y Y ACF (⌧)= i=1 i · i+⌧ • Peaks get amplified N P • Fine-grained periodicity detector Each element of the periodogram provides the amplitude patterns, which nonetheless appear more power at frequency k/N or, equivalently, at period N/k. scarcely (see example in fig. 2). Being more precise, each DFT ‘bin’ corresponds to a range of periods (or frequencies). That is, coefficient Sequence N N X(fk/N ) corresponds to periods [ k . . . k 1 ). It is easy 0.2 to see that the resolution of the periodogram− becomes Each element of the periodogram provides the amplitude0.1 patterns, which nonetheless appear more very coarse for longer periods. For example, for a se- Autocorrelation power at frequency k/N or, equivalently, at period N/k. scarcely0 (see example in fig. 2). quence of length N = 256, the DFT bin margins will be Being more precise, each DFT ‘bin’ corresponds to a 50 100 150 200 250 300 350 N/1,N/2,N/3,...= 256, 128, 64 etc. PeriodogramSequence range of periods (or frequencies). That is, coefficient 0.2 X(Essentially,f ) corresponds the accuracy to periods of the [ N discovered. . . N ). It periods, is easy P1= 7 k/N k k 1 0.150.2 P2= 30.3333 time series data deterioratesto see that thefor resolution large periods, of the due periodogram to the− increasing becomes 0.10.1 width of the DFT bins (N/k). Another related issue is Power or signal very coarse for longer periods. For example, for a se- 0.05 spectral leakage, which causes frequencies that are not 0 quence of length N = 256, the DFT bin margins will be 0 0 0.05 50 0.1 1000.15 0.2150 0.25 200 0.3 2500.35 0.4300 0.45 3500.5 integer multiples of the DFT bin width, to disperse over 6 N/1,N/2,N/3,...= 256, 128, 64 etc. x 10 CircularPeriodogram Autocorrelation the entire spectrum. This can lead to ‘false alarms’ 0.2 Essentially, the accuracy of the discovered periods, 7 P1= 7 in the periodogram. However, the periodogram can 0.15 P2= 30.3333 deteriorates for large periods, due to the increasing Auto-correlation 0.16 7 day stillwidth provide of the an DFT accurate bins (N/k indicator). Another of important related issue short is Power N 0.055 Y Y (tospectral medium) leakage length, which periods. causes Additionally, frequencies through that are the not ACF (⌧)= i=1 i · i+⌧ 0 N periodogram it is easy to automate the extraction of 0 200.05 0.140 0.1560 0.280 0.25100 0.3 120 0.35 140 0.4 1600.45 1800.5 P integer multiples of the DFT bin width, to disperse over 6 important periods (peaks) by examining the statistical x 10 Circular Autocorrelation the entire spectrum. This can lead to ‘false alarms’ Figure7 2: The 7 day period is latent in the autocorrela- propertiesin the periodogram. of the Fourier However, coefficients the (such periodogram as in [15]). can tion graph, because it has lower amplitude (even though • 6To determine dominant period significance threshold needs to be specified still provide an accurate indicator of important short it happens7 day with higher frequency). However, the 7 day 2.2.2 Circular Autocorrelation. The second way 5 (to medium) length periods. Additionally, through the peak• Multiples is very of obvious the same in theperiod Periodogram. are also peaks — needs post processing toperiodogram estimate the it dominant is easy to periods automate of a the time-series extractionx, is of 20 40 60 80 100 120 140 160 180 to calculate the circular AutoCorrelation Function (or important periods (peaks) by examining the statistical FigureThe2: advantages The 7 day period and shortcomings is latent in the of autocorrela- the peri- ACF), which examines how similar a sequence is to its properties of the Fourier coefficients (such as in [15]). odogramtion graph, and because the ACF it has are lower summarized amplitude in Table(even though 1. previous values for different τ lags: it happensFrom the with above higher discussion frequency). one can However, realize the that 7 day al- 2.2.2 Circular Autocorrelation.N 1 The second way thoughpeak is the very periodogram obvious in the and Periodogram. the autocorrelation cannot ACF τ 1 − x τ x n τ to estimate the( dominant)= N periods( ) · of( a+ time-series) x, is provide sufficient spectral information separately, there n=0 to calculate the circular AutoCorrelation Function (or is a lotThe of advantages potential when and both shortcomings methods are of combined. the peri- Therefore, the autocorrelation is formally a convo- ACF), which examines how similar a sequence is to its Weodogram delineate and our the approach ACF are in summarized the following in Table section. 1. lution,previous and values we can for avoid different theτ quadraticlags: calculation in the time domain by computing it efficiently as a dot From the above discussion one can realize that al- N 1 3though Our the Approach periodogram and the autocorrelation cannot product in theACF frequencyτ 1 domain− x τ usingx n theτ normalized ( )= N ( ) · ( + ) provide sufficient spectral information separately, there Fourier transform: n=0 We utilize a two-tier approach, by considering the in- 1 formationis a lot of in potential both the when autocorrelation both methods and are combined. the peri- ACF = F −
• Auto-correlation : Good for large periods but difficult to automatically determine periods
Metho•d PeriodogramEasy to : threshold Easy to thresholdAccurate short but periodsnot accurateAccurate for largeshort periods periodsComplexit y Periodogram yes yes no O(NlogN) Autocorrelation• Idea: Get candidateno periods fromy esPeriodogram and validateyes false O(NlogN) Combinationalarms using Auto-correlationyes yes yes O(NlogN) Table 1: Concise comparison of approaches for periodicity detection.
hill
Refine Period Period Sequence Periodogram Autocorrelation Candidate
Periods valley False Alarm Dismiss Figure 3: Diagram of our methodology (AUTOPERIOD method)
MSN query request logs and represents the aggregate cient, because both the periodogram and the ACF can demand for the query ‘Easter’ for 1000 days after the be directly computed through the Fast Fourier Trans- beginning of 2002. The demand for the specific query form of the examined sequence in O(N log N) time. peaks during Easter time and we can observe one yearly peak. Our intuition is that periodicity should be 3.1 Discussion. First, we need to clarify succinctly approximately 365 (although not exactly, since Easter that the use of the combined periodogram and auto- is not celebrated at the same date every year). Indeed correlation does not carry additional information than the most dominant periodogram estimate is 333.33 = each metric separately. This perhaps surprising state- ment can be verified by noting that: (1000/3), which is located on a hill of the ACF, with a 2 peak at 357 (the correct periodicity -at least for this
0 domain, and (ii) the identification of the exact period 100 200 300 400 500 600 700 800 900 1000 in the time domain. Periodogram 0.2 P1= 333.3333 Another issue that we would like to clarify is the 0.15 reason that we are not considering a (seemingly) simpler 0.1 P2= 166.6667
Power approach for accurate periodicity estimation. 0.05 P3= 90.9091 Looking at the problem from a signal processing 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 perspective, one could argue that the inability to dis- Circular Autocorrelation cover the correct period is due to the ‘coarse’ sampling 0.8 0.6 Hill: Candidate Period of the series. If we would like to increase the resolution 0.4 357: correct period of the DFT, we could ‘sample’ our dataset at a finer res- 0.2 Valley: False Alarms olution (upsampling). Higher sampling rate essentially 0 50 100 150 200 250 300 350 400 450 translates into padding the time-series with zeros, and Days calculating the DFT of the longer time-series. Indeed, if Figure 4: Visual demonstration of our method. Candi- we increase the size of the example sequence from 1000 date periods from the periodogram are verified against to 16000, we will be able to discover the correct period- the autocorrelation. Valid periods are further refined icity which is 357 (instead of the incorrect 333, given in utilizing the autocorrelation information. the original estimate). However, upsampling also imposes a significant Essentially, we have leveraged the information of performance overhead. If we are interested in obtaining both metrics for providing an accurate periodicity de- online periodicity estimates from a data stream, this tector. In addition, our method is computationally effi- alternative method may result in a serious system Method Easy to threshold Accurate short periods Accurate large periods Complexity Periodogram yes yes no O(NlogN) Autocorrelation no yes yes O(NlogN) Combination yes yes yes O(NlogN) Table 1: Concise comparison of approaches for periodicity detection.
hill
Refine Period Period Sequence Periodogram Autocorrelation Candidate
Periods valley False Alarm Dismiss Figure 3: Diagram of our methodology (AUTOPERIOD method)
MSN query request logs and represents the aggregate cient, because both the periodogram and the ACF can demand for the query ‘Easter’ for 1000 days after the be directly computed through the Fast Fourier Trans- beginning of 2002. The demand for the specific query form of the examined sequence in O(N log N) time. peaks during Easter time and we can observe one yearly peak. Our intuition is that periodicity should be 3.1 Discussion. First, we need to clarify succinctly approximately 365 (although not exactly, since Easter that the use of the combined periodogram and auto- is not celebrated at the same date every year). Indeed correlation does not carry additional information than the most dominant periodogram estimate is 333.33 = each metric separately. This perhaps surprising state- ment can be verified by noting that: (1000/3), which is located on a hill of the ACF, with a 2 peak at 357 (the correct periodicity -at least for this
0 domain, and (ii) the identification of the exact period 100 200 300 400 500 600 700 800 900 1000 in the time domain. Periodogram 0.2 P1= 333.3333 Another issue that we would like to clarify is the 0.15 reason that we are not considering a (seemingly) simpler 0.1 P2= 166.6667
Power approach for accurate periodicity estimation. 0.05 P3= 90.9091 Looking at the problem from a signal processing 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 perspective, one could argue that the inability to dis- Circular Autocorrelation cover the correct period is due to the ‘coarse’ sampling 0.8 0.6 Hill: Candidate Period of the series. If we would like to increase the resolution 0.4 357: correct period of the DFT, we could ‘sample’ our dataset at a finer res- 0.2 Valley: False Alarms olution (upsampling). Higher sampling rate essentially 0 50 100 150 200 250 300 350 400 450 translates into padding the time-series with zeros, and Days calculating the DFT of the longer time-series. Indeed, if Figure 4: Visual demonstration of our method. Candi- we increase the size of the example sequence from 1000 date periods from the periodogram are verified against to 16000, we will be able to discover the correct period- the autocorrelation. Valid periods are further refined icity which is 357 (instead of the incorrect 333, given in utilizing the autocorrelation information. the original estimate). However, upsampling also imposes a significant Essentially, we have leveraged the information of performance overhead. If we are interested in obtaining both metrics for providing an accurate periodicity de- online periodicity estimates from a data stream, this tector. In addition, our method is computationally effi- alternative method may result in a serious system Matching Time Series
• Similar time series suggest similar things Matching Time Series
• Correlating time series used for clustering, classification, anomaly detection, speech recognition etc. Matching Time Series
What measure would you use to match two time series ?
d = y x | t t| t EuclideanX Distance
Why is Euclidean matching not good enough ? Dynamic Time Warping
Time series might be shifted
Time series might be compressed at some point in time
Random noise at some points
Dynamic time warping measures the distance between two sequences under certain restrictions.
Not a metric. Triangle inequality doesn't hold Detour - Edit Distance
• Edit distance measures how many steps it takes to convert a string to another based on restrictions
• Restrictions define cost function — insertion, deletion, replacement
f o x f o x
f 0 1 2 f 0 1 2
a 1 2 3 a 1 1 2
x 2 3 2 x 2 2 1
insertions and deletions insertions, deletions and replacements Edit Distance
f o x f o x
f 0 1 2 f 0 1 2
a 1 2 3 a 1 1 2
x 2 3 2 x 2 2 1
insertions and deletions insertions, deletions and replacements
di 1,j 1 aj = bi di 1,j + wdel(bi) 1 dij = 8 ,for1 i m, 1 j n. >min 8di,j 1 + wins(aj) 1 aj = bi <> 6 <>di 1,j 1 + wsub(aj,bi) 1 or 2 > :> :> Dynamic Time Warping
• DTW aligns two sequences of feature vectors by warping the time axis iteratively until an optimal match (according to a suitable metrics) between the two sequences is found.
x1,x2,...xn y1,y2,...yn
d(i 1,j), d(i, j)=c(i, j)+min d(i 1,j 1), 8 <>d(i, j 1) :> Burst Detection
• Bursts are rare but extremely beneficial in time-series
• Used in number of applications
• Twitter: Trending topics
• Stock markets: Trending Stocks
• Text Mining: finding important time periods
• Elastic burst detection:
• Stream of data
• Quadratic computations not allowed Burst Detection
• Global Average
• Damped Average 2.5 Elastic Burst Detection and Shifted Binary
Tree
The elastic burst detection problem [84] is to detect bursts across multiple win- dow sizes. Formally: Elastic Burst Detection Problem 1. Given a data source producing non-negative data elements x1,x2,...,
asetofwindowsizesW = w1,w2,...,wm ,amonotonic,associativeag- • Given a time-series {xi} { } gregation function A (such as “sum” or “maximum”) that maps a consecu- • A set of window sizes W tive sequence of data elements to a number (it is monotonic in the sense that • A monotonic, associative aggregation function A which maps a sequence of values to a number. E.g. Average, Max A[xt xt+w−1] A[xt xt+w],forallw), and thresholds associated with each • and Thresholds··· associated≤ with each··· window size w, f(w) window size, f(wj),forj =1, 2,...,m,theelasticburstdetectionistheproblem • Find all pairs (t,w) such that t time a time point and w is a window size inof W finding all pairs (t, w) such that t is a time point and w is a window size in
W and A[xt xt w− ] f(w). ··· + 1 ≥
Anaivealgorithmistocheckeachwindowsizeofinterestoneat a time. To detect bursts over m window sizes in a sequence of length N naively requires Θ(mN)time.Thisisunacceptableinahigh-speeddatastreamenvironment. In [84], the authors show that a simple data structure called the Shifted Binary Tree could be the basis of a filter that would detect all bursts, and perform in time independent of the number of windows when the probability of bursts is very low. AShiftedBinaryTreeisahierarchicaldatastructureinspired by the Haar wavelet tree. The leaf nodes of this tree (denoted level 0) correspond to the time points of the incoming data; a node at level 1 aggregates two adjacent nodes at level 0. In general, a node at level i+1 aggregatestwonodesatlevel i,
i+1 thus includes 2 time points. There are only log2 N +1levelswhereN is the
14 Burst Detection - Shifted Binary Tree
Level 4
Level 3
Level 2 base level shifted level Level 1
Level 0
Figure 2.1: An example of a Shifted Binary Tree. The two shadedsequencesin Whenever more than f(2 + 2i−1) events are found in a window of size 2i+1, levelthen a detailed search must be performed to check if some subwindow of 0 are included in the shaded nodes in level 4 and level 3 respectively. size w, 2+2i−1 ≤ w ≤ 1+2i, has f(w) events. maximum window size. The Shifted Binary Tree includes a shifted sublevel for each level above level 0. In shifted sublevel i,thecorrespondingwindowsare still of length 2i but those windows are shifted by 2i−1 from the base sublevel. Figure 2.1 shows an example of a Shifted Binary Tree. The overlap between the base sublevels and the shifted sublevels guarantees that all the windows of length w, w 1+2i,areincludedinoneofthewindows ≤ at level i +1.BecausetheaggregationfunctionA is monotonically increasing, if A[xt xt w c] f(w), then surely A[xt xt w− ] f(w). The Shifted ··· + + ≤ ··· + 1 ≤ Binary Tree takes advantage of this monotonic property as follows: the threshold value f(2 + 2i−1)isassociatedwithleveli +1. Whenevermorethan f(2 + 2i−1)eventsarefoundinawindowofsize2i+1,thenadetailedsearchmustbe performed to check if some subwindow of size w,2+2i−1 w 1+2i,hasf(w) ≤ ≤ events. All bursts are guaranteed to be reported and many non-burst windows are filtered away without requiring a detailed check when the burst probability is very low. However, some detailed searches will turn out to be fruitless(i.e.thereisno burst at all). For example, assume the threshold for window size 4 is 100, for 5 is 120, and for 8 is 150. Because each node at level 8 covers window size 4 and
15 Summary
• Periodicity of Events
• Auto-correlation, Periodograms and their combinations
• Burst Detection Techniques and elastic detection
• Matching of time series
• Euclidean matching
• Dynamic Time Warping References
• “On Periodicity Detection and Structural Periodic Similarity”, 2005.
• Michail Vlachos, Philip Yu, Vittorio Castelli
• Burst Detection in Hierarchical Streams.
• Jon Kleinberg
• Everything you know about Dynamic Time Warping is wrong
• Eamonn Keogh Projects http://www.l3s.de/~anand/tir14/projects.html • Temporal and Phrase-based Indexing - Avishek([email protected])
• Temporal Retrieval Models - Jaspreet ([email protected])
• Temporal Query Autocompletion- Avishek
• Crawling for Temporal Collections - Gerhard ([email protected])
• Temporal Query Suggestions - Helge ([email protected])