Introduction to Statistical Analysis of Time Series Richard A

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Statistical Analysis of Time Series Richard A Introduction to Statistical Analysis of Time Series Richard A. Davis Department of Statistics Outline Modeling objectives in time series General features of ecological/environmental time series Components of a time series Frequency domain analysis-the spectrum Estimating and removing seasonal components Other cyclical components Putting it all together 1 Time Series: A collection of observations xt, each one being recorded at time t. (Time could be discrete, t = 1,2,3,…, or continuous t > 0.) Objective of Time Series Analaysis Data compression -provide compact description of the data. Explanatory -seasonal factors -relationships with other variables (temperature, humidity, pollution, etc) Signal processing -extracting a signal in the presence of noise Prediction -use the model to predict future values of the time series 2 General features of ecological/environmental time series Examples. 1. Mauna Loa (CO2,, Oct `58-Sept `90) CO2 320 330 340 350 1960 1970 1980 1990 Features increasing trend (linear, quadratic?) seasonal (monthly) effect. 3 2. Ave-max monthly temp (vegetation=tundra,, 1895-1993) temp -5 0 5 10 15 20 0 200 400 600 800 1000 1200 Features seasonal Go(monthly to ITSM effect) Demo more variability in Jan than in July 4 July: mean = 21.95, var = .6305 temp Jan : mean = -.486, var =2.637 -5 0 5 10 15 20 0 20406080100 Sept : mean = 17.25, var =1.466 t temp Line: 16.83 + .00845 15 16 17 18 19 20 0 20406080100 5 Components of a time series Classical decomposition Xt = mt + st + Yt • mt = trend component (slowly changing in time) • st = seasonal component (known period d=24(hourly), d=12(monthly)) • Yt = random noise component (might contain irregular cyclical components of unknown frequency + other stuff). Go to ITSM Demo 6 Estimation of the components. Xt = mt + st + Yt Trend mt filtering. E.g., for monthly data use = + ++ + mˆ t (.5xt−6 xt−5 xt+5 .5xt+6 ) /12 polynomial fitting = + ++ k mˆ t a0 a1t akt 7 Estimation of the components (cont). Xt = mt + st + Yt Seasonal st Use seasonal (monthly) averages after detrending. (standardize so that st sums to 0 across the year. = + + = sˆt (xt xt+12 xt+24 ) / N, N number of years harmonic components fit to the time series using least squares. 2π 2π sˆ = Acos( t) + Bsin( t) t 12 12 Irregular component Yt ˆ = − − Yt X t mˆ t sˆt 8 The spectrum and frequency domain analysis Toy example. (n=6) 2π0 2π0 c = (cos( ),…, cos( 6) )'/sqrt(6)c0 0 6 6 π π = 2 … 2 c (cos( ), , cos( 6) )' /sqrt(3)c1 1 6 6 π π = 2 … 2 s (sin( ), , sin ( 6) )' /sqrt(3)s1 1 6 6 π π = 2 2 … 2 2 c (cos( ), , cos( 6) )'/sqrt(3c2 ) 2 6 6 π π = 2 2 … 2 2 s (sin( ), , sin( 6) )'/sqrt(3)s2 2 6 6 2π 2π c =(cos( ),…, cos( 6) )'/sqrt(6c3 3 2 2 X=(4.24, 3.26, -3.14, -3.24, 0.739, 3.04)’ = 2c0+5(c1+s1)-1.5(c2+s2)+.5c3 x 9 Fact: Any vector of 6 numbers, x = (x1, . , x6)’ can be written as a linear combination of the vectors c0, c1, c2, s1, s2, c3. More generally, any time series x = (x1, . , xn)’ of length n (assume n is odd) can be written as a linear combination of the basis (orthonormal) vectors c0, c1, c2, …, c[n/2], s1, s2, …, s[n/2]. That is, = + + + + = x a0c0 a1c1 b1s1 amcm bmsm , m [n / 2] 1 cos(ω ) sin(ω ) j j 1/ 2 1/ 2 ω 1/ 2 ω 1 1 2 cos(2 j ) 2 sin(2 j ) c = , c = , s = 0 n j n j n ω ω 1 cos(n j ) sin(n j ) 10 = + + + + = x a0c0 a1c1 b1s1 amcm bmsm , m [n / 2] Properties: 1. The set of coefficients {a0, a1, b1, … } is called the discrete Fourier transform n = = 1 a0 (x,c0 ) 1/ 2 ∑ xt n t=1 1/ 2 n = = 2 ω a j (x,c j ) 1/ 2 ∑ xt cos( jt) n t=1 1/ 2 n = = 2 ω b j (x,s j ) 1/ 2 ∑ xt sin( jt) n t=1 11 2. Sum of squares. n m 2 = 2 + ()2 + 2 ∑ xt a 0 ∑ a j b j t =1 j=1 3. ANOVA (analysis of variance table) Source DF Sum of Squares Periodgram ω 2 ω 0 1 a0 I( 0) ω π/ 2 2 ω 1=2 n2 a1 + b1 2I( 1) ω π / 2 2 ω m =2 m n 2 am + bm 2I( m) 2 n ∑ xt t 12 Applied to toy example Source DF Sum of Squares ω 2 0=0 (period 0) 1 a0 = 4.0 ω π/ 2 2 1=2 6 (period 6) 2 a1 + b1 = 50.0 ω π / 2 2 2 =2 2 6 (period 3) 2 a2 + b2 = 4.5 ω π / 2 3 =2 3 6 (period 2) 1 a3 = 0.25 2 6 ∑ xt = 58.75 t Test that period 6 is significant µ ε ε H0: Xt = + t , { t} ~ independent noise µ π π ε H1: Xt = + A cos (t2 /6) + B sin (t2 /6) + t ω Σ 2 ω Test Statistic: (n-3)I( 1)/( t xt -I(0)-2I( 1)) ~ F(2,n-3) (6-3)(50/2)/(58.75-4-50)=15.79 ⇒ p-value = .003 13 The spectrum and frequency domain analysis Ex. Sinusoid with period 12. 2π 2π x = 5cos( t) + 3sin( t), t =1,2,…,120. t 12 12 Ex. Sinusoid with periods 4 and 12. Ex. Mauna Loa ITSM DEMO 14 Differencing at lag 12 Sometimes, a seasonal component with period 12 in the time series can be removed by differencing at lag 12. That is the differenced series is = − yt xt xt−12 Now suppose xt is the sinusoid with period 12 + noise. 2π 2π x = 5cos( t) + 3sin( t) + ε , t =1,2,…,120. t 12 12 t Then = − = ε − ε yt xt xt−12 t t−12 which has correlation at lag 12. 15 Other cyclical components; searching for hidden cycles Ex. Sunspots. period ~ 2π/.62684=10.02 years Fisher’s test ⇒ significance What model should we use? ITSM DEMO 16 Noise. The time series {Xt} is white or independent noise if the sequence of random variables is independent and identically distributed. x_t -2024 0 20406080100120 time Battery of tests for checking whiteness. In ITSM, choose statistics => residual analysis => Tests of Randomness 17 Residuals from Mauna Loa data. t rt rt+1t+25 Cor(Xt, Xt+25) = .074 t rt rt+1t+2 Cor(X , X ) = .654 Cor(Xt t, Xt+3t+2) = .736 1 -.19 -.13 .13 Cor(Xt, Xt+1) = .824 1 -.19 -.14-.25 2 -.14 -.25-.13-.32 .04 3 -.25 -.13-.32 .20 x_{t+1} 4 -.13 -.02 .47 x_{t+1} 4 -.13 -.32-.02 x_{t+1} x_{t+1} … -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0-1.0 -0.5 -0.5 0.0 0.0x_t 0.5 0.5 1.0 1.0 1.5 1.5 x_t x_tx_t 18 Autocorrelation function (ACF): Mauna Loa residuals ACF 0.0 0.2 0.4 0.6 0.8 1.0 Conf Bds: ± 1.96/sqrt(n) 0 10203040 lag white noise ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 10203040 lag 19 Example: Mauna Loa irregular part seasonal trend CO2 -1.0 0.0 1.0 -3 -1 1 2 3 320 340 320 340 19601960 19701960 1970 19801960 1970 1980 1990 1970 1980 1990 1980 1990 1990 Putting it all together 20 Strategies for modeling the irregular part {Yt}. Fit an autoregressive process Fit a moving average process Fit an ARMA (autoregressive-moving average) process In ITSM, choose the best fitting AR or ARMA using the menu option Model => Estimation => Preliminary => AR estimation or Model => Estimation => Autofit 21 How well does the model fit the data? 1. Inspection of residuals. Are they compatible with white (independent) noise? no discernible trend no seasonal component variability does not change in time. no correlation in residuals or squares of residuals Are they normally distributed? 2. How well does the model predict. values within the series (in-sample forecasting) future values 3. How well do the simulated values from the model capture the characteristics in the observed data? ITSM DEMO with Mauna Loa 22 Model refinement and Simulation Residual analysis can often lead to model refinement Do simulated realizations reflect the key features present in the original data Two examples Sunspots NEE (Net ecosystem exchange). Limitations of existing models Seasonal components are fixed from year to year. Stationary through the seasons Add intervention components (forest fires, volcanic eruptions, etc.) 23 Other directions Structural model formulation for trend and seasonal components Local level model mt =mt-1 + noiset Seasonal component with noise st =–st-1 –st-2 – . – st-11+noiset ε Xt= mt + st + Yt + t Easy to add intervention terms in the above formulation. Periodic models (allows more flexibility in modeling transitions from one season to the next). 24.
Recommended publications
  • Kalman and Particle Filtering
    Abstract: The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. With the state estimates, we can forecast and smooth the stochastic process. With the innovations, we can estimate the parameters of the model. The article discusses how to set a dynamic model in a state-space form, derives the Kalman and Particle filters, and explains how to use them for estimation. Kalman and Particle Filtering The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. Since both filters start with a state-space representation of the stochastic processes of interest, section 1 presents the state-space form of a dynamic model. Then, section 2 intro- duces the Kalman filter and section 3 develops the Particle filter. For extended expositions of this material, see Doucet, de Freitas, and Gordon (2001), Durbin and Koopman (2001), and Ljungqvist and Sargent (2004). 1. The state-space representation of a dynamic model A large class of dynamic models can be represented by a state-space form: Xt+1 = ϕ (Xt,Wt+1; γ) (1) Yt = g (Xt,Vt; γ) . (2) This representation handles a stochastic process by finding three objects: a vector that l describes the position of the system (a state, Xt X R ) and two functions, one mapping ∈ ⊂ 1 the state today into the state tomorrow (the transition equation, (1)) and one mapping the state into observables, Yt (the measurement equation, (2)).
    [Show full text]
  • Lecture 19: Wavelet Compression of Time Series and Images
    Lecture 19: Wavelet compression of time series and images c Christopher S. Bretherton Winter 2014 Ref: Matlab Wavelet Toolbox help. 19.1 Wavelet compression of a time series The last section of wavelet leleccum notoolbox.m demonstrates the use of wavelet compression on a time series. The idea is to keep the wavelet coefficients of largest amplitude and zero out the small ones. 19.2 Wavelet analysis/compression of an image Wavelet analysis is easily extended to two-dimensional images or datasets (data matrices), by first doing a wavelet transform of each column of the matrix, then transforming each row of the result (see wavelet image). The wavelet coeffi- cient matrix has the highest level (largest-scale) averages in the first rows/columns, then successively smaller detail scales further down the rows/columns. The ex- ample also shows fine results with 50-fold data compression. 19.3 Continuous Wavelet Transform (CWT) Given a continuous signal u(t) and an analyzing wavelet (x), the CWT has the form Z 1 s − t W (λ, t) = λ−1=2 ( )u(s)ds (19.3.1) −∞ λ Here λ, the scale, is a continuous variable. We insist that have mean zero and that its square integrates to 1. The continuous Haar wavelet is defined: 8 < 1 0 < t < 1=2 (t) = −1 1=2 < t < 1 (19.3.2) : 0 otherwise W (λ, t) is proportional to the difference of running means of u over successive intervals of length λ/2. 1 Amath 482/582 Lecture 19 Bretherton - Winter 2014 2 In practice, for a discrete time series, the integral is evaluated as a Riemann sum using the Matlab wavelet toolbox function cwt.
    [Show full text]
  • Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients
    Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients Richard M. Levich and Rosario C. Rizzo * Current Draft: December 1998 Abstract: When autocorrelation is small, existing statistical techniques may not be powerful enough to reject the hypothesis that a series is free of autocorrelation. We propose two new and simple statistical tests (RHO and PHI) based on the unweighted sum of autocorrelation and partial autocorrelation coefficients. We analyze a set of simulated data to show the higher power of RHO and PHI in comparison to conventional tests for autocorrelation, especially in the presence of small but persistent autocorrelation. We show an application of our tests to data on currency futures to demonstrate their practical use. Finally, we indicate how our methodology could be used for a new class of time series models (the Generalized Autoregressive, or GAR models) that take into account the presence of small but persistent autocorrelation. _______________________________________________________________ An earlier version of this paper was presented at the Symposium on Global Integration and Competition, sponsored by the Center for Japan-U.S. Business and Economic Studies, Stern School of Business, New York University, March 27-28, 1997. We thank Paul Samuelson and the other participants at that conference for useful comments. * Stern School of Business, New York University and Research Department, Bank of Italy, respectively. 1 1. Introduction Economic time series are often characterized by positive
    [Show full text]
  • Time Series: Co-Integration
    Time Series: Co-integration Series: Economic Forecasting; Time Series: General; Watson M W 1994 Vector autoregressions and co-integration. Time Series: Nonstationary Distributions and Unit In: Engle R F, McFadden D L (eds.) Handbook of Econo- Roots; Time Series: Seasonal Adjustment metrics Vol. IV. Elsevier, The Netherlands N. H. Chan Bibliography Banerjee A, Dolado J J, Galbraith J W, Hendry D F 1993 Co- Integration, Error Correction, and the Econometric Analysis of Non-stationary Data. Oxford University Press, Oxford, UK Time Series: Cycles Box G E P, Tiao G C 1977 A canonical analysis of multiple time series. Biometrika 64: 355–65 Time series data in economics and other fields of social Chan N H, Tsay R S 1996 On the use of canonical correlation science often exhibit cyclical behavior. For example, analysis in testing common trends. In: Lee J C, Johnson W O, aggregate retail sales are high in November and Zellner A (eds.) Modelling and Prediction: Honoring December and follow a seasonal cycle; voter regis- S. Geisser. Springer-Verlag, New York, pp. 364–77 trations are high before each presidential election and Chan N H, Wei C Z 1988 Limiting distributions of least squares follow an election cycle; and aggregate macro- estimates of unstable autoregressive processes. Annals of Statistics 16: 367–401 economic activity falls into recession every six to eight Engle R F, Granger C W J 1987 Cointegration and error years and follows a business cycle. In spite of this correction: Representation, estimation, and testing. Econo- cyclicality, these series are not perfectly predictable, metrica 55: 251–76 and the cycles are not strictly periodic.
    [Show full text]
  • Generating Time Series with Diverse and Controllable Characteristics
    GRATIS: GeneRAting TIme Series with diverse and controllable characteristics Yanfei Kang,∗ Rob J Hyndman,† and Feng Li‡ Abstract The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We pro- pose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application. Keywords: Time series features; Time series generation; Mixture autoregressive models; Time series forecasting; Simulation. 1 Introduction With the widespread collection of time series data via scanners, monitors and other automated data collection devices, there has been an explosion of time series analysis methods developed in the past decade or two. Paradoxically, the large datasets are often also relatively homogeneous in the industry domain, which limits their use for evaluation of general time series analysis methods (Keogh and Kasetty, 2003; Mu˜nozet al., 2018; Kang et al., 2017).
    [Show full text]
  • Monte Carlo Smoothing for Nonlinear Time Series
    Monte Carlo Smoothing for Nonlinear Time Series Simon J. GODSILL, Arnaud DOUCET, and Mike WEST We develop methods for performing smoothing computations in general state-space models. The methods rely on a particle representation of the filtering distributions, and their evolution through time using sequential importance sampling and resampling ideas. In particular, novel techniques are presented for generation of sample realizations of historical state sequences. This is carried out in a forward-filtering backward-smoothing procedure that can be viewed as the nonlinear, non-Gaussian counterpart of standard Kalman filter-based simulation smoothers in the linear Gaussian case. Convergence in the mean squared error sense of the smoothed trajectories is proved, showing the validity of our proposed method. The methods are tested in a substantial application for the processing of speech signals represented by a time-varying autoregression and parameterized in terms of time-varying partial correlation coefficients, comparing the results of our algorithm with those from a simple smoother based on the filtered trajectories. KEY WORDS: Bayesian inference; Non-Gaussian time series; Nonlinear time series; Particle filter; Sequential Monte Carlo; State- space model. 1. INTRODUCTION and In this article we develop Monte Carlo methods for smooth- g(yt+1|xt+1)p(xt+1|y1:t ) p(xt+1|y1:t+1) = . ing in general state-space models. To fix notation, consider p(yt+1|y1:t ) the standard Markovian state-space model (West and Harrison 1997) Similarly, smoothing can be performed recursively backward in time using the smoothing formula xt+1 ∼ f(xt+1|xt ) (state evolution density), p(xt |y1:t )f (xt+1|xt ) p(x |y ) = p(x + |y ) dx + .
    [Show full text]
  • Measuring Complexity and Predictability of Time Series with Flexible Multiscale Entropy for Sensor Networks
    Article Measuring Complexity and Predictability of Time Series with Flexible Multiscale Entropy for Sensor Networks Renjie Zhou 1,2, Chen Yang 1,2, Jian Wan 1,2,*, Wei Zhang 1,2, Bo Guan 3,*, Naixue Xiong 4,* 1 School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China; [email protected] (R.Z.); [email protected] (C.Y.); [email protected] (W.Z.) 2 Key Laboratory of Complex Systems Modeling and Simulation of Ministry of Education, Hangzhou Dianzi University, Hangzhou 310018, China 3 School of Electronic and Information Engineer, Ningbo University of Technology, Ningbo 315211, China 4 Department of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USA * Correspondence: [email protected] (J.W.); [email protected] (B.G.); [email protected] (N.X.) Tel.: +1-404-645-4067 (N.X.) Academic Editor: Mohamed F. Younis Received: 15 November 2016; Accepted: 24 March 2017; Published: 6 April 2017 Abstract: Measurement of time series complexity and predictability is sometimes the cornerstone for proposing solutions to topology and congestion control problems in sensor networks. As a method of measuring time series complexity and predictability, multiscale entropy (MSE) has been widely applied in many fields. However, sample entropy, which is the fundamental component of MSE, measures the similarity of two subsequences of a time series with either zero or one, but without in-between values, which causes sudden changes of entropy values even if the time series embraces small changes. This problem becomes especially severe when the length of time series is getting short. For solving such the problem, we propose flexible multiscale entropy (FMSE), which introduces a novel similarity function measuring the similarity of two subsequences with full-range values from zero to one, and thus increases the reliability and stability of measuring time series complexity.
    [Show full text]
  • Time Series Analysis 1
    Basic concepts Autoregressive models Moving average models Time Series Analysis 1. Stationary ARMA models Andrew Lesniewski Baruch College New York Fall 2019 A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Outline 1 Basic concepts 2 Autoregressive models 3 Moving average models A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Time series A time series is a sequence of data points Xt indexed a discrete set of (ordered) dates t, where −∞ < t < 1. Each Xt can be a simple number or a complex multi-dimensional object (vector, matrix, higher dimensional array, or more general structure). We will be assuming that the times t are equally spaced throughout, and denote the time increment by h (e.g. second, day, month). Unless specified otherwise, we will be choosing the units of time so that h = 1. Typically, time series exhibit significant irregularities, which may have their origin either in the nature of the underlying quantity or imprecision in observation (or both). Examples of time series commonly encountered in finance include: (i) prices, (ii) returns, (iii) index levels, (iv) trading volums, (v) open interests, (vi) macroeconomic data (inflation, new payrolls, unemployment, GDP, housing prices, . ) A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Time series For modeling purposes, we assume that the elements of a time series are random variables on some underlying probability space. Time series analysis is a set of mathematical methodologies for analyzing observed time series, whose purpose is to extract useful characteristics of the data. These methodologies fall into two broad categories: (i) non-parametric, where the stochastic law of the time series is not explicitly specified; (ii) parametric, where the stochastic law of the time series is assumed to be given by a model with a finite (and preferably tractable) number of parameters.
    [Show full text]
  • 2.5 Nonlinear Principal Components Analysis for Visualisation of Log-Return Time Series 43 2.6 Conclusion 47 3
    Visualisation of financial time series by linear principal component analysis and nonlinear principal component analysis Thesis submitted at the University of Leicester in partial fulfilment of the requirements for the MSc degree in Financial Mathematics and Computation by HAO-CHE CHEN Department of Mathematics University of Leicester September 2014 Supervisor: Professor Alexander Gorban Contents Declaration iii Abstract iv Introduction 1 1. Technical analysis review 5 1.1 Introduction 5 1.2 Technical analysis 5 1.3 The Critics 7 1.3.1 Market efficiency 8 1.4 Concepts of Technical analysis 10 1.4.1 Market price trends 10 1.4.2 Support and Resistance 12 1.4.3 Volume 12 1.4.4 Stock chart patterns 13 1.4.4.1 Reversal and Continuation 13 1.4.4.2 Head and shoulders 16 1.4.4.3 Cup and handle 17 1.4.4.4 Rounding bottom 18 1.4.5 Moving Averages (MAs) 19 1.5 Strategies 20 1.5.1 Strategy 1 - Check trend day-by-day 22 1.5.2 Strategy 2 - Check trend with counters 25 1.5.3 Strategy 3 – Volume and MAs 28 1.5.4 Conclusion of Strategies 31 2. Linear and nonlinear principle component for visualisation of log-return time series 32 2.1 Dataset 32 2.2 The problem of visualisation 33 2.3 Principal components for visualisation of log-return 33 2.4 Nonlinear principal components methods 40 2.4.1 Kernel PCA 41 2.4.2 Elastic map 41 i 2.5 Nonlinear principal components analysis for visualisation of log-return time series 43 2.6 Conclusion 47 3.
    [Show full text]
  • Discrete Wavelet Transform-Based Time Series Analysis and Mining
    6 Discrete Wavelet Transform-Based Time Series Analysis and Mining PIMWADEE CHAOVALIT, National Science and Technology Development Agency ARYYA GANGOPADHYAY, GEORGE KARABATIS, and ZHIYUAN CHEN,Universityof Maryland, Baltimore County Time series are recorded values of an interesting phenomenon such as stock prices, household incomes, or patient heart rates over a period of time. Time series data mining focuses on discovering interesting patterns in such data. This article introduces a wavelet-based time series data analysis to interested readers. It provides a systematic survey of various analysis techniques that use discrete wavelet transformation (DWT) in time series data mining, and outlines the benefits of this approach demonstrated by previous studies performed on diverse application domains, including image classification, multimedia retrieval, and computer network anomaly detection. Categories and Subject Descriptors: A.1 [Introductory and Survey]; G.3 [Probability and Statistics]: — Time series analysis; H.2.8 [Database Management]: Database Applications—Data mining; I.5.4 [Pattern Recognition]: Applications—Signal processing, waveform analysis General Terms: Algorithms, Experimentation, Measurement, Performance Additional Key Words and Phrases: Classification, clustering, anomaly detection, similarity search, predic- tion, data transformation, dimensionality reduction, noise filtering, data compression ACM Reference Format: Chaovalit, P., Gangopadhyay, A., Karabatis, G., and Chen, Z. 2011. Discrete wavelet transform-based time series analysis and mining. ACM Comput. Surv. 43, 2, Article 6 (January 2011), 37 pages. DOI = 10.1145/1883612.1883613 http://doi.acm.org/10.1145/1883612.1883613 1. INTRODUCTION A time series is a sequence of data that represent recorded values of a phenomenon over time. Time series data constitutes a large portion of the data stored in real world databases [Agrawal et al.
    [Show full text]
  • Introduction to Time Series Analysis. Lecture 19
    Introduction to Time Series Analysis. Lecture 19. 1. Review: Spectral density estimation, sample autocovariance. 2. The periodogram and sample autocovariance. 3. Asymptotics of the periodogram. 1 Estimating the Spectrum: Outline We have seen that the spectral density gives an alternative view of • stationary time series. Given a realization x ,...,x of a time series, how can we estimate • 1 n the spectral density? One approach: replace γ( ) in the definition • · ∞ f(ν)= γ(h)e−2πiνh, h=−∞ with the sample autocovariance γˆ( ). · Another approach, called the periodogram: compute I(ν), the squared • modulus of the discrete Fourier transform (at frequencies ν = k/n). 2 Estimating the spectrum: Outline These two approaches are identical at the Fourier frequencies ν = k/n. • The asymptotic expectation of the periodogram I(ν) is f(ν). We can • derive some asymptotic properties, and hence do hypothesis testing. Unfortunately, the asymptotic variance of I(ν) is constant. • It is not a consistent estimator of f(ν). 3 Review: Spectral density estimation If a time series X has autocovariance γ satisfying { t} ∞ γ(h) < , then we define its spectral density as h=−∞ | | ∞ ∞ f(ν)= γ(h)e−2πiνh h=−∞ for <ν< . −∞ ∞ 4 Review: Sample autocovariance Idea: use the sample autocovariance γˆ( ), defined by · n−|h| 1 γˆ(h)= (x x¯)(x x¯), for n<h<n, n t+|h| − t − − t=1 as an estimate of the autocovariance γ( ), and then use · n−1 fˆ(ν)= γˆ(h)e−2πiνh h=−n+1 for 1/2 ν 1/2. − ≤ ≤ 5 Discrete Fourier transform For a sequence (x1,...,xn), define the discrete Fourier transform (DFT) as (X(ν0),X(ν1),...,X(νn−1)), where 1 n X(ν )= x e−2πiνkt, k √n t t=1 and ν = k/n (for k = 0, 1,...,n 1) are called the Fourier frequencies.
    [Show full text]
  • Principal Component Analysis for Second-Order Stationary Vector Time Series.” Davis, R
    Submitted to the Annals of Statistics PRINCIPAL COMPONENT ANALYSIS FOR SECOND-ORDER STATIONARY VECTOR TIME SERIES By Jinyuan Chang1,∗ , Bin Guo1,† and Qiwei Yao2,‡ Southwestern University of Finance and Economics1, and London School of Economics and Political Science2 We extend the principal component analysis (PCA) to second- order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a p-variate time series such that the transformed series is segmented into several lower- dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore those lower- dimensional series can be analysed separately as far as the linear dynamic structure is concerned. Technically it boils down to an eige- nanalysis for a positive definite matrix. When p is large, an additional step is required to perform a permutation in terms of either maxi- mum cross-correlations or FDR based on multiple tests. The asymp- totic theory is established for both fixed p and diverging p when the sample size n tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analysing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for inde- pendent data, there is no guarantee that the required linear transfor- mation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for ex- ample, forecasting for future values. The method can also be adapted to segment multiple volatility processes.
    [Show full text]