Appendix a Large Sample Theory

Appendix A Large Sample Theory A.1 Convergence Modes The study of the optimality properties of various estimators (such as the sample autocorrelation function) depends, in part, on being able to assess the large-sample behavior of these estimators. We summarize briefly here the kinds of convergence useful in this setting, namely, mean square convergence, convergence in probability, and convergence in distribution. We consider first a particular class of random variables that plays an important role in the study of second-order time series, namely, the class of random variables belonging to the space L2, satisfying E|x|2 < ∞. In proving certain properties of the class L2 we will often use, for random variables x, y ∈ L2,theCauchy–Schwarz inequality, |E(xy)|2 ≤ E(|x|2)E(|y|2), (A.1) and the Tchebycheff inequality, E(|x|2) Pr{|x|≥a}≤ , (A.2) a2 for a > 0. Next, we investigate the properties of mean square convergence of random variables in L2. 2 Definition A.1 A sequence of L random variables {xn} is said to converge in mean square to a random variable x ∈ L2, denoted by ms xn → x, (A.3) if and only if 2 E|xn − x| → 0(A.4) as n →∞. © Springer International Publishing AG 2017 473 R.H. Shumway, D.S. Stoffer, Time Series Analysis and Its Applications, Springer Texts in Statistics, DOI 10.1007/978-3-319-52452-8 474 Appendix A: Large Sample Theory Example A.1 Mean Square Convergence of the Sample Mean Consider the white noise sequence wt and the signal plus noise series xt = μ + wt . Then, because σ2 E|x¯ − μ|2 = w → 0 n n →∞ = −1 n →ms μ as n , wherex ¯n n t=1 xt is the sample mean, we havex ¯n . We summarize some of the properties of mean square convergence as follows. If ms ms xn → x, and yn → y, then, as n →∞, E(xn)→E(x);(A.5) 2 2 E(|xn | )→E(|x| );(A.6) E(xn yn)→E(xy). (A.7) We also note the L2 completeness theorem known as the Riesz–Fischer Theorem. 2 2 Theorem A.1 Let {xn} be a sequence in L . Then, there exists a x in L such that ms xn → x if and only if lim sup E|x − x |2 = 0 . (A.8) →∞ n m m n≥m Often the condition of Theorem A.1 is easier to verify to establish that a mean square limit x exists without knowing what it is. Sequences that satisfy (A.8) are said to be Cauchy sequences in L2 and (A.8) is also known as the Cauchy criterion for L2. Example A.2 Time Invariant Linear Filter As an important example of the use of the Riesz–Fisher Theorem and the properties of mean square convergent series given in (A.5)–(A.7), a time-invariant linear filter is defined as a convolution of the form ∞ yt = aj xt−j (A.9) j=−∞ for each t = 0, ±1, ±2,..., where xt is a weakly stationary input series with mean μx and autocovariance function γx(h), and aj ,forj = 0, ±1, ±2,...are constants satisfying ∞ |aj | < ∞. (A.10) j=−∞ The output series yt defines a filtering or smoothing of the input series that changes the character of the time series in a predictable way. We need to know the conditions under which the outputs yt in (A.9) and the linear process (1.31) exist. A.1 Convergence Modes 475 Considering the sequence n n = , yt aj xt−j (A.11) j=−n = , ,... n n 1 2 , we need to show first that yt has a mean square limit. By Theorem A.1, it is enough to show that n − m2 → E yt yt 0 as m, n →∞.Forn > m > 0, 2 n − m2 = E yt yt E aj xt−j <| |≤ m j n = aj ak E(xt−j xt−k ) <| |≤ ≤| |≤ m j n m k n ≤ |aj ||ak ||E(xt−j xt−k )| <| |≤ ≤| |≤ m j n m k n 2 1/2 2 1/2 ≤ |aj ||ak |(E|xt−j | ) (E|xt−k | ) m<| j |≤n m≤|k |≤n 2 = γ ( ) + μ2 | | → x 0 x aj 0 m≤|j |≤n as m, n →∞, because γx(0) is a constant and {aj } is absolutely summable (the second inequality follows from the Cauchy–Schwarz inequality). { n} Although we know that the sequence yt given by (A.11) converges in mean square, we have not established its mean square limit. If S denotes the mean square n | − |2 = | − n |2 ≤ limit of yt , then using Fatou’s Lemma, E S yt E lim infn→∞ S yt | − n |2 = n lim infn→∞ E S yt 0, which establishes that yt is the mean square limit of yt . Finally, we may use (A.5) and (A.7) to establish the mean, μy and autocovariance function, γy(h) of yt . In particular we have, ∞ μy = μx aj, (A.12) j=−∞ and ∞ ∞ γy(h) = E aj (xt+h−j − μx)aj (xt−k − μx) j=−∞ k=−∞ ∞ ∞ = aj γx(h − j + k)ak . (A.13) j=−∞ k=−∞ A second important kind of convergence is convergence in probability. 476 Appendix A: Large Sample Theory Definition A.2 The sequence {xn},forn = 1, 2,..., converges in probability to a random variable x, denoted by p xn → x, (A.14) if and only if Pr{|xn − x| >}→0 (A.15) for all >0,asn →∞. An immediate consequence of the Tchebycheff inequality, (A.2), is that E(|x − x|2) Pr{|x − x|≥}≤ n , n 2 so convergence in mean square implies convergence in probability, i.e., ms p xn → x ⇒ xn → x. (A.16) This result implies, for example, that the filter (A.9) exists as a limit in probability because it converges in mean square [it is also easily established that (A.9) exists with probability one]. We mention, at this point, the useful Weak Law of Large Numbers which states that, for an independent identically distributed sequence xn of random variables with mean μ,wehave p x¯n → μ (A.17) →∞ = −1 n as n , wherex ¯n n t=1 xt is the usual sample mean. We also will make use of the following concepts. Definition A.3 For order in probability we write xn = op(an) (A.18) if and only if x p n → 0. (A.19) an The term boundedness in probability, written xn = Op(an), means that for every >0, there exists a δ() > 0 such that xn Pr >δ() ≤ (A.20) an for all n. p Under this convention, e.g., the notation for xn → x becomes xn − x = op(1). The definitions can be compared with their nonrandom counterparts, namely, for a fixed sequence xn = o(1) if xn → 0 and xn = O(1) if xn,forn = 1, 2,...is bounded. Some handy properties of op(·) and Op(·) are as follows. A.1 Convergence Modes 477 (i) If xn = op(an) and yn = op(bn), then xn yn = op(anbn) and xn + yn = op(max(an, bn)). (ii) If xn = op(an) and yn = Op(bn), then xn yn = op(anbn). (iii) Statement (i) is true if Op(·) replaces op(·). Example A.3 Convergence and Order in Probability for the Sample Mean 2 For the sample mean,x ¯n, of iid random variables with mean μ and variance σ ,by the Tchebycheff inequality, E[(x¯ − μ)2] Pr{|x¯ − μ| >}≤ n n 2 σ2 = → 0, n2 p as n →∞. It follows thatx ¯n → μ,or¯xn − μ = op(1). To find the rate, it follows that, for δ() > 0, √ σ2/n σ2 Pr n |x¯ − μ| >δ() ≤ = n δ2()/n δ2() √ by Tchebycheff’s inequality, so taking = σ2/δ2() shows that δ() = σ/ does the job and −1/2 x¯n − μ = Op(n ). p For k × 1 random vectors xn, convergence in probability, written xn → x or xn − x = op(1) is defined as element-by-element convergence in probability, or equivalently, as convergence in terms of the Euclidean distance p xn − x → 0, (A.21) p = 2 → where a j aj for any vector a. In this context, we note the result that if xn x and g(xn) is a continuous mapping, p g(xn) → g(x). (A.22) Furthermore, if xn − a = Op(δn) with δn → 0 and g(·) is a function with continuous first derivatives continuous in a neighborhood of a = (a1, a2,...,ak ) ,we have the Taylor series expansion in probability ∂ ( ) ( ) = ( ) + g x ( − ) + (δ ), g xn g a ∂ xn a Op n (A.23) x x=a where ∂ ( ) ∂ ( ) ∂ ( ) g x = g x ,..., g x ∂ ∂ ∂ x x=a x1 x=a xk x=a denotes the vector of partial derivatives with respect to x1, x2,...,xk , evaluated at a. This result remains true if Op(δn) is replaced everywhere by op(δn). 478 Appendix A: Large Sample Theory Example A.4 Expansion for the Logarithm of the Sample Mean With the same conditions as Example A.3, consider g(x¯n) = logx ¯n, which has a −1/2 derivative at μ,forμ>0. Then, becausex ¯n − μ = Op(n ) from Example A.3, the conditions for the Taylor expansion in probability, (A.23), are satisfied and we have −1 −1/2 logx ¯n = log μ + μ (x¯n − μ) + Op(n ). The large sample distributions of sample mean and sample autocorrelation func- tions defined earlier can be developed using the notion of convergence in distribution. Definition A.4 A sequence of k × 1 random vectors {xn} is said to converge in distribution, written d xn → x (A.24) if and only if Fn(x)→F(x) (A.25) at the continuity points of distribution function F(·).

Appendix a Large Sample Theory

Stationary Processes and Their Statistical Properties

Stochastic Process - Introduction

Power Comparisons of Five Most Commonly Used Autocorrelation Tests

Autocovariance Function Estimation Via Penalized Regression

LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function

Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients

3 Autocorrelation

Spatial Autocorrelation: Covariance and Semivariance Semivariance

Modeling and Generating Multivariate Time Series with Arbitrary Marginals Using a Vector Autoregressive Technique

Random Processes

On the Use of the Autocorrelation and Covariance Methods for Feedforward Control of Transverse Angle and Position Jitter in Linear Particle Beam Accelerators*

Lecture 10: Lagged Autocovariance and Correlation