Nonlinear theory: Another look at dependence

Wei Biao Wu†

Department of Statistics, University of Chicago, 5734 South University Avenue, Chicago, IL 60637

Communicated by Murray Rosenblatt, University of California at San Diego, La Jolla, CA, August 4, 2005 (received for review April 29, 2005) Based on the theory, we introduce previously is generally not easy because it involves the complicated manip- undescribed dependence measures for stationary causal processes. ulation of taking the supremum over two sigma algebras (7–9). Our physical and predictive dependence measures quantify the Additionally, many well-known processes are not strong mixing. degree of dependence of outputs on inputs in physical . A prominent example is the Bernoulli shift process. Consider the The proposed dependence measures provide a natural framework simple AR(1) process Xn ϭ (XnϪ1 ϩ ␧n)͞2, where ␧i are iid for a limit theory for stationary processes. In particular, under Bernoulli random variables with success probability 1͞2 (see conditions with quite simple forms, we present limit theorems for refs. 10 and 11). Then Xn is a causal process with the represen- ϭ ¥ϱ Ϫi␧ ␧ ␧ partial sums, empirical processes, and kernel density estimates. The tation Xn iϭ0 2 nϪi and the innovations n, nϪ1, ..., conditions are mild and easily verifiable because they are directly correspond to the dyadic expansion of Xn. The process Xn is not related to the data-generating mechanisms. strong mixing since ␣n ϵ 1͞4 for all n (12). Some alternative ways have been proposed to overcome the disadvantages of nonlinear ͉ limit theory ͉ kernel estimation ͉ weak convergence strong mixing conditions (8,9). Dependence Measures et ␧i, i ʦ ޚ, be independent and identically distributed (iid) Lrandom variables and g be a measurable such that In this work, we shall provide another look at the fundamental issue of dependence. Our primary goal is to introduce ‘‘physical ϭ ͑ ␧ ␧ ͒ Xi g ..., iϪ1, i , [1] or functional’’ and ‘‘predictive dependence measures’’ a previ- ously undescribed type of dependence measures that are quite is a properly defined random . Then (Xi) is a stationary different from strong mixing conditions. In particular, following process, and it is causal or nonanticipative in the sense that Xi refs. 1 and 13, we shall interpret Eq. 1 as an input͞output system ␧ Ͼ does not depend on the future innovations j, j i. The causality and then introduce dependence coefficients by measuring the assumption is quite reasonable in the study of time series. Wiener degree of dependence of outputs on inputs. Specifically, we view (1) considered the fundamental coding and decoding problem of Eq. 1 as a representing stationary and ergodic processes in terms of the ␧ ϭ ͑ ͒ form Eq. 1. In particular, Wiener studied the construction of i xi g ...,eiϪ1, ei , [3] based on Xk, k Յ i. The class of processes that Eq. 1 represents is huge and it includes linear processes, Volterra processes, and where ei, eiϪ1,...areinputs, g is a filter or a transform, and xi many time series models. In certain situations, Eq. 1 is also called is the output. Then, the process Xi is the output of the physical the nonlinear Wold representation. See refs. 2–4 for other deep system 3 with random inputs. It is clearly not a good way to assess contributions of representing stationary and ergodic processes the dependence just by taking the partial Ѩg͞Ѩej, by Eq. 1. To conduct statistical inference of such processes, it is which may not exist if g is not well-behaved. Nonetheless, necessary to consider the asymptotic properties of the partial because the inputs are random and iid, the dependence of the ϭ ¥n ϭ sum Sn iϭ1 Xi and the empirical distribution function Fn(x) output on the inputs can be simply measured by applying the idea Ϫ n 1 ¥ Յ ␧Ј ␧ n iϭ1 1Xi x. of coupling. Let ( i) by an iid copy of ( i); let the shift process In probability theory, many limit theorems have been estab- ␰i ϭ (...,␧iϪ1, ␧i), ␰Јi ϭ (...,␧ЈiϪ1, ␧Јi). For a set I ʚ ޚ, let ␧j,I ϭ lished for independent random variables. Those limit theorems ␧Јj if j ʦ I and ␧j,I ϭ ␧j if j ԫ I; let ␰i,I ϭ (...,␧iϪ1,I, ␧i,I) and play an important role in the related statistical inference. In the ␰*i ϭ ␰i,{0}. Then ␰i,I is a coupled version of ␰i with ␧j replaced p p 1/p study of stochastic processes, however, independence usually by ␧Јj if j ʦ I. For p Ͼ 0 write X ʦ L if ʈXʈp :ϭ [ޅ(͉X͉ )] Ͻ does not hold, and the dependence is an intrinsic feature. In an ϱ and ʈXʈ ϭ ʈXʈ2. influential paper, Rosenblatt (5) introduced the strong mixing Definition 1 (Functional or physical dependence measure): For condition. For a stationary process (Xi), let the sigma algebra p Ͼ 0 and I ʚ ޚ let ␦p(I, n) ϭ ʈg(␰n) Ϫ g(␰n,I)ʈp and ␦p(n) ϭ n ϭ ␴ Յ Am (Xm, ..., Xn), m n, and define the strong mixing ʈg(␰n) Ϫ g(␰*n)ʈp. Write ␦(n) ϭ ␦2(n). coefficients Definition 2 (Predictive dependence measure): Let p Ն 1 and gn such that g (␰ ) ϭ ޒ ۋ... ϫ ޒ ϫ ޒ be a Borel function on ␣ ϭ ͕͉ސ͑ പ ͒ Ϫ ސ͑ ͒ސ͑ ͉͒ ʦ 0 ʦ ϱ͖ n 0 n sup A B A B : A AϪϱ, B A . n ޅ(Xn͉␰0), n Ն 0. Let ␻p(I, n) ϭ ʈgn(␰0) Ϫ gn(␰0,I)ʈp and ␻p(n) ϭ ʈ ␰ Ϫ ␰ ʈ ␻ ϭ ␻ [2] gn( 0) gn( *0) p. Write (n) 2(n). Definition 3 (p-stability): Let p Ն 1. The process (Xn) is said to ⍀ ϭ ¥ϱ ␻ Ͻϱ ⌬ ϭ If ␣n 3 0, then we say that (Xi) is strong mixing. Variants of the be p-stable if p : nϭ0 p(n) , and p-strong stable if p : ¥ϱ ␦ Ͻϱ ⍀ϭ⍀ Ͻϱ strong mixing condition include ␳, ␺, and ␤-mixing conditions nϭ0 p(n) .If 2 , we say that (Xn) is stable. among others (6). A central limit theorem (CLT) based on the By the causal representation in Eq. 1, if min{i : i ʦ I} Ͼ n, strong mixing condition is proved in ref. 5. Since then, as basic then ␦p(I, n) ϭ 0. Apparently, ␦p(I, n) quantifies the dependence assumptions on the dependence structures, the strong mixing of Xn ϭ g(␰n)on{␧i, i ʦ I} by measuring the distance between condition and its variants have been widely used and various limit g(␰n) and its coupled version g(␰n,I). In Definition 2, ޅ(Xn͉␰0)is theorems have been obtained; see the extensive treatment in the n-step ahead predicated mean, and ␻p(n) measures the ref. 6. contribution of ␧0 in predicting future expected values. In the Since the quantity ͉ސ(A പ B) Ϫ ސ(A)ސ(B)͉ in Eq. 2 measures the dependence between events A and B and it is zero if A and Abbreviation: iid, independent and identically distributed. B are independent, it is sensible to call ␣n and its variants ‘‘probabilistic dependence measures.’’ For stationary causal †E-mail: [email protected]. processes, the calculation of probabilistic dependence measures © 2005 by The National Academy of of the USA

14150–14154 ͉ PNAS ͉ October 4, 2005 ͉ vol. 102 ͉ no. 40 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506715102 classical prediction theory (14), the conditional expectation of Then D0, DϪ1,...aremartingale differences with respect to the the form ޅ(Xn͉X0, XϪ1, . . .) is studied. The one ޅ(Xn͉␰0) used in sigma algebras ␴(␧i, ..., ␧n), i ϭ 0, Ϫ1, . . . . By Jensen’s ʈ ʈ Յ ␦ Ϫ ϭ ¥0 2 ϭ ¥0 Definition 2 has a different form. It turns out that, in studying inequality, Di p p(n i). Let V iϭϪϱ Di , M iϭϪϱ ˜ ϭ ޅ ͉␧ ␧ Ϫ ˜ ϭϪ asymptotic properties and moment inequalities of Sn,itis Di and Xn (Xn 1,..., n). Then Xn Xn M and convenient to use ޅ(Xn͉␰0) and predictive dependence measure ␦ ͑I, n͒ Յ ʈX Ϫ X˜ ʈ ϩ ʈX˜ Ϫ g͑␰ ͒ʈ ϭ 2ʈMʈ . (cf. Theorems 2 and 3), while the other version ޅ(Xn͉X0, XϪ1,...) p n n p n n,I p p is generally difficult to work with. In the special case in which Xn Ͻ Ͻ ␴ ␰ ϭ To show Eq. 5, we shall deal with the two cases 1 p 2 and are martingale differences with respect to the filter ( n), gn p/2 0 p p Ն 2 separately. If 1 Ͻ p Ͻ 2, then V Յ ¥ ϭϪϱ ͉D ͉ .By 0 almost surely and consequently ␻(n) ϭ 0, n Ն 1. i i Burkholder’s inequality (15) Roughly speaking, since gn(␰0) ϭ ޅ(Xn͉␰0), the p-stability in ␧ Definition 3 indicates that the cumulative contribution of 0 in 0 ޅ ͉␰ predicting future expected values { (Xn 0)}nՆ0 is finite. Inter- ʈMʈp Յ CpʈV1/2ʈp Յ Cp ͸ ␦p͑n Ϫ i͒. estingly, the stability condition ⍀ Ͻϱimplies invariance p p p p p 2 iϭϪϱ principles with ͌n-norming in a natural way (Theorem 3). By (i) ␦ Ն Ն ʈ ʈ2 Յ ¥0 ʈ ʈ2 of Theorem 1, p-strong stability implies p-stability since p(n) If p 2, by proposition 4 in ref. 16, M p 2p iϭϪϱ Di p.So ␻p(n). Eq. 5 follows. Our dependence measures provide a very convenient and Inequality 5 suggests the interesting reduction property: the simple way for a large-sample theory for stationary causal degree of dependence of Xn on {␧i, i ʦ I} can be bounded in an processes (see Theorems 2–5 below). In many applications, element-wise manner, and it suffices to consider the dependence functional and predictive dependence measures are easy to use of Xn on individual ␧i. Indeed, our limit theorems and moment because they are directly related to data-generating mechanisms inequalities in Theorems 2–5 involve conditions only on ␦p(n) and and because the construction of the coupled process g(␰n,I)is ␻p(n). simple and explicit. Additionally, limit theorems with those p dependence measures have easily verifiable conditions and are Linear Processes. Let ␧i be iid random variables with ␧i ʦ L , p Ն often optimal or nearly optimal. On the other hand, however, our 1; let (ai) be real coefficients such that

dependence measures rely on the representation 1, whereas the APPLIED ϱ strong mixing coefficients can be defined in more general MATHEMATICS ϭ ͸ ␧ situations (6). Xt ai tϪi, [6] iϭ0

Theorem 1. (i) Let p Ն 1 and n Ն 0. Then ␦p(n) Ն ␻p(n). (ii) Let is a proper random variable. The existence of Xt can be checked p Ն 1 and the projection operator PkZ ϭ ޅ(Z͉␰k) Ϫ ޅ(Z͉␰kϪ1), Z ʦ Lp. Then for n Ն 0, by Kolmogorov’s three series theorem. The linear process (Xt) can be viewed as the output from a linear filter and the input ʈ ʈ Յ ␻ ͑ ͒ Յ ʈ ʈ ␧ Ϫ ␧ P0Xn p p n 2 P0Xn p . [4] (..., t 1, t) is a series of shocks that drive the system (ref. 17, pp. 8–9). Clearly, ␻p(n) ϭ ␦p(n) ϭ ͉an͉c0, where c0 ϭ ʈ␧0 Ϫ 3/2 Ϫ1/2 ͌ (iii) Let p Ͼ 1, Cp ϭ 18p (p Ϫ 1) if 1 Ͻ p Ͻ 2, Cp ϭ 2p ␧Ј0ʈp Ͻϱ. Let p ϭ 2. If if p Ն 2; let I ʚ ޚ. Then ϱ ␦pЈ͑ ͒ Յ pЈ pЈ ͸␦pЈ͑ Ϫ ͒ Ј ϭ ͑ ͒ ͸ ͉ ͉ Ͻ ϱ p I, n 2 Cp p n i , where p min p,2. [5] ai , [7] iʦI iϭ0

Proof: (i) Since ␰*n ϭ (␰Ϫ1, ␧Ј0, ␧1,...,␧n), then the filter is said to be stable (17) and the preceding inequality implies short-range dependence since the covariances ޅ͓ ͑␰ ͒ Ϫ ͑␰ ͉͒␰ ␧Ј ␧ ͔ g n g *n Ϫ1, 0, 0 are absolutely summable. Definition 3 extends the notion of ϭ ޅ͓ ͑␰ ͉͒␰ ␧ ͔ Ϫ ޅ͓ ͑␰ ͉͒␰ ␧Ј͔ stability to nonlinear processes. g n Ϫ1, 0 g *n Ϫ1, 0 ϭ g ͑␰ ͒ Ϫ g ͑␰*͒, Volterra Series. Analysis of nonlinear systems is a notoriously n 0 n 0 difficult problem, and the available tools are very limited (18).

which by Jensen’s inequality implies ␦p(n) Ն ␻p(n). (ii) Since Oftentimes it would be unsatisfactory to linearize or approxi- ޅ[g(␰n)͉␰Ϫ1] ϭ ޅ[gn(␰0)͉␰Ϫ1] and ␧Ј0 and (␧i) are independent, we mate nonlinear systems by linear ones. The Volterra represen- have ޅ[gn(␰0)͉␰Ϫ1] ϭ ޅ[gn(␰*0)͉␰0] and inequality 4 follows from tation provides a reasonably simple and general way. The idea is to represent Eq. 3 as a power series of inputs. In particular, ʈ ʈ ϭ ʈޅ͓ ͑␰ ͒ Ϫ ͑␰ ͉͒␰ ͔ʈ P0Xn p gn 0 gn *0 0 p suppose that g in Eq. 3 is sufficiently well-behaved so that it has the stationary and causal representation Յ ʈ ͑␰ ͒ Ϫ ͑␰ ͒ʈ gn 0 gn *0 p g͑ ...,e Ϫ , e ͒ Յ ʈ ͑␰ ͒ Ϫ ޅ͓ ͑␰ ͉͒␰ ͔ʈ n 1 n gn 0 gn 0 Ϫ1 p ϱ ϱ ϩ ʈޅ͓ ͑␰ ͉͒␰ ͔ Ϫ ͑␰ ͒ʈ gn 0 Ϫ1 gn *0 p ϭ ͸ ͸ ͑ ͒ Ϫ Ϫ [8] gk u1,...,uk en u1 ...en uk, ϭ ϭ ϭ ʈ ʈ k 1 u1,...,uk 0 2 P0Xn p. (iii) For presentational clarity, let I ϭ {...,Ϫ1, 0}. For i Յ 0 where functions gk are called the Volterra kernel. The right-hand let side of Eq. 8 is generically called the Volterra expansion, and it plays an important role in the nonlinear system theory (13,18– ϭ ϭ ޅ͑ ͉␧ ␧ ␧ ͒ Ϫ ޅ͑ ͉␧ ␧ ͒ 8 Di Di,n Xn iϩ1, iϩ2,..., n Xn i,..., n 22). There is a continuous-time version of Eq. with summations replaced by . Because the series involved has infinitely ϭ ޅ͓ ͑␰ ͒ Ϫ ͑␰ ͉͒␧ ␧ ͔ g n,͕i͖ g n i,..., n . many terms, to guarantee the meaningfulness of the represen-

Wu PNAS ͉ October 4, 2005 ͉ vol. 102 ͉ no. 40 ͉ 14151 Ϫ1 ␶(1/2Ϫ␤) ␶ tation, there is a convergence issue that is often difficult to deal If ␶ Ͼ (2␤ Ϫ 1) Ϫ 1, then ␪n ϭ O[n l (n)] is summable. with, and the imposed conditions can be quite restrictive (18). Therefore, if the function K satisfies ␬ ϭ 0 for r ϭ 0,...,␶ and Fortunately, in our setting, the difficulty can be circumvented (␶ ϩ 1)(2␤ Ϫ 1) Ͼ 1, then Yt ϭ K(Xt) is stable even though Xt because we are dealing with iid random inputs. Indeed, assume is not. Appell (29) satisfy such conditions. For ϭ 2 Ϫ ޅ 2 ϭ 2 ␬ ϭ that et are iid with mean 0, variance 1 and gk(u1,..., uk)is example, let K(x) x (Xn), then Kϱ(w) w and 1 0, ␬ ϭ ␤ ʦ ͞ 2 Ϫ ޅ 2 symmetric in u1,...,uk and it equals zero if ui ϭ uj for some 2 2. If (3 4, 1), then the process Xt (Xt ) is stable. If 1 Յ i Ͻ j Յ k, and 1͞2 Ͻ ␤ Ͻ 3͞4, then Sn(K)͞ʈSn(K)ʈ converges to the Rosenblatt distribution. ϱ ϱ Uniform Volterra expansions for Fn(x) over x ʦ ޒ are ͸ ͸ 2͑ ͒ Ͻ ϱ established in refs. 30 and 31. Wu (32) considered nonlinear gk u1,...,uk .

kϭ1 u1,...,ukϭ0 transforms of linear processes with infinite variance innovations.

2 ␧ Then Xn exists and is in L . Simple calculations show that Nonlinear Time Series. Let t be iid random variables and consider the recursion ϱ 2 ␻ ͑n͒ ϭ ͑ ␧ ͒ ϭ ͸ ͸ 2͑ ͒ Xt R XtϪ1, t , [11] gk u1,...,uk 2 ϭ ͑ ͒ϭ k 1 min u1,...uk n where R is a measurable function. The framework 11 is quite

ϱ ϱ general, and it includes many popular nonlinear time series models, such as threshold autoregressive models (33), exponen- ϭ ͸ ͸ 2͑ ͒ k gk n, u2,...,uk , tial autoregressive models (34), bilinear autoregressive models, ϭ ϭ ϩ k 1 u2,...uk n 1 autoregressive models with conditional heteroscedasticity (35), among others. If there exists ␣ Ͼ 0 and x such that and 0 ␣ ޅ͑log L␧͒ Ͻ 0 and L␧ ϩ ͉R͑x , ␧ ͉͒ ʦ L , [12] ϱ ϱ 0 0 0 ␦2͑n͒ ϭ ͸ k ͸ g2͑n, u ,...,u ͒. 2 k 2 k where kϭ1 u2,...ukϭ0 ͉R͑x, ␧͒ Ϫ R͑xЈ, ␧͉͒ ϱ ¥ ␻ Ͻϱ L␧ ϭ sup , The Volterra process is stable if iϭ1 (i) . ͉ Ϫ Ј͉ xxЈ x x

Nonlinear Transforms of Linear Processes. Let (Xt) be the linear then Eq. 11 admits a unique stationary distribution (36), and process defined in Eq. 6 and consider the transformed process 11 1 ϭ ␻ iterations of Eq. give rise to Eq. . By theorem 2 in ref. 37, Eq. Yt K(Xt), where K is a possibly nonlinear filter. Let (n, Y) 12 implies that there exists p Ͼ 0 and r ʦ (0, 1) such that be the predictive dependence measure of (Yt). Assume that ␧i ʈ Ϫ ͑␰ ͒ʈ ϭ ͑ n͒ have mean 0 and finite variance. Under mild conditions on K,we Xn g n,I p O r , [13] have ʈP0Ynʈ ϭ O(͉an͉)(cf. theorem 2 in ref. 23). By Theorem 1, ϭ Ϫ ␰ ϭ ␰ ␻(n, Y) ϭ O(͉an͉). In this case, if (Xt) is stable, namely Eq. 7 where I {..., 1, 0}. Recall *n n,{0}. By stationarity, ʈ ␰ Ϫ ␰ ʈ ϭ ʈ ␰ Ϫ ␰ ʈ holds, then (Yt) is also stable. g( *n) g( n,I) p g( nϩ1) g( nϩ1,I) p.SoEq.13 implies ␦ ϭ ʈ ␰ Ϫ ʈ ϭ n Quite interesting phenomena happen if (Xn) is unstable. p(n) g( *n) Xn p O(r ). On the other hand, by Theorem ␦ ϭ n Ͼ ʦ Under appropriate conditions on K,(Yn) could possibly be 1 (iii),if p(n) O(r ) holds for some p 1 and for some r stable. With a nonlinear transform, the dependence structure of (0,1), then Eq. 13 also holds. So they are equivalent if p Ͼ 1. In (Yt) can be quite different from that of (Xn) (24–27). The refs. 37 and 38, the property 13 is called geometric-moment ϭ ¥n contraction, and it is very useful in studying asymptotic prop- asymptotic problem of Sn(K) tϭ1 K(Xt) has a long history (see refs. 23 and 27 and references therein). Let Kϱ(w) ϭ erties of nonlinear time series. ␶ ޅ[K(w ϩ X )] and assume Kϱ ʦ C (ޒ) for some ␶ ʦ ގ. Consider t Inequalities and Limit Theorems the remainder of the ␶-th order Volterra expansion of Yn For (Xi) defined in Eq. 1, let Su ϭ Sn ϩ (u Ϫ n)Xnϩ1, n Յ u Յ ␶ n ϩ 1, n ϭ 0, 1, . . . , be the partial sum process. Let Rn(s) ϭ ͑␶͒ ͌ Ϫ ϭ ސ Յ L ͑␰ ͒ ϭ Y Ϫ ͸ ␬ U , [9] n[Fn(s) F(s)], where F(s) (X0 s) is the distribution n n r n,r function of X . Primary goals in the limit theory of stationary rϭ0 0 processes include obtaining asymptotic properties of {Su,0Յ (r) Յ ʦ ޒ where ␬r ϭ Kϱ (0), r ϭ 0,..., ␶, and u n} and {Rn(s), s }. Such results are needed in the related statistical inference. The physical and predictive dependence r measures provide a natural vehicle for an asymptotic theory for ϭ ͸ ͹ ␧ Sn and Rn. Ϫ Un,r ajs n js. Յ Ͻ Ͻ Ͻϱ ϭ 0 j1 ... jr s 1 ͌ Partial Sums. Let S*n ϭ maxiՅn ͉Si͉, Zn ϭ S*n͞ n and Bp ϭ ␶ ϱ ͌ ϱ ␪ ϭ ͉ ͉ ͉ ͉ ϩ 1/2 ϩ /2 ϭ ¥ ͞ Ϫ Ͼ ⍀ ϭ ¥ ϭ ␻ Let n anϪ1 [ anϪ1 An (4) An (2)] and An(j) tϭn p 2p (p 1), p 1. Recall p k 0 p(k) and let j ͉at͉ . Under mild regularity conditions on K and ␧n, by theorem (␶) ϱ 5 in ref. 23, ʈP0L (␰n)ʈ ϭ O(␪nϩ1). By Theorem 1, the predictive (␶) (␶) ⌰ ϭ ͸ ʈ ʈ dependence measure ␻ (n) of the remainder L (␰n) satisfies p P0Xk p. kϭ0 ␻͑␶͒͑ ͒ ϭ ͑␪ ͒ n O nϩ1 . [10] By Theorem 1, ⌰p Յ ⍀p Յ 2⌰p. Moment inequalities and limit ¥ϱ ␪ Ͻϱ ¥ϱ ͉ ͉ ϭϱ It is possible that nϭ1 n while nϭ1 an . Consider the theorems of Sn are given in Theorems 2 and 3, respectively. Ϫ␤ special case an ϭ n l(n), where 1͞2 Ͻ ␤ Ͻ 1 and l is a slowly Denote by IB the standard Brownian motion. An interesting varying function, namely, for any c Ͼ 0. l(cn)͞l(n) 3 1asn 3 feature in the large deviation result in Theorem 2(ii) is that ⍀p 1Ϫ␤j j ϱ. By Karamata’s Theorem (28) for j Ն 2, An( j) ϭ O[n l (n)]. and Xk do not need to be bounded.

14152 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506715102 Wu Theorem 2. Let p Ն 2. (i) We have ʈZnʈp Յ Bp⌰p Յ Bp⍀p.(ii) Let ϱ 0 Ͻ ␣ Յ 2 and assume ϩ ͸ ͸ ͑ ͉ ͉ ͒ min 1, C ai p ϭ ʦ ͑͑ k͒Ϫ1͒ ␥ ϭ 1/2Ϫ1/␣⍀ Ͻ ϱ k 0 i J p2 : lim sup p p . [14] p3ϱ ϱ ϭ ޅ ␣ Ͻϱ Յ Ͻ ϭ ϭ O͑ͱp͒ ϩ ͸ O͓͑p2kϩ1͒1/2͑p2k͒Ϫ1Cp͔ Then m(t): supnʦގ [exp(tZn)] for 0 t t0, where t0 ␣ Ϫ1 Ϫ␣/2 kϭ0 (e␣␥ ) 2 . Consequently, for u Ͼ 0, ސ(Zn Ͼ u) Յ Ϫ ␣ exp( tu )m(t). ϭ ͑ͱ ͒ Proof: (i) It follows from W.B.W. (unpublished results) and O p . theorem 2.5 in ref. 39. For completeness we present the proof Condition 15 holds if a ϭ O(iϪ2). ϭ ¥j Ն ϭ ͉ ͉ i here. Let Mk,j iϭ1 PiϪkXi, k, j 0 and M*k,n maxjՅn Mk,j . ϱ Then S ϭ ¥ ϭ M . By Doob’s maximal inequality and n k 0 k,n Theorem 3. (i) Assume that ⍀2 Ͻϱ. Then theorem 2.5 in ref. 39 (or proposition 4 in ref. 16), ͕ ͞ͱ Յ Յ ͖ f ͕␴ ͑ ͒ Յ Յ ͖ Snt n,0 t 1 IB t ,0 t 1 , [16] ʈM* ʈ Յ p͑p Ϫ 1͒Ϫ1ʈM ʈ Յ B ͱnʈM ʈ . k,n p k,n p p k,1 p ␴ ϭ ʈ¥ϱ ʈ Յ ⍀ Ͻ Յ where iϭ0 P0Xi 2.(ii) Let 2 p 4 and assume that Յ ¥ϱ ϭ ϭ ͞␣ ¥ϱ ␦ Ͻϱ Since S*n kϭ0 M*k,n,(i) follows. (ii) Let Z Zn and p0 [2 ] iϭ0 i p(i) . Then on a possibly richer probability space, there ϩ 1. By Stirling’s formula and Eq. 14 exists a Brownian motion IB such that ␣ ␣ ␣ ␣ sup ͉S Ϫ ␴IB͑u͉͒ ϭ O͓n1/pl͑n͔͒ almost surely, [17] tB␣ ⍀␣ tB␣ ⍀␣ u p p ϭ p p uʦ͓0,n͔ lim sup ͑ ͒1/p lim sup ͑ ␲ ͒1/͑2p͒ ͞ p3ϱ p! p3ϱ 2 p p e where l(n) ϭ (log n)1/2ϩ1/p(log log n)2/p. ϭ te␣␥␣2␣/2 Ͻ 1. The proof of the strong invariance principle (ii) is given by W.B.W. (unpublished results). Theorem 3(i) follows from cor- ϱ ␴ ϭ ʈ¥ϱ ʈ v ϭ ¥ p͞ ollary 3 in ref. 40, and the expression iϭ0 P0Xi is By (i), since e pϭ0v (p!), (ii) follows from APPLIED a consequence of the martingale approximation: let Dk ϭ ϱ ͌ MATHEMATICS ϱ ϱ ¥ ϭ P X and M ϭ D ϩ ...ϩ D , then ʈS Ϫ M ʈ ϭ o( n) ޅ͓͑ ␣͒p͔ p͑ ⍀ ͒␣p i k k i n 1 n n n tZ t B␣p ␣p ʈ ʈ͌͞ ϭ ␴ ϩ ͸ Յ ͸ Ͻ ϱ. and Sn n o(1) (see theorem 6 in ref. 41). Theorem p! p! 3(i) also can be proved by using the argument in ref. 42. The pϭp pϭp 0 0 invariance principle in the latter paper has a slightly different Example 1: For the linear process 6, assume that form. We omit the details. See refs. 43 and 44 for some related works. ͕ ͉ ͉ Ͼ ␩͖ ϭ ͑␩Ϫ1/2͒ ␩2 # i: ai O as 0, [15] Empirical Distribution Functions. Let Hi(u͉␰0) ϭ ސ(Xi Յ u͉␰0), u ʦ ͉␧0 and A :ϭ ޅ(e ͉) Ͻϱ. We now apply (ii)ofTheorem 2 to the sum ޒ, be the conditional distribution function of Xi given ␰0.By n Ϫ ϭ ¥ ␰ ␰ ϭ Յ Ϫ Definition 2, the predictive dependence measure for ˜g(␰i) ϭ n[Fn(u) F(u)] iϭ1 ˜g( i), where ˜g( i) 1Xi u F(u). To Յ Ϫ ʈ ͉␰ Ϫ ͉␰ ʈ this end, we need to calculate the predictive dependence mea- 1Xi u F(u), at a fixed u,is Hi(u 0) Hi(u *0) p. To study the sure ␻ (n, ˜g) (say) of the process ˜g(␰ ). Without loss of generality asymptotic properties of Rn, it is certainly necessary to consider p n ʦ Ϫϱ ϱ let a0 ϭ 1. Let F␧ and f␧ be the distribution and density functions the whole range u ( , ). To this end, we introduce the integrated predictive dependence measure of ␧0 and assume c :ϭ supuf␧(u) Ͻϱ. Then Eq. 14 holds with ␣ ϭ 1. To see this, let YnϪ1 ϭ Xn Ϫ ␧n, ZnϪ1 ϭ YnϪ1 Ϫ an␧0 and 1/p Y* Ϫ ϭ Z Ϫ ϩ a ␧Ј. Let n Ն 1. Then ޅ(1 Յ ͉␰ ) ϭ ޅ[F␧(u Ϫ ͑ ͒ ͑ ͒ ͑ ͒ n 1 n 1 n 0 Xn u 0 ␾ j ͑ ͒ ϭ ͵ ʈ j ͑ ͉␰ ͒ Ϫ j ͑ ͉␰ ͒ʈp [18] p i ͫ Hi u 0 Hi u *0 pduͬ , YnϪ1)͉␰0] and ޅ[F␧(u Ϫ ZnϪ1)͉␰*0] ϭ ޅ[F␧(u Ϫ ZnϪ1)͉␰0]. By the ޒ triangle inequality, ϭ ͉ޅ͓ ͑ Ϫ ͉͒␰ ͔ Ϫ ޅ͓ ͑ Ϫ ͉͒␰ ͔͉ and the uniform predictive dependence measure Qn : F␧ u YnϪ1 0 F␧ u Y*nϪ1 *0 ␸͑j͒͑ ͒ ϭ ʈ ͑j͒͑ ͉␰ ͒ Ϫ ͑j͒͑ ͉␰ ͒ʈ [19] Յ ͉ޅ͓ ͑ Ϫ ͉͒␰ ͔ Ϫ ޅ͓ ͑ Ϫ ͉͒␰ ͔͉ p i sup Hi u 0 Hi u *0 p, F␧ u YnϪ1 0 F␧ u ZnϪ1 0 u

ϩ ͉ޅ͓ ͑ Ϫ ͉͒␰ ͔ Ϫ ޅ͓ ͑ Ϫ ͉͒␰ ͔͉ (j) ͉␰ ϭ Ѩj ͉␰ ͞Ѩ j ϭ Ն ͉␰ ϭ F␧ u ZnϪ1 *0 F␧ u Y*nϪ1 *0 where Hi (u 0) Hi(u 0) u , j 0,1,...,i 1. Let hi(t 0) H(1)(t͉␰ ). Theorem 4 below concerns the weak convergence of R Յ ޅ͓ ͉ Ϫ ʈ␰ ͔ ϩ ޅ͓ ͉ Ϫ ͉͒␰ i 0 n c YnϪ1 ZnϪ1 0 c ZnϪ1 Y*nϪ1 *0] ␾(j) based on 2 (i). It follows from corollary 1 by W.B.W. (unpublished ϭ ͉ ͉͉͑␧ ͉ ϩ ͉␧Ј͉͒ results). c an 0 0 . ʦ ␶ ͉␰ Յ ␻ ϭ ʈ ʈ Յ ͉ ͉ʈ␧ ʈ ϭ ޅ ͉␧0͉ Theorem 4. Assume that X1 L and supuh1(u 0) c0 for some Hence, p(n, ˜g) Qn p 2c an 0 p. Since A (e ), we ␶ Ͻϱ p 1/p positive constants , c0 . Further assume that have ޅ(͉␧0͉ ) Յ p!A, ʈ␧0ʈp Յ pA . Clearly, 0 Յ Qn Յ 1. So ␻p(n, ˜g) Յ min(1, C͉an͉p), where C ϭ 2cA. For ␩ Ͼ 0 let the set J(␩) ϭ ϱ Ն ␩͞ Յ ͉ ͉ Ͻ ␩ {i 0: 2 ai }. By Eq. 15 ͸ ͓␾͑0͒͑ ͒ ϩ ␾͑1͒͑ ͒ ϩ ␾͑2͒͑ ͔͒ Ͻ ϱ 2 i 2 i 2 i . [20] iϭ1 ϱ ⍀ Յ ͸ ͑ ͉ ͉ ͒ p min 1, C ai p Then Rn f {W(s), s ʦ ޒ}, where W is a centered Gaussian iϭ0 process. ϭ ͸ ͑ ͉ ͉ ͒ min 1, C ai p Kernel Density Estimation. An important problem in nonparamet- Ϫ i:͉ai͉Նp 1 ric inference of stochastic processes is to estimate the marginal

Wu PNAS ͉ October 4, 2005 ͉ vol. 102 ͉ no. 40 ͉ 14153 density function f (say) given the data X1,..., Xn. A popular ϱ method is the kernel density estimation (45,46). Let K be a ͸ ␸͑1͒͑k͒ Ͻ ϱ, [23] bounded kernel function for which ͐ޒK(u)du ϭ 1 and b Ͼ 1be n kϭ1 a of bandwidths satisfying ͌ we have nbn{fn(x) Ϫ ޅ[fn(x)]} f N[0, f(x)␬] for every x ʦ ޒ. 3 3 ϱ bn 0 and nbn . [21] Proof: Let m be a nonnegative integer. By the identity ޅ[ސ(Xmϩ1 Յ u͉␰m)͉␰0] ϭ ސ(Xmϩ1 Յ u͉␰0) and the Lebesgue dominated convergence theorem, we have ޅ[h (u͉␰ )͉␰ ͉ ϭ Let Kb(x) ϭ K(x͞b). Then f can be estimated by 1 m 0 hmϩ1(u͉␰0) and hmϩ1 is also bounded by c0.ByTheorem 1(ii), ʈ ͉␰ ʈ Յ ␸(1) ϩ ϭ ¥n ͉␰ Ϫ P0h1(u m) (m 1). Let An(u) iϭ1 h1(u iϪ1) 1 n nf(u). By Theorem 2(i) and Eq. 23 ͑ ͒ ϭ ͸ ͑ Ϫ ͒ fn x Kbn x Xi . [22] nbn ϱ iϭ1 sup ʈA ͑u͒ʈ u n Յ ͸ ʈ ͑ ͉␰ ͒ʈ Ͻ ϱ ͱ sup P0h1 u m . B n u If Xi are iid, Parzen (46) proved a central limit theorem for 2 mϭ0 Ϫ ޅ fn(x) [fn(x)] under the natural condition 21. There has been n Let M ϭ ¥ ϭ P K (x Ϫ X ) and N ϭ͐ޒ K(v)A (x Ϫ b v)dv. a substantial literature on generalizing Parzen’s result to time n i 1 i bn i n n n series (47,48). Wu and Mielniczuk (49) solved the open problem Observe that that, for short-range dependent linear processes, Parzen’s central limit theorem holds under Eq. 21. See references therein for ޅ͓ ͑ Ϫ ͉͒␰ Ϫ ͔ ϭ ͵ ͑ ͒ ͑ Ϫ ͉␰ Ϫ ͒ Kbn x Xi i 1 bn K v h1 x bnv i 1 dv. historical developments. Here, we shall generalize the result in ޒ ref. 49 to nonlinear processes. To this end, we shall adopt the uniform predictive dependence measure 19. The asymptotic Then nbn{fn(x) Ϫ ޅ[fn(x)]} ϭ Mn ϩ bnNn. Following the (1) ͌͞ f ␬ normality of fn requires a summability condition of ␸ (k) ϭ argument of lemma 2 in ref. 49, Mn nbn N[0, f(x) ], which finishes the proof since ޅ͉N ͉ ϭ O(n1/2) and b 3 0. suptʈhk(t͉␰0) Ϫ hk(t͉␰*0)ʈ. n n I thank J. Mielniczuk, M. Pourahmadi, and X. Shao for useful comments. Theorem 5. ͉␰ Յ Ͻ Assume that supu h1(u 0) c0 for some constant c0 I am very grateful for the extremely helpful suggestions of two reviewers. 2 ϱ and that f ϭ FЈ is continuous. Let ␬ :ϭ͐ޒ K (u)du Ͻϱ. Then This work was supported by National Foundation Grant DMS- under Eq. 21 and 0448704.

1. Wiener, N. (1958) Nonlinear Problems in Random Theory (MIT Press, Cam- 23. Wu, W. B. (2006) Econometric Theory 22, in press. bridge, MA). 24. Dittmann, I. & Granger, C. W. J. (2002) J. Econometrics 110, 113–133. 2. Rosenblatt, M. (1959) J. Math. Mech. 8, 665–681. 25. Sun, T. C. (1963) J. Math. Mech. 12, 945–978. 3. Rosenblatt, M. (1971) Markov Processes. Structure and Asymptotic Behavior 26. Taqqu, M. S. (1975) Z. Wahrscheinlichkeitstheorie Verw. Geb. 31, 287–302. (Springer, New York). 27. Ho, H. C. & Hsing, T. (1997) Ann. Prob. 25, 1636–1669. 4. Kallianpur, G. (1981) in , Collected Works with Commentaries 28. Feller, W. (1971) An Introduction to Probability Theory and its Applications eds. Wiener, N. & Masani, P. (MIT Press, Cambridge, MA) pp. 402–424. (Wiley, New York) Vol. II. 5. Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42, 43–47. 29. Avram, F. & Taqqu, M. (1987) Ann. Prob. 15, 767–775. 6. Bradley, R. C. (2005) Introduction to Strong Mixing Conditions (Indiana Univ. 30. Ho, H. C. & Hsing, T. (1996) Ann. Stat. 24, 992–1024. Press, Bloomington, IN). 31. Wu, W. B. (2003) Bernoulli 9, 809–831. 7. Blum, J. R. & Rosenblatt, M. (1956) Proc. Natl. Acad. Sci. USA 42, 412–413. 32. Wu, W. B. (2003) Statistica Sinica 13, 1259–1267. 8. Doukhan, P. & Louhichi, S. (1999) Stochastic Process. Appl. 84, 313–342. 33. Tong, H. (1990) Non-linear Time Series: A Approach (Oxford 9. Dedecker, J. & Prieur, C. (2005) Probab. Theor. Rel. Fields 132, 203–236. Univ. Press, Oxford). 10. Rosenblatt, M. (1964) J. Res. Natl. Bureau Standards Sect. D 68D, 933–936. 34. Haggan, V. & Ozaki, T. (1981) Biometrika 68, 189–196. 11. Rosenblatt, M. (1980) J. Appl. Prob. 17, 265–270. 35. Engle, R. F. (1982) Econometrica 50, 987–1007. 12. Andrews, D. W. K. (1984) J. Appl. Prob. 21, 930–934. 36. Diaconis, P. & Freedman, D. (1999) SIAM Rev. 41, 41–76. 13. Priestley, M. B. (1988). Nonlinear and Nonstationary Time Series Analysis 37. Wu, W. B. & Shao, X. (2004) J. Appl. Prob. 41, 425–436. (Academic, London). 38. Hsing, T. & Wu, W. B. (2004) Ann. Prob. 32, 1600–1631. 14. Pourahmadi, M. (2001) Foundations of Time Series Analysis and Prediction 39. Rio, E. (2000) Theorie Asymptotique des Processus Aleatoires Faiblement Theory (Wiley, New York). 15. Chow, Y. S. & Teicher, H. (1988) Probability Theory (Springer, New York). Dependants (Springer, Berlin). 16. Dedecker, J. & Doukhan, P. (2003) Stochastic Process. Appl. 106, 63–80. 40. Dedecker, J. & Merleve`de, P. (2003) Stochastic Process. Appl. 108, 17. Box, G. E. P., Jenkins, G. M. & Reinsel, G. C. (1994) Time Series Analysis: 229–262. Forecasting and Control (Prentice–Hall, Englewood Cliffs, NJ). 41. Volny´, D. (1993) Stochastic Process. Appl. 44, 41–74. 18. Rugh, W. J. (1981) Nonlinear System Theory: The Volterra͞Wiener Approach 42. Hannan, E. J. (1979) Stochastic Process. Appl. 9, 281–289. (Johns Hopkins Univ. Press, Baltimore). 43. Hannan, E. J. (1973) Z. Wahrscheinlichkeitstheorie Verw. Geb. 26, 157–170. 19. Schetzen, M. (1980) The Volterra and Wiener Theories of Nonlinear Systems 44. Heyde, C. C. (1974) Z. Wahrscheinlichkeitstheorie Verw. Geb. 30, 315–320. (Wiley, New York). 45. Rosenblatt, M. (1956) Ann. Math. Stat. 27, 832–837. 20. Casti, J. L. (1985) Nonlinear System Theory (Academic, Orlando, FL). 46. Parzen, E. (1962) Ann. Math. Stat. 33, 1965–1976. 21. Bendat, J. S. (1990) Nonlinear and Identification from Random 47. Robinson, P. M. (1983) J. Time Ser. Anal. 4, 185–207. Data (Wiley, New York). 48. Bosq, D. (1996) Nonparametric Statistics for Stochastic Processes. Estimation 22. Mathews, V. J. & Sicuranza, G. L. (2000) Signal Processing (Wiley, and Prediction (Springer, New York). New York). 49. Wu, W. B. & Mielniczuk, J. (2002) Ann. Stat. 30, 1441–1459.

14154 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0506715102 Wu