<<

ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS

ISAAC LOH

Abstract. We show that stationary, Polish space-valued stochastic processes can be uniformly approximated by weakly mixing processes, non-ergodic processes, and non-mean-ergodic processes. Therefore, the ergodic hypothesis—that time averages will converge to their statistical counterparts— is not testable in the general case. We also show that the set of weakly mixing processes is generic and the set of strongly mixing processes is meagre in a rich space of stationary stochastic processes.

1. Introduction The ergodic hypothesis, that time averages asymptotically converge to statistical averages, is fundamental in many economic and statistical settings. In this paper, we consider the testability of the ergodic hypothesis under minimal assumptions apart from stationarity. Namely, we ask if there exists a statistical test with nontrivial power that distinguishes between and nonergodicity. We provide an answer in Theorem 1 which states that any stationary taking on values in a Polish space can be approximated in a very strong sense by stationary ergodic stochastic processes. This theorem is proved by applying a result on uniform approximation in . The approximation result also holds very generally with non-ergodic processes, as well as mean-ergodic and non-mean-ergodic processes. This implies that in our very general setting, the power of any test for ergodicity (or alternately non-ergodicity, mean-ergodicity, non- mean-ergodicity) cannot exceed its size, despite the number of time periods or the number of observations available. Our results contrast with the tests for ergodicity which are proposed in [8] and [9] for processes that are Markovian in nature, and demonstrate that the consistency results achieved therein depend on the Markovian assumptions. Our approximation theorem complements several well known results on the existence of station- ary processes having a certain specified marginal distribution. [12] and [19] show the existence of stationary processes with a finite state space arising from measure preserving, ergodic, and ape- riodic transformations that have a certain marginal distribution. Similarly, [18] and [2] study the problem of coding a stationary process onto another process with a given marginal distribution, when the coded process takes on values in a finite set and a countable set, respectively. Our findings particularly parallel those of [17], which shows that marginal distributions for stationary processes on a finite state space can be matched with periodic measures and closely approximated with ergodic measures. In contrast with [17], we allow for stochastic processes taking on values in an arbitrary Polish space, and prove that those processes can be approximated in a uniform sense with other processes satisfying ergodicity, non-ergodicity, and related conditions. The implication of our main result for the testability of the ergodic hypothesis supplements existing work on testability when ergodicity is a maintained assumption in place of the usual i.i.d. assumption for data. [20] and [24] investigate the discernability of families of stationary ergodic processes, whereas we study the discernability of stationary processes with mixing characteristics from those without mixing. Finally, we apply our result to studying the genericity of various kinds of mixing conditions in a space of stationary stochastic processes taking on values in a given Polish space. We show that, within this large space of processes, weak mixing is a generic property but strong mixing is meagre. [21] showed that the set of ergodic measures for stationary stochastic processes was a generic set 1 2 ISAAC LOH in the set of stationary measures endowed with the weak topology, whereas the set of strongly mixing measures was meagre. Our observation corroborates these findings in a space of stationary stochastic processes, as opposed to their push-forward measures. It also parallels, and makes use of, results on the approximation and genericity of measure preserving transformations possessing certain mixing properties (e.g. [13] and [11]).

2. Main Result Throughout, we consider a Polish space X and say that a X -valued stochastic process X = (X (˜ω)) :(Ω˜, F˜, P)˜ → Q X is stationary if its finite dimensional distributions are shift t t∈Z t∈Z k invariant, i.e. for each k ≥ 1 and measurable H ∈ X , P ((Xt+1,...,Xt+k) ∈ H) is invariant with respect to t. Let X Z denote Q X , and let T : X Z → X Z be the left shift map. We say that a t∈Z stationary process is ergodic if for all k ∈ N and cylinder sets A ⊂ X , one has t 1 X a.s. lim I(T jX ∈ A) = P (X ∈ A) t→∞ t j=1 Alternately, we say that a stationary process is weakly mixing if, for all Borel measurable A, B ⊂ X Z,  t  (1) lim E 1A(T X)1B(X = P (A)P(B) , t→∞ t6∈I0 1 where I0 is some zero density subset of N . Note that if A is a shift invariant set, putting B = A in (1) implies that P (A) = P A2, and thus P (A) ∈ {0, 1}. Thus, Theorem 9.5.6 of [16] implies that weak mixing implies ergodicity. For some equivalent characterizations of weak mixing, see [13] or [26]. Define a process as strongly mixing if it satisfies (1) with I0 = ∅. In the particular case where X = R, we call a stationary stochastic process mean ergodic if t 1 X a.s. lim Xj = E [X0] , t→∞ t j=1 provided that the integral on the right exists. A stochastic process X, is said to be aperiodic if the probability that ...,X−1,X0,X1,... is a periodic sequence is zero. Our result applies approximation results from ergodic theory (see the canonical result of [22] and also [3] for an overview). Some relevant mathematical terminology is introduced further along in Section A. Let us formalize a notion of statistical testing in our setting. Let X be a X -valued, stationary and measurable stochastic process with law P ∈ P (where the set P is to be specified later) on the space X Z. As in [6], we are interested in hypothesis testing problems of the form

H0 :P ∈ P0

H1 :P ∈ P1, where P0 ⊂ P is the set of probability measures on Ω for which the null hypothesis is deemed to hold and P1 = P \ P0 is its complement. A possibly randomized statistical test will be denoted by t a map ϕt : X → [0, 1], where t indicates the time dimension of the test. The corresponding size of the test is given by Z sup E[ϕt(X1,...,Xt)] = ϕt(x1, . . . , xt) dP(x). P∈P0 Ω

1 1 Pt Recall that I0 ⊂ N is said to have zero density if limt→∞ t j=1 1j∈I0 = 0. ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 3

We will show that when P0, P1 are chosen to test ergodicity (or non-ergodicity) of the stochastic process X, one has under mild conditions

sup EP (ϕt) ≤ sup EP (ϕt)(2) P∈P1 P∈P0 for any sequence {ϕt}t∈N of tests and any sample size t. In other words, the power of the test ϕt cannot exceed its size, for any t. For this reason, one can say that the null hypothesis H0 is non-testable. A simple argument (see [23], Remark 2) then shows that the nontestability result i i n holds even if ϕt is allowed to depend not only on X1,...,Xt, but an iid sample (X1,...,Xt )i=1 of size n, where n is arbitrarily large. The definition of P and its constituent sets P0 and P1 clearly play a significant role in establishing (2). We let P denote the set of standard Borel measures on (X Z, B∞) corresponding to distributions of stationary X -valued stochastic processes X. Lemma 5 below implies that P contains all of the pushforward measures induced on X Z by Borel measurable stochastic processes. We then let PE ⊂ P denote the subset of distributions which correspond to X -valued stochastic processes which are additionally ergodic, and PWM those which are weakly mixing. In the special case where X = R and P contains only distributions induced by stochastic processes X which are integrable in the sense that E [|X0|] < ∞, we let PME ⊂ P denote the subset of distributions which correspond to mean ergodic stochastic processes. By Corollary 6 below, PE ⊂ PME. In the analysis to follow, we consider several null hypotheses. The first sets P0 = PE. This is to say that the stochastic process in question is ergodic, under the null hypothesis. The other scenarios c c we consider are that P0 = PE, PME, and PME. We will show that (2) holds in all scenarios. In fact, this is implied by our main result Theorem 1, which is stronger. The theorem states that, given a stationary X -valued stochastic process X and law P, we can find a sequence (Xk) of ergodic (or non-ergodic) stationary stochastic processes that uniformly approximate X. The proof of our main theorem uses uniform approximation results from the theory of measure preserving transformations. The specific results results that we use depend on aperiodicity of the underlying transformation, which by analogy would suggest that our main theorem would be valid only for aperiodic processes. However, Theorem 1 shows not only that periodicity is not a hindrance to ergodic approximation, but that the approximation can always be done with aperiodic transformations. This has the added benefit of implying that all of our non-testability results still apply when P is further restricted to contain only the set of aperiodic stochastic processes. As aperiodicity is a light assumption, this negates what would otherwise be a straightforward workaround to the finding of non-testability. Recall that a probability space is called a standard probability space if it is isomorphic to a 1 regular probability measure on R (c.f. [14], §2.4 or [5], §9.4). All standard probability spaces that are considered in this paper will be nonatomic. If X is supported on a standard probability space (Ω, F, P) then the sequence (Xk) constructed in Theorem 1 is actually constructed on an enriched sample space (Ω × Ω0, F ⊗ F0, P × P0), where (Ω0, F0, P0) is again standard. Whenever this is the case, we make the fairly innocuous identification of

(3) X ∼ (Xt ◦ π1)t∈Z with support on Ω1 × Ω2, where π1 is the projection onto the first coordinate, and with some abuse of notation refer to the product measure P × P0 by P. This minor technical imposition ensures that P indeed describes the joint distribution of X and (Xk), and affords some economy of notation.

Theorem 1. Let X be a Polish space satisfying |X | ≥ 2. Then for any stationary X -valued stochastic process X on (Ω, F, P), and any standard probability space (Ω0, F0, P0), there is a sequence (Xk) of stationary, aperiodic, and weakly mixing X -valued stochastic processes on (Ω × Ω0, F ⊗ 4 ISAAC LOH

0 0 F , P × P ) such that for all t ∈ N,  k k lim P X−t = X−t,...,Xt = Xt = 1, k→∞

d k and moreover, if X is non-deterministic, X0 = X0 for all k ∈ N. The same is true if instead k of weakly mixing, (X ) is specified to be non-ergodic. If X is in addition an integrable R-valued stochastic process, the same is true if instead of weakly mixing, (Xk) is specified to be mean-ergodic or non-mean-ergodic. The proof of Theorem 1 is in the appendix below. As a corollary to the theorem, we have our main testability result:

Corollary 2. Let X be a Polish space satisfying |X | ≥ 2. Then for any t ∈ N, (2) holds with c P0 = PE, PWM, and PE for all tests ϕt. If X ⊂ R and P contains only integrable distributions (corresponding to stochastic process X satisfying E[|X0|] < ∞), then (2) holds with P0 = PME and c P0 = PME. The same is true if P, P0, P1 are additionally restricted to the set of aperiodic distributions.

Proof. We deal with the case P0 = PE; the other cases are similar. Let ϕt be a test func- ton. By Theorem 1 there exists a sequence (Xk) of stationary and ergodic processes such that k k k k limk→∞ P X1 = X1 ,...,Xt = Xt = 1. Let P denote the specific probability law of (X ) and P the law of X. Then, by boundedness of ϕt, h k k i sup EP (ϕt) ≥ lim EPk (ϕt) = lim E ϕt(X1 ,...,Xt ) = E [ϕt(X1,...,Xt)] t→∞ t→∞ P∈PE = EP (ϕt) . k As ϕt and (X ), whence P, were arbitrary, (2) is proved.  2.1. The Category of Mixing Processes. In this section we consider a rich set X of stationary processes and demonstrate that, in a topology to be made precise later on, the set of weakly mixing processes constitutes a residual set. This result is closely tied to finding of [13], that the set of weakly mixing and measure preserving transformations on a standard probability space is generic in the group of all measure preserving transformations. We also find that the set of strongly mixing processes is meagre, which augments the finding of [21] that the set of strongly mixing and stationary measures is meagre in the set of stationary measures endowed with the weak topology. To summarize, [21] shows that when the set P of stationary measures on X ∞ is endowed with the weak topology, we have the relation iid =⇒ ϕ, β, ρ, α-mixing =⇒ strong mixing =⇒ weak mixing =⇒ ergodic =⇒ mean ergodic, | {z } | {z } meagre generic whereas we show that in our metric space X of stochastic processes, the following is true: iid =⇒ ϕ, β, ρ, α-mixing =⇒ strong mixing =⇒ weak mixing =⇒ ergodic =⇒ mean ergodic . | {z } | {z } meagre: Proposition 2 generic: Proposition 1 To make our discussion a little more rigorous, it will be necessary to define a structure for the stochastic processes under consideration so that we may properly define e.g. their joint distributions. The space of processes that we define is quite large, consisting of maps from countable products of standard probability spaces into a state space. Importantly, the space is large enough so that, given any of its constituent process X, it contains another stochastic process independent of X. As before, let X be a Polish space with its Borel σ-algebra, and for all z ∈ R let (Ωz, Fz, Pz) be a standard probability space (e.g. the unit interval—the exact choice is inconsequential). Let X˜ denote Q Q Q Z the set of pairs (X,V ) where V ⊂ R is a countable set and X :( z∈V Ωz, z∈V Fz, z∈V Pz) → X 0 is a stationary random variable. Given two distinct elements (X,V ) and (Y,V ) of X˜ and any ` ∈ N, ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 5

V Q Q define πV ∪V 0 : z∈V ∪V 0 Ωz → z∈V Ωz to be projection defined in the obvious way, and define V 0 ˜ πV ∪V 0 similarly. We may define a pseudometric d` on the space X by 0  V V 0  d`((X,V ), (Y,V )) = P (X−`,...,X`) ◦ πV ∪V 0 6= (Y−`,...,Y`) ◦ πV ∪V 0 , Q Q where P is used as shorthand for the product measure z∈V ∪V 0 Pz on z∈V ∪V 0 Ωz. These psuedo- metrics may be combined to form a psuedometric d on the space X˜: ∞ 0 X −` 0 d((X,V ), (Y,V )) = 2 d`((X,V ), (Y,V )). `=1 Finally let ∼ denote the equivalence relation induced by the d psuedometric, and let X = X˜/ ∼ denote the resulting quotient space. This is the quotient space obtained by imposing the relation indicated in (3). As the indexing set V is inherent in the definition of X, we refer to equivalence classes [(X,V )] ∈ X by omitting dependence on V and writing simply X. The resulting space is a complete metric space, and in particular a Baire space, as the next lemma demonstrates.

Lemma 3. (X, d) is a complete metric space. Proof. The only difficulty in verifying that d is a psuedometric on X˜ lies in checking the triangle 0 00 inequality. Pick (X,V ), (Y,V ), (Z,V ) ∈ X˜ arbitrarily and note 0  V V 0  d`((X,V ), (Y,V )) =P (X−`,...,X`) ◦ πV ∪V 0∪V 00 6= (Y−`,...,Y`) ◦ πV ∪V 0∪V 00

 V V 00  ≤P (X−`,...,X`) ◦ πV ∪V 0∪V 00 6= (Z−`,...,Z`) ◦ πV ∪V 0∪V 00

 V 0 V 00  + P (Y−`,...,Y`) ◦ πV ∪V 0∪V 00 6= (Z−`,...,Z`) ◦ πV ∪V 0∪V 00 00 00 =d`((X,V ), (Z,V )) + d`((Y,V ), (Z,V )). It remains to show completeness. k k Let (X ,V ) be a d-Cauchy, and hence d`-Cauchy for all ` ∈ N, sequence of equivalence classes P∞ kn+1 kn+1 kn kn in X. Choose a subsequence (kn)n∈N such that, for all `, n=1 d`((X ,V )(X ,V )) < ∞. Then the Borel-Cantelli lemma implies that, for each ` ∈ N,  V  kn+1 kn+1 kn+1 kn kn Vkn P lim sup{(X−` ,...,X` ) ◦ π∪ V 6= (X−`,...,X` ) ◦ π∪ V } = 0, n→∞ i∈N i i∈N i

kn kn Vkn so that almost surely, limn→∞(X ,...,X ) ◦ π exists as a random variable defined on −` ` ∪i∈NVi k Q kn V n Ωz. Define Xt = limn→∞ Xt ◦ π for all t ∈ Z and let X denote the resulting z∈∪i∈NVi ∪i∈NVi stochastic process defined on V = S V , so that in the convergence (Xk,V k) → (X,V ) holds. i∈N i X Because the convergence (Xkn ) → X holds in the product topology of X Z, and X Z is Polish with this topology, it follows that X is measurable (see [25], which applies to metrizable spaces). Finally, k stationarity of X follows from stationarity of (Xt ). 

Now let Xw denote the subset of X consisting of only weakly mixing transformations. Our first claim is that Xw is a countable intersection of open sets. The following lemma adapts the strategy of [13] to stationary processes.

Lemma 4. Xw is a Gδ subset of (X, d). 2`+1 Proof. For each ` ∈ N let C` = X with its product Borel σ-algebra, and let π` : X Z → C` denote the projection (. . . , x−`, . . . , x`,...) 7→ (x−`, . . . , x`). C` is a separable metric space, so it is ` −1 ` second countable with a basis which we can denote {Bi }i∈N. The collection {π` (Bi )}i,`∈N then constitutes a (countable) basis for the product topology on X Z because every open cylinder set can be written as a union of its constituent sets. Let A denote the algebra which is generated by 6 ISAAC LOH

−1 ` the family of sets {π` (Bi )}i,`∈N, which is again countable (as it the family of sets obtained from the countable basis sets by iterating complementation and finite unions finitely many times). Note that the σ-algebra generated by A contains all of the open sets of the product topology in X Z, so it is the Borel σ-algebra B on X Z. Furthermore, by the density of algebras in the σ-algebras they generate (see e.g. the appendix of [10]), for any Borel set D ⊂ X Z, constant ε > 0, and finite Borel measure µ on X Z, there exists a set B ∈ A satisfying µ(A∆B) = µ(A \ B) + µ(B \ A) < ε. Hence, letting Λ denote the countable set of finite linear combinations of functions {1B}B∈A with 2 coefficients in Q + iQ, Λ is a dense collection of bounded complex functions in L (X Z, µ) (density follows because every simple function can be approximated with elements of Λ). Let f1, f2,... be an enumeration of Λ. Now, using the notation of [13], for all i, j, m, t, define n h i h i o n E(i, j, m, n) = X ∈ X : E fi(S (X))fj(X) − E[fi(X)] E fj(X) < 1/m , where Sn : X Z → X Z is the left shift map. First, we argue that E(i, j, m, n) is open in (X, d). Define the function γi,j,n : X → C by

h n i h i γi,j,n : X 7→ E fi(S (X))fj(X) − E[fi(X)] E fj(X) .

By construction of Λ, the function fi only depends on the finite dimensional distribution of k π`X = (X−`,...,X`) for some ` sufficiently large. Let X converge to X in the d-metric; then k k  k P (X−`,...,X`) = (X−`,...,X` ) → 1 and the distribution of π`(X ) converges to that of π`(X)  k  in the total variation norm. Because fi is bounded, this implies the convergence E fi(X ) → E[fi(X)], so E [fi(·)] : X → C is d-continuous. In the same way it can be shown that γi,j,n is −1 d-continuous, so E(i, j, m, n) = γi,j,n{z ∈ C : |z| < 1/m} is open. We claim that = T S E(i, j, m, n). By expressing f and f as sums of indicator Xw i,j,m∈N n∈N i j −1 ` functions over cylinder sets of the form π` (Bi ), it is clear that (1) implies containment in the forward direction. Conversely, suppose that X is not weakly mixing, so that (1) does not hold. Let P˜ denote the pushforward measure induced on X Z by X so that, by applying Proposition 6.2.2 of [6] with the roles of A and B reversed in conjunction with stationarity of X, the left shift S is not weakly mixing on (X Z, B∞, P).˜ Because Λ is dense in L2(X Z, P),˜ it follows as in [13] that there exists an fi ∈ Λ such that Z Z Z n ˜ ˜ ˜ γi,i,n(X) = fi(S ω˜)fi(˜ω) dP − fi(˜ω) dP fi(˜ω) dP > 1/2 X Z for all n, which concludes. 

As Theorem 1 implies the density of Xw in X, we immediately have the following result.

Proposition 1. Xw is a residual subset of (X, d).

Now let Xs denote the set of strongly mixing processes in X. Let M denote the set of stationary measures on X Z, and let Ms ⊂ M be the subset of measures which are strongly mixing, i.e. the set of measures induced by stochastic processes which are strongly mixing. M is a separable metric space under this topology, and Theorem 3.4 of [21] establishes that Ms is of the first Baire category. Q Note that the mapping ψ : X → M given by ψ :(X,V ) 7→ X∗ z∈V Pv is continuous with respect −1 c −1 c c to the d-metric, and that Xs = ψ (Ms), Xs = ψ (Ms) (by definition). It follows that Xs is a Gδ subset of X, and Theorem 1 implies that it is dense (as are, in fact, the set of non-ergodic processes), whence:

Proposition 2. Xs is meagre in (X, d). ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 7

Appendix A. Proof of Theorem 1

Let B be the Borel σ-algebra on a Polish space X , where d ≥ 1 is fixed. For t ∈ Z let Xt : (Ω, F, P) → X be a F/B-measurable random variable. Borel measurability of Xt will be maintained throughout. Define the product space: X Z ≡ Q X with the corresponding product Borel σ- t∈Z algebra N B. Note that this σ-algebra, which we denote B∞, is also the Borel σ-algebra on X Z t∈Z (c.f. [15], Lemma 1.2) equipped with its product topology. Under the aforementioned conditions, we have the following straightforward result:

Lemma 5. Under Borel measurability of Xt, t ∈ Z, there is a unique probability measure P on ∞ (X Z, B ) such that P agrees with Q on all finite dimensional cylinder sets, i.e. for all t ∈ N and Borel sets A ⊂ X t,   Y (4) P A × St = Q ((X1,...,Xt) ∈ A) . t6∈I

Moreover, (X Z, B∞, P) is a standard measure space, and if Q(X = x) = 0 for any element x ∈ X Z, (X Z, B∞, P) is isomorphic to Lebesgue measure on the unit interval [0, 1].

Proof. Note that X Z is a countable product of Polish spaces, and thus it is itself Polish (which is to say, separable and completely metrizable, see [27] §2.2). Now a straightforward extension of Theorem 36.1 of [4] to Polish-space valued stochastic processes, and uniqueness guaranteed by Theorem 3.1 of the same, supplies the unique probability measure P on B∞ satisfying (4). By Theorem 7.1.7 of [5], P is regular. Hence Theorem 2.4.1 of [14] implies that it is also a standard probability space, using the fact that X Z is Polish with its product topology. As (X Z, B∞, P) is standard, Theorem 9.4.7 of [5] implies that it is isomorphic to the unit interval equipped with its Borel σ-algebra, Lebesgue measure, and a countable number of atoms. Let f : X Z → [0, 1] be an isomorphism of these measure spaces. If the pushforward f∗P has an atom A ∈ [0, 1], then A is atom of a Borel measure on [0, 1] and hence must be a singleton, A = {x} ⊂ [0, 1]. Then f −1(x) is a singleton in X Z because f is injective, and Q(X = s) = P f −1(x) > 0. So if P assigns no singletons positive measure then X Z is isomorphic to Lebesgue measure on the unit interval.  We now discuss some measure theoretical preliminaries, and particularly some rudiments of ergodic theory, following [3]. Let (Ω, F, P) be a probability space. Whenever two sets A and B are equal up to a set of measure zero, i.e. P (A ∆ B) = 0, we write A = B (mod P). A transformation (or automorphism) is an invertible map T : (Ω, F, P) → (Ω, F, P) such that T and T −1 are measurable. T is nonsingular if P (A) = 0 if and only if P T −1(A) = 0. We say that T is measure-preserving if P(T −1(A)) = P (A) for all measurable A. T is ergodic if for all A with P(A) > 0 such that T −1(A) = A (mod P), one has µ(A) = 1 or µ(Ac) = 0 (see [26], Lemma 3.7.1). A measure preserving transformation is weakly mixing if it satisfies the stronger condition:

t−1 1 X lim |P T −j(A) ∩ B − P(A)P(B) | = 0 t→∞ t j=0 for each pair of measurable sets A and B. Evidently, if T is weak mixing, then it is also ergodic. Say that a point ω ∈ Ω is a periodic point of period t for T if T tx = x for and T jx 6= x for j = 1, . . . , t − 1. A transformation T is called aperiodic if the set of its periodic points has P-measure 0. We use Ap to denote the set of aperiodic transformations in M(Ω, P). Using Lemma 5, we have the following equivalence, which is essentially stated in the case X = R in [16]: 8 ISAAC LOH

˜ ˜ ˜ Proposition 3. Let (Xt(˜ω))t∈Z be an X -valued stochastic process on (Ω, F, P) which is nonatomic in the sense of Lemma 5. Then Xt is stationary if and only if there exists a standard nonatomic probability space (Ω, F, P) = (X Z, B∞, P), measure-preserving map ξ :(Ω˜, F˜, P)˜ → (Ω, F, P), trans- formation T :Ω → Ω, and measurable map χ :Ω → X such that t (5) Xt(˜ω) = χ(T ◦ ξ(˜ω)). Moreover, under stationarity of X, (Ω, P,T ) satisfies: (1) The process is ergodic if and only if T is an ergodic transformation of Ω (2) The process is aperiodic if and only if T ∈ Ap. Proof. We summarize the proof, which is straightforward along the lines of [16] and [4]. Let Ω be the space X Z supplied by Lemma 5 and F = B∞. Let ξ : Ω˜ → Ω be defined by ξ(˜ω) = (...,X−1(˜ω),X0(˜ω),X1(˜ω),...). Let T be the left shift map T :(. . . , x−1, x0, x1,...) 7→ (. . . , x0, x1, x2,...). Note that P is the pushforward of P˜ under ξ, denoted ξ∗P.˜ Let χ be the projection of a point x ∈ X Z onto its zeroth coordinate x0, i.e. χ(. . . , x−1, x0, x1,...) = x0. Then the relation (5) follows. To see that stationarity of X implies that T is measure preserving, one may apply [4], p. 311 Lemma 1. The converse follows by noting that k  P((˜ X1,...,Xk) ∈ H) = P ω :(χ(T ω), . . . , χ(T ω)) ∈ H = P T −tω :(χ(T t+1T −tω), . . . , χ(T t+kT −tω)) ∈ H = P ω :(χ(T t+1ω), . . . , χ(T t+kω)) ∈ H

= P˜ ((Xt+1,...,Xt+k) ∈ H) , for any t ∈ Z. Now suppose that our stochastic process X is also ergodic. Then it is immediate that for every Q Q cylinder set R = n∈I At × m∈Ic Sm, |I| < ∞, one has t 1 X a.s. lim I(T j(ω) ∈ R) = P (R) . t→∞ t j=1 Let D indicate the collection of all subsets of Ω = X Z which have this convergence, noting that the collection of cylinders R is contained in D. A straightforward argument shows that D is a Dynkin 1 Pt j a.s. ∞ system, so it follows that limn→∞ t j=1 I(T (ω) ∈ A) = P (A) for all A ∈ F = B . By the π − λ 1 Pt j a.s. theorem, t j=1 I(T (ω) ∈ A) → P(A) for all A ∈ F. Thus, for any such set A, the dominated convergence theorem implies t t 1 X 1 X Z lim P T j(Ac) ∩ A = lim I(T j(ω) ∈ A) dP n→∞ t n→∞ t c j=1 j=1 A t Z 1 X = lim I(T j(ω) ∈ A) dP c n→∞ t A j=1 = P (A)P(Ac) . So if 0 < P(A) < 1, there exists some j such that P T j(Ac) ∩ A > 0, which is to say that A is not T -invariant, modulo P. Hence T is ergodic. The converse follows by noting  j k j   I((Xj+1,...,Xj+k ∈ H) = I ω : χ(T (T ω)), . . . , χ(T (T ω)) ∈ H , which is an integrable function of T jω, and then applying the well-known Birkhoff Ergodic Theorem (c.f. [26], Theorem 5.1.1). Finally, if X is aperiodic, it is clear that T ∈ Ap. On the other hand, if the stochastic process is periodic with some positive probability, then the set of periodic sequences in Ω = X Z is assigned positive P-measure, and as T is the shift map on Ω, it is straightforward to verify that T 6∈ Ap.  ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 9

Corollary 6. In the setting of Proposition 3, a stationary stochastic process (Xt(˜ω))t∈Z is ergodic if and only if for all g ∈ L1(X Z, P), one has

t 1 X a.s. (6) lim g((...,X−1+j,Xj,X1+j,...)) = E˜ (g((...,X−1(˜ω),X0(˜ω),X1(˜ω),...))) n→∞ t P j=1

= EP (g) .

Also, if X is ergodic, it is mean ergodic.

Proof. Sufficiency of (6) is clear. On the other hand, if the process X is ergodic,

t t 1 X 1 X g((...,X (˜ω),X (˜ω),X (˜ω),...)) = g(T jξ(˜ω)), t −1+j j 1+j t j=1 j=1

j and by ergodicity of T and the relation P = ξ∗P,˜

 t  ˜ 1 X j P  lim g(T ξ(˜ω)) 6= E˜ (g((...,X−1,X0,X1,...))) n→∞ t P j=1

 t  1 X j = P  lim g(T ω) 6= EP (g) = 0. n→∞ t j=1

The second claim on mean ergodicity similarly follows from (5) and the Birkhoff ergodic theorem. 

For any stationary nonatomic stochastic process X, the implication of Proposition 3 is that we t may assume without loss of generality that Xt = χ(T ω) on some standard nonatomic probability space (Ω, F, P) and measurable map χ :Ω → X . This will prove useful when we discuss the density of ergodic transformations. In the following discussion we track closely the introductions of [7] and [1]. Fix a nonatomic standard probability space (Ω, F, P) and let G(Ω, P) denote the group of all nonsingular auto- morphisms on it; furthermore let M(Ω, P) ⊂ G(Ω, P) denote the subgroup of measure preserv- ing automorphisms on Ω. The uniform topology on G(Ω, P) is induced by the metric d(T,S) = P(ω : T (ω) 6= S(ω)). It can readily be seen that this topology is complete and includes M(Ω, P) as a closed subset of G(Ω, P), whence G(Ω, P) and M(Ω, P) are complete metric spaces and hence Baire spaces. The following lemma demonstrates that convergence in the uniform topology implies convergence of a certain type of integral with bounded integrand:

Lemma 7. Let Tk → T in G(Ω, P) equipped with the uniform topology. Then

−t t −t t  (7) P (Tk ω, . . . , Tkω) = (T ω, . . . , T ω) → 1. Proof. We claim that

t−1 \ j j j T (Ak) ⊂ {ω : Tk ω = T ω for all j = 1, . . . , t} j=0 10 ISAAC LOH

Tt−1 −j j where Ak = {ω : Tk(ω) = T (ω}. Indeed, if ω ∈ j=0 T (Ak) then one has T ∈ Ak for all j = 0, . . . , t − 1, which is to say that

Tk(ω) = T ω 2 Tk(T ω) = T ω . . t−1 t Tk(T ω) = T ω, j j j  which by substitution implies Tk (ω) = T ω for j = 1, . . . , t. Now note that P T Ak = P (Ak) → 1 t t  for all j, so that P (Tkω, . . . , Tkω) = (T ω, . . . , T ω) → 1. To finish, note that, because T and Tk are invertible and measure preserving, −1  −1  0 −1 0 −1 0 P(T ω = Tkω) = P ω = T Tkω = P Tkω : ω = T Tkω = P ω : Tk ω = T ω , −1 −1 so that also Tk → T uniformly; now apply the same argument for j = −1,..., −t and apply the union bound.  It is also useful to have the following lemma in hand, which assures us that we may replace the assumption that X follows a nonatomic distribution with the slightly stronger assumption that it is aperiodic: Lemma 8. An aperiodic X -valued stochastic process X is also nonatomic in the sense of Lemma 5. Proof. We prove the contrapositive. Let P be the measure induced by X on X Z. Suppose that P has an atom x ∈ X Z so that P(x) = P (Xt = xt for all t ∈ Z) > 0. As Xt is station- s ary, it is straightforward to verify that for all s ∈ Z, P(T x) = P (Xt = xt+s for all t ∈ Z) = s s0 0 P(Xt = xt for all t ∈ Z) > 0. If T x 6= T x whenever s 6= s then this contradicts that P is a probability (finite) measure, so that for some s 6= s0 one has T s−s0 x = x. Assuming without loss of 0 0 generality that s > s , x is periodic with period s − s and X is not aperiodic.  Proposition 4. If X is a Polish space, then for any t ∈ N and stationary, aperiodic X -valued stochastic process X on (Ω˜, F˜, P)˜ , there is a sequence (Xk) of stationary, aperiodic, and weakly ˜ ˜ ˜ k mixing X -valued stochastic processes supported on (Ω, F, P) such that X0 = X0 for all k ∈ N and  k k lim P X−t = X−t,...,Xt = Xt = 1. k→∞ The same is true if instead of weakly mixing, (Xk) is specified to be non-ergodic. If X is in addition an integrable R-valued stochastic process, the same is true if instead of weakly mixing, (Xk) is specified to be mean-ergodic or non-mean-ergodic.

Proof. Fix t ∈ N and a test ϕt. We break into cases below:

Case 1: Approximation with ergodic processes. Fix the law P corresponding to the X - valued stochastic process X which is stationary and aperiodic, as well as nonatomic by Lemma th 8. Proposition 3 with ω ∈ X Z, Xt the projection map onto the t coordinate, Ω = X Z, ξ :ω ˜ 7→ t (...,X−1,X0,X1,...) and ξ = id holds, and so Xt = χ(T ◦ ξ(˜ω)) with T the left shift map and χ the projection onto the 0th coordinate. For brevity, we will typically write ω in place of ξ(˜ω). Recall ∞ that P is the pushforward ξ∗P,˜ and that (X Z, B , P) is a standard nonatomic measure space, and Proposition 3 guarantees that T ∈ Ap ⊂ M(Ω, P). This allows us to apply some approximation results from ergodic theory. Now we show that T may be approximated by a sequence of weakly mixing transformations

(Tk)k∈N ⊂ M(Ω, P). In Theorem 7.14 of [11], it is shown that the conjugacy class of a weak mixing ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 11 transformation S ∈ Ap ∩ M(Ω, P) (which exists, see [3]) is dense in Ap, which is to say that −1 there is a sequence of P-measure preserving transformations (ηk)k∈N such that Tk ≡ ηk Sηk → T (uniform topology). It is thus sufficient to show that Tk is weakly mixing (whence ergodic). By Proposition 6.2.2 of [26], a transformation S is weakly mixing if and only if for each pair of measurable A, B ⊂ Ω one has lim P S−j(A) ∩ B = P (A)P(B) j→∞ j6∈D for some zero density set D = D(A, B) ⊂ N. By assumption, S is weakly mixing, so that there is a zero density set D = D(η(A), η(B)) satisfying   (8) lim P T −j(A) ∩ B = lim P η−1S−jη(A) ∩ B = lim P S−j(η(A)) ∩ η(B) j→∞ k j→∞ j→∞ j6∈D j6∈D j6∈D = P (η(A)) P (η(B)) = P (A)P(B) , and Tk is weakly mixing. k t Consider for each k the stochastic process defined by Xt ≡ χ(Tk ◦ ξ(˜ω)) for t ∈ Z. Immediately k Z Z Z one has X0 = X0 for all k. Because X : X → X and Tk : X → X are both Borel maps, k Z k Xt measurably maps X = Ω to X (the pushforward of P). Hence, Lemma 5 implies that (X ) Z ∞ k induces a standard probability distribution Pk on (X , B ). Moreover, (Xt ) is clearly a stationary stochastic process by Proposition 3, and the process is weakly mixing by application of (8) to the definition of weak mixing given in (1). k We now wish to show that (X ) is aperiodic. To this end, we use the fact that Tk is weakly mixing. Suppose for the sake for the sake of contradiction that the stochastic process is not aperiodic, so k t that on some positive P-measure set A ⊂ Ω, one has ω ∈ A implies that Xt (ω) = χ(Tkω) is periodic. By partitioning A into a countable number of subsets, there must exist a positive measure subset k Pm ⊂ A on which Xt is periodic with period m ∈ N, i.e. k k ω ∈ Pm =⇒ Xt (ω) = Xt+m(ω) for all t ∈ Z k 0 Note that X0 (ω) = χ(T ω) = χ(ω) has the same distribution as X0. We claim that this implies the −1  −1  existence of two disjoint sets B1,B2 ⊂ X such that P χ (B1) ∩ Pm > 0 and P χ (B2) > 0. In fact, this follows straightforwardly from separability of X ; for all ε > 0 there is a pairwise-disjoint cover (Cε) of X such that (Cε) is contained in an ε-ball of X and X = F Cε. Letting ε tend ` `∈N ` `∈N ` ε ε −1 ε  −1 ε  towards 0, there must eventually exist two sets C` ,C`0 satisfying P χ (C` ) , P χ (C`0 ) > 0, or else because X is Polish, the pushforward measure χ∗P is a point mass at some x0 ∈ X . This would then imply that P (X0 = x0) = P (ω : χ(ω) = x0) = 1, and then by stationarity of X that P(Xt = x0 for all t ∈ Z) = 1, contradicting that P is nonatomic (as it is aperiodic, see Lemma ε −1 ε  ε 8). As we may choose C` satisfying P χ (C` ) ∩ Pm > 0, it suffices to take B1 = C` and ε −1 k B2 = C`0 . Now for ω ∈ χ (B1) ∩ Pm, it must be the case that X0 (ω) ∈ B1 and hence also that `m k k `m −1 χ(Tk ω) = X`m(ω) = X0 (ω) ∈ B1 for all ` ∈ N. In particular, Tk ω ∈ χ (B1) for all ` ∈ N. −1 −1 However, χ (B1) ∩ χ (B2) = ∅, so  `m −1  −1  −1  −1  P Tk χ (B1) ∩ Pm ∩ χ (B2) = 0 < P χ (B1) ∩ Pm P χ (B2) .

As Tk is weakly mixing, and {`m : ` ∈ N} is a set of strictly positive upper density, this is a contradiction and indeed one has (Xk) is aperiodic. Lemma 7 concludes, as  k k ˜ −t −t t t  P X−t = X−t,...,Xt = Xt = P χ(Tk ξ(˜ω)) = χ(T ξ(˜ω)), . . . , χ(Tkξ(˜ω)) = χ(T ξ(˜ω)) ˜ −t −t t t  ≥ P Tk ξ(˜ω) = T ξ(˜ω),...,T ξ(˜ω) = T ξ(˜ω) −t t t t  = P Tk ω = T ω, . . . , Tkω = T ω . 12 ISAAC LOH

Case 2: Approximation with nonergodic processes. Let X be a ergodic and stationary aperiodic process with law P = ξ∗P.˜ Let T :(X Z, P) → (X Z, P) be the usual left shift map associated with X, which is ergodic by hypothesis. As in Case 1, we may approximate T by a 0 0 sequence of weakly mixing transformations (Tk)k∈N satisfying Tk → T (uniform topology). We take 0 the further step of approximating each Tk with a transformation Tk which is not ergodic. As we have seen, the fact that P is nonatomic (Lemma 8) implies that there exist two disjoint sets B1,B2 ⊂ X −1  −1  −1 −1 such that P χ (B1) , P χ (B2) > 0. Hence, we may pick a subset Ak ⊂ χ (B1) t χ (B2) −1  −1  c −1  c −1  such that P Ak ∩ χ (B1) , P Ak ∩ χ (B2) , P Ak ∩ χ (B1) , P Ak ∩ χ (B2) > 0 and 0 < 1 P(Ak) < k , but also Z  −1  −1 −1 −1 EP 1ω∈ξ (B1)|ω ∈ Ak ≡ P(Ak) 1ω∈ξ (B1) dP < EP 1ω∈ξ (B1) < 1. Ak 0 c Now note that Tk induces a transformation on both Ak and Ak (c.f. [26], §3.11). In particular, Z ∞ 0 n for any ω ∈ X and B ∈ B let tB,k(ω) = min{t > 0 : (Tk) (ω) ∈ B}. Then, the induced c c transformations T : A → A and T c : A → A defined by Ak k k Ak k k

t (ω) 0 Ak,k TAk (ω) = (Tk) (ω)

0 tAc ,k(ω) T c (ω) = (T ) k (ω) Ak k c are ergodic measure preserving transformations on Ak and Ak, respectively. Define  TAk (ω) if ω ∈ Ak Tk(ω) ≡ c T c (ω) if ω ∈ A Ak k ∞ Then Tk is invertible, and for all B ∈ B

−1   −1 −1 c  c P T (B) = P T (B ∩ Ak) ∪ T c (B ∩ Ak) = P (B ∩ Ak) + P (B ∩ Ak) = P (B) , k Ak Ak c so Tk ∈ M(Ω, P). Note that Tk is not ergodic because it fixes the sets Ak and Ak. However, we c claim that Tk is weakly mixing on its restrictions to Ak and Ak, with respect to the conditional probabilities P| and P| c . For by Theorem 6.4.2 of [26], T is weakly mixing on A if and only Ak Ak Ak k if TAk ×S is ergodic for any ergodic, finite measure perserving transformation S on a measure space

(Y, Y, Q) (in the sense that TAk ×S fixes only sets in X ×Y which have zero or full P×Q-measure). 0 By design, Tk is weakly mixing so that Tk ×S is ergodic on X ×Y for any ergodic measure preserving

S. Moreover, it is straightforward to verify that TAk × S is the induced transformation of Tk × S on Ak × Y , which is ergodic by Lemma 3.11.2 of [26]. Hence, TAk × S is ergodic for every ergodic and measure preserving S, and T is weakly mixing as well. The same proof establishes that T c Ak Ak c is weakly mixing on Ak. k t By measurability of Tk, we may again consider the stationary stochastic process Xt ≡ χ(Tk◦ξ(˜ω)) ∞ for t ∈ Z, which induces a standard probability distribution Pk on (X Z, B ). As in the first case k we must show that (Xt ) is aperiodic. We again appeal to a proof by contradiction. Suppose that k (X ) is periodic with some positive probability, so that there is some m ∈ N and positive measure t t+m c set Pm = {ω : χ(Tkω) = χ(Tk ω) for all t ∈ Z}; then Pm ∩ Ak or Pm ∩ Ak has positive measure. Suppose without loss of generality that the former is true (the same proof applies to both cases). Then, as in the preceding part, for all ` ∈ N one has `m −1 −1 Tk (χ (B1) ∩ Pm ∩ Ak) ∩ (χ (B2) ∩ Ak) = ∅, which is a contradiction to the fact that the restriction Tk|Ak = TAk is weakly mixing on Ak. Therefore (Xk) is aperiodic. ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 13

k Now we argue that the stochastic process Xt is not ergodic. By ergodicity of TAk and construction of Ak, t t 1 X k 1 X j −1 lim I(X ∈ B1) = lim I(T ω ∈ χ (B1)) n→∞ t j n→∞ t Ak j=1 j=1 a.s.   −1 = E 1ω∈χ (B1)|ω ∈ Ak   −1 < E 1ω∈χ (B1) = P (X0 ∈ B1) . In fact, almost nowhere in Ω is it the case that the limit on the left hand side of the preceding k display is P (X0 ∈ B1), which implies that (Xt ) is not ergodic. 0 Finally we argue that Tk → T (uniform topology). By the definition of Tk we have d(Tk,Tk) = 0 c  −1 c −1 c  P(T 6= T ) ≤ P A ∩ {t c = 1} → 1. But note that {t c = 1} = T (A ) and P T (A ) = k k k Ak Ak k k c −1 P(Ak) > 1 − k . So c  P A ∩ {t c = 1} > 1 − 2/k, k Ak and we have 0 0 lim inf d(Tk,T ) ≤ lim inf d(Tk,Tk) + lim inf d(Tk,T ) = 0. k→∞ k→∞ k→∞ Lemma 7 then implies the desired convergence result, as in Case 1.

Case 3: Approximation with mean-ergodic processes. This follows directly from Case 1 and the observation that every ergodic process is also mean-ergodic, by Corollary 6.

Case 4: Approximation with non-mean-ergodic processes. Note that by Case 1 it suffices to consider the case where X is ergodic, so fix such a process and its law P on X Z. Let g = χ, the projection of ω ∈ X Z onto its 0th coordinate. By assumption, g ∈ L1(X Z, P). We have seen that the pushforward measure χ∗P is not a point mass, and as such there exist disjoint sets B1,B2 ⊂ R −1  −1  c such that P χ (B1) , P χ (B2) > 0, B2 = B1, and

y ∈ B1 =⇒ y > EP (X0) . ∞ As in the proof of Case 2, there exists a positive measure set Ak ∈ B and a P-preserving trans- c formation Tk such that Ak, Tk → T (uniform topology), and Tk fixes Ak and Ak and is measure- preserving and weakly mixing on each piece. Moreover, by restricting the measure P (Ak ∩ B2) to be sufficiently small relative to P (Ak ∩ B1), we may clearly guarantee that E [χ(ω)|ω ∈ Ak] > E[χ(ω)]. k t k Letting as usual Xt (˜ω) = X(Tk ◦ ξ(˜ω)), one quickly verifies (X ) is stationary, aperiodic, and ∞ integrable by the method of Case 2 and the fact that Tk is measure preserving on (X Z, B , P) so  k  that for all t,E |Xt | = E [|X0|] < ∞. However, for ω ∈ Ak, ergodicity of Tk on Ak implies t t 1 X k 1 X j a.s. lim X (ω) = lim χ(T (ω)) = EP (χ(ω)|ω ∈ Ak) dP > EP (X0(ω)) , n→∞ t j n→∞ t k j=1 j=1 k which implies also that (Xt )t∈Z is not mean-ergodic. As in Case 1, Lemma 7 concludes.  In the following lemma, we use the convention adopted in (3) of enriching sample spaces. In particular, given a random variable X with law P supported (Ω, F, P), we identify X with the 0 0 0 random variable (Xt ◦ π1)t∈Z supported on the product space (Ω × Ω , F × F , P × P ) and use P to denote the product measure P × P0. Lemma 9. If X is a Polish space satisfying |X | ≥ 2 then for any stationary X -valued stochastic process X on (Ω, F, P) and any standard probability space (Ω0, F0, P0), there is a sequence (Xk) of stationary and aperiodic X -valued stochastic processes on (Ω × Ω0, F ⊗ F0, P × P0) satisfying k k k limk→∞ P X−t = X−t,...,Xt = Xt = 1, for all t. Moreover, if X is non-deterministic, (X ) can d k be chosen to satisfy X0 = X0 for all k ∈ N. 14 ISAAC LOH

Proof. For all t ∈ Z let Yt be iid copies of the r.v. X0 which are independent of X if X is nondegn- k erate, and otherwise iid copies of some non-deterministic X -valued random variable. Let Ct be iid  0 w.p. 1 − 1/k random variables satisfying Ck = chosen independently of all other random t 1 w.p. 1/k variables. It is straightforward to see that the random vector (Ck,Y ) can be constructed over the (standard) product probability space Ω˜ 0 = Ω˜ Z × (X Z)Z by invoking Lemma 5, where Ω˜ is just the unit interval with Lebesgue measure and X Z is equipped with the pushforward measure induced by X. By isomorphism, (Ck,Y ) can be constructed on Ω0 =∼ Ω˜ 0. Make this construction, so that (X,Ck,Y ) has domain Ω × Ω0 with appropriate product σ-algebra and measure P × P0. According to our convention, let P indicate this extension of our original measure to the product space. For all t ∈ Z, k ∈ N let  k Xt if Ct = 0 Xt ≡ , Yt if Ct = 1 which is a measurable mapping of Ω × {0, 1}Z × X Z into X Z.(Xk) can be written in the form 1 2 γ(T × S × S ), where T is the measure preserving shift of (5) and S1 and S2 are measure pre- serving left shifts on {0, 1}Z and X Z, respectively. As products of measure preserving transforma- tions are measure preserving, Proposition 3 implies that it is a stationary process. Moreover, k d k k k (X ) satisfies X0 = X0 in the non-deterministic case. Also, P X−t = X−t,...,Xt = Xt ≥ k P(C−t = ··· = Ct = 0) ≥ 1 − (2t + 1)/k → 1. It remains only to show that (X ) is aperiodic. k k  This follows if we can show that for all m ∈ N,P X0 = X`m for all ` ∈ Z = 0. Fix such an m. Let the random subset S ⊂ Z be given by S ≡ {t : Ct = 1} and let A denote the subset of Ω × {0, 1}Z × X Z on which |S ∩ mZ| = ∞. Note that by independence P (A) = 1, and  k k   k k  P X0 = X`m for all ` ∈ Z = P X0 = X`m for all ` ∈ Z|A

 k k  ≤ P X0 = Xt for all t ∈ S ∩ mZ|A

 k  = P X0 = Yt for all t ∈ S ∩ mZ|A h   i k k = E P X0 = Yt for all t ∈ S ∩ mZ | X0 , S, A A = 0, where the last line follows by independence of Y from the other variables and nondegeneracy of Yt for all t.  In conjunction with Proposition 4, Lemma 9 implies the approximation result of Theorem 1 k (note that in Lemma 9, if X0 is taken to be integrable, then so is X0 for all k).

References [1] Oleg N. Ageev and Cesar E. Silva. Genericity of rigid and multiply recurrent infinite measure-preserving and nonsingular transformations. Topology Proceedings, 26:357–365, 2002. [2] S. Alpern and V.S. Prasad. Coding a stationary process to one with prescribed marginals. The Annals of Prob- ability, 17:1658–1663, 1989. [3] S. Bezuglyi, J. Kwiatkowski, and K. Medynets. Approximation in ergodic theory, Borel and Cantor dynamics. Contemporary Mathematics, 385:39–64, 2005. [4] P. Billingsley. Probability and Measure. Wiley Series in Probability and Statistics. Wiley, 1995. [5] V.I. Bogachev. Measure Theory. Number v. 2 in Measure Theory. Springer Berlin Heidelberg, 2007. [6] Ivan A. Canay, Andres Santos, and Azeem M. Shaikh. On the testability of identification in some nonparametric models with endogeneity. Econometrica, 81:2535–2559, 2013. [7] J. R. Choksi and Shizuo Kakutani. Residuality of ergodic measurable transformations and of ergodic transfor- mations which preserve an infinite measure. Indiana University Mathematics Journal, 28(3):453–469, 1979. [8] Ian Domowitz and Mahmoud A. El-Gamal. A consistent test of stationary-ergodicity. Econometric Theory, 9:589–601, 1993. ON THE TESTABILITY OF THE ERGODIC HYPOTHESIS 15

[9] Ian Domowitz and Mahmoud A. El-Gamal. A consistent nonparametric test of ergodicity for time series with applications. Journal of Econometrics, 102:365–398, 2001. [10] Rick Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 4 edition, 2010. [11] N.A. Friedman. Introduction to ergodic theory. Van Nostrand Reinhold mathematical studies. Van Nostrand Reinhold, 1970. [12] Christian Grillenberger and Ulrich Krengel. On marginal distributions and isomorphisms of stationary processes. Mathematische Zeitschrift, 149:131–154, 1976. [13] Paul R. Halmos. In general a measure preserving transformation is mixing. Annals of Mathematics, 45:786–792, 1944. [14] K. Itˆo. An Introduction to Probability Theory. Cambridge University Press, 1984. [15] O. Kallenberg. Foundations of Modern Probability. Probability and Its Applications. Springer New York, 2002. [16] S. Karlin and H.E. Taylor. A First Course in Stochastic Processes. Elsevier Science, 2012. [17] John C. Kieffer. On the approximation of stationary measures by periodic and ergodic measures. Annals of Probability, 2(3):530 – 534, 1974. [18] John C. Kieffer. On coding a stationary process to achieve a given marginal distribution. The Annals of Proba- bility, 8:131–141, 1980. [19] John C. Kieffer. On obtaining a stationary process isomorphic to a given process with a desired distribution. Monatshefte fur Mathematik, 96:183–193, 1983. [20] Andrew B. Noble. Hypothesis testing for families of ergodic processes. Bernoulli, 12:251–269, 2006. [21] K.R. Parthasarathy. On the category of ergodic measures. Illinois Journal of Mathematics, 5:648–656, 1961. [22] V.A. Rokhlin. Selected topics from the metric theory of dynamical systems. Uspekhi Mat. Nauk, 4:57–128, 1949. [23] Joseph P. Romano. On non-parametric testing, the uniform behaviour of the t-test, and related problems. Scandinavian Journal of Statistics, 31(4):567–584, 2004. [24] Daniil Ryabko. A criterion for hypothesis testing for stationary processes. arXiv e-prints, page arXiv:0905.4937, May 2009. [25] Shalop. Limit of measurable functions is measurable? Mathematics Stack Exchange. URL:https://math.stackexchange.com/questions/1343860/limit-of-measurable-functions-is-measurable. [26] Cesar E. Silva. Invitation to ergodic theory, volume 42 of Student Mathematical Library. American Mathematical Society, 2007. [27] S.M. Srivastava. A Course on Borel Sets. Graduate Texts in Mathematics. Springer New York, 1998.