Bayesian Definition of Random Sequences with Respect to Conditional Probabilities

Bayesian deﬁnition of random sequences with respect to conditional probabilities

Hayato Takahashi ∗

April 3, 2020

Abstract We consider computable joint distributions on sample and parameter spaces (Bayes models). For each random sequence y∞, we consider T P (·|y) the following four sets: (i) y→y∞ R , the limit of ML-random ∞ set with respect to (w.r.t.) P (·|y) as y → y , (ii) Ry∞ , the section of R at y∞, where R is the Martin-Löf(ML-)random set w.r.t. the ∞ joint distribution, (iii) RP (·|y ), ML-random set w.r.t. P (·|y∞), and ∞ ∞ (iv) RP (·|y ),y , ML-random set w.r.t. P (·|y∞) with oracle y∞. The ∞ ∞ ∞ equation RP (·|y ) = RP (·|y ),y for all y∞ was first proved Kjos Hanssen (2010) for the Bernoulli model and by Bienvenu et al. (2011) for effectively compact, effectively orthogonal models. Effectively orthogonal models imply the consistent Bayes models but the converse is not true. We show that for all random parameter y∞ the following properties (i)-(iii) are satisfied in the Bayes models (Takahashi T P (·|y) P (·|y∞),y∞ T P (·|y) 2017): (i) y→y∞ R ⊇ Ry∞ ⊇ R and y→y∞ R ⊇ P (·|y∞) P (·|y∞),y∞ T P (·|y) P (·|y∞) R ⊇ R , (ii) y→y∞ R = Ry∞ ⊇ R ⊇ P (·|y∞),y∞ T P (·|y) R if the model is consistent, and (iii) y→y∞ R = P (·|y∞) P (·|y∞),y∞ Ry∞ = R = R if the model is consistent and the conditional distribution is computable with the parameter oracle. By modifying the example presented in Bauwens (2017), we demonstrate a con- ∞ P (·|y∞) arXiv:1701.06342v6 [cs.IT] 2 Apr 2020 sistent Bayes model and random y such that Ry∞ \R 6= ∅. We show that the posterior distribution converges weakly to ML-random parameters when the model is consistent and define conditional random set by Ry∞ (the inverse of the Bayes estimator) to validate consistent theorem in Bayes models for individual ML-random parameters and conditional random sequences. Keywords: Martin-Löfrandom, generalized van Lambalgen’s theorem, conditional probability, consistent theorem, Bayes models, collective ∗Random Data Lab., Tokyo 121-0062 Japan. Email: [email protected].

1 1 Introduction

A point is called random with respect to (w.r.t.) a probability measure if it is not rejected by a set of statistical tests under the measure. A random set depends on the class of tests and the measure. Kolmogorov introduced the idea of recursion theory into statistics and defined a random set as the complement of the effective null sets [13–15,17]. An effective null set is defined as the limit of uniformly recursive enumerable sets whose probability tends to zero effectively. Many standard statistical tests have been proven to be effective null sets [16, 20]. The random sets thus defined are called Martin-Löfrandom (ML-random) sets [17] and the effective null sets are called Martin-Löftests (ML-tests). A parametrized family of distributions is called a parametric model. Con- ditional distributions are parametric models with priors (probability measures on a parameter space). Parametric models without priors are called frequentist models. Statisticians infer the true model from observed data (samples). A true model is a probability measure for which the observed data are random. A measurable function from the sample space to the parameter space is called an estimator.

Definition 1 An estimator fˆ is called consistent for the frequentist model ˆ−1 Pθ, θ ∈ I if Pθ(f (θ)) = 1 for all θ ∈ I. A frequentist model Pθ, θ ∈ I is called consistent if there exits a consistent estimator. For the necessary and sufficient conditions for the consistency of the frequentist models, refer to Breiman et al. [5] and Section 4. In a frequentist model Pθ, θ ∈ I, we often infer the model with the maximum likelihood estimator ˆ (MLE) f(x) := arg maxθ∈I Pθ(x) for sample x [30]. von Mises [28] defined the notion of collectives for the Bernoulli model (frequentist model). A sequence is called a collective (random) if the MLE (sample mean) converges to the true model and the subsequences obtained by the admissible selection functions are random w.r.t. the true model.

Deﬁnition 2 (von Mises Collective [28], pp.155–156 [16]) Let Pθ, θ ∈ [0, 1] be the Bernoulli model with parameter θ. A sequence is called a collective if for sample mean f (MLE for the Bernoulli model), (A1) the true model Pθ is deﬁned from the sequence x, i.e., f(x) = θ, (B1) subsequences obtained by the admissible selection functions are random w.r.t. the model, and −1 (C1) Pθ(f (θ)) = 1 for all θ ∈ [0, 1].

2 In Theorem 8.15 presented by Bienvenu et al. [4], the notion of collectives was generalized in terms of uniform randomness. In this study, we investigate random sets w.r.t. conditional distributions. We start with the joint probability measures on sample and parameter spaces, which are called the Bayes models. The marginal distributions on sample and parameter spaces are defined from the joint distribution. The conditional probabilities are defined under the Bayes models as the limit of the conditional probability when the probability of the conditional event tends to zero, i.e., the limit of martingale convergence sequences (see [12] or Theorem 2). The conditional probabilities for the given sample sequences are called posterior distributions. We do not assume uniform computability on conditional distributions (see Remark 2). In the Bayes model, we infer the true model with the posterior distribution. The Bayes model is called consistent when the posterior distribution weakly converges to the true model for almost all parameters. From The- orem 3, when the model is consistent, there exists a surjective measurable function f (Bayes estimator) from the ML-random set on a sample space to the ML-random set on a parameter space1. We call f ML-consistent Bayes estimator (Definition 4). We define a conditional random set w.r..t. conditional distributions for a given ML-random parameter y∞ by the inverse of ML-consistent estimator f −1(y∞) for the consistent Bayes models. By our definition, the Bayes consistent theorem holds true for individual ML- random parameters and the conditional random sequences given the parameter as follows.

Theorem 1 Assume that the Bayes model is computable and consistent. For all ML-random y∞ for all random sequences x∞ w.r.t. conditional distributions given y∞, the Bayes estimator converges to y∞ as x → x∞.

1.1 Conditional randomness

Let N be the set of natural numbers, Ω be the set of binary sequences, and S be the set of finite binary strings. Let ∆(x) := {xω|ω ∈ Ω} be the cylinder set for x ∈ S where xω is the concatenation of x and ω ∈ Ω. To clarify the difference between strings and sequences, we use symbols such as x, y for strings and x∞, y∞ for sequences. Let λ be the empty word. We write x v y or x @ y if x is a prefix of y for x, y ∈ S including the case x = y. Let B1 and B2 be the σ-algebras generated from {∆(x)|x ∈ S} and {∆(x) × ∆(y)|x, y ∈ S}, respectively. We study random sets w.r.t. proba-

1This gives an algorithmic solution to a classical statistical problem, see Remark 1.

3 bility measures on (X × Y, B2) and (Ω, B1), where X = Y = Ω. Let P be a probability measure on (X × Y, B2), and PX and PY be marginal distributions on X and Y , respectively. Let P (x, y) := P (∆(x) × ∆(y)), PX (x) := P (∆(x) × Ω), and PY (y) := P (Ω × ∆(y)). Set P (x|y) := P (x, y)/PY (y) if PY (y) 6= 0 and PY |X (y|x) := P (x, y)/PX (x) (posterior distribution) if PX (x) 6= 0 for all x, y ∈ S. Let Bc be the complement of B. For two sets A and B, set A \ B := A ∩ Bc. Two probability measures P and Q are called orthogonal if there is a Borel set A such that P (A) = 1 and Q(A) = 0. For a set A ⊆ Ω × Ω, let ∞ ∞ ∞ ∞ Ay∞ := {x | (x , y ) ∈ A} be the section at y . ˜ S We write A := x∈A ∆(x) for A ⊆ S. Let U ⊆ S × N. U is called a test (effective null set) or ML-test w.r.t. (Ω, B,P ) if U is recursively enumerable −n (r.e.), ∀n Uñ ⊇ Uñ+1, and P (Uñ) < 2 , where Un = {x|(x, n) ∈ U}. The ML-random set w.r.t. P is defined as the complement of the effective P P S T ˜ c null sets w.r.t. P . We denote it by R , i.e., R := ( U:test n Un) . Let P,A S T ˜ c R := ( U:test n Un) , where U is a test with oracle A, i.e., U is r.e. with −n P P,A oracle A and ∀n Uñ ⊇ Uñ+1, and P (Uñ) < 2 . Similarly, R and R are defined w.r.t. (Ω × Ω, B2,P ). In [17], Martin-Löfrandom sets were defined for computable probability measures; however, for simplicity, RP and RP,A are called ML-random sets. We obtain the following theorem from the martingale convergence theorem for individual ML-random sequences [21, 22].

Theorem 2 (Takahashi [21–23]) Let P be a computable probability measure on (X × Y, B2) and

∀x ∈ S, y∞ ∈ RPY P (x|y∞) := lim P (x|y), y→y∞ if the right-hand side exists.

For each y∞ ∈ RPY ,P (·|y∞) is a probability measure on (Ω, B). (1)

Definition 3 The family of probability measures P (·|y∞), y∞ ∈ RPY is called a standard conditional distribution. Two versions of conditional probabilities are equivalent if they coincide with almost all y∞. From the Radon-Nikodýmtheorem, the standard conditional distribution is a version of conditional probability [31]. A version of conditional probability is called regular if it satisfies (1). From Theorem 2, the standard conditional distribution is regular. To argue the theorems in the Bayes models for individual ML-random points, e.g., consistent theorem

4 of the posterior distributions (Theorem 3), we fix a version of conditional probability to the standard conditional distribution. Joint measure P de- fines standard conditional distribution and PY , and vice versa. We give the definition of the consistent Bayes models.

Deﬁnition 4 The Bayes model (X ×Y, B2,P ) is called consistent if there is a measurable f : X → Y (consistent Bayes estimator) such that P (f −1(y∞) | ∞ ∞ P P y ) = 1 for almost all y w.r.t. PY . A measurable f : R X → R Y is called an ML-consistent Bayes estimator if P (f −1(y∞) | y∞) = 1 for all y∞ ∈ RPY .

∞ Let δy∞ be the probability measure such that δy∞ ({y }) = 1, i.e., unit ∞ mass at y . The posterior distribution converges weakly to δy∞ as x → ∞ x (PY |X (·|x) ⇒ δy∞ for short) if and only if for any neighborhood A of ∞ y , limx→x∞ PY |X (A | x) = 1. The following theorem shows equivalent conditions for the consistent Bayes models.

Theorem 3 (Takahashi [21, 23, 24]) Let P be a computable probability measure on (X × Y, B2), where X = Y = Ω. The following ﬁve statements are equivalent: (i) P (· | y) ⊥ P (· | z) if ∆(y) ∩ ∆(z) = ∅ for y, z ∈ S. (ii) RP (· | y) ∩ RP (· | z) = ∅ if ∆(y) ∩ ∆(z) = ∅ for y, z ∈ S. ∞ ∞ P ∞ (iii) For (x , y ) ∈ R , PY |X (·|x) ⇒ δy∞ as x → x . P P ∞ ∞ ∞ ∞ (iv) Ry∞ ∩ Rz∞ = ∅ if y 6= z for all y , z . (v) There is a measurable f : RPX → RPY (ML-consistent Bayes estimator) such that P (f −1(y∞) | y∞) = 1 for all y∞ ∈ RPY . (vi) There is a measurable f : X → Y (consistent Bayes estimator) such −1 ∞ ∞ ∞ that P (f (y ) | y ) = 1 for almost all y w.r.t. PY . The proof of Theorem 3 is based on the likelihood ratio test. The equivalence of (i) and (vi) is proved in Breiman et al. [5]. The equivalence of (i) and (ii) is due to Martin-L¨of[18] (see also Theorem 4.1 in [24] and Theorem 8.6 in [4]). An ML-random set in the consistent Bayes model is illustrated in Figure 1. By relativizing the random sets with oracle P , Theorem 3 holds true without assuming the computability of P .

Remark 1 The frequentist models investigate the estimators that are consistent for all parameters, while the Bayes models study those that are consistent for almost all parameters (von Mises [28], Doob [7]). The identiﬁcation of the points for which points the posterior distributions weakly converge constitutes the problem (see pp.4 Diaconis and Freedman [6]). Theorem 3

5 (v) shows that the posterior distribution weakly converges to ML-random parameters if and only if the Bayes model is consistent. Even though Theo- rem 3 holds true for the joint probability measures on Ω × Ω, Theorem 3 (v) is novel because we did not assume any condition other than the consistency of the model. As for relevant studies, Freedman [8] and Schwartz [19] showed that the posterior distributions weakly converge to the support of the prior under a smooth condition on the parameters for parametric models with finite alphabets. Here the complement of the support of the measure is the largest open null sets. An ML-random set is a subset of the support of the measure since the open null sets are ML-tests. Freedman [9] showed that the posterior distributions weakly converge to sets of first category for i.i.d. processes with infinite alphabets. Recall that the ML-random sets are sets of the first category.

X RP

x∞

Y y∞ z∞

Figure 1: Random set w.r.t. the consistent Bayes model. The vertical lines P P P ∞ ∞ show Ry∞ and Rz∞ , i.e., the sections of R at parameters y and z . P ∞ ∞ PY P PX P (Ry∞ |y ) = 1 if y ∈ R ; otherwise Ry∞ = ∅. R is the projection of P PX S P P P ∞ ∞ R to X and R = y∞∈RPY Ry∞ . If Ry∞ ∩ Rz∞ = ∅ for all y , z ∈ RPY , y∞ 6= z∞, then there is a function f : RPX → RPY (ML-consistent Bayes estimator) such that f(x∞) = y∞ for all (x∞, y∞) ∈ RP . Points on vertical lines consist of strong zero-one sets, see Section 4.

We deﬁne conditional randomness for which Theorem 1 is valid.

6 Definition 5 Suppose that the Bayes model is consistent and f is an ML- consistent Bayes estimator. f −1(y∞), y∞ ∈ RPY is the random set w.r.t. the standard conditional probability distribution P (·|y∞), y∞ ∈ RPY . P (·|y∞) is called the true model of x∞ if x∞ ∈ f −1(y∞). Definition 5 implies that (A) for all x∞ ∈ RPX , there is a unique true model P (·|f(x∞)), (B) x∞ ∈ f −1(y∞)(x∞ is random w.r.t. P (·|y∞)), and (C) for all y∞ ∈ RPY P (f −1(y∞) | y∞) = 1. Compare statements (A1), (B1), and (C1) in Definition 2 with statements (A), (B), and (C). The collectives, parameter set [0, 1], and MLE (sample mean) in Definition 2 are replaced with RPX , RPY , and Bayes estimators in Definition 5, respectively. Consider the following sets for y∞ ∈ RPY , T P (·|y) ∞ (i) y→y∞ R , the limit of ML-random set w.r.t. P (·|y) as y → y , P P ∞ (ii) Ry∞ , the section of global ML-random set R at y . (iii) RP (·|y∞), ML-random set w.r.t. P (·|y∞), and (iv) RP (·|y∞),y∞ , ML-random set w.r.t. P (·|y∞) with oracle y∞. The relations between the sets (i)–(iv) are shown in Fig 2. The set RP (·|y∞) is also called a Hippocratic [11] (blind [4]) random set. We examine if f is an ML-consistent Bayes estimator when conditional random set f −1 is equivalent to the sets above.

Theorem 4 (i) Suppose that the Bayes model is consistent. Let f(x∞) := y∞ if (x∞, y∞) ∈ RP . The function f is an ML-consistent Bayes estimator −1 ∞ T P (·|y) P ∞ PY and f (y ) = y→y∞ R = Ry∞ for all y ∈ R . ∞ (ii) Let f be a function deﬁned by f −1(y∞) = RP (·|y ) for all y∞ ∈ RPY ∞ ∞ or f −1(y∞) = RP (·|y ),y for all y∞ ∈ RPY . There is a consistent Bayes model in which f is not an ML-consistent Bayes estimator. Proof of (i). From Theorem 3, f is a ML-consistent Bayes estimator. From −1 ∞ P T P (·|y) ∞ PY Theorem 7, f (y ) = Ry∞ = y→y∞ R for all y ∈ R and we have the statement (i). Proof of (ii). Theorem 8 shows a consistent Bayes model such that ∞ ∞ ∞ PX S P (·|y ),y PX S P (·|y ) R \ y∞∈RPY R ⊇ R \ y∞∈RPY R 6= ∅ and we have (ii). T P (·|y) P ∞ PY From Theorem 7, we have y→y∞ R ⊇ Ry∞ for all y ∈ R . When the model is not consistent, from Lemma 2, there is an example such T P (·|y) P ∞ PY that y→y∞ R \Ry∞ 6= ∅ for all y ∈ R .

7 T P (·|y) P By choosing a smaller set from y→y∞ R and Ry∞ , as in [22], we propose the following deﬁnition.

Definition 6 Let P be a computable joint probability measure on (X × P ∞ PY Y, B2). Let Ry∞ , y ∈ R be the set of random sequences w.r.t. the standard conditional probability. The conditional random sets defined in Definition 6 satisfy Theorem 1. However, for example, let RP (·|y∞) be a conditional random set w.r.t. P (·|y∞). Then, by Theorem 4, there is a consistent Bayes model that does not satisfy Theorem 1. The conditional random sets defined in Definition 6 satisfy most theorems in Bayesian statistics (see [22, 24]). In Section 2, we study ML-random sets in the Bayes models. In Section 3, we show the diagram of the relations between sets (i)-(iv) in Figure 2. In Section 4, we compare our definition of random sets in the Bayes models with uniform randomness [4, 10, 11] in frequentist models.

2 ML-random sets and Bayes models

We apply the randomness theory to the Bayes models. The van Lambalgen theorem [27] states that a pair of sequences (x∞, y∞) ∈ Ω2 is ML-random w.r.t. the product of uniform measures if and only if x∞ is ML-random and y∞ is ML-random with oracle x∞. The following theorem is the generalized van Lambalgen theorem.

Theorem 5 (Takahashi [21–24]) Let P be a computable probability measure on (X × Y, B2). Then,

∞ ∞ P P (·|y ),y ∞ PY Ry∞ ⊇ R for all y ∈ R . (2)

Fix y∞ ∈ RPY and suppose that P (·|y∞) is computable with oracle y∞. Then, P P (·|y∞),y∞ Ry∞ = R . (3)

Remark 2 1. Vovk and V’yugin [29] proved (3) for uniformly computable conditional distributions. If the conditional distribution is uniformly computable, then the uniform random set w.r.t. P (·|y∞) is equal to RP (·|y∞),y∞ (refer to Theorem 5.3 in [22] and Lemma 4.22 in [4] for the Bernoulli model). 2. Theorem 5.2 in [22] demonstrates equation (2) when P (·|y∞) is computable with oracle y∞. However, the same proof for (2) holds true when

8 T P (·|y) P T P (·|y)

y→y∞ R Ry∞ = y→y∞ R (i) ⊆ ⊆ (ii) ⊆ (v)

P P (·|y∞) P (·|y∞)

Ry∞ R R (iii) ⊆ ⊆ (iv) ⊆ (vi)

RP (·|y∞),y∞ RP (·|y∞),y∞

(a) General Case (b) Consistent Case

Figure 2: Relations between conditional random sets. Figure (a) shows relations between conditional ML-random sets. Figure (b) shows those relations when the Bayes model is consistent. If the conditional probability is computable with ML-random oracle y∞ and the Bayes model is consistent then the four sets in ﬁgure (b) are equal. Theorem 8 shows a counter-example of the equalities in (ii) and (v). Lemma 2 shows a counter-example of the equal- P P (·|y∞) ities in (i) and (iv). Lemma 2 and Theorem 8 show that Ry∞ and R are incomparable in the general case. The counter-example of equality in (iii) is due to Bauwens [2, 3]. The equality in (vi) for consistent frequentist models was presented in Kjos Hanssen [11] and Bienvenu et al. [4], see Section 4. An example of the equality in (vi) for non-uniformly computable orthogonal conditional distributions is presented in Theorem 8. It remains open whether the equality in (vi) always holds true for all ML-random y∞.

9 P (·|y∞) is not computable with oracle y∞. The equation (3) holds true without supposing uniform computability of the conditional distributions. 3. Non-computable conditional distributions are presented in the work by Ackerman et al. [1]. Bauwens [2] showed an example that violates the equal- P (·|y∞),y∞ P ity in (3) and R is a proper subset of Ry∞ in (2) when the conditional distribution is not computable with oracle y∞. In [25], an example is shown for which for all y∞ the conditional distributions are not computable with oracle y∞, but (3) holds true. For more details on the generalized van Lambalgen theorem, see the survey [3].

Theorem 6 (Takahashi [21–23]) Let P be a computable probability measure on (X × Y, B2).

P ∞ ∞ PY P P (Ry∞ |y ) = 1 if y ∈ R and Ry∞ = ∅ else.

PX [ P R = Ry∞ . y∞∈RPY

3 Variants of conditional random sets

We show the diagram in Fig. 2.

Lemma 1 (Takahashi [26]) Let P be a computable probability measure on X × Y , where X = Y = Ω. Let y∞ ∈ RPY then

∞ P (·|y∞) P (·|y) ∀y @ y R ⊆ R .

Proof) First we prove the lemma for y = λ. Let y = λ, then P (·|y) = PX . X X×λ X X×λ Let U be a test w.r.t. PX and U := {(x, λ, n)|(x, n) ∈ U }. U is ˜ X×λ ˜ X a test w.r.t. P . Since (Un )y∞ = Un , from Corollary 4.1 in [22], for all y∞ ∈ RPY there is an integer M such that

X ∞ M ∀k P ( I ˜ X > k|y ) < , Un k n

∞ ∞ ˜ X where I is the characteristic function, i.e., I ˜ X (x ) = 1 if x ∈ U and Un n X ˜ X P k 0 otherwise. Thus, we have a test V (∀k V = { I ˜ X > M2 }) k n Un w.r.t. P (·|y∞) and the lemma is proved for y = λ. We can show the lemma ∞ for any ﬁnite preﬁx y @ y similarly.

10 Theorem 7 (Takahashi [26]) Let P be a computable probability measure on (X × Y, B2).

∞ \ P (·|y) P (·|y ) \ P (·|y) P ∞ PY R ⊇ R and R ⊇ Ry∞ for all y ∈ R . (4) y→y∞ y→y∞ If P is consistent, we have

∞ ∞ \ P (·|y) P P (·|y ) P (·|y),y ∞ PY R = Ry∞ ⊇ R ⊇ R for all y ∈ R . (5) y→y∞

Fix y∞ ∈ RPY and suppose that P (·|y∞) is computable with oracle y∞. If P is consistent, we have

\ P (·|y) P P (·|y∞) P (·|y∞),y∞ R = Ry∞ = R = R . (6) y→y∞ Proof) Similarly, as presented in Theorem 6, it can be shown that

P (·|y) [ P ∀y R = Ry∞ . (7) y∞∈RPY ∩∆(y) From Lemma 1, we obtain (4). P P ∞ ∞ Suppose that Ry∞ ∩ Rz∞ = ∅ if y 6= z . Then from (7), we have

\ P (·|y) P R = Ry∞ . y→y∞ From Lemma 1 and Theorem 3, we obtain (5). The third part follows from Theorem 5 and (5). The following lemma shows counter-example of equality in (i) and (iv) in Fig. 2.

Lemma 2 Let P := U × U, where U is the uniform probability measure T P (·|y) P P (·|y∞) P (·|y∞),y∞ on Ω, then y→y∞ R \Ry∞ 6= ∅ and R \R 6= ∅ for y∞ ∈ RU . Proof) The diagonal set is covered by a test, i.e., {(x∞, x∞)|x∞ ∈ Ω} ⊆ (RP )c. Let y∞ ∈ RU , from van Lambalgen’s theorem, we have y∞ ∈/ U,y∞ P U T P (·|y) R = Ry∞ ⊆ R = y→y∞ R . P P (·|y∞) To show an example in which Ry∞ \R 6= ∅ in (5), we construct a consistent Bayes model with the properties (i)–(iv) listed in Theorem 8 by modifying the Bauwens example [2, 3]. Note that the example in [2, 3] does not imply (iv) in Theorem 8.

11 X

B1 Y 10∞ 110∞ 1∞

Figure 3: Construction of the counter example. The joint measure P con- centrates on the thick lines. P is consistent. The conditional distribution P (·|y∞) is not continuous at the parameter y∞ = 1∞.

Theorem 8 There is a probability measure P on (X ×Y, B2) with the properties that (i) P is computable, (ii) {1∞} ∈ RPY and P (·|1∞) is not computable, (iii) P (·|y) ⊥ P (·|z) for all ∆(y) ∩ ∆(z) = ∅, and ∞ ∞ P P (·|1 ) PX S P (·|y ) (iv) R1∞ \R 6= ∅ and R \ y∞∈RPY R 6= ∅. Here, 1∞ is the sequence consisting of 1s. Except for the clearly stated cases, do not confuse symbols such as x∞ with the repetition of a string. Proof of Theorem 8) First, we prove (i). Let PX be the uniform distribution 0 P −K(s) on X. Let α = s∈S 2 , where K is the preﬁx complexity. Let α ∈ Ω 0 PX P −i be the binary expansion of α , then α ∈ R [16]. Let g(s) := 1≤i≤k 2 si and |s| := k for s = s1 ··· sk, si = 0 or 1 for 1 ≤ i ≤ k. Consider a com- 0 putable sequence of strings a1, a2,... such that g(a1) < g(a2) < ··· < α 0 and limi g(ai) = α . For all i ∈ N, set

Ai := {s ∈ S | |s| = |ai|, g(s) < g(ai)},

Bi := A˜i \ A˜i−1 for i ≥ 1, and A0 = ∅.

We deﬁne the measure P on X × Y as follows. For all x ∈ S and i ∈ N

12 deﬁne

P (λ, 0) := 0, i ∞ P ((Bi ∩ ∆(x)) × {1 0 }) := PX (Bi ∩ ∆(x)), c i ∞ P (Bi × {1 0 }) := 0, c i c P (((A˜i−1) ∩ ∆(x)) × ∆(1 )) := PX ((A˜i−1) ∩ ∆(x)), and i P (A˜i−1 × ∆(1 )) := 0, where 1i is the string consisting of i 1s and 0∞ is the sequence consisting of 0s. For the construction of the measure, see Fig. 3. By construction, the S i total measure of P is one. Let C := i Bi × ∆(1 0). For all x, y ∈ S, we have

P (x, y) = P ((∆(x) × ∆(y)) ∩ C) if y 6= 1|y|, ˜ c |y| P (x, y) = PX (∆(x) ∩ (A|y|−1) ) if y = 1 . Since the right-hand sides are computable, P is computable. k c ∞ Next we show (ii). Since P (Ω×∆(1 )) = PX ((A˜k−1) ), we have PY ({1 }) = ∞ c ∞ P P (Ω × ∆(1 )) = PX (∩k(A˜k) ) = 1 − α > 0 and {1 } ∈ R Y . We ∞ ∞ have P (x|1 ) = 0 if ∆(x) ⊆ ∪iA˜i and P (x|1 ) = PX (x)/(1 − α) if c ∞ ∆(x) ⊆ ∩i(A˜i) . Since α is not computable, P (·|1 ) is not computable. i ∞ ∞ By construction, we have PY ({1 0 }) > 0 for all i and PY ({1 }) > 0. S i ∞ ∞ c S i ∞ ∞ c Since PY (( i{1 0 } ∪ {1 }) ) = 0, the set ( i{1 0 } ∪ {1 }) is covered PY S i ∞ ∞ ∞ by a test and we have R = i{1 0 }∪{1 }. By construction, P (·|y ) is orthogonal for different y∞ ∈ RPY ; we obtain statement (iii) of the theorem. i 1 Similar to [2, 3], set Un := {s | ∃i g(s) < g(a ) + n }. Then U = {(s, n) | ∞ ∃n s ∈ Un} is r.e. and P (Uñ|1 ) ≤ 1/n for all n. U is a test that covers α, i.e., α∈ / RP (·|1∞). Let b1 v b2 ··· be an increasing sequence of prefixes of α such that k bn → α as n → ∞. By construction, ∀k∃M∀n ≥ MPY |X (1 |bn) = 1. We k ∞ ∞ PY |X (·|α),α have ∀k PY |X ({1 }|α) = 1, PY |X ({1 }|α) = 1, and {1 } ∈ R . PX Since PY |X (·|α) is computable with oracle α ∈ R , from Theorem 5, we ∞ P P P (·|1∞) have (α, 1 ) ∈ R and α ∈ R1∞ . Since α∈ / R , from Theorem 7, we have the first part of statement (iv) of the theorem. ∞ P P (·|1 ) P ∞ PY Since α ∈ R1∞ , α∈ / R , Ry∞ are disjoint for different y ∈ R ∞ P P (·|y ) ∞ PY (Theorem 3), and Ry∞ ⊇ R for all y ∈ R (Theorem 7), we have the latter part of statement (iv) of the theorem. The conditional distribution P (·|y∞) in Theorem 8 is not continuous at 1∞ and it is not effectively compact. P (·|y∞) is not computable with

4 Randomness in frequentist and Bayes models

We study randomness in consistent frequentist models and consistent Bayes models. We compare (6) in Theorem 7 with the uniform randomness in effectively compact, effectively orthogonal model (Theorem 10) in the framework of Bayes and frequentist models. Equation (6) in Theorem 7 is valid for the consistent Bayes models. Corollary 2 shows that effective orthogonal models imply the consistent Bayes models but the converse is not true. A ∈ B1 is called a strong zero-one set if for all θ, Pθ(A) = 1 or 0. The following is a special case shown in [5].

Theorem 9 (Breiman et. al [5]) A frequentist model is consistent if and S only if the σ-algebra generated by strong zero-one set A{θ | Pθ(A) = 1} equals B1. Effective compact models [4] are frequentist models with infinite dimensional parameters. In this paper, for simplicity, we consider effective compact models with one-dimensional parameters. Let RPθ,θ, θ ∈ Ω be the uniform random sets for an effective compact model. In [4], an effective compact 0 model is called effectively orthogonal if RPθ,θ ∩ RPθ0 ,θ = ∅ for θ 6= θ0.

Corollary 1 A frequentist model is consistent if it is eﬀectively orthogonal. S Proof) Let C := strong zero-one set A{θ | Pθ(A) = 1} for an eﬀective orthog- S Pθ,θ onal model. Since θ∈O R is a strong zero-one set for an open set O S Pθ,θ and O = {θ | Pθ( θ∈O R ) = 1}, C contains every open sets and the σ-algebra generated by C is B1. By Theorem 9, we have the proof. Kjos Hanssen [11] showed (8) for the Bernoulli model. The following is a generalization of the Kjos Hanssen theorem.

Theorem 10 (Theorem 5.41 in Bienvenu et al. [4]) Let Pθ be an effectively compact, eﬀectively orthogonal model for θ ∈ Ω. Then

RPθ = RPθ,θ for all θ. (8)

14 Theorem 11 (Doob [7]) A Bayes model is consistent if the frequentist model is consistent. There are counter-examples for the converse of Theorem 11.

Example 1 1. Let Pθ, θ ∈ Ω be any parametric model. Let prior m be the ∞ ∞ unit mass at 0 , i.e., m{0 } = 1. Then the Bayes model Pθ, θ ∈ Ω with prior m consists of a single model P0∞ that is consistent. In particular, if the frequentist model is not consistent, the converse of Theorem 11 is not true for this example. P P xn n− xn 2. Let Pθ, θ ∈ [0, 1] be the Bernoulli model, i.e., Pθ(x) := θ (1−θ) 1 for x = x1x2 ··· xn ∈ S. Let Q0(x) := 2 (P0(x) + P1(x)) if θ = 0 and Qθ(x) := Pθ(x) if θ ∈ (0, 1] for all x. Qθ, θ ∈ [0, 1] is uniformly computable. ˆ Suppose that there is a measurable f :Ω → [0, 1] such that Qθ(Aθ) = 1 for ∞ ˆ ∞ P1 1 P1 all θ, where Aθ := {x | f(x ) = θ}. Since Q0(R ) = 2 and Q1(R ) = 1, we have A0 ∩ A1 6= ∅, and Qθ, θ ∈ [0, 1] is not consistent by Theorem 9. Let U be the uniform measure on [0, 1], then 0 and 1 are not random w.r.t. U. U The Bayes model Qθ, θ ∈ R with prior U is consistent since it is equivalent U to Pθ, θ ∈ R with prior U. Since the model Qθ, θ ∈ [0, 1] is not continuous at θ = 0, it is not closed.

Corollary 2 A Bayes model is consistent if it is eﬀectively orthogonal, but the converse is not true. Proof) By Theorem 11 and Example 1, consistent frequentist models imply consistent Bayes models but the converse is not true. By Corollary 1, we have the proof. If a frequentist model is uniformly computable and consistent, both (6) in Theorem 7 and Theorem 10 are valid. Thus, Theorem 10 is valid for all parameters while (6) in Theorem 7 is valid for the ML-random parameters. Schwartz [19] presented Bayes models that are not consistent at some parameters. Equation (6) in Theorem 7 cannot be extended to all parameters. There is a point that is not ML-random w.r.t. any computable prior since the ML-random sets are sets of the ﬁrst category and by the Baire category theorem, the complement of the countable union of the ML-random sets is not empty. See [4] and Gacs [10] for more details on uniform random sets.

Acknowledgment

15 The author wishes to thank the anonymous referees for insightful comments, which helped in highlighting the relevant results and classical statistical problems, and Editage (www.editage.com) for English language editing. A part of this work was supported by KAKENHI (24540153) and has been presented at ISIT2017, MSJ2017 at Tokyo Metropolitan Univ., MSJ2017 at Yamagata Univ., MSJ2020 (online presentation) at Nihon Univ., SITA2017 Niigata, CCR 2017 Mysour, Ergod theory Seminar at Tsukuba Univ. (2016), and Probability Seminar at Kyoto Univ. (2017).

References

[1] N. L. Ackerman, C. E. Freer, and D. M. Roy, On the computability of conditional probability, IEEE 26th Annual Symposium on Logic in Computer Sci- ence, 2011, arXiv:1005.3014, pp. 107–116. [2] B. Bauwens, Conditional measure and the violation of van Lambalgen’s theorem for Martin-Löfrandomness, Theory Comput. Syst. 60 (2017), no. 2, 314–323, arXiv:1103.1529. [3] B. Bauwens, A. Shen, and H. Takahashi, Conditional probabilities and van Lambalgen theorem revisited, Theory Comput. Syst. 61 (2017), no. 4, 1315– 1336. [4] L. Bienvenu, P. Gács,M. Hoyrup, C. Rojas, and A. Shen, Algorithmic tests and randomness with respect to a class of measures, Proc. of the Steklov Institute of Mathematics 274 (2011), 41–102, arXiv:1103.1529v2. [5] L. Breiman, L. LeCam, and L. Schwartz, Consistent estimates and zero-one sets, Ann. Math. Statist. 35 (1964), no. 1, 157–161. [6] P. Diaconis and D. Freedman, On the consistency of Bayes estimates, Ann. Statist. 14 (1986), no. 1, 1–26. [7] J. L. Doob, Application of the theory of martingales, Le Calcul des Probabilités et ses Applications. Colloq. Intern. du C.N.R.S. (Paris), 1948, pp. 22–28. [8] D. A. Freedman, On the asymptotic behavior of Bayes’ estimates in the discrete case, Ann. Math. Statist. 34 (1963), 1386–1403. [9] , On the asymptotic behavior of Bayes’ estimates in the discrete case II, Ann. Math. Statist. 36 (1965), 454–456. [10] P. Gács, Uniform test of algorithmic randomness over a general space, Theoret. Comput. Sci. 341 (2005), 91–137. [11] Bjørn Kjos Hanssen, The probability distribution as a computational resource for randomness testing, Journal of Logic and Analysis 2 (2010), no. 10, 1–13. [12] A. N. Kolmogorov, Grundbegriffe der wahrscheinlichkeitsrechnung, vol. 2, Eng. Math., no. 3, Springer Verlag, Berlin, 1933.

16 [13] , On tables of random numbers, Sankhy¯a 25 (1963), 369376. [14] , Three approaches to the quantitative definition of information, Probl. Inf. Transm. 1 (1965), no. 1, 1–7. [15] , Logical basis for information theory and probability theory, IEEE Trans. Inform. Theory 14 (1968), no. 5, 662–664. [16] M. Li and P. Vitányi, An introduction to Kolmogorov complexity and its applications, third ed., Springer, New York, 2008. [17] P. Martin-Löf, The definition of random sequences, Information and Control 9 (1966), 602–609. [18] , Notes on constructive mathematics, Almqvist & Wiksell, Stockholm, 1968. [19] L. Schwartz, On Bayes procedures, Z. Wahrscheinlichkeitstheorie 4 (1965), 10– 26. [20] A. Shen, V. A. Uspenski’ı, and N. K. Vereshchagin, Kolmogorov complexity and algorithmic randomness, AMS, 2017. [21] H. Takahashi, Bayesian approach to a definition of random sequences and its applications to statistical inference, 2006 IEEE International Symposium on Information Theory, July 2006, pp. 2180–2184. [22] , On a definition of random sequences with respect to conditional probability, Inform. and Compt. 206 (2008), 1375–1382. [23] , Some problems of algorithmic randomness on product space, The 8th Workshop on Stochastic Numerics, vol. 1620, RIMS Kôkyûroku,Kyoto Uni- versity, 2009, pp. 175–196. [24] , Algorithmic randomness and monotone complexity on product space, Inform. and Compt. 209 (2011), 183–197. [25] , Generalization of van lambalgen’s theorem and blind randomness for conditional probabilities, 2014, arXiv:1310.0709v3. [26] , Bayesian definition of random sequences with respect to conditional probabilities, 2017 IEEE International Symposium on Information Theory, June 2017, pp. 1296–1300. [27] M. van Lambalgen, Random sequences, Ph.D. thesis, Universiteit van Amster- dam, 1987. [28] R. von Mises, Probability, statistics and truth, Dover, 1981. [29] V. G. Vovk and V. V. V’yugin, On the empirical validity of the Bayesian method, J. R. Stat. Soc. B 55 (1993), no. 1, 253–266. [30] A. Wald, Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist. 20 (1949), 595–601.

17 [31] D. Williams, Probability with martingale, Cambridge university press, Cam- bridge, 1991.