<<

Condorcet Jury Theorem: The dependent case Bezalel Peleg and Shmuel Zamir1 Center for the Study of Rationality The Hebrew University of Jerusalem.

Abstract We provide an extension of the Condorcet Theorem. Our model includes both the Nitzan-Paroush framework of “unequal competencies” and Ladha’s model of “corre- lated by the jurors.” We assume that the jurors behave “informatively”; that is, they do not make a strategic use of their information in voting. Formally, we consider a of binary random variables X = (X1,X2,...,Xn,...) with range in 0,1 and a joint probability distribution P. The pair (X,P) is said to satisfy the Con- { } Σn n dorcet Jury Theorem (CJT) if limn ∞ P i=1Xi > 2 = 1. For a general (dependent) distribution P we provide necessary→ as well as sufficient conditions for the CJT that  establishes the validity of the CJT for a domain that strictly (and naturally) includes the domain of independent jurors. In particular we provide a full characterization of the exchangeable distributions that satisfy the CJT. We exhibit a large family of dis- 1 ∞ Σn Σ tributions P with liminfn n(n 1) i=1 j=iCov(Xi,Xj) > 0 that satisfy the CJT. We → − 6 do that by “interlacing” carefully selected pairs (X,P) and (X ′,P′). We then proceed to project the distributions P on the planes (p,y), and (p,y∗) where p = liminfn ∞ pn, 2 → y = liminfn ∞ E(Xn pn) , y∗ = liminfn ∞ E Xn pn , pn = (p1 + p2,... + pn)/n, → − → | − | and Xn = (X1 + X2,... + Xn)/n. We determine all feasible points in each of these planes. Quite surprisingly, many important results on the possibility of the CJT are obtained by analyzing various regions of the feasible set in these planes. P ∞ P In the space of all probability distributions on Sp = 0,1 , let 1 be the set P { }P P P of all probability distributions in that satisfy the CJT and let 2 = 1 be the set of all probability distributions in P that do not satisfy the CJT. We\ prove that P P P P both 1 and 2 are convex sets and that 2 is dense in (in the weak topology). Using an appropriate separation theorem we then provide an affine functional that separates these two sets.

JEL classifcation: D71, D72

Keywords: Condorcet jury theorem, majority voting games, correlated voting, exchangeable variables

1We thank Marco Scarsini and Yosi Rinott for drawing our attention to de Finetti’s theorem.

1 Introduction

The simplest way to present our problem is by quoting Condorcet’s classic result (see Young (1997)):

Theorem 1. (CJT–Condorcet 1785) Let n voters (n odd) choose between two alternatives that have equal likelihood of being correct a priori. Assume that voters make their judgements independently and that each has the same 1 probability p of being correct ( 2 < p < 1). Then, the probability that the group makes the correct judgement using simple is

n h n h ∑ [n!/h!(n h)!]p (1 p) − h=(n+1)/2 − −

which approaches 1 as n becomes large.

We generalize Condorcet’s model by presenting it as a game with incomplete infor- mation in the following way: Let I = 1,2,...,n be a set of jurors and let D be the defendant. There are two states of nature{: g – in which} D is guilty, and z – in which D is innocent. Thus the set of states of nature is S = g,z . Each juror has an action set A with { } two actions: A = c,a . The action c is to convict D. The action a is to acquit D. Before the voting, each juror{ i}gets a private random signal ti T i := ti ,ti . In the terminology ∈ { g z} of games with incomplete information, T i is the type set of juror i. The interpretation is i i that juror i of type tg thinks that D is guilty while juror i of type tz thinks that D is innocent. The signals of the jurors may be dependent and may also depend on the the state of na- ture. In our model the jurors act “informatively” (not “strategically”); that is, the strategy σ i i σ i i σ i i of juror i is : T A given by (tg) = c and (tz) = a. The definition of informative voting is due to Austen-Smith→ and Banks (1996), who question the validity of the CJT in a strategic framework. Informative voting was, and is still, assumed in the vast majority of the literature on the CJT , mainly because it is implied by the original Condorcet assump- tions. More precisely, assume, as Condorcet did, that P(g) = P(z) = 1/2 and that each i i juror is more likely to receive the “correct” signal (that is, P(tg g) = P(tz z) = p > 1/2); then the strategy of voting informatively maximizes the probability| of voting| correctly, among all four pure voting strategies. Following Austen-Smith and Banks, strategic vot- ing and Nash Equilibrium were studied by Wit (1998), Myerson (1998), and recently by Laslier and Weibull (2008), who discuss the assumption on preferences and beliefs under which sincere voting is a Nash equilibrium in a general deterministic majoritarian voting rule. As we said before, in this work we do assume informative voting and leave strategic considerations and equilibrium concepts for the next phase of our research. The action taken by a finite society of jurors 1,...,n (i.e. the jury verdict) is determined by a sim- ple majority (with some tie-breaking{ rule,} e.g., by coin tossing). We are interested in the probability that the (finite) jury will reach the correct decision. Again in the style of games

2 1 n with incomplete information let Ωn = S T ,..., T be the set of states of the world. A state of the world consists of the state× of× nature× and a list of the types of all jurors. (n) Denote by p the probability distribution on Ωn. This is a joint probability distribution on the state of nature and the signals of the jurors. For each juror i let the i i i Xi : S T 0,1 be the indicator of his correct voting, i.e., Xi(g,tg) = Xi(z,tz) = 1 and ×i →{ i } (n) Ω Xi(g,tz) = Xi(z,tg) = 0. The probability distribution p on n induces a joint probability (n) distribution on the the vector X =(X1,...,Xn), which we denote also by p . If n is odd, then the probability that the jury reaches a correct decision is

n (n) n p ∑ Xi > i=1 2! . Figure 1 illustrates our construction in the case n = 2. In this example, according to p(2) the state of nature is chosen with unequal probabilities for the two states: P(g) = 1/4 and P(z) = 3/4 and then the types of the two jurors are chosen according to a joint probability distribution that depends on the state of nature.

Nature @ 1/4 @ 3/4 @ g @ z @2 2 2 @2 @2 2 1 @ tg tz 1 @ tg @ tz t1 1 1 t1 g 4 4 g 0 0

t1 1 1 t1 z 4 4 z 0 1

Figure 1 The probability distribution p(2).

Guided by Condorcet, we are looking for theorems as the the size of the jury increases. Formally, as n goes to infinity we obtain an increasing sequence of “worlds”, Ω ∞ Ω Ω Ω ( n)n=1, such that for all n, the projection of n+1 on n is the whole n. The corre- (n) ∞ sponding sequence of probability distributions is (p )n=1 and we assume that for every (n+1) (n) n, the marginal distribution of p on Ωn is p . It follows from the Kolmogorov ex- tension theorem (see Loeve (1963), p. 93) that this defines a unique probability P on the (projective, or inverse) limit

1 n Ω = lim Ωn = S T ... T ... ∞ n ← × × × (n) such that, for all n, the marginal distribution of P on Ωn is p .

3 In this paper we address the the following problem: Which probability measures P de- rived in this manner satisfy the Condorcet Jury Theorem (CJT ); that is, Which probability measures P satisfy n n lim P Σ Xi > = 1. n ∞ i=1 2 → As far as we know, the only existing result on this general problem is that of Berend and Paroush (1998), which deals only with independent jurors. Rather than working with the space Ω and its probability measure P, it will be more convenient to work with the infinite sequence of binary random variables X =(X1,X2,...,Xn,...) (the indicators of “correct voting”) and the induced probability measure on it, which we shall denote also by P. Since the pair (X,P) is uniquely deter- mined by (Ω,P) , in considering all pairs (X,P) we cover all pairs (Ω,P). We provide a full characterization of the exchangeable X =(X1,X2,...,Xn,...) that satisfy the CJT. For a general (dependent) distribution P, not necessarily exchange- able, we provide necessary as well as sufficient conditions for the CJT . We exhibit a large 1 Σn Σ family of distributions P with liminfn ∞ i=1 j=iCov(Xi,Xj) > 0 that satisfy the → n(n 1) 6 CJT. − P ∞ P In the space of all probability distributions on Sp = 0,1 , let 1 be the set of all P {P } P P probability distributions in that satisfy the CJT and let 2 = 1 be the set of all P \ P P probability distributions in that do not satisfy the CJT . We prove that both 1 and 2 P P are convex sets and that 2 is dense in (in the weak topology). Using an appropriate separation theorem we then provide an affine functional that separates these two sets.

1 Sufficient conditions

Let X =(X1,X2,...,Xn,...) be a sequence of binary random variables with range in 0,1 and with joint probability distribution P. The sequence X is said to satisfy the Condorcet{ } Jury Theorem (CJT) if n n lim P Σ Xi > = 1 (1) n ∞ i=1 2 → We shall investigate necessary as well as sufficient conditions for CJT. Given a sequence of random binary variables X =(X1,X2,...,Xn,...) with joint distri- 2 bution P denote pi = E(Xi), Var(Xi) = E(Xi pi) and Cov(Xi,Xj) = E[(Xi pi)(Xj − − − p j)], for i = j, where E denotes, as usual, the expectation operator. Also let p = 6 n (p1 + p2,... + pn)/n and X n =(X1 + X2,... + Xn)/n. Our first result provides a sufficient condition for CJT : Σn n Theorem 2. Assume that i=1 pi > 2 for all n > N0 and

2 E(Xn p ) lim − n = 0, (2) n ∞ (p 1 )2 → n − 2

4 or equivalently assume that

p 1 lim n − 2 = ∞; (3) n ∞ 2 E(Xn p ) → − n then the CJT is satisfied. p Proof.

n n n n P Σ Xi = P Σ Xi i=1 ≤ 2 − i=1 ≥ −2    n n  n n = P Σ pi Σ Xi Σ pi i=1 − i=1 ≥ i=1 − 2  n n n n P Σ pi Σ Xi Σ pi ≤ | i=1 − i=1 | ≥ i=1 − 2   Σn n By Chebyshev’s inequality (assuming i=1 pi > 2 ) we have

Σn Σn 2 2 n n n n E i=1Xi i=1 pi E(Xn pn) P Σ pi Σ Xi Σ pi − = − | i=1 − i=1 | ≥ i=1 − 2 ≤ Σn n 2 1 2 pi  (pn 2 )   i=1 − 2 − As this last term tends to zero by (2), the CJT (1) then follows.

Σn Σ Corollary 3. If i=1 j=iCov(Xi,Xj) 0 for n > N0 (in particular if Cov(Xi,Xj) 0 for 6 1 ∞≤ ≤ all i = j) and limn ∞ √n(pn ) = , then theCJT is satisfied. 6 → − 2 Proof. Since the variance of a binary random variable X with mean p is p(1 p) 1/4 we have for n > N0, − ≤ 2 1 n 2 0 E(Xn p ) = E (Σ (Xi pi)) ≤ − n n2 i=1 − 1 Σn Σn Σ 1 = i=1Var(Xi) + i=1 j=iCov(Xi,Xj) n2 6 ≤ 4n 1 ∞  Therefore if limn ∞ √n(pn ) = , then → − 2 2 E(Xn p ) 1 0 lim − n lim = 0 ≤ n ∞ (p 1 )2 ≤ n ∞ 4n(p 1 )2 → n − 2 → n − 2

Remark 4. Note that under the condition of corollary 3, namely, for bounded random variable with all covariances being non-positive, the (weak) (LLN) holds for arbitrarily dependent variables (see, e.g., Feller (1957) volume I, exercise 9, p. 262). This is not implied by corollary 3 since, as we shall see later, the CJT, strictly speaking, is not a law of large numbers. In particular,CJT does not imply LLN and LLN does not imply CJT.

5 Remark 5. When X1,X2,...,Xn,... are independent, then under mild conditions 1 ∞ limn ∞ √n(pn 2 ) = is a necessary and sufficient condition for CJT (see→ Berend and− Paroush (1998)).

Remark 6. The sequence X =(X1,X2,...,Xn,...) of i.i.d. variables with P(Xi = 1) = P(Xi = 0) = 1/2 satisfies the LLN but does not satisfy the CJT, since it does not satisfy Berend and Paroush’s necessary and sufficient condition; therefore LLN does not imply CJT.

Given a sequence X =(X1,X2,...,Xn,...) of binary random variables with a joint prob- ability distribution P, we define the following parameters of (X,P): p := liminf p (4) n ∞ n → p := limsup pn (5) n ∞ → 2 y := liminfE(Xn p ) (6) n ∞ n → − 2 y := limsupE(Xn pn) (7) n ∞ − → y∗ := liminfE Xn p (8) n ∞ n → | − | y∗ := limsupE Xn pn (9) n ∞ | − | → We first observe the following: Remark 7. If p > 1/2 and y = 0 then the CJT is satisfied.

2 2 Proof. As E(Xn pn) 0, if y = 0 then limn ∞ E(Xn pn) = 0. Since p > 1/2, there − ≥ → − exists n0 such that pn > (1/2+ p)/2 for all n > n0. The result then follows by Theorem 2.

2 Necessary conditions using the L1-norm

Given a sequence X =(X1,X2,...,Xn,...) of binary random variables with a joint probabil- ity distribution P, if y > 0, then we cannot use Theorem 2 to conclude CJT . To derive necessary conditions for the CJT, we first have: Proposition 8. If theCJT holds then p 1 . ≥ 2 ∞ ω ω Proof. Define a sequence of events (Bn)n=1 by Bn = Xn( ) 1/2 0 . Since the Σn n { | − ≥ } CJT holds, limn ∞ P i=1Xi > = 1 and hence limn ∞ P(Bn) = 1. Since → 2 → 1  1 1 p = E(Xn )) P(Ω Bn), n − 2 − 2 ≥ −2 \ taking the liminf, the right-hand side tends to zero and we obtain that 1 liminfn ∞ pn = p . → ≥ 2 6 We shall first consider a stronger violation of Theorem 2 than y > 0; namely, assume that y > 0. We shall prove that in this case, there is a range of distributions P for which the CJT is false. 2 2 First we notice that for 1 x 1, x x . Hence E Xn p E(Xn p ) for − ≤ ≤ | | ≥ | − n| ≥ − n all n and thus y > 0 implies that y∗ > 0 We are now ready to state our first impossibility theorem, which can be readily trans- lated into a necessary condition.

Theorem 9. Given a sequence X =(X1,X2,...,Xn,...) of binary random variables with 1 y∗ joint probability distribution P, if p < 2 + 2 , then the (X,P) violates the CJT.

Proof. If y∗ = 0, then the CJT is violated by Proposition 8. Assume then that y∗ > 0 and choosey ˜ such that 0 < y˜ < y and 2t := y˜ + 1 p > 0. First we notice that, since ∗ 2 2 − E(Xn pn) = 0, we have E max(0, pn X n) = E max(0,Xn pn), and thus, since y∗ > 0, we have− − − y˜ E max(0, p Xn) > for n > n. (10) n − 2 If (Ω,P) is the probability space on which the sequence X is defined, for n > n define the events y˜ Bn = ω p X n(ω) max(0, t) (11) { | n − ≥ 2 − } By (10) and (11), P(Bn) > q > 0 for some q and y˜ p Xn(ω) t for ω Bn, n > n. (12) n − ≥ 2 − ∈ ∞ Choose now a subsequence (nk)k=1 such that y˜ 1 p < + t = p +t, k = 1,2,... (13) nk 2 2 −

By (12) and (13), for all ω Bn we have, ∈ k y˜ 1 Xn (ω) p +t < , k ≤ nk − 2 2 1 and thus P(Xn > ) 1 q < 1, which implies that P violates the CJT. k 2 ≤ − 1 Corollary 10. If liminfn ∞ pn 2 and liminfn ∞ E Xn pn > 0, then P violates the CJT. → ≤ → | − |

7 3 Necessary conditions using the L2-norm

Let X =(X1,X2,...,Xn,...) be a sequence of binary random variables with a joint proba- bility distribution P. In this section we take a closer look at the relationship between the parameters y and y∗(see (7) and (9)). We first notice that y > 0 if and only if y∗ > 0. Next 1 1 we notice that pn 2 for n > n implies that Xn pn 2 for n > n. Thus, by corollary 10, ≥ − ≤ 1 if y > 0 and the CJT is satisfied then max(0,Xn pn) 2 for n > n. Finally we observe the following lemma, whose proof is straightforward:− ≤ ω ω Lemma 11. If limsupn ∞ P pn X n( ) pn/2 > 0, then theCJT is violated. → { | − ≥ } We now use the previous discussion to prove the following theorem:

1 Theorem 12. If (i) liminfn ∞ pn > and (ii) liminfn ∞ P(Xn > pn/2) = 1, then y∗ 2y. → 2 → ≥ 1 Proof. Condition (i) implies that pn > 2 for all n > N0 for some N0. This implies that 1 1 limn ∞ P(Xn pn < ) = 1. Also, (ii) implies that limn ∞ P(pn X n ) = 1, and thus → − 2 → − ≤ 2 1 1 lim P( Xn p ) = 1 (14) n ∞ 2 n 2 → − ≤ − ≤ 1 1 Define the events Bn = ω Xn(ω) p ; then by (14) { | − 2 ≤ − n ≤ 2 } 2 liminf (Xn pn) dP = y (15) n ∞ B − → Z n and

liminf Xn pn dP = y∗. (16) n ∞ B | − | → Z n Since any u [ 1 , 1 ] satisfies u 2u2, it follows from (15) and (16) that y 2y. ∈ − 2 2 | | ≥ ∗ ≥

2 1 Corollary 13. Let p = liminfn ∞ pn and y = liminfn ∞ E(X n pn) . Then if p < 2 + y then P does not satisfy theCJT.→ → −

1 Proof. Assume that p < 2 +y. If y = 0, then CJT is not satisfied by Proposition 8. Hence 1 we may thus assume that y > 0, which also implies that y∗ > 0. Thus, if liminfn ∞ pn 2 1 → ≤ then CJT fails by Corollary 10. Assume then that liminfn ∞ pn > . By Lemma 11 we → 2 may also assume that liminfn ∞ P X n > pn/2 = 1 and thus by Theorem 12 we have → y y 2y and hence p < 1 + y 1 + ∗ and the CJT fails by Theorem 9. ∗ ≥ 2 ≤ 2 2 

8 4 Dual Conditions

A careful reading of Sections 2 and 3 reveals that it is possible to obtain “dual” results to Theorems 9 and 12 and Corollary 13 by replacing “liminf” by “limsup”. More precisely, for a sequence X =(X1,X2,...,Xn,...) of binary random variables with joint probability distribution P, we let p = limsupn ∞ pn and y∗ = limsupn ∞ E Xn pn , and we have: → → | − | 1 y∗ Theorem 14. If p < 2 + 2 , then the (X,P) violates the CJT. Proof. As we saw in the proof of Corollary 13, we may assume that 1 liminfn ∞ pn and hence also → ≥ 2 1 p limsup p liminf p = n ∞ n , n ∞ ≥ n ≥ 2 → → y˜ 1 ∞ and hence y∗ > 0. Choosey ˜ such that 0 < y˜ < y∗ and 2t = 2 + 2 p > 0. Let (Xnk )k=1 be a subsequence of X such that −

lim E Xn pn = y∗. k ∞ | k − k | → As in (10) we get y˜ E max(0, p X n ) > for k > k. (17) nk − k 2 ∞ Define the events (Bnk )k=1 by y˜ Bn = ω p Xn (ω) t . (18) k { | nk − k ≥ 2 − }

By (17) and (18), P(Bnk ) > q for some q > 0 and y˜ ¯ p X n (ω) t for ω Bn and k > k¯. (19) nk − k ≥ 2 − ∈ k Now y˜ 1 limsup pn = p < + t. (20) n ∞ 2 2 − → y˜ 1 Thus, for n sufficiently large p < + t. Hence, for k sufficiently large and all ω Bn , n 2 2 − ∈ k y˜ 1 Xn (ω) p +t < . (21) k ≤ nk − 2 2 1 Therefore P(Xn > ) 1 q < 1 for sufficiently large k in violation of the CJT . k 2 ≤ − Similarly we have the “dual” results to those of Theorem 12 and Corollary 13: 1 Theorem 15. If (i) liminfn ∞ pn > and → 2 (ii) liminfn ∞ P(Xn > pn/2) = 1, then y∗ 2y. → ≥ 1 Corollary 16. If p < 2 + y then P does not satisfy theCJT. The proofs, which are similar respectively to the proofs of Theorem 12 and Corol- lary 13, are omitted.

9 5 Existence of distributions satisfying the CJT

In this section we address the issue of the existence of distributions that satisfy the CJT . For that, let us first clarify the relationship between the CJT and law of large numbers which, at first sight, look rather similar. Recall that an infinite sequence of binary random variables X =(X1,X2,...,Xn,...) with a joint probability distribution P satisfies the (weak) law of large numbers (LLN) if (in our notations):

ε > 0, lim P Xn p < ε = 1 (22) n ∞ n ∀ → | − | while it satisfies the Condorcet Jury Theorem (CJT ) if: 

1 lim P Xn > = 1 (23) n ∞ 2 →   1 Since by Proposition 8, the condition p 2 is necessary for the validity of the CJT, let us check the relationship between the LLN≥and the CJT in this region. Our first observation is:

Proposition 17. For a sequence X =(X1,X2,...,Xn,...) with probability distribution P 1 satisfying p > 2 , if the LLN holds then theCJT also holds. δ δ δ Proof. Let p = 1/2 + 3 for some > 0 and let N0 be such that pn > 1/2 + 2 for all n > N0; then for all n > N0 we have 1 1 P Xn > P Xn + δ P Xn p < δ 2 ≥ ≥ 2 ≥ | − n|     Since the last expression tends to 1 as n ∞, the first expression does too, and hence the CJT holds. →

1 Remark 18. The statement of Proposition 17 does not hold for p = 2 . Indeed, the se- quence X =(X1,X2,...,Xn,...) of i.i.d. variables with P(Xi = 1) = P(Xi = 0) = 1/2 satis- fies the LLN but does not satisfy theCJT (see Remark 6). Unfortunately, Proposition 17 is of little use to us. This is due to the following fact:

Proposition 19. If the random variables of the sequence X =(X1,X2,...,Xn,...) are uni- formly bounded then the condition 2 lim E Xn p = 0 n ∞ n → − is a necessary condition for LLN to hold.  The proof is elementary and can be found, e.g., in Uspensky (1937), page 185. For the sake of completeness it is provided here in the Appendix. It follows thus from Proposition 19 that LLN cannot hold when y > 0 and thus we cannot use Proposition 17 to establish distributions in this region that satisfy the CJT . Nevertheless, we shall exhibit a rather large family of distributions P with y > 0 (and p > 1/2) for which the CJT holds. Our main result is the following:

10 1 Theorem 20. Let t [0, 2 ]. If F is a distribution with parameters (p,y∗), then there exists a distribution H with∈ parameters p˜ = 1 t +tp and y˜ = ty that satisfy the CJT. − ∗ ∗ Proof. To illustrate the idea of the proof we first prove (somewhat informally) the case t = 1/2. Let X =(X1,X2,...,Xn,...) be a sequence of binary random variables with a joint probability distribution F. Let G be the distribution of the sequence Y =(Y1,Y2,...,Yn,...), where EYn = 1 for all n (that is, Y1 = Y2 = ...Yn = ... and P(Yi = 1) = 1 i). Consider now the following “interlacing” of the two sequences X and Y : ∀

Z =(Y1,Y2,X1,Y3,X2,Y4,X3,...,Yn,Xn 1,Yn+1,Xn...), − and let the probability distribution H of Z be the product distribution H = F G. Itis verified by straightforward computation that the parameters of the distribution×H are in 1 1 1 1 accordance with the theorem for t = 2 , namely,p ˜ = 2 + 2 p andy ˜∗ = 2 y∗. Finally, as each initial segment of voters in Z contains a majority of Yi’s (thus with all values 1), the 1 distribution H satisfies the CJT , completing the proof for t = 2 . The proof for a general t [0,1/2) follows the same lines: We construct the sequence Z so that any finite initial segment∈ of n variables, includes “about, but not more than” the initial tn segment of the X sequence, and the rest is filled with the constant Yi variables. This will imply that the CJT is satisfied. Formally, for any real x 0 let x be the largest integer less than or equal to x and let x be smallest integer greater≥ than⌊ or⌋ equal to x. Note that for any n and any 0 t 1 we⌈ ⌉ have tn + (1 t)n = n; thus, one and only one of the following holds: ≤ ≤ ⌊ ⌋ ⌈ − ⌉ (i) tn < t(n + 1) or ⌊ ⌋ ⌊ ⌋ (ii) (1 t)n < (1 t)(n + 1) ⌈ − ⌉ ⌈ − ⌉ From the given sequence X and the above-defined sequence Y (of constant 1 variables) we define now the sequence Z =(Z1,Z2,...,Zn,...) as follows: Z1 = Y1 and for any n 2, let ≥ Zn = X t(n+1) if (i) holds and Zn = Y (1 t)(n+1) if (ii) holds. This inductive construction ⌊ ⌋ ⌈ − ⌉ guarantees that for all n, the sequence contains tn Xi coordinates and (1 t)n Yi coordinates. The probability distribution H is the⌊ product⌋ distribution F ⌈ G.− The⌉ fact × that (Z,H) satisfies the CJT follows from:

(1 t)n (1 t)n > tn tn , ⌈ − ⌉ ≥ − ≥ ⌊ ⌋ and finallyp ˜ = 1 t +tp andy ˜ = ty is verified by straightforward computation. − ∗ ∗ Remark 21. The “interlacing” of the two sequences X and Y described in the proof of Theorem 20 may be defined for anyt [0,1]. We were specifically interested in t [0,1/2] since this guarantees the CJT. ∈ ∈

11 6 Feasibility considerations

The conditions developed so far for a sequence X =(X1,X2,...,Xn,...) with joint proba- bility distribution P to satisfy the CJT involved only the parameters p, p,y,y,y∗, and y∗. In this section we pursue our characterization in the space of these parameters. We shall look at the distributions in two different spaces: the space of points (p,y∗), which we call the L1 space, and the space (p,y), which we call the L2 space.

6.1 Feasibility and characterization in L1 2 With the pair (X,P) we associate the point (p,y∗) in the Euclidian plane R . It follows immediately that 0 p 1. We claim that y 2p(1 p) holds for all distributions P. ≤ ≤ ∗ ≤ − To see that, we first observe that E Xi pi = 2pi(1 pi); hence | − | − 1 n 1 n 2 n E Xn pn = E ∑(Xi pi) E ∑ Xi pi ∑ pi(1 pi). | − | n | i=1 − | ≤ n i=1 | − |! ≤ n i=1 −

n The function ∑ pi(1 pi) is (strictly) concave. Hence i=1 − n 1 E Xn pn 2 ∑ pi(1 pi) 2pn(1 pn). (24) | − | ≤ i=1 n − ≤ −

Finally, let p = limk ∞ pn ; then → k

y∗ = liminfE Xn pn liminfE Xn pn 2 lim pn (1 pn ) = 2p(1 p). n ∞ | − | ≤ k ∞ | k − k | ≤ k ∞ k − k − → → → The second inequality is due to (24). 2 Thus, if (u,w) denote a point in R , then any feasible pair (p,y∗) is in the region

FE1 = (u,w) 0 u 1, 0 w 2u(1 u) . (25) { | ≤ ≤ ≤ ≤ − }

We shall now prove that all points in this region are feasible; that is, any point in FE1 is attainable as a pair (p,y∗) of some distribution P. Then we shall indicate the sub-region of FE1 where the CJT may hold. We first observe that any point (u0,w0) FE1 on the ∈ parabola w = 2u(1 u), for 0 u 1, is feasible. In fact such (u0,w0) is attainable by − ≤ ≤ the sequence X =(X1,X2,...,Xn,...) with identical variables Xi, X1 = X2 = ... = Xn..., and EX1 = u0 (clearly p = u0, and y = 2u0(1 u0) follows from the dependence and from ∗ − E Xi pi = 2pi(1 pi) = 2u0(1 u0)). | − | − − Let again (u0,w0) be a point on the parabola, which is thus attainable. Assume that they are the parameters (p,y∗) of the pair (X,F). Let (Y,G) be the pair (of constant variables) described in the proof of Theorem 20 and let t [0,1]. By Remark 21 the t- interlacing of (X,F) and (Y,G) can be constructed to yield∈ a distribution with parameters p˜ = tp +(1 t) andy ˜ = ty (see the proof of Theorem 20). Thus, the line segment − ∗ ∗ defined byu ˜ = tu0 +(1 t) andw ˜ = tw0 for 0 t 1, connecting (u0,w0) to (1,0), − ≤ ≤ 12 w = y*

1/2 w = 2 u (1 − u )

(1 − u ) w 0 ( u 0 , w 0 ) w = 1 − u0

FE1 M 0 u = p 0 1/2 1

Figure 2 The feasible set FE1.

consists of attainable pairs contained in FE1. Since any point (u,w) in FE1 lies on such a line segment, we conclude that every point in FE1 is attainable. We shall refer to FE1 as the feasible set, which is shown in Figure 2. We now attempt to characterize the points of the feasible set according to whether the CJT is satisfied or not. For that we first define: Definition 22.

The strong CJT set, denoted by sCJT, is the set of all points (u,w) FE1 such that • ∈ any pair (X,P) with parameters p = u andy∗ = w satisfies the CJT.

The weak CJT set, denoted by wCJT, is the set of all points (u,w) FE1 for which • ∈ there exists a pair (X,P) with parameters p = u andy∗ = w that satisfies the CJT.

We denote sCJT = FE1 sCJT and wCJT = FE1 wCJT. − \ − \ For example, as we shall see later (see Proposition 24), (1,0) sCJT and (1/2,0) wCJT . ∈ ∈ By Theorem 9, if u < 1/2 + 1/2w, then (u,w) wCJT . Next we observe that if ∈ − (u0,w0) is on the parabola w = 2u(1 u) and M is the midpointof the segment [(u0,w0),(1,0)], then by the proof of Theorem 20,− the segment [M,(1,0)] wCJT (see Figure 2). To find the upper boundary of the union of all these segments,⊆ that is, the locus of the mid points M in Figure 2, we eliminate (u0,w0) from the equations w0 = 2u0(1 u0), and − (u,w) = 1/2(u0,w0) + 1/2(1,0), and obtain w = 2(2u 1)(1 u) (26) − − This is a parabola with maximum 1/4 at u = 3/4. The slope of the tangent at u = 1/2 is 2; that is, the tangent of the parabola at that point is the line w = 2u 1 defining the − 13 region wCJT . Finally, a careful examination of the proof of Theorem 20, reveals that for − every (u0,w0) on the parabola w = 2u(1 u), the line segment [(u0,w0),M] is in sCJT (see Figure 2). − − Our analysis so far leads to the conclusions summarized in Figure 3 describing the feasibility and and regions of CJT possibility for all pairs (X,P).

w = y* w= 2 u (1 − u ) w = 2 u − 1 1/2

−sCJT w = 2(2 u − 1)(1 − u ) 1/4 −wCJT

wCJT 0 u = p 0 1/2 3/4 1

Figure 3 Regions of possibility of CJT in L1.

Figure 3 is not complete in the sense that the regions wCJT and sCJT are not dis- joint, as it may mistakenly appear in the figure. More precisely, we complete− Definition 22 by defining:

Definition 23. The mixed CJT set, denoted by mCJT, is the set of all points (u,w) FE1 ∈ for which there exists a pair (X,P) with parameters p = u and y∗ = w that satisfies the CJT, and a pair (Xˆ ,Pˆ) with parameters pˆ = u and yˆ∗ = w for which theCJT is violated. Then the regions sCJT, wCJT , and mCJT are disjoint and form a partition of the − feasible set of all distributions FE1

FE1 = wCJT sCJT mCJT (27) − ∪ ∪ To complete the characterization we have to find the regions of this partition, and for that it remains to identify the region mCJT since, by definition, wCJT mCJT sCJT and sCJT mCJT wCJT . \ ⊂ − \ ⊂ − Proposition 24. All three regions sCJT, wCJT, and mCJT are not empty. − Proof. As can be seen from Figure 3, the region wCJT is clearly not empty; it contains for example the points (0,0) and (1/2,1/2). As− we remarked already (following Defini- tion 22), the region sCJT contains the point (1,0). This point corresponds to a unique pair (X,P), in which Xi = 1 for all i with probability 1, that trivially satisfies the CJT). The

14 region mCJT contains the point (1/2,0). To see this recall that the Berend and Paroush’s necessary and sufficient condition for CJT in the independent case (see Remark 5) is 1 lim √n(p ) = ∞ (28) n ∞ n 2 → − ˜ ˜ ˜ ∞ ˜ First consider the pair (X,P) in which (Xi)i=1 are i.i.d with P(Xi = 1) = 1/2 and ˜ 1 P(Xi = 0) = 1/2. Clearly √n(pn 2 ) = 0for all n and hence condition (28) is not satisfied, implying that the CJT is not satisfied.− Now consider (X,P) in which X =(1,1,0,1,0,1 ) with probability 1. This pair corresponds to the point (1/2,0) since ···

1 + 1 if n is even X = p = 2 n , n n 1 + 1 if n is odd  2 2n 1 and hence p = 1/2 and y∗ = 0. Finally this sequence satisfies the CJT as Xn > 2 with probability one for all n.

6.2 Feasibility and characterization in L2 2 Replacing y∗ = liminfn ∞ E Xn pn by the parameter y = liminfn ∞ E(Xn pn) , we → | − | → − obtain results in the space of points (p,y) similar to those obtained in the previous section in the space (p,y∗). Given a sequence of binary random variables X with its joint distribution P, we first observe that for any i = j, 6 Cov(Xi,Xj) = E(XiXj) pi p j min(pi, p j) pi p j. − ≤ − Therefore,

n n 2 1 E(Xn pn) = 2 ∑ ∑Cov(Xi,Xj) + ∑ pi(1 pi) (29) − n (i=1 j=i i=1 − ) 6 1 n n 2 ∑ ∑[min(pi, p j) pi p j] + ∑ pi(1 pi) . (30) ≤ n (i=1 j=i − i=1 − ) 6 ∑n We claim that the maximum of the last expression (30), under the condition i=1 pi = pn, is p (1 p ). This is attained when p1 = = pn = p . To see that this is indeed the n − n ··· n maximum, assume to the contrary that the maximum is attained atp ˜ =(p˜1, , p˜n) with ··· p˜i = p˜ j for some i and j. Without loss of generality assume that:p ˜1 p˜2 p˜n with 6 ≤ ≤···≤ p˜1 < p˜ j andp ˜1 = p˜ for ℓ < j. Let0 < ε < (p˜ j p˜1)/2 and define p =(p , , p ) by ℓ − ∗ 1∗ ··· n∗ p = p˜1 + ε, p = p˜ j ε, and p = p˜ for ℓ / 1, j . A tedious, but straightforward, 1∗ ∗j − ℓ∗ ℓ ∈{ } computation shows that the expression (30) is higher for p∗ than forp ˜, in contradiction to the assumption that it is maximized atp ˜. We conclude that

2 E(Xn p ) p (1 p ). − n ≤ n − n 15 ∞ Let now (pnk )k=1 be a subsequence converging to p; then

2 2 y = liminf E(Xn pn) liminf E(Xn pn ) n ∞ − ≤ k ∞ k − k → → liminf pn (1 pn ) = p(1 p). ≤ k ∞ k − k − → We state this as a theorem:

Theorem 25. For every pair (X,P), the corresponding parameters (p,y) satisfy y p(1 p). ≤ − Next we have the analogue of Theorem 20, proved in the same way.

Theorem 26. Let t [0, 1 ]. If F is a distribution with parameters (p,y), then there exists ∈ 2 a distribution H with parameters p˜ = 1 t +tp and y˜ = t2y that satisfy the CJT. − We can now construct Figure 4, which is the analogue of Figure 2 in the L2 space (p,y). The feasible set in this space is

FE2 = (u,w) 0 u 1, 0 w u(1 u) (31) { | ≤ ≤ ≤ ≤ − } w = y

1/4 w = u (1 − u ) 2 1 − u w ( u , w 0 ) w = 0 0 1 −u0

FE2 M 0 u = p 0 1/2 1 +u0 1 2

Figure 4 The feasible set FE2.

The geometric locus of the midpoints M in Figure 4 is derived from 1 1 1 (1) u = u0 + , (2) w = w0, and (3) w0 = u0(1 u0) and is given by 2 2 4 − w = 1 (2u 1)(1 u). This yields Figure 5, which is the analogue of Figure 3. Note, 2 − − however, that unlike in Figure 3, the straight line w = u 1 is not tangential to the small − 2 parabola w =(u 1 )(1 u) at ( 1 ,0). − 2 − 2

16 w = y w = u (1 − u ) w = u − 1/2 1/4

w = (u − 1/2)(1 − u ) 1/8 −wCJT −sCJT 1/16 wCJT 0 u = p 0 1/2 3/4 1

Figure 5 Regions of possibility of CJT in L2.

The next step toward determining the region mCJT in the L2 space (Figure 5) is the following: Proposition 27. For any (u,w) (u,w) 0 < u < 1; 0 w u(1 u) , there is a pair (Z,H) such that:∈{ | ≤ ≤ − }

(i) E(Zi) = u, i. ∀ 2 (ii) liminfn ∞ E(Zn u) = w. → − (iii) The distribution H does not satisfy the CJT. Proof. For 0 < u < 1,

let (X,F0) be given by X1 = X2 = ... = Xn = ... and E(Xi) = u; • ∞ let (Y,F1) be a sequence of of i.i.d. random variables (Yi) with expectation u. • i=1 t t t For 0 < t 1 let (Z ,H ) be the pair in which Zi = tXi +(1 t)Yi for i = 1,2,... • t ≤ t − and H is the product distribution H = F0 F1 (that is, the X and the Y sequences are independent). ×

t Note first that E(Zi ) = u for all i and

t 2 u(1 u) lim E(Z n u) = lim (1 t) − +tu(1 u) = tu(1 u), n ∞ − n ∞ − n − − → →   t t and therefore the pair (Z ,H ) corresponds to the point (u,w) in the L2 space, where w = tu(1 u) ranges in (0,u(1 u)) as 0 < t 1. − − ≤ Finally, (Zt,Ht) does not satisfy the CJT since for all n,

t 1 t t Pr(Z n > ) 1 Pr(Z = Z = ... = 0) = 1 t(1 u) < 1. 2 ≤ − 1 2 − − 17 As this argument does not apply for t = 0 it remains to prove that, except for (1,0), to any point (u,0) on the x axis corresponds a distribution that does not satisfy the CJT. For ∞ 0 u 1/2, the sequence (Y,F1) of of i.i.d. random variables (Yi)i=1 with expectation u does≤ not≤ satisfy the CJT , as follows from the result of Berend and Paroush (1998). For 1/2 < u < 1 such a sequence of i.i.d. random variables does satisfy the CJT and we need the following more subtle construction: Given the two sequences (X,F0) and (Y,F1) defined above we construct a sequence ∞ Z =(Zi)i=1 consisting of alternating blocks of Xi-s and Yi-s, with the probability distribu- tion on Z being that induced by the product probability H = F0 F1. Clearly E(Zi) = u × for all i, in particular pn = u for all n and p = u. We denote by Bℓ the set of indices of the Σℓ ℓ-th block and its cardinality by bℓ. Thus n(ℓ) = j=1b j is the index of Zi at the end of the ℓ-th block. Therefore

B 1 = n(ℓ) + 1,...,n(ℓ) + b 1) and n(ℓ + 1) = n(ℓ) + b 1. ℓ+ { ℓ+ } ℓ+ Define the block size bℓ inductively by:

1. b1 = 1, and for k = 1,2,..., Σk 2. b2k = k j=1b2 j 1 and b2k+1 = b2k. − ∞ Finally we define the sequence Z =(Zi)i=1 to consist of Xi-s in the odd blocks and Yi-s in the even blocks, that is,

Xi if i B2k 1 for some k = 1,2,... Zi = ∈ − Yi if i B2k for some k = 1,2,...  ∈

Denote by nx(ℓ) and ny(ℓ) the number of X coordinates and Y coordinates respectively in the sequence Z at the end of the ℓ-th block and by n(ℓ) = nx(ℓ) + ny(ℓ) the number of coordinates at the end of the ℓ-th block of Z. It follows from 1 and 2 (in the definition of bℓ) that for k = 1,2,...,

nx(2k 1) = ny(2k 1) + 1 (32) − − n (2k) 1 n (2k) 1 x and hence also x (33) ny(2k) ≤ k n(2k) ≤ k It follows from (32) that at the end of each odd-numbered block 2k 1, there is a − majority of Xi coordinates that with probability (1 u) will all have the value 0. Therefore, − 1 Pr Zn(2k 1) < (1 u) for k = 1,2,..., − 2 ≥ −   and hence 1 liminfPr Zn > u < 1; n ∞ 2 ≤ →   18 that is, (Z,H) does not satisfy the CJT. It remains to show that

2 y = liminfE(Zn p ) = 0. n ∞ n → − 2 ∞ To do so, we show that the subsequence of E((Zn pn) ) n=1 corresponding to the end of the even-numbered blocks converges to 0, namely,−  2 lim E(Zn(2k) pn(2k)) = 0. k ∞ − → Indeed,

2 2 nx(2k) 1 ny(2k) E(Z p ) = E (X1 u) + Σ (Yi u) . n(2k) − n(2k) n(2k) − n(2k) i=1 −   Since the Yi-s are i.i.d. and independent of X1 we have

2 n (2k) ny(2k) E(Z p )2 = x u(1 u) + u(1 u), n(2k) − n(2k) n2(2k) − n2(2k) − and by property (33) we get finally:

2 1 1 lim E(Zn(2k) pn(2k)) lim u(1 u) + u(1 u) = 0, k ∞ − ≤ k ∞ k2 − n(2k) − → →   concluding the proof of the proposition. Proposition 27 asserts that for every point (u,w) in Figure 5, except for the point (1,0), there is a distribution with these parameters that does not satisfy the CJT . This was known from our previous results in the regions wCJT and sCJT . As for the region wCJT , Proposition 27 and Theorem 26 yield the− following conclusio− ns, presented in Figure 6.

Corollary 28. 1. The region below the small parabola in Figure 5, with the exception of the point (1,0), is in mCJT, that is,

1 1 (p,y) p < 1; and y (2p 1)(1 p) mCJT. |2 ≤ ≤ 2 − − ⊆   2. The point (p,y)=(1,0) is the only point in sCJT. It corresponds to a single se- quence with X1 = = Xn = with F(Xi = 1) = 1. ··· ···

19 w = y w = u (1 − u ) w = u − 1/2 1/4

w = (u − 1/2)(1 − u ) 1/8 −wCJT −sCJT 1/16 sCJT mCJT 0 u = p 0 1/2 3/4 1

Figure 6 mCJT and sCJT in the L2 space.

7 Exchangeable variables

In this section we fully characterize the distributions of sequences X =(X1,X2,...,Xn,...) of exchangeable variables that satisfy the CJT. We first recall:

Definition 29. A sequence of random variables X =(X1,X2,...,Xn,...) is exchangeable if for every n and every permutation (k1,...,kn) of (1,...,n), the finite sequence (Xk1,...,Xkn ) has same n-dimensional probability distribution as (X1,...,Xn). We shall make use of the following characterization theorem due to de Finetti (see, e.g., Feller (1966), Vol. II, page 225).

Theorem 30. A sequence of binary random variables X =(X1,X2,...,Xn,...) is exchange- able if and only if there is a probability distribution F on [0,1] such that for every n:

1 k n k Pr(X1 = = Xk = 1, Xk+1 = ... = Xn = 0) = θ (1 θ) − dF (34) ··· Z0 − 1 n k n k Pr(X1 + + Xn = k) = θ (1 θ) − dF (35) ··· k 0 −  Z The usefulness of de Finetti for our purposes is that it enables an easy projection of the distribution into our L2 space:

Theorem 31. Let X =(X1,X2,...,Xn,...) be a sequence of exchangeable binary random variables and let F be the corresponding distribution function in de Finetti’s theorem. Then,

2 y = lim E(Xn u) = V(F), (36) n ∞ → −

20 where 1 1 u = θdF and V(F) = (θ u)2dF. Z0 Z0 − Proof. We have

1 u = E(Xi) = Pr(Xi = 1) = xdF ; V(Xi) = u(1 u) Z0 − and for i = j, 6 1 2 2 2 Cov(Xi,Xj) == Pr(Xi = Xj = 1) u = x dF u = V (F). − 0 − Z So,

2 2 1 n E(Xn u) = E Σ (Xi u) − n 1 −   1 Σn 1 Σ = 1V(Xi) + i= jCov(Xi,Xj) n2 n2 6 nu(1 u) n(n 1) = − + − V (F), n2 n2 which implies equation (36). We can now state the characterization theorem:

Theorem 32. A sequence X =(X1,X2,...,Xn,...) of binary exchangeable random vari- ables with a corresponding distribution F(θ) satisfies theCJT if and only if

1 Pr < θ 1 = 1, (37) 2 ≤   that is, if and only if the support of F is in the open semi-interval (1/2,1].

Proof. The “only if” part follows from the fact that any sequence X =(X1,X2,...,Xn,...) of binary i.i.d. random variables with expectation E(Xi) = θ 1/2, violates the CJT (by the Berend and Paroush’s necessary condition). ≤ To prove that a sequence satisfying condition (37) also satisfies the CJT , note that for 0 < ε < 1/2,

1 1 1 1 Pr Xn > Pr θ + ε Pr Xn > θ + ε . (38) 2 ≥ ≥ 2 2| ≥ 2      

21 For the second term in (38) we have: 1 1 1 Pr Xn > θ + ε = Σk n Pr X1 + + Xk = k θ + ε (39) 2| ≥ 2 > 2 ··· | ≥ 2     1 Σ n θ k θ n k = k> n (1 ) − dF (40) 2 k 1 ε −  Z 2 + 1 Σ n θ k θ n k = k> n (1 ) − dF (41) 1 ε 2 k − Z 2 +     1 := Sn(θ)dF (42) 1 ε Z 2 + Now, using Chebyshev’s inequality we have: 1 1 Sn(θ) = Pr Xn > θ Pr Xn > + ε θ (43) 2| ≥ 2 |     V(Xn θ) θ(1 θ) 1 | = 1 − (44) ≥ − (θ 1 ε)2 − n(θ 1 ε)2 − 2 − − 2 − Since the last expression in (44) converges to 1 uniformly on [1/2+ε,1] as n ∞, taking the limit n ∞ of (42) and using (44) we have: → → 1 1 1 1 lim Pr Xn > θ + ε dF = Pr θ + ε . (45) n ∞ 2| ≥ 2 ≥ 1 ε ≥ 2 →   Z 2 +   From (38) and (45) we have that for and fixed ε > 0,

1 1 2 lim Pr Xn > Pr θ + ε . (46) n ∞ 2 ≥ ≥ 2 →      ε 1 θ Since (46) must hold for all 1/2 > > 0, and since Pr 2 < 1 = 1, we conclude that ≤  1 lim Pr Xn > = 1, (47) n ∞ 2 →   i.e., the sequence X =(X1,X2,...,Xn,...) satisfies the CJT . To draw the consequences of Theorem 32 we prove first the following: Proposition 33. Any distribution F of a variable θ in [1/2,1] satisfies 1 V (F) (u )(1 u), (48) ≤ − 2 − where u = E(F), and equality holds in (48) only for F in which 1 Pr(θ = ) = 2(1 u) and Pr(θ = 1) = 2u 1. (49) 2 − −

22 Proof. We want to show that

1 1 θ 2dF(θ) u2 (u )(1 u), (50) 1 2 − ≤ − 2 − Z / or, equivalently, 1 3 1 θ 2dF(θ) u + 0. (51) 2 2 Z1/2 − ≤ 1 θ θ 1 1 1 θ Replacing u = 1/2 dF( ) and 2 = 1/2 2 dF( ), inequality (50) is equivalent to R R 1 3 1 1 (θ 2 θ + )dF(θ) := g(θ)dF(θ) 0. (52) 2 2 Z1/2 − Z1/2 ≤ The parabola g(θ) is convex and satisfies g(1/2) = g(1) = 0 and g(θ) < 0 for all 1/2 < θ < 1, which proves (52). Furthermore, equality to 0 in (52) is obtained only when F is such that Pr(1/2 < θ < 1) = 0, and combined with u = E(F) this implies (49). The next proposition provides a sort of an inverse to proposition 33.

Proposition 34. For any pair (u,w) where 1/2 < u 1 and 0 w < (u 1/2)(1 u), there is a distribution F(θ) on (1/2,1] such that E(F≤) = u andV≤(F) = w.− −

Proof. For u = 1, the only point in this region is when w = 0 and for this point (1,0) the claim is trivially true (with the distribution Pr(θ = 1) = 1), and so it suffices to con- sider only the case u < 1. Given (u,w), for any y satisfying 1/2 < y u < 1 define the ≤ distribution Fy for which

Pr(θ = y)=(1 u)/(1 y) and Pr(θ = 1)=(u y)/(1 y). − − − −

This distribution satisfies E(Fy) = u and it remains to show that we can choose y so that V(Fy) = w. Indeed, 1 u 2 u y 2 V(Fy) = − y + − u . 1 y 1 y − − − For a given u < 1 this is a continuous function of y satisfying: limy u V(Fy) = 0 and → limy 1/2 V(Fy)=(u 1/2)(1 u). Therefore, for 0 w < (u 1/2)(1 u), there is a → − − ≤ − − value y∗ for which V (Fy∗ ) = w. The geometric expression of Theorem 32 can now be stated as follows: In the L2 plane of (p,y) let

1 1 A = (p,y) < p 1; and y < (p )(1 p) (1,0) |2 ≤ − 2 − { }  [ This is the region strictly below the small parabola in Figure 6, excluding (1/2,0) and adding (1,0).

23 Theorem 35. 1. Any exchangeable sequence of binary random variables that satisfy the CJT corresponds to (p,y) A. ∈ 2. Toany (p,y) A there exists an exchangeable sequence of binary random variables with parameters∈ (p,y) that satisfy the CJT.

Proof. The statements of the theorems are trivially true for the point (1,0), as it corre- sponds to the unique distribution: Pr(X1 = ... = Xn ...) = 1, which is both exchangeable and satisfies the CJT . For all other points in A,

Part 1. follows de Finetti’s Theorem 30, Theorem 32 and Proposition 33. • Part 2. follows de Finetti’s Theorem 30, Theorem 32 and Proposition 34. •

Remark 36. Note that Theorem 26 and part (ii) of Theorem 35 each establish the ex- istence of distributions satisfying the CJT in the interior of the region below the small parabola in Figure 6. The two theorems exhibit examples of such a distribution: while the proof of part (ii) of Theorem (35) provides to each point (p,y) in this region a corre- sponding exchangeable distribution that satisfies the CJT, Theorem (26), provides other distributions satisfying the CJT that are clearly not exchangeable.

8 General interlacing

We now generalize the main construction of the proof of Theorem 20. This may be useful in advancing our investigations.

Definition 37. Let X =(X1,X2,...,Xn,...) be a sequence of binary random variables with joint probabilitydistribution F and letY =(Y1,Y2,...,Yn,...) be another sequence of binary random variables with joint distribution G. For t [0,1], the t-interlacing of (X,F) and ∈ (Y,G) is the pair (Z,H) :=(X,F) t (Y,G) where for n = 1,2,..., ∗

X tn if tn > t(n 1) Zn = ⌊ ⌋ ⌊ ⌋ ⌊ − ⌋ , (53) Y (1 t)n if (1 t)n > (1 t)(n 1)  ⌈ − ⌉ ⌈ − ⌉ ⌈ − − ⌉ and H = F G is the product probability distribution of F and G. × The following lemma is a direct consequence of Definition 37.

Lemma 38. If (X,F) and (Y,G) satisfy the CJT then for any t [0,1] the pair (Z,H) = ∈ (X,F) t (Y,G) also satisfies the CJT. ∗

24 Proof. We may assume that t (0,1). Note that ∈ ω ω 1 ω ω 1 ω ω 1 Zn( ) > X tn ( ) > Y (1 t)n ( ) > | 2 ⊇ | ⌊ ⌋ 2}∩{ | ⌈ − ⌉ 2     By our construction and the fact that both (X,F) and (Y,G) satisfy the CJT,

1 1 lim F X tn > = 1 and lim G Y (1 t)n > = 1. n ∞ ⌊ ⌋ 2 n ∞ ⌈ − ⌉ 2 →   →   As 1 1 1 H Zn > F X tn > G Y (1 t)n > , 2 ≥ ⌊ ⌋ 2 · ⌈ − ⌉ 2       the proof follows.

Corollary 39. The region wCJT is star-convex in the L1 space . Hence, in particular, it is path-connected in this space.

Proof. Let (u,w) be a point in wCJT in the L1 space. Then, there exists a pair (X,F) which satisfies CJT, where X is the sequence of binary random variables with joint prob- ability distribution F satisfying p = u and y∗ = w. By Remark 21, Lemma 38, and the proof of Theorem 20, the line segment [(u,w),(1,0)] is contained in wCJT , proving that wCJT is star-convex.

Corollary 40. The region wCJT is path-connected in the L2 space.

Proof. In the L2 space a point (u,w) corresponds to p = u and y = w. By the same 2 arguments as before, the arc of the parabola w = ((1 u)/(1 u0)) w0 connecting (u,w) to (1,0) (see Figure 4) is contained in wCJT , and thus−wCJT−is path-connected.

9 CJT in the space of all probability distributions.

∞ We look at the space Sp = 0,1 already introduced in the proof of Proposition 27 on { } page 17. We consider Sp both as a measurable product space and as a topological product space. Let P be the space of all probability distributions on Sp. P is a compact metric P P P space in the weak topology. Let 1 be the set of all probability distributions in P P⊆P P that satisfy the CJT and let 2 = 1 be the set of all probability distributions in that do not satisfy the CJT . \

Lemma 41. If P1 and P2 are two distributions in P,andifP2 P then for any 0 < t < 1, ∈ 2 the distribution P3 = tP1 +(1 t)P2 is also in P . − 2

25 Proof. For n = 1,2, , let ··· 1 n 1 Bn = x =(x1,x2, ) Sp ∑ xi > (54) ( ··· ∈ |n i=1 2) ∞ Since P2 P , there exists a subsequence (Bn ) and ε > 0 such that P2(Bn ) 1 ε ∈ 2 k k=1 k ≤ − for k = 1,2, . Then ···

P3(Bn ) = tP1(Bn )+(1 t)P2(Bn ) t +(1 t)(1 ε) = 1 ε(1 t), k k − k ≤ − − − − implying that P3 P . ∈ 2 P P Corollary 42. The set 2 is dense in (in the weak topology) and is convex. P P P P We proceed now to separate 1 from 2 in the space . We first observe that 1 is P P P convex by its definition and 2 is convex by Lemma 41. In order to separate 1 from 2 by some linear functional we first define the mapping T : P RN where N = 1,2,... in the following way: For P P let → { } ∈ T P χ dP χ dP χ dP ( ) = B1 , B2 ,..., Bn ,... Z Z Z  χ where the sets Bn are defined in (54) and Bn is the indicator function of the set Bn, that is, for x Sp and n = 1,2,..., ∈ χ 1 if x Bn B (x) = ∈ . n 0 if x Bn  6∈ The mapping T is affine and continuous when RN is endowed with the product topol- P P P P ogy. Clearly, T ( ) ℓ∞. Also T ( 1 ) and T ( 2 ) are convex and if z1 T ( 1 ) and P ⊂ ∈ z2 T ( 2) then liminfk ∞(z1,k z2,k) 0, where zi =(zi,1,zi,2 ...) for i = 1,2. ∈ → − ≥ Let B : ℓ∞ R be any Banach limit on ℓ∞ (see Dunford and Schwartz (1958), p. 73); → then B is a continuous linear functional on ℓ∞ and as B(z) liminfk ∞ zk for every z ℓ∞, P ≥ P → ∈ we have B(z1) B(z2) whenever z1 T ( 1 ) and z2 T ( 2 ). Thus the composition B T, which is also≥ an affine function,∈ satisfies ∈ ◦

P1 P and P2 P = B T(P1) B T(P2). ∈ 1 ∈ 2 ⇒ ◦ ≥ ◦ P P We can now improve upon the foregoing separation result between T ( 1) and T ( 2 ) as follows:

Proposition 43. For any y T (P ) there exists ψ ℓ∞ such that ∈ 2 ∈ ∗ (i) ψ(z1) ψ(z2) for every z1 T (P ) and z2 T (P ); ≥ ∈ 1 ∈ 2 (ii) ψ(z) > ψ(y) for all z T (P ). ∈ 1

26 Proof. Given y T (P ), let C = y T (P ). Then, C is a convex subset of ℓ∞ and there ∈ 2 − 1 exists an ε > 0 such that liminfk ∞ zk < ε for every z C. Let → − ∈

D = z ℓ∞ liminf zk ε ; ∈ | k ∞ ≥ −  →  then D is convex and C D = φ. Furthermore, 0 D is an interior point. Hence by ∩ ∈ Dunford and Schwartz (1958) p. 417, there exists ψ ℓ∗∞, ψ = 0 such that ψ(d) ψ(c) + ∈ 6 ≥ for all d D and c C. Since ℓ∞ = x ℓ∞ x 0 is a cone contained in D, and ψ is ∈ ∈ { ∈ + | ≥ } bounded from below on D (and hence on ℓ∞) it follows that ψ 0 (that is ψ(x) 0 for all x 0). Also, as 0 C and 0 Int(D), there exists δ > 0 such≥ that ψ(d) δ≥for all ≥ 6∈ ∈ ≥ − d D and ψ(D) ( δ,∞). Hence ψ(c) δ for all c C, that is, ψ(y x) δ for all∈x T (P ). ⊇ − ≤ − ∈ − ≤ − ∈ 1 Finally, the cone C∗ = z ℓ∞ liminfk ∞ zk 0 is contained in D; hence ψ is non- P { ∈ P| → ≥ } ψ negative on C∗. As T ( 1 ) T ( 2 ) C∗, the functional separates weakly between P P − ⊆ T ( 1 ) and T ( 2 ).

Remark 44. The mapping ψ T, which is defined on P, may not be continuous in the P ◦ P weak topology on . Nevertheless, it may be useful in distinguishing the elements of 1 P from those of 2 .

References

Austen-Smith, D. and J.S. Banks (1996), “Information aggregation, rationality, and the Condorcet Jury Theorem.” The American Political Science Review 90: 34–45. Berend, D. and J. Paroush (1998), “When is Condorcet Jury Theorem valid ?” Social Choice and Welfare 15: 481–488. Condorcet, Marquis de (1785), “Essai sur l’application de l’analyse `ala probabilit´edes d´ecisions rendues `ala pluralit´edes voix”. De l’Imprimerie Royale, Paris. Feller, W. (1957, third edition), “An Introduction to Probability Theory and Its Applications” Vol- ume I, John Wiley & Sons, New York. Feller, W. (1966, second corrected printing), “An Introduction to Probability Theory and Its Ap- plications”, Volume II, John Wiley & Sons, New York. Dunford, N. and J.T. Schwartz (1958), “Linear Operators”, Part I, Interscience Publishers, New York. Ladha, K.K. (1995), “Information pooling through majority rule: Condorcet’s Jury Theorem with correlated votes.” Journal of Economic Behavior and Organization 26: 353–372. Laslier, J.-F. and J. Weibull (2008), “Committee decisions: optimality and equilibrium.” Mimeo. Lo`eve, M. (1963, third edition), “Probability Theory”, D. Van Norstrand Company, Inc., Princeton, New Jersey.

27 Myerson, R.B. (1998), “Extended Poisson games and the Condorcet Jury Theorem.” Games and Economic Behavior 25: 111–131. Myerson, R.B. (1998), “Population uncertainty and Poisson games”. International Journal of Game Theory 27: 375–392. Nitzan, S. and J. Paroush (1985), “Collective Decision Making: An Economic Outlook.” Cam- bridge University Press, Cambridge. Shapley, L.S. and B. Grofman (1984), “Optimizing group judgemental accuracy in the presence of interdependencies.” Public Choice 43: 329–343. Uspensky, J.V. (1937), “Introduction to Mathematical Probability”, McGraw-Hill, New York. Wit, J. (1998), “Rational Choice and the Condorcet Jury Theorem.” Games and Economic Behav- ior 22: 364–376. Young, P.H. (1997), “Group choice and individual judgements.” In ‘Perspectives on Public Choice’, Mueller, D.C. (ed.), Cambridge University Press, Cambridge, pp. 181-200.

Appendix

9.1 Proof of Proposition 19

Let C be the uniform bound of the variables, i.e., Xi C for all i (in our case C = 1). Let | |≤

Qn = P Xn p ε) and hence P X n p > ε)= 1 Qn. | − n |≤ | − n | −   Then 2 2 2 2 2 E Xn p C (1 Qn)+ ε Qn C (1 Qn)+ ε − n ≤ − ≤ − If LLN holds, then limn ∞ Q n = 1 and thus the limit of the left-hand side is less than or equal → to ε. Since this holds for all ε > 0, the limit of the left-hand side is zero.

28