THE PROBLEM OF MOMENTS AND EXCHANGEABILITY

Chun-Kai Huang December, 2009 THE PROBLEM OF MOMENTS AND EXCHANGEABILITY

by

Chun-Kai Huang

Thesis submitted to the University of KwaZulu-Natal in fulfilment of the re- quirements for the degree of Master of Science in the School of Statistics and Actuarial Science.

Degree Assessment Board

Thesis advisor Prof. A. I. Dale Thesis examiners To be announced

As the candidate’s supervisor, I have/have not approved this thesis/dissertation for submission.

Signed and Dated:

SCHOOLOFSTATISTICSANDACTUARIALSCIENCE

WESTVILLECAMPUS, DURBAN, SOUTHAFRICA

UNIVERSITY OF KWAZULU-NATAL Declaration

I, Chun-Kai Huang, declare that

1. The research reported in this thesis, except where otherwise indicated, is my original research.

2. This thesis has not been submitted for any degree or examination at any other university.

3. This thesis does not contain other persons’ data, pictures, graphs or other information, unless specifically acknowlegded as being sourced from other persons.

4. This thesis does not contain other persons’ writing, unless specifically ac- knowledged as being sourced from other researchers. Where other written sources have been quoted, then

(a) their words have been re-written but the general information at- tributed to them has been referenced, or (b) where their exact words have been used, then their writing has been placed in italics and referenced.

5. This thesis does not contain text, graphics or tables copied and pasted from the internet, unless specifically acknowledged, and the source being detailed in the thesis and in the reference sections.

Signed: Disclaimer

This document describes work undertaken as part of a masters programme of study at the University of KwaZulu-Natal (UKZN). All views and opinions expressed therein remain the sole responsibility of the author, and do not nec- essarily represent those of the institute. Abstract

In the second volume of “An introduction to Probability Theory and Its Applications”, Feller (1966) proved de Finetti’s Representation Theorem for infinite sequences of exchangeable {0, 1}-random variables via the solu- tion to the Hausdorff Problem over the unit line. It can be shown that the above problems are actually equivalent, i.e. an infinite sequence of {0, 1}-random variables is exchangeable if and only if the corresponding Hausdorff over the unit line has a solution.

The Hausdorff Moment Problem has long been studied and the theory ex- tends to multiple variables over various domains, while the concepts of Exchangeability are also vastly extended. This gave rise to the idea of the current thesis, aiming to draw some connections between the different cases of Problems of Moments and the concepts of Exchangeability.

Keywords Hausdorff moment problems, moment problems, exchangeability, partial exchangeability, de Finetti’s theorem, exchangeable sequences, completely monotonic sequences.

i Contents

Abstract i

Acknowledgements iii

1 Preliminaries 1 1.1 Introduction and Terminology ...... 1 1.2 A Brief History of the Moment Problems ...... 3 1.3 A Brief History of Exchangeability ...... 8

2 Hausdorff Moment Problem 11 2.1 A Basic Model ...... 11 2.2 HMP over the unit square ...... 14 2.3 HMP over a triangle ...... 18 2.4 Some Remarks on HMP ...... 24 2.5 Further Results for HMP ...... 27

3 Exchangeability and Partial Exchangeability 31 3.1 Introduction ...... 31 3.2 Exchangeable and Partially exchangeable events ...... 32 3.3 Exchangeable Random Variables ...... 33 3.4 De Finetti’s Theorems on Exchangeability ...... 38 3.5 De Finetti’s Theorems on Partial exchangeability ...... 46

4 Connections between Moment Problem and Exchangeability 51 4.1 Introduction ...... 51 4.2 HMP and Exchangeability on the unit square and triangle . . . 53 4.3 HMP and Exchangeability on several k-dim. simplexes . . . . . 56 4.4 Some final remarks ...... 59

Appendix 62

Bibliography 63

ii Acknowledgements

First, and foremost, I wish to thank my supervisor, Professor A. I. Dale for his contin- ued interest, invaluable advice and tremendous patience throughout the preparation of this thesis.

I also wish to thank Professor D. E. North, and fellow colleagues of the School of Statis- tics and Actuarial Science, for the use of facilities in the department and their contin- ued support.

I am also indebted to the South African National Research Foundation and the Univer- sity of KwaZulu-Natal for the awards of postgraduate scholarships and bursaries.

Finally, I would like to thank my wife, Shy-Yin, and my family for their constant sup- port and encouragement.

iii Chapter 1

Preliminaries

In this opening chapter, a general introduction to the motivation of this the- sis (i.e. historical links between Moment problems and Exchangeability), short summaries of the chapters and some terminology issues are discussed. Some historical reviews of both topics are also provided for interest and reference.

1.1 Introduction and Terminology

It is not necessary in this thesis to discuss the pros and cons between subjective and objective views of probability. In any case, most mathematical statisticians nowadays are (or should be) equipped with enough knowledge of both to make judgements of their own. The purpose of this research is merely to discuss some relations between the concept of exchangeability and moment problems, from a view point that is unbiased and understandable by both Bayesians and non- Bayesians.

The first motivation of this research was obtained from Feller (1966), on page 229 of volume II, where he proved de Finetti’s theorem for 2-valued sequences via the solution of the Hausdorff moment problem over the unit interval. He also suggested that this method can easily extend to a sequence that assumes a finite number of values.

It turns out that one may also construct an exchangeable sequence of 2-valued random variables from a given sequence of Hausdorff moments over the unit interval, i.e. the two theorems are clearly equivalent. This leads to the obvi- ous questions: what happens if one move onto moment problems over other domains? And, how do other extensions of de Finetti’s theorem link up with the general moment problems?

Some links between these two theories may also be found in a few other pa-

1 1.1. Introduction and Terminology pers. Dale (1983) gave an probabilistic proof of Hausdorff’s theorem for double sequences using de Finetti’s result for partially exchangeable events. Diaconis & Freedman (2004a,b) discussed the connections between the Markov moment problem and de Finetti’s theorem. Bassetti & Regazzini (2008) discussed how de Finetti proved a proposition regarding the characteristic function of an infinite exchangeable sequence of events via the solution to the Hamburger moment problem. And, most recently, Peng et al. (2009) showed that partial exchange- ability can be characterized by rectangular complete monotonicity.

The current thesis will explore both topics with the aim to discuss their rela- tions. In particular, the equivalence between de Finetti’s theorems for infinite exchangeable sequences that take on a finite or infinite number of values and Hausdorff’s theorem over various domains.

The content assumes some basic knowledge of probability measure theory and real analysis. Readers may also obtain detailed background information on mo- ment problems and exchangeability in Shohat & Tamarkin (1943) and Bernardo & Smith (1994), respectively.

CHAPTER SUMMARIES:

The remainder of this chapter will be devoted to some historical background of the two topics.

Chapter 2 formally introduces the Hausdorff Moment problems. The solutions are also extended to various domains, which are useful in a later Chapter. Ex- amples and discussions are also provided for further understanding of the prob- lems.

Exchangeability and Partial Exchangeability are introduced in Chapter 3. De- tailed proofs of de Finetti’s representation theorems in several cases are pro- vided. Interesting counterexamples on exchangeability are also discussed here.

Chapter 4 provides some results on the links between the Hausdorff moment problems and Exchangeability. I have shown here that de Finetti’s representa- tion theorem for a partially exchangeable sequence over a finite set of values is completely characterized by the Hausdorff moment problem over several k- dimensional simplexes (various simpler cases are also given). Difficulties in expanding to infinite and continuous cases are also discussed in the final con- clusion.

2 1.2. A Brief History of the Moment Problems

TERMINOLOGY AND NOTATION:

The following terminology and notations will be used throughout this thesis:

R = the set of real numbers N = {1, 2, 3,...} N0 = {0, 1, 2,...} [k] = {1, 2, . . . , k} [a, b] = the set of all real numbers between a and b P = a probability measure Ef(X) = the expected value of f(X) B(R) = the class of all Borel sets in R Cov(X,Y ) = the covariance between X and Y V ar(X) = the variance of X

We shall also use “,” to mean “by definition, equals to”. Vectors will be denoted with bold fonts, e.g. x1 = (x11, x12, . . . , x1k) is a k-dimensional vector in some specified space. A g-IID sequence shall mean a g-fold infinite sequence of inde- pendent random variables that are identically distributed within each of the g types.

Other terminology and notations in the thesis either will be commonly used or will be introduced in the context.

1.2 A Brief History of the Moment Problems

In the celebrated paper Recherches sur les fractions continues (1894 and 1895), which introduced our modern concept of Stieltjes Integrals, Stieltjes also pro- posed and solved the problem which he called “The Problem of Moments”.

The problem Stieltjes proposed is to determine a bounded non-decreasing func- tion F (x) in the interval [0, ∞) such that its moments have a prescribed set of values, i.e. Z ∞ n x dF (x) = µn , n = 0, 1, 2,.... (1.1) 0

The terminology “Problem of Moments” was taken by Stieltjes from Mechanics, as he often used the concepts of mass, stability, etc. in solving analytical prob- lems. He refers to the left-hand side of equation (1.1) as the n-th moment, about zero, of the mass distribution characterized by the function F (x). This would correspond to our modern definition of the n-th moment about the origin of the distribution function F (x) over the positive real-line.

3 1.2. A Brief History of the Moment Problems

Stieltjes related the solutions of Moment Problems to their “corresponding” continued fractions and a Moment Problem would be determined (i.e. unique solution) or indeterminate (i.e. more than one solution, in which case there would be, of necessity, infinitely many solutions) depending on whether the corresponding continued fraction is convergent or divergent, respectively (see Shohat & Tamarkin (1943, pp. vii–viii) and Stieltjes (1895)).

Markov, a student of Chebyshev, continued the work in applying it to prob- ability theory and further generalized the Moment Problem by requiring the solution p(x) to be bounded, i.e. Z ∞ n x p(x)dx = µn , n = 0, 1, 2,..., with 0 ≤ p(x) ≤ L < ∞ (1.2) −∞ See Markov (1896) for details.

The progress on the Problem of Moments then paused for more than 20 years until it was considered again by Hamburger (1921).

Hamburger extended the Stieltjes moment problem to the real line. He again made use of continued fractions and, further, extensive use of Helly’s theorem (see Appendix). It is interesting to note that Moment Problem (1.1) may be determined while the corresponding Hamburger moment problem Z ∞ n x dF (x) = µn , n = 0, 1, 2,..., (1.3) −∞ with the same µn, is indeterminate (see Hamburger (1921)).

In 1922, Nevanlinna, by making use of the theory of functions, exhibited the solutions and properties of the Moment Problem (1.3) in terms of functions sim- ilar to those considered by Chebyshev (see Nevanlinna (1922)). And about the same time Riesz solved the Hamburger moment problem by the use of “quasi- orthogonal ” (see Riesz (1923)).

In 1923 and 1926, Carleman showed the connection between the Problem of Moments (1.3) and the theory of quasi-analytic functions and quadratic forms in infinitely many variables (see Carleman (1923, 1926)).

Hausdorff provided the necessary conditions for the existence of a solution (nec- essarily unique) over a finite interval. He also gave an effective construction of the solution and stated criteria for the solution to be continuous, differ- entiable . . . etc. (see Hausdorff (1923)). This result was extended to a multi- dimensional case (the unit square) by Hildebrandt and Schoenberg (see Hilde- brandt & Schoenberg (1933)).

4 1.2. A Brief History of the Moment Problems

Schoenberg also gave an alternative method in proving the conditions for the Hausdorff moment problem (see Schoenberg (1932)). He proceeded in the op- posite direction to Hausdorff, by considering the conditions as a set of linear inequalities in infinitely many variables and showed that its most general so- lution can be represented parametrically in the form Z 1 n x dF (x) = µn , n = 0, 1, 2,.... (1.4) 0 He referred to a sequence which satisfies the conditions of the Hausdorff mo- ment problem as a “completely monotonic sequence”.

Achyeser and Krein made use of the tools of quadratic forms in generalizing the work of Markov and extended the theory to the “Trigonometric moment problem” Z 2π inx e dF (x) = µn , n = 0, 1, 2,.... (1.5) 0 (see Achyeser & Krein (1934) for details)

Haviland extended Riesz’ theory to the case of several dimensions (see Haviland (1936)). He also gave a general approach to Moment Problems using spectral theory and derived the conditions for the one-dimensional problem on [−1, 1] using Legendre Polynomials. For further historical references up to this point, see Shohat & Tamarkin (1943, pp. vii–xi).

Hedge extended the solution of the Hausdorff moment problem to any bounded region in n-dimensional Euclidean space. He used methods that were similar to those of Hausdorff (see Hedge (1941)).

Shohat & Tamarkin (1943) also discussed what they referred to as the “Reduced moment problem” (often called the “Finite moment problem” in some other con- text). It arises when only a finite number of moments are given. Shohat and Tamarkin provided conditions for the uniqueness of its solution and a least up- per bound for the absolute difference of any two solutions. This bound was further improved by Khamis (see Khamis (1954)). He also provided a method for constructing an infinite number of solutions, from one given solution.

Taking a finite number of the inequalities expressed by the conditions of the Hausdorff moment problem, in the d-dimensional case, to partially describe the feasible set of finite dimensional linear equations, would lead to the study on the geometry of the d-dimensional Hausdorff moment space. This is particu- larly useful in the identification and approximation of moment sequences. The first to consider the one-dimensional Hausdorff moment spaces were Karlin and Shapley (see Karlin & Shapley (1953)).

5 1.2. A Brief History of the Moment Problems

Devinatz also derived the conditions for the two parameter Stieltjes and Ham- burger moment problems using operator theory (see Devinatz (1957)).

Feller gave a proof of the Hausdorff one-dimensional theorem using simple probabilistic techniques and Bernstein polynomials (see Feller (1966)). In the same year, Cobb and Harris provided the solutions for the generalized finite moment problems of Hausdorff and Stieltjes, in which the integrands of (1.1) and (1.4) are changed to general functions fn(x) depending on n (see Cobb & Harris (1966)).

In 1975, Atzmon derived necessary and sufficient conditions for the “Complex moment problem”, i.e. Z m n z z dF (z) = µmn , m, n = 0, 1, 2,..., (1.6) D where z is a and D is the closed unit disc. He also applied the result in solving the Trigonometric, Hausdorff and the [−1, 1] moment problems (see Atzmon (1975)).

In 1982, Askey, Schoenberg and Sharma provided a further new proof of the conditions to Hausdorff moment problem by means of Legendre polynomials and their discrete extensions found by Chebyshev. They further applied this method to obtain the corresponding result for the Weighted Hausdorff Moment problem, where Z 1 n+α β f(x)x (1 − x) dx = µn , n = 0, 1, 2, . . . , α, β > −1 (1.7) 0 (see Askey et al. (1982) for further details).

Petersen, in 1982, showed that the n-dimensional Hamburger moment problem will have a unique solution if the corresponding projected n one-dimensional Hamburger moment problems all have unique solutions. He also gave an ex- ample to show that the converse is not necessarily true (see Petersen (1982)).

In 1983, Fuglede gave a more general approach to the multi-dimensional mo- ment problems using an operator theoretic approach (see Fuglede (1983)).

Talenti provided a stability theorem for the finite Hausdorff moment problem and gave an algorithm to obtain approximate solutions (see Talenti (1987)). At the same time, Dale extended Feller’s methods to the two-dimensional case, solving the Hausdorff moment problem for a triangle and provided a new proof of the result over the unit square (see Dale (1987)).

6 1.2. A Brief History of the Moment Problems

In 1993, Scalas and Viano derived the solution to the Hausdorff moment prob- lem using Pollaczek polynomials (see Scalas & Viano (1993)). They also pro- vided a numerical method to approximate the solution when only a finite num- ber of moments are given (and, or, affected by noise).

In 1999, Gupta also provided the necessary and sufficient conditions for the k k-dimensional Moment Problems over the standard simplex in R (see Gupta (1999)). In 2000, Knill also proved the conditions over the unit cube via Bern- stein’s polynomials and generalized the results of Hausdorff, Hildebrandt and Schoenberg in obtaining conditions for absolute continuity (see Knill (2000)).

In 2001, Stochel provided the proof (in a general setting) that the finite (which he referred to as “truncated”) Moment Problem is more general than the full Moment problem, i.e. solving the finite Moment Problem implies solving the full Moment Problem (see Stochel (2001)).

In the same year, Romera, Angulo and Dehesa applied the Hausdorff conditions to solve the Hausdorff entropic moment problem,

Z 1 n [f(x)] dx = ωn , n = 0, 1, 2,..., (1.8) 0 and also provided the method of reconstructing the density function f(x) (see Romera et al. (2001) for further details).

In 2003, Helmes and Stockbridge extended Dale’s result over the unit sim- plex to the general multi-dimensional case (see Helmes & Stockbridge (2003)). Stockbridge (2003) also provided conditions for the Moment Problems over poly- n topes and other bounded regions specified piecewise by curves in R of the form α1 αn c1x1 + ··· + cnxn = c0 (see Stockbridge (2003)).

About the same time, Inverardi, Pontuale, Petri and Tagliani provided a more efficient way of recovering a probability distribution through a low number of fractional moments (which are derived from ordinary moments) using the max- imum entropy technique (see Inverardi et al. (2003)).

In 2004, Diaconis and Freedman gave a new proof for the Markov Moment Problem over the unit interval (see Diaconis & Freedman (2004a,b)).

There are also various other generalization, extensions and applications to the Moment Problems. These, however, are not discussed in this thesis.

7 1.3. A Brief History of Exchangeability

It is also interesting to note that Shohat & Tamarkin (1943) suggested, al- though Stieltjes might have been the first to use the terminology “Problem of Moments”, there are other related works that preceded his paper, in particular, those of Chebyshev and Heines.

Chebyshev, in a series of papers started in 1855, studied a related type of inte- grals and sums Z ∞ p(y) dy (1.9) −∞ x − y where p(x) is non-negative on the real line, and

∞ X θ2 i (1.10) x − x −∞ i where θi 6= 0, in the aim to answer the question of “How far does a given se- quence of moments determine a function?” and to study certain properties of their corresponding continued fraction. Elsewhere, Heine (1861, 1878, 1881) studied similar integrals over a finite interval.

1.3 A Brief History of Exchangeability

One of the differences between Bayesian and frequentist methods in statisti- cal inference is that frequentists often treat observations as independent that Bayesians treat as exchangeable (or permutable or symmmetric). A Bayesian statistician will often seek the conditional probability distribution of an unob- servable quantity given the observable data. This gave rise to the idea of a ”learning process”.

It may be argued that the root of the idea of finite exchangeability of events track back to 1718, where de Moivre discussed the game “Rencontre (Matches)”. In his book The Doctrine of Chances, he enthusiastically described the ideas in Problems XXXV and XXXVI as a new sort of “Algebra” (see de Moivre (1718)).

However, Haag (1924) seems to be the first to formally discuss the notion ex- changeabiity of events, in a communication at the International Mathematical Congress, held in Toronto. In the paper, Haag hinted at a representation the- orem but did not rigorously state or prove it. Khintchine (1932) also had some discussions on this topic in the early 1930’s.

In 1930, exchangeable random variables were independently introduced by de Finetti (he referred to them as being equivalent and the word exchangeable, pro- posed by Polya´ (see de Finetti (1938a, 1939)) or Frechet´ (see de Finetti (1976))

8 1.3. A Brief History of Exchangeability is used more often nowadays) in the context of personalistic probability speci- fication (see de Finetti (1930, 1974)). He used it to describe a sense in which random quantities in such a specification are judged to be similar. In the papers, de Finetti also gave his famous representation theorem for the 2-valued case, which showed that the joint distribution of any 2-valued exchangeable sequence may be represented as a mixture of IID sequence of random variables. This case has also been treated by Hincin,ˇ in much the same manner (see Hincinˇ (1932, 1952)).

Soon after, de Finetti extended his theory to real-valued random variables (see de Finetti (1937), a translation of this paper also appears in Kyburg & Smokler (1964)). Dynkin independently proved the real-valued case, but the technique he used also covers more general spaces that are separable in some sense (see Dynkin (1953)).

De Finetti then generalized the concept of exchangeability to “partial exchange- ability”, which considers sequences of several types being exchangeable within each type (see de Finetti (1938b)). And, in another paper by de Finetti (1959), he suggested the possibility of subjective treatment of Markov chains using par- tial exchangeability. The idea is that the type of the i-th observable depends on the outcome of the previous observation. A simple setup may be followed in the paper by Freedman (1962a), where he showed that a stationary partially exchangeable process is a mixture of Markov chains.

One aim in the early research, was to extend de Finetti’s results to more gen- eral state spaces. The most satisfactory result was obtained by Hewitt & Sav- age (1955); they provided the representation theorem over compact Hausdorff spaces, which may immediately be extended to Borel subsets of compact spaces. The latter extension may also be found in Aldous (1985).

Another type of question was also of interest: what are the necessary and suffi- cient conditions on a given sequence of random variables in order that it turns out to be a mixture of IID sequences of a particular type, e.g. normally, expo- nentially or Poisson distributed random variables. Interesting results for these may be found in Freedman (1962b), Ressel (1985) and Bernardo & Smith (1994).

It is also of obvious interest to know when de Finetti’s theorem would fail. In the paper by Dubins & Freedman (1979), they demonstrated the existence of a separable metric space for which the representation theorem for exchangeable stochastic processes fails. In the opposite direction, an example is given of a nonstandard space for which the representation necessarily holds.

It is also well known that de Finetti’s theorem does not hold in general for fi- nite sequences of exchangeable random variables; for an easy example showing

9 1.3. A Brief History of Exchangeability the failure in the 2-valued case, see Diaconis (1977) or Diaconis & Freedman (1980b).

Several versions and modifications of the theorem were also developed for the finite case. Kendall (1967) showed that every finite system of exchangeable events is equivalent to a random sampling scheme without replacement, where the number of items in the sampling has an arbitrary distribution. Diaconis (1977) and Diaconis & Freedman (1980b) used this result to find the total vari- ation distance to the closest mixture IID random variables, which actually im- plies de Finetti’s theorem in the limit.

Jaynes (1986) also showed that, by dropping the non-negativity condition in the problem, one can still obtain the usual de Finetti’s representation theorem for all finite sequences. Gnedin (1996) then found a criterion for extendability to an infinite sequence.

Another paper worth noting is by Heath & Sudderth (1976), where they pro- vided an alternative proof of the theorem by showing that de Finetti’s represen- tation theorem emerges as a limit of a set of “Urn model” distributions for finite sequences. The method they used was very simple and gave a much deeper un- derstanding of the result. Such proof was also suggested by de Finetti (1974), but was not actually carried out.

On the other hand, Diaconis & Freedman (1980a) showed that a model of sev- eral sequences that is partially exchangeable and recurrent may be represented as a mixture of Markov chains. By introducing recurrence, they were able to eliminate the previous stationary assumption.

Recently, Draper et al. (1993) generalized the concept of exchangeability to data analysis and Kerns & Szekely´ (2006) also introduced an extended notion of “mixture”, which is able retain de Finetti’s convenient representation for finite exchangeable sequences.

Various other extensions and applications of the de Finetti-type representation theorems has been carried out by many authors, which are not mentioned here. For a review on some recent researches, see Aldous (2009). Also, for interest, a detailed summary of de Finetti’s contribution to probability and statistics is given in Cifarelli & Regazzini (1996).

10 Chapter 2

Hausdorff Moment Problem

In this chapter, results for Hausdorff moment problems over various domains are provided. Some remarks are also discussed to give better understanding of the concepts.

2.1 A Basic Model

Suppose {µn : n = 0, 1, 2,...} is a sequence of real numbers. The basic Hausdorff Moment Problem is concerned with the existence of a distribution function F such that Z 1 n x dF (x) = µn , n = 0, 1, 2,..., (2.1) 0 i.e. {µn} is the sequence of moments of some X with distribu- tion function F . Hausdorff (1923) has shown a distribution function F exists with the above properties if and only if the sequence {µn} satisfies some condi- tions. First, let us look at some useful definitions and results.

Definition 2.1. The Bernstein of degree t is defined as

t X k t B (x, f) = f xk(1 − x)t−k , (2.2) t t k k=0 for any function f.

Definition 2.2. Let us define a sequence of increments by

∆µn = µn − µn+1 , (2.3) for any given sequence {µn : n = 0, 1, 2,...}.

11 2.1. A Basic Model

Theorem 2.1. Consider an i.i.d. sequence of real valued random variables {Xn} 2 1 P with mean α and variance σ ≤ K < ∞ . Let St = t n≤t Xn and let f : R → R be a uniformly continuous and bounded function. Then Ef(St) → f(α). Proof. For any ε > 0,

Ef(St) − f(α) ≤ E f(St) − f(α)  = E f(St) − f(α) I[|St−α|≤ε] + I[|St−α|>ε] Z Z

= f(St) − f(α) dP + f(St) − f(α) dP |St−α|≤ε |St−α|>ε  ≤ max f(x) − f(α) . 1 + 2 max f(x) P St − α > ε |x−α|≤ε x 1 ≤ δ(ε) + 2K (S − α)2 ε2 E t 2K = δ(ε) + σ2 , tε2 2 where δ(ε) is a modulus of continuity. Letting ε = εt → 0 so that tεt → ∞ fin- ishes the proof. 

Let Xn have a Bernoulli distribution with probability of success α ∈ [0, 1] and let f : [0, 1] → R be continuous. Then by the above theorem, we get

t X k t B (x, f) = f xk(1 − x)t−k f(S ) → f(α) t t k , E t k=0 uniformly over [0, 1] as t tends to infinity. This will be used in the following theorem, which adopted the approach by Feller (1966).

Theorem 2.2. For every sequence {µn : n = 0, 1, 2,...} of real numbers, there exists a random variable X ∈ [0, 1] such that

Z 1 n x dF (x) = µn , n = 0, 1, 2,..., 0 if, and only if,

r µn ≥ 0 , µ0 = 1 and ∆ µn ≥ 0 for all n, r . (2.4)

n Proof. (⇒) Suppose X is a random variable in [0, 1] and let µn = EX be its n-th moment. Then

n n+1 n ∆µn = µn − µn+1 = E(X − X ) = EX (1 − X) ,

2 n n+1 n 2 ∆ µn = ∆∆µn = EX (1 − X) − EX (1 − X) = EX (1 − X)

12 2.1. A Basic Model and by induction we will get

r n r ∆ µn = EX (1 − X) .

r It is clear that µ0 = 1, ∆ µn ≥ 0 and µn ≥ 0, since X ∈ [0, 1]. Thus we do have the above conditions satisfied.

(⇐) Assume the conditions (2.4) are true. Since ∆µn = µn − µn+1, we get the following inversion formula

µn = µn+1 + ∆µn = (µn+2 + ∆µn+1) + (µn − µn+1)

= (µn+2 + ∆µn+1) + (µn+1 + ∆µn − (µn+2 + ∆µn+1))

= (µn+2 + ∆µn+1) + (µn+1 − µn+2) + (∆µn − ∆µn+1) 2 = µn+2 + 2∆µn+1 + ∆ µn r X r = ∆r−jµ j n+j j=0 by induction. If we now take r = t − n and define

t p(t) = ∆t−nµ n n n then we will get

t−n t−n   t−n t−n X j t X j (t) µ = ∆t−(n+j)µ = p . n t  n + j n+j t  n+j j=0 n+j j=0 n+j

We also have t−n (t − n)! (n + j)!(t − n − j)! n+j j = = n t  j!(t − n − j)! t! t  n+j n so that t−n n+j t m X n (t) X n (t) µn = t  pn+j = t  pm . j=0 n m=n n (t) P (t) By (2.4), it is obvious that pm ≥ 0 and µ0 = m≤t pm = 1, so we may think of (t) pn as the distribution of a random variable Xt such that  m X = = p(t) for m ∈ {0, 1, 2, . . . , t}. P t t m

13 2.2. HMP over the unit square

Now consider t m t X X m(m − 1)(m − 2) ··· (m − n + 1) µ = n p(t) = p(t) n t  m t(t − 1)(t − 2) ··· (t − n + 1) m m=n n m=n t m m 1 m 2 m n−1 X t ( t − t )( t − t ) ··· ( t − t ) (t) = 1 2 n−1 pm m=n 1(1 − t )(1 − t ) ··· (1 − t ) t Xmn 1 = p(t) + o t m t m=0 1 = (X )n + o E t t n where limt→∞ E(Xt) exists because any continuous bounded function can be 1 approximated by Bernstein polynomials and o( t ) tends to 0. Now, by Helly’s (ti) (t) Theorem (see Appendix), we can choose a subsequence pm of pm , which con- verges to some distribution function pm. So we get

n −1 µn = E(Xti ) + o(ti )

(ti) where Xti are the random variables that correspond to pm . Now, by the Helly- Bray Theorem (see Appendix), if we take ti → ∞, then

n µn = E(X) where X is the random variable of pm, i.e. {µn} is a moment sequence of pm. 

2.2 HMP over the unit square

The approach used in the previous section can be easily extended to the two- dimensional cases. The first obvious extension is over the unit square, i.e. S = [0, 1] × [0, 1]. We shall follow an approach that is similar to Dale (1987).

For two random variables X and Y , with joint distribution function F over S = [0, 1] × [0, 1], we define the (r, s)-th moment µrs by Z r s r s µrs = EF (X Y ) = x y dF (x, y) , S where EF (or simply E) indicates expectation with respect to F .

Definition 2.3. Given a sequence {µrs : r, s = 0, 1, 2,...}, let us define

∆1∆2 µrs = µrs − µr+1,s − µr,s+1 + µr+1,s+1 .

14 2.2. HMP over the unit square

Definition 2.4. The two-dimensional Bernstein polynomial (Lorentz (1953)) is given by

n m X X  i j nm Bf (x, y) = f , xi(1 − x)n−iyj(1 − y)m−j , n,m n m i j i=0 j=0 where f is any function defined and bounded on the unit square.

Theorem 2.3. If f is bounded on S, then, at each continuity point (x, y) of f

lim Bf (x, y) = f(x, y) n,m→∞ n,m

Proof. Let ε > 0 be given and let (x, y) be a continuity point of f. Then

0 0 (∃δ1, δ2) 3 |x − x | < δ1 and |y − y | < δ2

⇒ |f(x, y) − f(x0, y0)| < ε Let nm q(n,m) = xi(1 − x)n−iyj(1 − y)m−j , i,j i j then clearly n m X X (n,m) qi,j = 1 . i=0 j=0 Now

X Xh  i j i (n,m) f(x, y) − Bf (x, y) = f(x, y) − f , q n,m n m i,j i j ≤ S1 + S2 + S3 + S4 where S1, S2, S3 and S4 are the double sums of  i j  f(x, y) − f , q(n,m) n m i,j over the respective regions n i j o R = (i, j): − x < δ , − y < δ 1 n 1 m 2 n i j o R = (i, j): − x < δ , − y ≥ δ 2 n 1 m 2 n i j o R = (i, j): − x ≥ δ , − y < δ 3 n 1 m 2 n i j o R = (i, j): − x ≥ δ , − y ≥ δ 4 n 1 m 2

15 2.2. HMP over the unit square

Notice next that X  i j  (n,m) X (n,m) S = f(x, y) − f , q ≤ ε q = ε . 1 n m i,j i,j R1 Furthermore, since f is bounded on S, say |f(x, y)| ≤ M, and (see Lorentz (1953), p. 6) X n 1 xi(1 − x)n−i ≤ , i 4nδ2 |i/n−x|≥δ we get

X  i j  (n,m) S = f(x, y) − f , q 2 n m i,j R2 X  i j  ≤ f(x, y) − f , n m R2 X n X m × xi(1 − x)n−i yj(1 − y)m−j i j |i/n−x|<δ1 |j/m−y|≥δ2 1 ≤ 2M. 1 . 2 4mδ2 2M = 2 . 4mδ2 Similarly, we can also get 2M S3 ≤ 2 4nδ1 2M S4 ≤ 2 2 (4nδ1)(4mδ2) and hence, we get the required result by taking n, m → ∞. 

From this theorem, we hence have n m X X  i j nm (Bf ) = f , ∆n−i∆m−jµ → (f) EF n,m n m i j 1 2 ij EF i=0 j=0 as m and n tend to infinity.

Theorem 2.4. For every sequence {µrs : r, s = 0, 1, 2,...} of real numbers, there exists a distribution function F (x, y) on S = [0, 1] × [0, 1] such that Z r s µrs = x y dF (x, y) , r, s = 0, 1, 2,..., S if, and only if, u v µ00 = 1 and ∆1 ∆2µrs ≥ 0 for all r, s, u, v . (2.5)

16 2.2. HMP over the unit square

Proof. (⇒) Since Z r s µrs = x y dF (x, y) , S we can get

∆1∆2 µrs = µrs − µr+1,s − µr,s+1 + µr+1,s+1 Z = x(1 − x)y(1 − y)dF (x, y) S = EX(1 − X)Y (1 − Y ) , and by induction we get

u v r u s v ∆1 ∆2µrs = EX (1 − X) Y (1 − Y ) .

u v It is then clear that µ00 = 1 and ∆1 ∆2µrs ≥ 0, since X,Y ∈ [0, 1].

(⇐) Suppose the conditions in (2.5) are true and let us define nm p(n,m) = ∆n−r∆m−sµ . r,s r s 1 2 rs

Now we have n m X X uv p(n,m) r s u,v u=0 v=0 n m X X uvnm = ∆n−u∆m−vµ r s u v 1 2 uv u=0 v=0 n m      X X n m n − r m − s (n−r)−(u−r) (m−s)−(v−s) = ∆ ∆ µ r s u − r v − s 1 2 uv u=0 v=0    n m    n m X X n − r m − s (n−r)−(u−r) (m−s)−(v−s) = ∆ ∆ µ r s u − r v − s 1 2 uv u=r v=s (let k = u − r and h = v − s)    n−r m−s    n m X X n − r m − s (n−r)−k (m−s)−h = ∆ ∆ µ r s k h 1 2 r+k,s+h k=0 h=0 nm = µ r s rs (see Dale (1983) for details of the above final step)

Now, for r = s = 0 we have

n m X X (n,m) pu,v = µ00 = 1 u=0 v=0

17 2.3. HMP over a triangle

(n,m) (n,m) and clearly pu,v ≥ 0. Thus we may think of pu,v as the atomic distribution of random variables Xn and Ym, such that  u v  X = ,Y = = p(n,m) P n n m m u,v where u ∈ {0, 1, 2, . . . , n} and v ∈ {0, 1, 2, . . . , m}.

Now consider n m uv X X r s (n,m) µrs = nmpu,v u=0 v=0 r s n m X X u(u − 1) ··· (u − r + 1)v(v − 1) ··· (v − s + 1) = p(n,m) n(n − 1) ··· (n − r + 1)m(m − 1) ··· (m − s + 1) u,v u=0 v=0 n m u u 1 u r−1 v v 1 v s−1 X X ( − ) ··· ( − ) ( − ) ··· ( − ) = n n n n n m m m m m p(n,m) 1(1 − 1 ) ··· (1 − r−1 )1(1 − 1 ) ··· (1 − s−1 ) u,v u=0 v=0 n n m m n m X Xur v s = p(n,m) + R n m u,v u=0 v=0 r s = E(XnYm) + R r s where R tends to 0 as n and m tend to ∞. Moreover, limn,m→∞ E(XnYm) exists because any continuous bounded function can be approximated by Bernstein polynomials as proved in the previous theorem.

(ni,mj ) Now, by Helly’s Theorem (see Appendix), we can choose a subsequence pu,v (n,m) of pu,v , which converges to some distribution pu,v. So we get

r s µrs = E(Xni Ymj ) + R

(ni,mj ) where Xni and Ymj are random variables that correspond to pu,v . Now, by the Helly-Bray Theorem (see Appendix), if we take ni, mj → ∞, then

r s µrs = E(X Y ) where X and Y are random variables that correspond to pu,v, i.e. {µrs} is a mo- ment sequence of pu,v. 

2.3 HMP over a triangle

The approach can be easily extended to distributions over the triangle (Dale (1987)) T = {(x, y): x ≥ 0, y ≥ 0, x + y ≤ 1}.

18 2.3. HMP over a triangle

For random variables X and Y , with joint distribution G over T , we define the (r, s)-th moment ωrs by Z r s r s ωrs = EG(X Y ) = x y dG(x, y) , T where EG (or simply E) indicates expectation with respect to G.

Definition 2.5. Given a sequence {ωrs : r, s = 0, 1, 2,...}, let us define

δ ωrs = ωrs − ωr+1,s − ωr,s+1 .

Definition 2.6. The two-dimensional Bernstein polynomial is given by

X  i j  n  Bf (x, y) = f , xiyj(1 − x − y)n−i−j , n n n i j Λ where  n  n! = , i j i!j!(n − i − j)!

Λ = {(i, j): i ≥ 0, j ≥ 0, i + j ≤ n} and f is any function defined and bounded over T .

Theorem 2.5. If f is bounded on T , then, at each continuity point (x, y) of f

lim Bf (x, y) = f(x, y) . n→∞ n Proof. Let ε > 0 be given and let (x, y) be a continuity point of f. Then

0 0 (∃δ1, δ2) 3 |x − x | < δ1 and |y − y | < δ2

⇒ |f(x, y) − f(x0, y0)| < ε Let  n  q(n) = xiyj(1 − x − y)n−i−j , i,j i j then clearly X (n) qi,j = 1 . Λ Now

Xh  i j i (n) f(x, y) − Bf (x, y) = f(x, y) − f , q n n n i,j Λ ≤ S1 + S2 + S3 + S4

19 2.3. HMP over a triangle

where S1, S2, S3 and S4 are the sums of  i j  f(x, y) − f , q(n) n n i,j over the respective regions n i j o R = (i, j): − x < δ , − y < δ 1 n 1 n 2 n i j o R = (i, j): − x < δ , − y ≥ δ 2 n 1 n 2 n i j o R = (i, j): − x ≥ δ , − y < δ 3 n 1 n 2 n i j o R = (i, j): − x ≥ δ , − y ≥ δ . 4 n 1 n 2

As in Theorem 2.3, we have S1 ≤ ε. By noting

X (n) X n! j q = j xiyj(1 − x − y)n−i−j ij i!j!(n − i − j)! Λ Λ X (n − 1)! = ny xiyj−1(1 − x − y)n−i−j i!(j − 1)!(n − i − j)! Λ = ny , similarly,

X (n) 2 j(j − 1)qij = n(n − 1)y Λ and also that in R , |j/n−y| ≥ 1 and y(1 − y) ≤ 1 , we may then derive 2 δ2 4

X (n) 2 X 2 (n) qi,j ≤ (1/δ2) (j/n − y) qi,j R2 Λ 1 X (n) = (j − ny)2q n2δ2 ij 2 Λ 1 X (n) = {j(j − 1) − (2ny − 1)j + n2y2}q n2δ2 ij 2 Λ 1 2 2 2 = 2 2 {n(n − 1)y − (2ny − 1)ny + n y } n δ2 1 = 2 2 ny(1 − y) n δ2 y(1 − y) = 2 nδ2 1 ≤ 2 . 4nδ2

20 2.3. HMP over a triangle

P (n) 1 and similarly R qi,j ≤ 2 . 3 4nδ1 Then, if |f(x, y)| ≤ M, we get

X  i j  (n) S = f(x, y) − f , q 2 n n i,j R2 X  i j  X (n) ≤ f(x, y) − f , q n n i,j R2 R2 1 ≤ 2M. 2 4nδ2 M = 2 2nδ2 2 and similarly S3 ≤ M/(2nδ1).

Finally, using the Schwarz inequality and the facts that |i/n−x| ≥ 1, |j/n−y| ≥ 1 δ1 δ2 in R4, we get

X  i j  (n) S = f(x, y) − f , q 4 n n i,j R4 2M i j X (n) ≤ − x − y qi,j δ1δ2 n n Λ 2M i   1 j   1 X (n) 2 (n) 2 ≤ − x qi,j × − y qi,j δ1δ2 n n Λ 1 1 2M hX 2 (n)i 2 hX 2 (n)i 2 ≤ 2 (i − nx) qi,j (j − ny) qi,j n δ1δ2 Λ Λ M ≤ 2nδ1δ2 and hence the required result by taking n → ∞. 

From this theorem, we hence have

f EG(Bn) → EG(f) as n → ∞.

21 2.3. HMP over a triangle

Theorem 2.6. For every sequence {ωrs : r, s = 0, 1, 2,...} of real numbers, there exists a distribution function G(x, y) on T such that Z r s ωrs = x y dG(x, y) , r, s = 0, 1, 2,..., T if, and only if, t ω00 = 1 and δ ωrs ≥ 0 for all r, s, t . (2.6) Proof. (⇒) Since Z r s ωrs = x y dG(x, y) , T we can get

δωrs = ωrs − ωr+1,s − ωr,s+1 Z = xrys(1 − x − y)dG(x, y) T r s = EX Y (1 − X − Y ) , and by induction we get

t r s t δ ωrs = EX Y (1 − X − Y ) .

t It is then clear that ω00 = 1 and δ ωrs ≥ 0, since X,Y ∈ [0, 1].

(⇐) Suppose the conditions in (2.6) are true and let us define

 n  p(n) = δn−r−sω . r,s r s rs

As in the unit square case, we will follow the approach similar to Dale (1983) by introducing the operators E1 and E2 such that

E1ωrs = ωr+1,s

E2ωrs = ωr,s+1

(δ + E1 + E2)ωrs = ωrs and also noting that n X n (x + y)n = xiyn−i , i i=0

22 2.3. HMP over a triangle then we have for Ω = {(u, v): u ≥ 0, v ≥ 0, u + v ≤ n}, X uv p(n) r s u,v Ω X uv n  = δn−u−vω r s u v uv Ω n−s n−u  n  X X n − r − sn − u − s = δn−u−vω r s u − r v − s uv u=r v=s n−r−s n−r−s−k  n  X X n − r − sn − r − s − k = δ(n−r−s)−k−hω r s k h s+k,r+h k=0 h=0 n−r−s n−r−s−k  n  X X n − r − sn − r − s − k = δ(n−r−s)−k−hEkEhω r s k h 1 2 rs k=0 h=0 n−r−s n−r−s−k  n  X n − r − s X n − r − s − k = Ek δ(n−r−s)−k−hEhω r s k 1 h 2 rs k=0 h=0 n−r−s  n  X n − r − s = Ek(δ + E )(n−r−s)−kω r s k 1 2 rs k=0  n  = (E + δ + E )n−r−sω r s 1 2 rs  n  = ω r s rs where k = u − r and h = v − s.

Now, for r = s = 0 we have X (n) pu,v = ω00 = 1 Ω (n) (n) and clearly pu,v ≥ 0, thus we may think of pu,v as the atomic distribution of random variables Xn and Yn, such that  u v  X = ,Y = = p(n) P n n n n u,v where u ≥ 0, v ≥ 0 and u + v ≤ n.

Now consider uv X r s (n) ωrs = n  pu,v Ω r s X u(u − 1) ··· (u − r + 1)v(v − 1) ··· (v − s + 1) = p(n) n(n − 1) ··· (n − r + 1)(n − r)(n − r − 1) ··· (n − r − s + 1) u,v Ω

23 2.4. Some Remarks on HMP

u u 1 u r−1 v v 1 v s−1 X ( − ) ··· ( − ) ( − ) ··· ( − ) = n n n n n n n n n n p(n) 1(1 − 1 ) ··· (1 − r−1 )(1 − r )(1 − r+1 ) ··· (1 − r+s−1 ) u,v Ω n n n n n Xur v s = p(n) + R n n u,v Ω r s = E(XnYn ) + R r s where R tends to 0 as n tends to ∞. Moreover, limn→∞ E(XnYn ) exists because any continuous bounded function can be approximated by Bernstein polynomi- als as proved in the previous theorem.

(nq) Now, by Helly’s Theorem (see Appendix), we can choose a subsequence pu,v (n) of pu,v, which converges to some distribution pu,v. So we get

r s ωrs = E(Xnq Ynq ) + R

(nq) where Xnq and Ynq are random variables that correspond to pu,v . Now, by the Helly-Bray Theorem (see Appendix), if we take nq → ∞, then

r s ωrs = E(X Y ) where X and Y are random variables that correspond to pu,v, i.e. {ωrs} is a mo- ment sequence of pu,v. 

2.4 Some Remarks on HMP

Remark 2.1. (Uniqueness). The uniqueness of the solution of HMP is easily seen from the proof. In the three cases that we have discussed, we defined

t p(t) = ∆t−nµ n n n

nm p(n,m) = ∆n−r∆m−sµ r,s r s 1 2 rs  n  p(n) = δn−r−sω r,s r s rs which tend to the three solutions respectively. They, in fact, represent the in- version formulae expressing the distribution functions in terms of their cor- responding moments. For a detailed proof of the uniqueness, see Shohat & Tamarkin (1943).

24 2.4. Some Remarks on HMP

In general, when there is a substantially unique solution1, we say that the moment problem is determined; or if there is more than one solution, in which case there are necessarily infinitely many solutions; the moment problem is then indeterminate.

It should also be pointed out that the situation is very different for distribu- tions that are not concentrated on some finite interval. In fact, generally, a distribution function is not determined by its moments. The following example (due to C. C. Heyde, see Feller (1966)) illustrates this.

Example 2.1 (The log-normal distribution). The log-normal distribution is not determined by its moments. The positive variable X is said to have a log-normal distribution if log X is normally dis- tributed. For the standard normal distribution, the density of X is defined by

1 −1 − 1 (log x)2 f(x) = √ x e 2 , x > 0 , 2π and f(x) = 0 for x ≤ 0. For −1 ≤ a ≤ 1, define

fa(x) = f(x)[1 + a sin(2π log x)] .

We claim that fa is a probability density with exactly the same moments as f. Since fa ≥ 0, it suffices to show that Z ∞ xkf(x) sin(2π log x)dx = 0 , k = 0, 1, 2,... 0 The substitutions log x = t = y + k reduce the integral to Z ∞ Z ∞ 1 − 1 t2+kt 1 1 k2 − 1 y2 √ e 2 sin(2πt)dt = √ e 2 e 2 sin(2πy)dy 2π −∞ 2π −∞ and the last integral vanishes since the integrand is an odd function. 

In fact, we may also have the extreme case such that a whole class of distri- butions having the same moments. An example of such cases (which is due to Feller and may be found in Romano & Siegel (1986)) is given below.

1Two distribution functions are substantially equal if they have the same points of continuity and if their values at points of discontinuity only differ by a constant, i.e. Φ1 and Φ2 are substan- R R tially equal in some k-dimensional Euclidean space Rk if and only if f(t)dΦ1 = f(t)dΦ2 Rk Rk for any continuous function f(t) which vanishes for all sufficiently large values of |t|.

25 2.4. Some Remarks on HMP

Example 2.2 (A class of distributions with common moments). Consider the class of probability densities indexed by α in [0, 1] defined by 1 f (x) = exp(−x1/4)[1 − α sin(x1/4)] (x) . α 24 I[0,∞) With some tedious integration by parts, one may obtain Z ∞ xk exp(−x1/4) sin(x1/4)dx = 0 , 0 for k = 0, 1,..., which means that the k-th moment of a random variable with density fα(x) is given by Z ∞ 1 xk exp(−x1/4)dx , 0 24 which is independent of the choice of α, i.e. all densities fα(x) have an identical moment sequence.

Remark 2.2. (Completely monotonic sequences). Firstly, it must be noted that in solving the HMP over the unit interval, the increments may alternatively be defined as

∆µn = µn+1 − µn .

Thus, the conditions for HMP over the unit interval can be written as

r r µn ≥ 0 , µ0 = 1 and (−1) ∆ µn ≥ 0 for all n, r .

In general, any sequence of numbers {µn} that satisfies the original HMP con- ditions in (2.4), or equivalently the above conditions, is called completely mono- tone. Thus we may restate the HMP as follows: Any sequence of numbers {µn} is a moment sequence if and only if it is completely monotone.

Remark 2.3. (Finite moment problems). When only a finite sequence is given in a moment problem, then it is referred to as a finite (or reduced, or truncated) moment problem.

It is quite true that some distributions only have a finite number of moments. For example, the Student’s t-distribution, with n degrees of freedom, has mo- ments of order 0, 1, . . . , n−1 but no higher moments exist (See Mood et al. (1974), p. 542). However, a finite sequence of numbers does not necessarily define a density (let alone a unique one), and gives far less information than an infinite sequence, i.e. the finite moment problem is more general than the full moment problem (shown by Stochel (2001)).

26 2.5. Further Results for HMP

It is not in the scope of this thesis to study the finite moment problems and readers are referred to Shohat & Tamarkin (1943) and Talenti (1987) for more details.

2.5 Further Results for HMP

For bounded regions in higher dimension, we may also obtain similar results to those in Sections 2.1 to 2.3. We shall only state a few of them that will be useful later. As the proofs are straightforward generalizations, albeit alge- braically cumbersome, of the earlier ones, I will only provide sketched proofs. First, we will look at the obvious extension of HMP over the square to a multi- dimensional cube.

Theorem 2.7. (HMP over k-dimensional unit cube)

For every sequence {µn1,...,nk : ni ∈ N0} of real numbers, there exists a distribution k function F (x1, . . . , xk) on S = [0, 1] such that Z n1 nk µn1,...,nk = x1 ··· xk dF (x1, . . . , xk) , ni = 0, 1, 2,..., S if, and only if,

r1 rk µ0,...,0 = 1 and ∆1 ··· ∆k µn1,...,nk ≥ 0 for all ni, ri, i ∈ [k] , (2.7) with ∆1 ··· ∆k defined in the obvious way, as in Section 2.2.

Proof. (Sketch only) (⇒) Clear. (⇐) Define the k-dimensional Bernstein polynomial by

n1 nk k   X X  i1 ik  Y nj i Bf (x , . . . , x ) = ··· f ,..., x j (1 − x )nj −ij , n1,...,nk 1 k j j n1 nk ij i1=0 ik=0 j=1 where f is any function defined and bounded on the k-dimensional unit square and show that Bf (x , . . . , x ) → f(x , . . . , x ) . E n1,...,nk 1 k E 1 k Then consider     m1 mk pm1,...,mk = ··· ∆m1−n1 ··· ∆mk−nk µ , n1,...,nk 1 k n1,...,nk n1 nk

27 2.5. Further Results for HMP and show that m m X1 Xk ··· pm1,...,mk = 1 . u1,...,uk u1=0 uk=0

n1,...,nk Finally, define random variables X1,X2,..., with pu1,...,uk as the atomic joint distribution, i.e.

 u1 uk  X = ,...,X = = pm1,...,mk P m1 mk u1,...,uk m1 mk and apply the above result for k-dimensional Bernstein polynomial, Helly’s The- orem and Helly-Bray Theorem to obtain

n1 nk µn1,...,nk = E(X1 ··· Xk ) , where X1,X2,...,Xk are random variables that correspond to the limit distri- bution pu1,...,uk . 

The HMP over the triangle (Section 2.3) may also be generalized in two obvious ways. Firstly, we can extend it to a multi-dimensional simplex, then secondly, to the Cartesian product of several simplexes.

Theorem 2.8. (HMP over k-dimensional simplex)

For every sequence {ωn1,...,nk : ni ∈ N0} of real numbers, there exists a distribution Pk function F (x1, . . . , xk) on T = {(x1, . . . , xk): xi ≥ 0 and i=1 xi ≤ 1} such that Z n1 nk ωn1,...,nk = x1 ··· xk dF (x1, . . . , xk) , ni = 0, 1, 2,..., T if, and only if, r ω0,...,0 = 1 and δ ωn1,...,nk ≥ 0 for all ni, r, (2.8) with δ defined by

δωn1,...,nk = ωn1,...,nk − ωn1+1,...,nk − ωn1,n2+1,...,nk − · · · − ωn1,...,nk+1 and δr is simply r repetitions of δ.

Proof. (Sketch only) (⇒) Clear. (⇐) Define the Bernstein polynomial over a k-dimensional simplex by i i  n  f X 1 k i1 ik n−i1−···−ik Bn(x1, . . . , xk) = f ,..., x1 ··· xk (1−x1 −· · ·−xk) , n n i1 ··· ik Λ Pk where Λ = {(i1, . . . , ik): ij ≥ 0 , j=1 ij ≤ n} and f is any function defined and bounded on T . Then we may show that

f EBn(x1, . . . , xk) → Ef(x1, . . . , xk) .

28 2.5. Further Results for HMP

Then consider  m  pm = δm−n1−···−nk ω , n1,...,nk n1,...,nk n1 ··· nk and show that X pm = 1 , u1,...,uk Ω Pk where Ω = {(u1, . . . , uk): uj ≥ 0 , j=1 uj ≤ m}. Lastly, define random vari- ables X ,X ,... , with pm as the atomic joint distribution, i.e. 1,m 2,m u1,...,uk

 u1 uk  m X1,m = ,...,X = = p P m k,m m u1,...,uk and apply the above result for k-dimensional Bernstein polynomial, Helly’s The- orem and Helly-Bray Theorem to obtain

n1 nk ωn1,...,nk = E(X1 ··· Xk ) , where X1,X2,...,Xk are random variables that correspond to the limit distri- bution pu1,...,uk . 

Now, by defining the following vector notations

xi = (xi1, . . . , xik) , ni = (ni1, . . . , nik) and ni ni1 nik xi = xi1 ··· xik , we may also state the following theorem:

Theorem 2.9. (HMP over several k-dimensional simplex) k For every sequence {ωn1,...,ng : ni ∈ [N0] } of real numbers, there exists a distri- Pk bution function F (x1,..., xg) on G = {(x1,..., xg): xij ≥ 0 and j=1 xij ≤ 1} such that Z n1 ng ωn1,...,ng = x1 ··· xg dF (x1,..., xg) , nij = 0, 1, 2,..., T if, and only if,

r1 rg ω0,...,0 = 1 and δ1 ··· δg ωn1,...,ng ≥ 0 for all ni, ri, (2.9)

ri with δi defined as in Theorem 2.8, but only applying on ni, and δi are simply ri repetitions of δi.

29 2.5. Further Results for HMP

Proof. (Sketch only) (⇒) Clear. f (⇐) Define the Bernstein polynomial Bn1,...,ng (x1,..., xg) over g identical k-dim- ensional simplex by

g e e   n  X 1 g Y i ei1 eik ni−ei1−···−eik f ,..., xi1 ··· xik (1 − xi1 − · · · − xik) , n1 ng ei1 ··· eik Γ i=1

Pk where Γ = {(e1,..., eg): eij ≥ 0 , j=1 eij ≤ ni} and f is any function de- fined and bounded on G. Then proceed similarly as the previous theorem and eventually derive n1 ng ωn1,...,ng = E(X1 ··· Xg ) , where X1, X2,..., Xg are k-dimensional random vectors that correspond to a limit distribution pu1,...,ug . 

Remark 2.4. (HMP over general bounded regions). The HMP over a k-dimensional unit cube and HMP over a k-dimensional sim- plex were solved by Knill (2000) and Gupta (1999), respectively, in the similar manner as those above.

From this point, it is not hard to imaging that HMP may also be extended to more general bounded regions. In fact, Hausdorff (1923) has already shown the solutions for HMP over any finite interval. And, HMP over more general bounded regions defined by polytopes and polynomial expansions, were solved by Hedge (1941) and Stockbridge (2003).

Although there are many versions of HMP, corresponding to different bounded regions, I have only selected a few to include in this chapter. The reason for this will become clear later.

30 Chapter 3

Exchangeability and Partial Exchangeability

Exchangeability and partial exchangeability are the fundamental ideas in the subjective approach to probability of sequences that replace the independence concept of the objective theory. This chapter introduces some concepts which shall be used in the next chapter.

3.1 Introduction

Consider a sequence of random variables X1,X2,..., and suppose that, for all n, the joint probability density may be written in the form

n Y p(x1, x2, . . . , xn) = p(xi) , i=1 i.e. the Xi are independent random variables. Hence, we get, for any 1 ≤ m < n,

p(xm+1, . . . , xn|x1, . . . , xm) = p(xm+1, . . . , xn) , i.e. past data provide no additional information about the possible future out- comes in the sequence.

As learning from experience is of prime importance in the subjective approach to probability, independence (which is of fundamental importance to an objec- tivist), as illustrated above, is of no value to the subjectivist. Clearly, the subjec- tivist needs some way to “relax” the concept of independence, i.e. the structure of p(x1, . . . , xn) must encapsulate some form of dependence among the individ- ual random variables.

In general, there are a vast number of possible subjective assumptions about

31 3.2. Exchangeable and Partially exchangeable events such dependencies. One of the simplest ways is to continue regarding the order of the random variables as irrelevant, i.e. exchangeability.

3.2 Exchangeable and Partially exchangeable events

We shall first look at the idea of exchangeable (also called symmetric, per- mutable) events, which was formally introduced by Haag (1924).

Suppose we toss a single coin several times and record the results. Let H de- note the event of obtaining a “head” and T denote the event of obtaining a “tail”. And say, after 9 tosses, we obtain the result HHTTHTHTH. What is the prob- ability of obtaining “head” in the next toss?

This probability (“obtaining head in the 10th toss”) is likely to be affected by the frequency of heads and tails in the previous 9 tosses and not by the par- ticular order of them (excluding extreme cases for large samples, e.g. 10 heads followed by 90 tails in 100 tosses, which may well cause a suspicion of fraud). This is saying that all the different sequences of 5 heads and 4 tails have the same probability and will all result in the same influence on the next toss. This leads to the definition of exchangeable events.

Definition 3.1. (Exchangeable Events) A sequence of events is said to be exchangeable if the probability that any n of these events occur depends only on n and not on the particular events chosen.

Now, instead of a single coin, suppose we toss g coins. Three different cases may arise:

1. The coins being perfectly equal. This means we will generate an exchange- able sequence of events (as in the above definition).

2. The coins being completely different. This means each of the g coins will generate a sequence of exchangeable events, with complete independence between the sequences.

3. Some coins being related, i.e. the outcomes of tosses with one coin will influence the probability with respect to tosses with other coins, but in a less direct manner than in case 1.

In other words, case 3 produces g exchangeable sequences as in case 2, but with some interdependence between the sequences. This leads to the definition of a partially exchangeable sequence of events.

32 3.3. Exchangeable Random Variables

Definition 3.2. (Partially Exchangeable Events) A sequence of events is said to be g-fold partially exchangeable if the events split into g types and events of the same type are exchangeable, i.e. the probability that any n1, n2, . . . , ng events of types 1, 2, . . . , g respectively, occur depends only on n1, n2, . . . , ng and not on the particular events chosen.

Remark 3.1. It is important to note that the definition of partially exchangeable events also covers the two extreme cases above, since in both cases the events are exchangeable within types. This means that the assumption of partial ex- changeability is far weaker (i.e. more general) than the one of exchangeability.

Remark 3.2. It is obvious from Definition 3.1 that the probability of any singu- lar event, in a sequence of exchangeable events, occurring is the same. As for a sequence of partially exchangeable events, all singular events of the same type have equal probability of occurring.

3.3 Exchangeable Random Variables

The concepts of exchangeability and partial exchangeability can easily be ex- tended to random variables. Let us first look at some definitions and results for exchangeable random variables.

Definition 3.3. (Finite Exchangeability) A sequence of random variables X1,X2 ...,Xn is said to be exchangeable if their joint distribution function F satisfies

F (x1, x2, . . . , xn) = F (xπ(1), xπ(2), . . . , xπ(n)) , (3.1) for all permutations π defined on the set {1, 2, . . . , n}. In terms of the corre- sponding density or mass function, the condition reduces to

p(x1, x2, . . . , xn) = p(xπ(1), xπ(2), . . . , xπ(n)) , where p(·) is the joint density or mass function.

Definition 3.4. (Infinite Exchangeability) An infinite sequence of random variables {Xn} is said to be exchangeable if all its finite subsequences are exchangeable in the sense of Definition 3.3. 

From these definitions, one might be tempted to ask whether every finite se- quence of exchangeable random variables can be embedded in or extended to an infinitely exchangeable sequence. However, as we shall see in the following example, this is not the case.

33 3.3. Exchangeable Random Variables

Example 3.1 (Non-extendable exchangeability). Suppose we have three random variables X1,X2,X3 defined over {0, 1}, with

P(X1 = 0,X2 = 1,X3 = 1) = P(X1 = 1,X2 = 0,X3 = 1) = P(X1 = 1,X2 = 1,X3 = 0) = 1/3 and all other combinations of X1,X2,X3 have probability 0. These three ran- dom variables are clearly exchangeable. If now we want to introduce a fourth random variable X4, also in {0, 1}, such that X1,X2,X3,X4 are exchangeable, then we will require, for example,

P(X1 = 0,X2 = 1,X3 = 1,X4 = 0) = P(X1 = 0,X2 = 0,X3 = 1,X4 = 1) .

But we have

P(X1 = 0,X2 = 1,X3 = 1,X4 = 0) = P(X1 = 0,X2 = 1,X3 = 1) − P(X1 = 0,X2 = 1,X3 = 1,X4 = 1) = 1/3 − P(X1 = 0,X2 = 1,X3 = 1,X4 = 1) = 1/3 − P(X1 = 1,X2 = 1,X3 = 1,X4 = 0) where

P(X1 = 1,X2 = 1,X3 = 1,X4 = 0) ≤ P(X1 = 1,X2 = 1,X3 = 1) = 0 so that P(X1 = 0,X2 = 1,X3 = 1,X4 = 0) = 1/3 . However, we also have

P(X1 = 0,X2 = 0,X3 = 1,X4 = 1) ≤ P(X1 = 0,X2 = 0,X3 = 1) = 0 and so

P(X1 = 0,X2 = 1,X3 = 1,X4 = 0) 6= P(X1 = 0,X2 = 0,X3 = 1,X4 = 1) .

As a result, we have a finitely exchangeable sequence that cannot even be em- bedded in a larger finitely exchangeable sequence, let alone an infinitely ex- changeable sequence. However, Gnedin (1996) did find a criterion for extending a finite exchangeable sequence to an infinite one, which is not discussed in this thesis. 

In general, it is very easy to see that any collection of independent and identi- cally distributed random variables is exchangeable. However, the converse is not true, as the next example will show.

34 3.3. Exchangeable Random Variables

Example 3.2 (Exchangeability does not imply independence). It should also be noted that an exchangeable sequence need not be independent. From the previous example,

P(X1 = 0,X2 = 1) = P(X1 = 0,X2 = 1,X3 = 0) +P(X1 = 0,X2 = 1,X3 = 1) = 0 + 1/3 = 1/3 .

However, we also have

P(X1 = 0) = P(X1 = 0,X2 = 1,X3 = 0) + P(X1 = 0,X2 = 1,X3 = 1) +P(X1 = 0,X2 = 0,X3 = 0) + P(X1 = 0,X2 = 0,X3 = 1) = 0 + 1/3 + 0 + 0 = 1/3 , and similarly,

P(X2 = 1) = P(X1 = 0,X2 = 1,X3 = 0) + P(X1 = 0,X2 = 1,X3 = 1) +P(X1 = 1,X2 = 1,X3 = 0) + P(X1 = 1,X2 = 1,X3 = 1) = 0 + 1/3 + 1/3 + 0 = 2/3 .

Therefore,

P(X1 = 0)P(X2 = 1) = 2/9 6= 1/3 = P(X1 = 0,X2 = 1) , i.e. X1 and X2 are not independent. 

At this point, one may think of exchangeability as a “weaker” condition than “in- dependent and identically distributed”, however, if we remove the ”identically distributed” condition, then the situation becomes different. In fact, exchange- ability is neither weaker nor stronger than independence alone. The next two examples by Romano & Siegel (1986), together with the previous one, will give more insight into the relations between the concepts of exchangeability, inde- pendence and identically distributed.

Example 3.3 (Independence alone does not imply exchangeability). Let X and Y be independent random variables, and suppose that X follows a Poisson distribution with mean two and Y follows a standard normal (Gaus- sian) distribution. Then X and Y are independent but are not exchangeable because of the different natures of their marginal distributions.

Example 3.4 (Identically distributed does not imply exchangeability). Let X and Y be independently distributed, each with the standard normal

35 3.3. Exchangeable Random Variables

(Gaussian) distribution with mean zero and variance one, and define Z = Y . Then X, Y , and Z are identically distributed. They are, however, not exchange- able because the 3-tuple (X,Y,Z) has a correlation of zero between its first two components, whereas (Y,Z,X) has a correlation of one between its first two components. This implies that these two 3-tuples have different joint distribu- tion functions. 

We also know that independent random variables are uncorrelated and Exam- ple 3.2 shows that exchangeability does not imply independence. So, one would suspect that two random variables being exchangeable does not necessarily im- ply that they are uncorrelated. This is confirmed in the next example, again by Romano & Siegel (1986).

Example 3.5 (Exchangeability does not imply uncorrelated). Let (X,Y ) be uniformly distributed over A = {x, y): x ≥ 0, y ≥ 0, x + y ≤ 1}. By symmetry, X and Y are obviously exchangeable. To see that X and Y are not independent, note that Z P(X ≤ 1/2,Y ≤ 1/2) = 2 dxdy = 1/2 . A However, Z 1/2 Z 1−x P(X ≤ 1/2) = 2 dydx = 3/4 . 0 0 Therefore, by exchangeability,

P(X ≤ 1/2) P(Y ≤ 1/2) = 9/16 6= 1/2 = P(X ≤ 1/2,Y ≤ 1/2) . So, we have X and Y being exchangeable but not independent. Further, the marginal density of X is given by Z 1−x 2 dy = 2(1 − x)I[0,1](x) . 0 Thus, Z 1 E(X) = x · 2(1 − x) dx = 3/4 . 0 Also, Z 1 Z 1−x E(XY ) = xy dydx = 1/24 . 0 0 Thus, by exchangeability, we have

Cov(X,Y ) = E(XY ) − E(X)E(Y ) = 1/24 − 9/16 = −25/48 6= 0 , so X and Y are correlated.

36 3.3. Exchangeable Random Variables

Remark 3.3. (Exchangeable Events). From now on, we may restrict our attention to exchangeable sequences of ran- dom variables rather than exchangeable events, since exchangeable events ap- pear as a special case of exchangeable random variables, by making use of in- dicator functions. Similarly, all results for partially exchangeable random vari- ables (which we will discuss later) apply to partially exchangeable events.

Remark 3.4. (Reducing finitely exchangeable sequences). Suppose X1,X2,...,Xn is a finitely exchangeable sequence of random variables. Then, for any permutation π that keeps n constant, Z p(x1, . . . , xn−1) = p(x1, . . . , xn−1, xn)dxn xn Z = p(xπ(1), . . . , xπ(n−1), xn)dxn (by exchangeability) xn = p(xπ(1), . . . , xπ(n−1)) , i.e. X1,X2,...,Xn−1 is an exchangeable sequence. Similarly, any subset of a finitely exchangeable sequence is again finitely exchangeable. Note that, by definition, such reduction is trivial for an infinitely exchangeable sequence. 

We can also easily see that exchangeable random variables are identically dis- tributed, e.g. if random variables X and Y are exchangeable with joint dis- tribution function FX,Y , then the marginal distributions can be obtained, by exchangeability, as follows

FX (a) = lim FX,Y (a, b) = lim FY,X (a, b) = FY (a) . b→∞ b→∞ In fact, all joint distributions for exchangeable random variables are the same. This is shown in the following theorem from Taylor et al. (1985).

Theorem 3.1. (Equal joint distributions) Let X1,...,Xn be a sequence of exchangeable random variables. Then for any integer k, 1 ≤ k ≤ n, and subset {i1, . . . , ik} of k distinct elements from {1, . . . , n}, we have

P(X1 ∈ B1,...,Xk ∈ Bk) = P(Xi1 ∈ B1,...,Xik ∈ Bk) for all B1,...,Bk ∈ B(R). Proof. Let π be a permutation on {1, 2, . . . , n} such that, for 1 ≤ j ≤ k, we have π(j) = ij. Then, given B1,...,Bk ∈ B(R), let Bm = R for m = k + 1, . . . , n. By

37 3.4. De Finetti’s Theorems on Exchangeability exchangeability and the selection of π, it follows that

n \  P(X1 ∈ B1,...,Xk ∈ Bk) = P [Xi ∈ Bi] i=1 n \  = P [Xπ(i) ∈ Bi] i=1 k \  = P [Xπ(i) ∈ Bi] i=1

= P(Xi1 ∈ B1,...,Xik ∈ Bk) . 

3.4 De Finetti’s Theorems on Exchangeability

After introducing the concept of exchangeability, one may be tempted to find a representation theorem for such sequences. Haag (1924) did give a hint about this in his paper, but it was de Finetti (1930) who formally provided a represen- tation theorem for infinitely exchangeable sequences that take values in {0, 1}. It can be stated as follows:

Theorem 3.2. (Representation theorem for {0, 1}-random variables) An infinite sequence of random variables X1,X2,..., taking values in {0, 1}, is exchangeable iff there exists a distribution function F such that, ∀n ∈ N, Z 1 n Y xi 1−xi p(x1, x2, . . . , xn) = θ (1 − θ) dF (θ) , (3.2) 0 i=1 where θ is the probability of obtaining 1.

Proof. (We will follow closely the proof given by Heath & Sudderth (1976)) (⇐) Clear. (⇒) Let us suppose x1 + x2 + ··· + xn = yn, then, by exchangeability we get  n  p(x1 + ··· + xn = yn) = p(xπ(1), . . . , xπ(n)) (3.3) yn for any 0 ≤ yn ≤ n and any permutation π such that xπ(1) + ··· + xπ(n) = yn. Moreover, for any N ≥ n ≥ yn ≥ 0, the left side of equation (3.3) becomes

N−(n−yn) X p(x1 + ··· + xn = yn|x1 + ··· + xN = yN ) p(x1 + ··· + xN = yN ) . yN =yn

38 3.4. De Finetti’s Theorems on Exchangeability

Now note, by exchangeability, all possible arrangements of the yN 1’s among the N places are equal likely, i.e. sampling n items without replacement from an urn of N items containing yN 1’s and N − yN 0’s. The above then becomes

N−(n−yn)      X yN N − yN . N p(x1 + ··· + xN = yN ) , yn n − yn n yN =yn which simplifies to

  N−(n−yn) n X (yN )yn (N − yN )n−yn p(x1 + ··· + xN = yN ) , yn (N)n yN =yn

where (yN )yn = yN (yN − 1) ··· [yN − (yn − 1)], etc.

Let us now define a distribution function FN (θ) on R, which is 0 for θ ≤ 0 and has jumps of p(x1 + ··· + xN = yN ) at θ = yN /N, yN = 0,...,N, then we have

  Z 1 n (θN)yn [(1 − θ)N]n−yn p(x1 + ··· + xn = yn) = dFN (θ) . yn 0 (N)n Note also, as N → ∞,

(θN) [(1 − θ)N] yn n−yn → θyn (1 − θ)n−yn (N)n uniformly in θ. Now, by Helly’s Theorem (see Appendix), there exists a subse- quence FN1 ,FN2 ,... such that

lim FNk = F Nk→∞ where F is also a distribution function. Hence the result. 

Remark 3.5. (Bernoulli random variables) One very useful and significant interpretation of the above theorem is that, given θ, the Xi’s are judged to be independent Bernoulli random variables. In more conventional notation, conditional on θ, X1,...,Xn is a random sample from a Bernoulli distribution with parameter θ, generating a parameterized joint sampling distribution

n n Y Y xi 1−xi p(x1, x2, . . . , xn|θ) = p(xi|θ) = θ (1 − θ) , i=1 i=1 where the parameter is assigned a prior distribution F (θ). 

39 3.4. De Finetti’s Theorems on Exchangeability

It is quite often that one is more interested in the sum of random variables, instead of particular ones, and the representation for the sum of a collection of exchangeable variables is obtained immediately as a result of Theorem 3.2.

Corollary 3.1. (Representation for the sum of {0, 1}-random variables) Let X1,X2,... be an infinite sequence of exchangeable random variables , taking values in {0, 1}. Then

n X Z 1 n ( X = k) = θk(1 − θ)n−kdF (θ) , (3.4) P i k i=1 0 with F and θ defined as in Theorem 3.2.

Proof. Immediate from Theorem 3.2.

Remark 3.6. (Probability of {0, 1}-random variables) Equation 3.4 in the above corollary may also be thought of as the probability of getting k 1’s out of n {0, 1}-random variables.

Remark 3.7. (Exchangeable sequences taking on any two values) It is easy to see that the result in Theorem 3.2 and Corollary 3.1 can be applied, in general, to infinitely exchangeable sequences taking on any two values. For- mally, an infinite sequence of random variables X1,X2,..., taking values in {a, b}, is exchangeable, if and only if, there exists a distribution function F such that, ∀n ∈ N, Z 1   n k n−k p(x1, x2, . . . , xn) = θ (1 − θ) dF (θ) , 0 k where θ is the probability of obtaining a (or b) and k is the number of xi’s that are equal to a (or b, correspondingly).

Remark 3.8. (De Finetti’s theorem may fail for finite sequences). For example, let

P (X1 = 0,X2 = 1) = P (X1 = 1,X2 = 0) = 1/2 ,

P (X1 = 0,X2 = 0) = P (X1 = 1,X2 = 1) = 0 .

Clearly X1 and X2 are exchangeable. If the representation theorem holds, then we have Z 1 Z 1 θ2dF = (1 − θ)2dF = 0 , 0 0

40 3.4. De Finetti’s Theorems on Exchangeability for some F . This means that F assigns mass one at both 0 and 1, which is impossible. In fact, it is not difficult to see the problem from the theory: suppose that X1,...,XN are exchangeable given some F (θ). Then for 1 ≤ i, j ≤ N,

EXiXj = EX1X2 (by exchangeability) = E{E[X1X2|θ]} = E{E[X1|θ]E[X2|θ]} (conditional independence) 2 = E{E[X1|θ] } (conditionally identically distributed)

And of course EXi = E{E[X1|θ]} = EXj, so that

Cov(Xi,Xj) = EXiXj − EXiEXj 2 2 = E{E[X1|θ] } − {EE[X1|θ]} = V ar(E[X1|θ]) ≥ 0 . This immediately gives the fact that de Finetti’s representation theorem only works for finite sequences that are non-negatively correlated, but this condition is not always satisfied in practice. For example, it is known that hypergeometric sequences are negatively correlated. To see this, let Xi and Xj both follow Hypergeometric distributions, with Xi ∼ H(mi, N, n) and Xj ∼ H(mj, N, n), then it is easy to show that nm m N − n Cov(X ,X ) = − i j , i j N 2 N − 1 which is clearly negative. 

We now want to investigate the situation where a sequence of infinitely ex- changeable random variables can take on finitely many values, as opposed to just 0 and 1 (or a and b in general). To simplify the proof, it is easier to consider k-dimensional random vectors, where at most one of the k components can take on the value 1. This clearly will produce k + 1 values that the random vec- tors may take on (including the case of all 0’s). We shall refer to such vectors as “{0, 1}-random vectors” and extend the result of Theorem 3.2 in the obvious way.

Theorem 3.3. (Representation theorem for {0, 1}-random vectors) An infinite sequence of {0, 1}-random vectors X1, X2,... is exchangeable iff there exists a distribution function F such that, ∀n ∈ N, n k P Z 1− xij Y xi1 xik  X  p(x1, x2,..., xn) = θ1 ··· θk 1 − θj dF (θ1, . . . , θk) , (3.5) T i=1 j=1 where θi is the probability of a random vector having 1 in the i-th component only and X T = {(θ1, . . . , θk): θi ≥ 0 and θi ≤ 1} .

41 3.4. De Finetti’s Theorems on Exchangeability

Proof. (⇐) Clear. (⇒) Let us suppose x1 + x2 + ··· + xn = yn, then, by exchangeability we get  n  p(x1 + ··· + xn = yn) = p(xπ(1),..., xπ(n)) (3.6) yn1 yn2 ··· ynk P for any yn such that 0 ≤ yni ≤ n and i yni ≤ n, and any permutation π such that xπ(1) +···+xπ(n) = yn. Moreover, for any vector yN such that N ≥ yNi ≥ yni for all i, the left side of equation (3.6) becomes X p(x1 + ··· + xn = yn|x1 + ··· + xN = yN ) p(x1 + ··· + xN = yN ) . yN

Now note, by exchangeability, all possible arrangements of the yNi 1’s among the N places of the i-th component of the N vectors are equal likely, for all i, i.e. sampling n items without replacement from an urn of N items containing yN1 P of the first type, yN2 of the second type, . . . , etc. and Nk − yNi of the (k + 1)-th type. The above then becomes     P   X yN1 yNk N − yNi N ··· P p(x1 + ··· + xN = yN ) , yn1 ynk n − yni n yN which simplifies to

  P P n X (yN1)yn1 ··· (yNk)ynk (N − yNi)(n− yni) p(x1 +···+xN = yN ) , yn1 ··· ynk (N)n yN where (yNi)yni = yNi(yNi − 1) ··· [yNi − (yni − 1)], etc.

k Let us now define a distribution function FN (θ1, θ2, . . . , θk) on R , which is 0 for θi ≤ 0 and has jumps of p(x1 + ··· + xN = yN ) at θi = yNi/N and θ1 + ··· + θk ≤ 1, yNi = 0,...,N, then we have

  Z P P n (θ1N)yn1 ··· (θkN)ynk [(1 − θi)N](n− yni) dFN (θ1, θ2, . . . , θk) . yn1 ··· ynk T (N)n Note also, as N → ∞, P (θ1N)y ··· (θkN)y [(1 − θi)N] P P n1 nk (n− yni) yn1 ynk X n− yni → θ1 ··· θk (1 − θi) (N)n uniformly in all θi. Now, by Helly’s Theorem (see Appendix), there exists a subsequence FN1 ,FN2 ,... such that

lim FNj = F Nj →∞ where F is also a distribution function. Hence the result. 

42 3.4. De Finetti’s Theorems on Exchangeability

Corollary 3.2. (Representation for random variables of finite number of values) Let X1,X2,... be an infinite sequence of exchangeable random variables , taking values in {0, 1, . . . , k}. Then the probability of having n1 of 1’s, n2 of 2’s, . . . , nk of k’s, for X1,...,Xn, is

Z  n  P n1 nk X n− ni θ1 ··· θk (1 − θi) dF (θ1, . . . , θk) , T n1 ··· nk with F , θi and T defined as in Theorem 3.3. Proof. Immediate from Theorem 3.3.

Remark 3.9. (Multinomial random variables) We can see, from Theorem 3.3 and Corollary 3.2, that it is as if we have a joint sampling distribution of random variables X1,..., Xn, where each Xi comes from a multinomial distribution with parameters θ1, . . . , θk and F is the prior distri- bution for θ1, . . . , θk.

Remark 3.10. (Exchangeable sequences taking on any finite number of values) As Remark 3.7 suggested for the two value case, it is also easy to see that the result in Theorem 3.3 and Corollary 3.2 can be applied, in general, to infinitely exchangeable sequences taking on any finite set of values (or equivalently, in- finite sequences of exchangeable random vectors, taking on a finite number of values, in any finite dimension). 

It is now quite clear that, the more values a sequence of exchangeable random variables can take on, the more parameters we will need. In particular, for a sequence of exchangeable random variables taking on an infinite number of values or values in a continuous set (e.g. R), we will need infinitely (or even uncountably) many parameters. In the following theorem, we will prove the case for random variables taking values in the real line. Similar results easily apply to other cases.

Theorem 3.4. (Representation theorem for random variables over R) An infinite sequence of random variables X1,X2,... , taking values in R, is ex- changeable iff there exists a probability measure Q over τ such that, ∀n ∈ N, n Z Y F (x1, x2, . . . , xn) = G(xi)dQ(G) , (3.7) τ i=1 where τ is the set of all distribution functions on R and Q(G) = limn→∞ P(Gn), with Gn being the empirical distribution function defined by X1,...,Xn.

43 3.4. De Finetti’s Theorems on Exchangeability

Proof. (We will follow the proof given by Chow & Teicher (1988)) (⇐) Clear. (⇒) First, define n 1 X G (x) = n n I[Xi≤x] i=1 and note that 2 = . Now let us denote = then, for m > n, I[Xi≤x] I[Xi≤x] Ii I[Xi≤x] we have n 2 m 2 n m  1 X   1 X   1 X  1 X  [G (x) − G (x)]2 = + − 2 n m n Ii m Ii n Ii m Ii i=1 i=1 i=1 i=1 n n m m 1  X X X  1  X X X  = 2 + 2 + 2 + 2 n2 Ii IiIj m2 Ii IiIj i=1 j i

Now, by exchangeability, we also have E(Ii) = P(Xi ≤ x) = P(X1 ≤ x) and E(IiIj) = P(Xi ≤ x, Xj ≤ x) = P(X1 ≤ x, X2 ≤ x), for all i, j. So, by counting through the number of terms in the sums, we have  n m 2n  [G (x) − G (x)]2 = + − (X ≤ x) E n m n2 m2 nm P 1 2m − 2n n(n − 1) 2 m(m − 1) + · + · n2m 2 m2 2 2  n(n − 1) − n(m − n) + (X ≤ x, X ≤ x) nm 2 P 1 2 m − n = { (X ≤ x) − (X ≤ x, X ≤ x)} . nm P 1 P 1 2 Similarly, for m < n, we can obtain n − m [G (x) − G (x)]2 = { (X ≤ x) − (X ≤ x, X ≤ x)} . E n m nm P 1 P 1 2 So in general,

|n − m| [G (x) − G (x)]2 = { (X ≤ x) − (X ≤ x, X ≤ x)} . E n m nm P 1 P 1 2

44 3.4. De Finetti’s Theorems on Exchangeability

The right-hand side tends to zero as m, n → ∞, therefore Gm(x) tends in prob- ability to some distribution G(x), which means that n n Y Y Gm(xj) → G(xj) (3.8) j=1 j=1 as m → ∞ for some fixed n. Now, let α1, . . . , αn denote positive integers and set

A = {α = (α1, . . . , αn) : 1 ≤ αi ≤ m for 1 ≤ i ≤ n} ∗ A = {α = (α1, . . . , αn): α ∈ A, αi 6= αj, i 6= j, 1 ≤ i, j ≤ n} (α) = . I I[Xα1 ≤x1,...,Xαn ≤xn] For m > n, it then follows that n n m Y 1 Y X 1 X G (x ) = = (α) m j mn I[Xi≤xj ] mn I j=1 j=1 i=1 α∈A 1  X X  = + (α) . mn I α∈A−A∗ α∈A∗ However, 1 X 1 X mn − m(m − 1) ··· (m − n + 1) (α) ≤ 1 = , mn I mn mn α∈A−A∗ α∈A−A∗ which tends to zero as n → ∞. This means that, for large n, n Y 1 X G (x ) ≈ (α) . m j mn I j=1 α∈A∗ But, by exchangeability, we also have Z Z

I(α)dF = I[X1≤x1,...,Xn≤xn]dF = P(X1 ≤ x1,...,Xn ≤ xn) = F (x1, . . . , xn) and so, n Z Y m(m − 1) ··· (m − n + 1) G (x )dF ≈ F (x , . . . , x ). m j mn 1 n j=1 Finally, by recalling (3.8), we have n Z Y G(xj)dQ(G) ≈ F (x1, . . . , xn) , j=1 as m → ∞ and Q(G) = limm→∞ P(Gm). Hence result. 

We will leave the remarks on this theorem for the final discussions in Chap- ter 4.

45 3.5. De Finetti’s Theorems on Partial exchangeability

3.5 De Finetti’s Theorems on Partial exchangeability

In 1938, de Finetti introduced the concept of partial exchangeability, which is often more appropriate than exchangeability. Quoting from de Finetti (1974):

”We cannot consider the case of exchangeability as the normal one but rather an oversimplified case.”

The case of exchangeability is one where there is a complete analogy between all the events under consideration. However, in practice, one will often find this not to be the case. Thus, exchangeability can only be considered as a limiting case and a more general concept must be introduced, i.e. partial exchangeabil- ity.

Definition 3.5. (Finite Partially Exchangeable Random Variables) A g-fold sequence of random variables, {Xij : i ∈ [g], j ∈ [ni]}, is said to be g-fold partially exchangeable if the joint distribution of the n1, n2, . . . , ng random vari- ables of types 1, 2, . . . , g respectively, depends only on n1, n2, . . . , ng and not on the order of the random variables within each type.

Definition 3.6. (Infinite Partially Exchangeable Random Variables) An infinite g-fold sequence of random variables, {Xij : i ∈ [g], j ∈ N}, is said to be g-fold partially exchangeable if all its finite g-fold subsequences of random variables, i.e. {Xij : i ∈ [g], j ∈ [ni]} where ni’s are finite, are partially exchange- able in the sense of Definition 3.5.

Remark 3.11. (Partial exchangeability as drawing from urns) We have already seen in Section 3.4 that a sequence of exchangeable random variables can be seen as the result of an experiment in drawing balls from an urn, i.e. an urn model. This concept can easily be extended to partially ex- changeable random variables.

Suppose we have g urns labeled 1 to g , where the i-th urn contains ni balls, for all i ∈ [g]. Let Xij denote the result of the j-th draw from urn i, then

(i) an infinite partially exchangeable sequence of random variables is formed by {Xij : i ∈ [g], j ∈ [ni]} when drawing with replacement from the urns. (ii) a finite partially exchangeable sequence of random variables is formed by {Xij : i ∈ [g], j ∈ N} when drawing without replacement from the urns.

46 3.5. De Finetti’s Theorems on Partial exchangeability

Remark 3.12. (Identically distributed within types) It is obvious that, since each of the g within-type sequences is exchangeable within itself (exchangeable within types), the sequence of partially exchange- able random variables {Xij : i ∈ [g], j ∈ N} is identically distributed within types. Without loss of generality, let us check this for a 2-fold sequence. Let {Xij : i ∈ [2], j ∈ N} be an infinite partially exchangeable sequence. Then, for all i ∈ [2], {Xij : j ∈ N} is an exchangeable sequence, and hence, for all a, b ∈ R, for all j ∈ N,

P(Xi1 ≤ a) = lim P(Xi1 ≤ a, Xij ≤ b) b→∞ = lim P(Xij ≤ a, Xi1 ≤ b) b→∞ = P(Xij ≤ a) . i.e. {Xij : j ∈ N} is identically distributed.

Example 3.6 (g-IID sequences are g-fold partially exchangeable). It follows immediately from the definitions that a g-fold IID sequence is g-fold partially exchangeable, however, the converse is not true in general, as the exchangeable case has shown earlier. The identical distribution within types of a partially exchangeable sequence follows from Remark 3.12 but idependence of the random variables cannot be obtained in general. 

We will now generalize de Finetti’s representation theorem to partially ex- changeable sequences. First, let us consider the {0, 1} case.

Theorem 3.5. (Representation theorem for several {0, 1}-sequences) An infinite sequence of random variables {Xij : i ∈ [g], j ∈ N}, taking values in {0, 1}, is g-fold partially exchangeable iff there exists a distribution function F such that, ∀ni ∈ N where i ∈ [g],

Z g ni Y Y xij 1−xij p(x1(n1),..., xg(ng)) = θi (1 − θi) dF (θ1, . . . , θg) , (3.9) g [0,1] i=1 j=1 where θi is the probability of obtaining 1 in the sequence {Xij : j ∈ N}}. Proof. (We will follow closely the proof given by Bernardo & Smith (1994)) (⇐) Clear.

(⇒) Let us suppose yi(ni) = xi1 + xi2 + ··· + xini for all i ∈ [g], then, by exchange- ability we get     n1 ng p(y1(n1), . . . , yg(ng)) = ··· p(x1(n1),..., xg(ng)) . (3.10) y1(n1) yg(ng)

47 3.5. De Finetti’s Theorems on Partial exchangeability

Now, for any Ni ≥ ni, i ∈ [g], we may also express the left side of equation (3.10) as X p(y1(n1), . . . , yg(ng)|y1(N1), . . . , yg(Ng)) p(y1(N1), . . . , yg(Ng)) , with the sums ranging from yi(Ni) = yi(ni) to yi(Ni) = Ni, for all i ∈ [g]. Now note, by exchangeability, all possible arrangements of the yi(Ni) 1’s among the Ni places are equal likely, i.e. sampling from g urns, where ni items are drawn without replacement from the i-th urn of Ni items containing yi(Ni) 1’s and Ni − yi(Ni) 0’s. Then p(y1(n1), . . . , yg(ng)|y1(N1), . . . , yg(Ng)) becomes g      Y yi(Ni) Ni − yi(Ni) . Ni . y (n ) n − y (n ) n i=1 i i i i i i

Now let (yN )yn = yN (yN − 1) ··· (yN − (yn − 1)), etc. and define the function g FN1,...,Ng (θ1, . . . , θg) on [0, 1] as the g-dimensional “step” function with “jumps” of p(y1(N1), . . . , yg(Ng)) at   y1(N1) yg(Ng) (θ1, . . . , θg) = ,..., , N1 Ng where yi(Ni) = 0, 1,...,Ni for all i ∈ [g]. We can now write p(y1(n1), . . . , yg(ng)) as Z g    Y ni (θiNi)yi(ni)[(1 − θi)Ni]ni−yi(ni) dFN1,...,Ng (θ1, . . . , θg) . g y (n ) (N ) [0,1] i=1 i i i ni

Note also, as N1,...,Ng → ∞, g   g Y (θiNi)yi(ni)[(1 − θi)Ni]ni−yi(ni) Y y (n ) → θ i i (1 − θ )ni−yi(ni) , (N ) i i i=1 i ni i=1 uniformly in θ1, . . . , θg. Now, by Helly’s Theorem (see Appendix), there exists a subsequence FN1(j),...,Ng(j) such that

lim FN1(j),...,Ng(j) = F N1(j),...,Ng(j)→∞

g where F is also a distribution function on [0, 1] . Hence the result. 

Remark 3.13. (Bernoulli Distributions and Independence) The above theorem asserts that each of the g within-type sequences is condi- tionally Bernoulli distributed, i.e. for a fixed i, {Xij : j ∈ N}} can be judged to be independent Bernoulli random variables conditioned on some random vari- able θi (compare to Remark 3.5), where the θi’s are assigned a joint distribution F .

48 3.5. De Finetti’s Theorems on Partial exchangeability

Also, it is clear that if we have the g within-type sequences being independent, then we may write

dF (θ1, . . . , θg) = dF (θ1)dF (θ2) ··· dF (θg) , i.e. independent prior distributions.

Corollary 3.3. (Representation for {0, 1}-partially exchangeable sequences) An infinite sequence of random variables {Xij : i ∈ [g], j ∈ N}, taking values in {0, 1}, is g-fold partially exchangeable iff there exists a distribution function F such that, the probability of obtaining mi 1’s from ni variables in the i-th within- type sequence, for all i ∈ [g], is given by

g Z  n  Y i mi ni−mi θi (1 − θi) dF (θ1, . . . , θg) , (3.11) g m [0,1] i=1 i where θi is the probability of obtaining 1 in the sequence {Xij : j ∈ N}.

Clearly now, if we want to extend the result to an infinite g-fold partially ex- changeable sequence taking a finite number of values, say m + 1, then we will need to have m parameters for each of the g within-type sequences. To avoid complications in notation we will state the result in the following form, a direct extension of the above corollary.

Theorem 3.6. (Representation for several finitely-valued sequences) An infinite sequence of random variables {Xij : i ∈ [g], j ∈ N}, taking values in {0, 1, . . . , k}, is g-fold partially exchangeable iff there exists a distribution func- tion F such that the probability of obtaining mi1 1’s, mi2 2’s, . . . and mik k’s out of ni variables in the i-th within-type sequence, for all i ∈ [g], is given by Z Z g   Y ni m m X n −P m ··· θ i1 ··· θ ik (1 − θ ) i j ij dF (θ , . . . , θ ) m ··· m i1 ik ij 11 gk T1 Tg i=1 i1 ik j where θij is the probability of obtaining j in the i-th within-type sequence, de- noted by {Xij : j ∈ N}, and, for all i ∈ [g], X Ti = {(θi1, . . . , θik): θi ≥ 0 and θij ≤ 1} . j

Proof. The proof is a straightforward generalization of Theorem 3.5, by consid- ering g urns that contain k + 1 types of items, out of the total of ni items in each urn. Then obtaining a product of multinomial expressions over the g urns. 

49 3.5. De Finetti’s Theorems on Partial exchangeability

Remark 3.14. (Multinomial Distributions) The above theorem implies that an infinite g-fold partially exchangeable se- quence, taking on finitely many values, arises as g sequences of independent random variables with multinomial distributions, conditioned on the parame- ters θi1, . . . , θik for the i-th within-type sequence.

Lastly, we will state the result for infinite partially exchangeable real-valued sequences, this comes, again, as a generalization of a previous result, i.e. Theo- rem 3.4.

Theorem 3.7. (Representation for several sequences in R) A set of infinite sequences of random variables {Xij : i ∈ [g], j ∈ N}, taking values in R, is partially exchangeable iff there exists a probability measure Q g over τ such that, for all ni in N, i ∈ [g],

g n Z Y Yi F (x1, x2,..., xg) = Gi(xij)dQ(G1,...,Gg) , (3.12) g τ i=1 j=1 where xi = (xi1, . . . , xini ), Q(G1,...,Gg) = limn1,...,ng→∞ P(Gn1 ,...,Gng ) and τ the set of all distribution functions on R, with Gni being the empirical distribu- tion functions of {Xij : j ∈ [ni]} for each i = 1, . . . , g. Proof. A direct generalization of Theorem 3.4.

Again, we will leave the discussions on this last representation for the next Chapter.

50 Chapter 4

Connections between Moment Problem and Exchangeability

4.1 Introduction

In the second volume of “An introduction to Probability Theory and Its Appli- cations”, Feller proved de Finetti’s representation theorem for 0-1 random vari- ables via the solution to the Hausdorff moment problem (see Feller (1966, pp. 228–229)). This gave rise to the idea of the current chapter, aiming to draw connections between various moment problems and exchangeability. Let us first look at Feller’s solution.

Theorem 4.1. (An alternative proof to Theorem 3.2) To every infinite sequence of exchangeable random variables X1,X2,... , taking values in {0, 1}, there corresponds a probability distribution F such that

Z 1 n Y xi 1−xi p(x1, x2, . . . , xn) = θ (1 − θ) dF (θ) , 0 i=1 where p(x1, . . . , xn) = P(X1 = x1,...,Xn = xn) and θ is the probability of obtain- ing 1. Proof. For brevity let

pk,n = P(X1 = 1,...,Xk = 1,Xk+1 = 0,...,Xn = 0) . Let us also define µn = pn,n = P(X1 = 1,...,Xn = 1) for n = 1, 2,... and put µ0 = 1. Then probabilistically we get

pn−1,n = pn−1,n−1 − pn,n = ∆µn−1

51 4.1. Introduction and hence 2 pn−2,n = pn−2,n−1 − pn−1,n = ∆ µn−2 . Therefore, by induction, we have for all k < n

n−k pk,n = pk,n−1 − pk+1,n = ∆ µk .

All these quantities are non-negative and hence the sequence {µn} satisfies the conditions in Theorem 2.2. It follows that µn is the n-th moment of a unique probability distribution F , i.e.

Z 1 n µn = θ dF (θ) 0 and Z 1 k n−k pk,n = θ (1 − θ) dF (θ) . 0 Thus, by exchangeability, any sequence with k 1’s and n − k 0’s must have prob- ability pk,n. Moreover, k = x1 + x2 + ··· + xn, therefore

1 Z P P xi n− xi p(x1, x2, . . . , xn) = θ (1 − θ) dF (θ) . 0

Hence result. 

In the above theorem, a link is made between the representation theorem for {0, 1}-random variables and HMP for the unit interval. I will now formalize this correspondence in the next theorem and also provide some extensions to higher dimensions in Sections 4.2 and 4.3.

Theorem 4.2. An infinite sequence of random variables X1,X2,... , taking val- ues in {0, 1}, is exchangeable if and only if the corresponding HMP has a solu- tion, i.e. there exists a distribution function F such that

Z 1 n µn = P(X1 = 1,...,Xn = 1) = θ dF (θ) , 0 for all n = 0, 1, 2,...

Proof. (⇒) Suppose that a sequence of random variables X1,X2,... , taking values in {0, 1}, is exchangeable. Then Theorem 4.1 gives

Z 1   n k n−k P(k 1’s out of n) = θ (1 − θ) dF (θ) . 0 k

52 4.2. HMP and Exchangeability on the unit square and triangle

So clearly Z 1 n µn = P(n 1’s out of n) = θ dF (θ) 0 is a moment sequence.

(⇐) Suppose that {µn} is a moment sequence. Then we have

n−k ∆ µk ≥ 0 (by Theorem 2.2) and obviously

n n X n X n Z 1 ∆n−kµ = θk(1 − θ)n−kdF (θ) = 1 . k k k k=0 k=0 0

So we may define a sequence of random variables X1,X2,... , taking values in {0, 1}, such that

n−k P(a particular choice of k of n of the Xi’s equal to 1) = ∆ µk and thus n (any k of n of the X ’s equal to 1) = ∆n−kµ . P i k k

Then clearly, X1,X2,... is an exchangeable sequence. 

4.2 HMP and Exchangeability on the unit square and triangle

The result of Theorem 4.2 can now easily be extended to HMP over the unit square and HMP over a triangle (refer to Sections 2.2 and 2.3, respectively), with the former involving a more general concept of exchangeability, i.e. partial exchangeability (see Section 3.5).

Theorem 4.3. A 2-fold sequence of random variables {Xij : i ∈ [2], j ∈ N}, taking values in {0, 1}, is partially exchangeable if and only if the corresponding HMP over the unit square has a solution, i.e. there exists a distribution function F such that Z 1 Z 1 r s µrs = P(X1,1 = 1,...,X1,r = 1,X2,1 = 1,...,X2,s = 1) = θ γ dF (θ, γ) , 0 0 for all r, s = 0, 1, 2,...

53 4.2. HMP and Exchangeability on the unit square and triangle

Proof. (⇒) Suppose Xij is a 2-fold partially exchangeable sequence of random vari- ables. Let us define r,s pk,l = P(X1,1 = 1,...,X1,k = 1,X1,k+1 = 0,...,X1,r = 0, X2,1 = 1,...,X2,l = 1,X2,l+1 = 0,X2,s = 0) , for all k < r and l < s, i.e. the probability that we have k 1’s followed by r − k 0’s and l 1’s followed by s − l 0’s, in the two sequences, respectively. Also define

r,s µrs = pr,s = P(X1,1 = 1,...,X1,r = 1,X2,1 = 1,...,X2,s = 1) , for r, s = 1, 2, 3,... and set µ00 = 1. Then we get r,s r−1,s−1 r,s−1 r−1,s r,s pr−1,s−1 = pr−1,s−1 − pr,s−1 − pr−1,s + pr,s = µr−1,s−1 − µr,s−1 − µr−1,s + µr,s

= ∆1∆2µr−1,s−1 and, by induction, we have, for all r, s = 0, 1, 2,... , k < r and l < s, r,s r−1,s−1 r,s−1 r−1,s r,s pk,l = pk,l − pk+1,l − pk,l+1 + pk+1,l+1 r−k s−l = ∆1 ∆2 µkl , which is obviously at least zero. This means that {µrs} is a moment sequence (see Theorem 2.4). Thus, there exists a distribution function F such that Z 1 Z 1 r s µrs = θ γ dF (θ, γ) . 0 0

(⇐) Suppose {µrs} is a 2-dimensional moment sequence in {0, 1}, then we have, for all k ≤ r, l ≤ s and r, s = 0, 1, 2,... ,

r−k s−l ∆1 ∆2 µkl ≥ 0 and so, there exists a distribution function F such that r s X X rs ∆r−k∆s−lµ k l 1 2 kl k=0 l=0 r s X X rs Z 1 Z 1 = θk(1 − θ)r−kγl(1 − γ)s−ldF (θ, γ) k l k=0 l=0 0 0 = 1 .

So we may define a sequence of random variables {Xi,j}, taking values in {0, 1}, such that ! a particular choice of k of r of X1j’s r−k s−l P = ∆1 ∆2 µkl and l of s of X2j’s equal to 1

54 4.2. HMP and Exchangeability on the unit square and triangle

and thus !    any of k of r of X1j’s and r s r−k s−l P = ∆1 ∆2 µkl . l of s of X2j’s equal to 1 k l

Then clearly, {Xij} is a 2-fold partially exchangeable sequence. 

Theorem 4.4. An infinite sequence of random variables X1,X2,... , taking val- ues in {0, 1, 2}, is exchangeable if and only if the corresponding HMP over a triangle (see Section 2.3) has a solution, i.e. there exists a distribution function F such that Z r s ωr,s = P(X1 = 0,...,Xr = 0,Xr+1 = 1,...,Xr+s = 1) = θ γ dF (θ, γ) , T for all r, s = 0, 1, 2,... and T = {(θ, γ): θ ≥ 0, γ ≥ 0, θ + γ ≤ 1}. Proof. (⇒) Suppose that the sequence of random variables X1,X2,... , taking values in {0, 1, 2}, is exchangeable. Now, let us define

pk,l,n = P(X1 = 0,...,Xk = 0,Xk+1 = 1,...,Xk+l = 1, Xk+l+1 = 2,...,Xn = 2) , for k + l ≤ n, and

ωr,s = pr,s,r+s = P(X1 = 0,...,Xr = 0,Xr+1 = 1,...,Xr+s = 1) for r, s = 1, 2, 3,... and put ω00 = 1.

Then we get (see Section 2.3 for the definition of δ)

pr,s,r+s+1 = pr,s,r+s − pr+1,s,r+s+1 − pr,s+1,r+s+1

= ωr,s − ωr+1,s − ωr,s+1

= δωr,s , and, similarly,

pr,s,r+s+2 = pr,s,r+s+1 − pr+1,s,r+s+2 − pr,s+1,r+s+2

= δωr,s − δωr+1,s − δωr,s+1

= δδωr,s 2 = δ ωr,s . So, by induction, we get k pr,s,r+s+k = δ ωr,s ≥ 0

55 4.3. HMP and Exchangeability on several k-dim. simplexes

for any k ≥ 0, i.e. {ωr,s} is a moment sequence (by Theorem 2.6). Thus, there exists a distribution function G such that Z r s ωr,s = θ γ dG(θ, γ) . T

(⇐) Suppose that {ωr,s} is a moment sequence in {0, 1, 2}, then we have, for all r + s ≤ k, k−r−s δ ωr,s ≥ 0 and there exists a distribution function G such that

X  k  X  k  Z δk−r−sω = θrγs(1 − θ − γ)k−r−sdG(θ, γ) = 1 , r s rs r s r,s∈Q r,s∈Q T for Q = {(r, s): r ≥ 0, s ≥ 0, r + s ≤ k}.

So we may define a sequence of random variables X1,X2,... , taking values in {0, 1, 2}, such that ! a particular choice of r Xi’s equal to 0 k−r−s P = δ ωr,s and s Xi’s equal to 1, from k of Xi’s and thus !   any of r Xi’s equal to 0 and k k−r−s P = δ ωr,s . s Xi’s equal to 1, from k of Xi’s r s

Then clearly, X1,X2,... is an exchangeable sequence. 

4.3 HMP and Exchangeability on several k-dim. sim- plexes

We are now ready to explore similar results to cases when an exchangeable or partially exchangeable sequence of random variables can take on finitely many values. Without loss of generality, we will only consider the situation over finite sets {0, 1, . . . , k}. It will turn out that these sequences correspond to Hausdorff moment problems over a k-dimensional simplex and several k-dimensional sim- plexes, respectively (see also Section 2.5).

56 4.3. HMP and Exchangeability on several k-dim. simplexes

Theorem 4.5. An infinite sequence of random variables X1,X2,... , taking val- ues in {0, 1, . . . , k}, is exchangeable if and only if the corresponding HMP over a k-dimensional simplex (see Theorem 2.8) has a solution, i.e. there exists a distri- bution function F such that, for X T = {(θ1, . . . , θk): θi ≥ 0, θi ≤ 1}, i and all n1, . . . , nk = 0, 1, 2,..., we have Z n1 nk ωn1,...,nk = θ1 ··· θk dF (θ1, . . . , θk) , T which is equal to the probability of having the first n1 Xi’s being 1’s, the next n2 Xi’s being 2’s, . . . etc., and the last nk Xi’s being k’s. Proof. (⇒) Let X1,X2,... be an exchangeable sequence of random variables taking values in {0, 1, . . . , k} and define pn1,n2,...,nk,n as the probability of having the first n1 Xi’s equal 1, followed by n2 2’s, . . . etc., and the last n − n1 − · · · − nk Xi’s equal to 0, for all n1 + ··· + nk ≤ n. Also define

ωn1,...,nk = pn1,n2,...,nk,n1+···+nk for all n1, . . . , nk = 0, 1, 2,... and put ω0,...,0 = 1. Then we get

pn1,n2,...,nk,n1+···+nk+1 = pn1,n2,...,nk,n1+···+nk − pn1+1,n2,...,nk,n1+···+nk+1

−pn1,n2+1,...,nk,n1+···+nk+1

− · · · − pn1,n2,...,nk+1,n1+···+nk+1

= ωn1,...,nk − ωn1+1,...,nk − · · · − ωn1,...,nk+1

= δωn1,...,nk , for δ defined as in Theorem 2.8, and similarly

pn1,n2,...,nk,n1+···+nk+2 = pn1,n2,...,nk,n1+···+nk+2 − pn1+1,n2,...,nk,n1+···+nk+2

−pn1,n2+1,...,nk,n1+···+nk+2

− · · · − pn1,n2,...,nk+1,n1+···+nk+2

= δωn1,...,nk − δωn1+1,...,nk − · · · − δωn1,...,nk+1 2 = δ ωn1,...,nk . So, by induction, we get

r pn1,n2,...,nk,n1+···+nk+r = δ ωn1,...,nk , which is obviously greater than or equal to zero, for any r ≥ 0, i.e. {ωn1,...,nk } is a moment sequence (by Theorem 2.8) defined by a distribution F over T , as defined in the theorem statement, such that Z n1 nk ωn1,...,nk = θ1 ··· θk dF (θ1, . . . , θk) . T

57 4.3. HMP and Exchangeability on several k-dim. simplexes

(⇐) Suppose that {ωn1,...,nk } is a moment sequence in {0, 1, . . . , k}, then we have, for all n1 + ··· + nk ≤ n,

n−n1−···−nk δ ωn1,...,nk ≥ 0 and there exists a distribution function F such that  n  X n−n1−···−nk δ ωn1,...,nk n1 n2 ··· nk n1,...,nk∈Q  n  Z P X n1 nk X n− ni = θ1 ··· θk (1 − θi) dF (θ1, . . . , θk) = 1 n1 n2 ··· nk T n1,...,nk∈Q P for Q = {(n1, . . . , nk): ni ≥ 0, ni ≤ n}. So, now, we may define a sequence of random variables X1,X2,... , taking values in {0, 1, . . . , k}, such that ! a particular choice of ni of Xi’s n−n1−···−nk P = δ ωn1,...,nk equal to i, for each i, from n of Xi’s and thus ! any of ni of Xi’s equal to i,  n  n−n1−···−nk P = δ ωn1,...,nk . for each i, from n of Xi’s n1 n2 ··· nk

Then clearly, X1,X2,... is an exchangeable sequence. 

The next theorem is a straightforward generalization, apart from more com- plicated notations, of the previous one, which we will state without proof.

Theorem 4.6. An infinite sequence of random variables {Xij : i ∈ [g], j ∈ N} , taking values in {0, 1, . . . , k}, is g-fold partially exchangeable if and only if the corresponding HMP over several k-dimensional simplexes (see Theorem 2.9) has a solution, i.e. there exists a distribution function F such that, for X T = {(θ1, . . . ,θθk): θij ≥ 0, ∀ i, j and θij ≤ 1}, j and all {ni = (ni1, ni2, . . . , nik): nij ∈ N0}, we have Z n1 nk ωn1,...,nk = θ1 ···θk dF (θ1, . . . ,θθk) , T which is equal to the probability of having the first ni1 of {Xij : j ∈ N} being 1’s, the next ni2 of {Xij : j ∈ N} being 2’s, . . . etc., and the last nik of {Xij : j ∈ N} being k’s, for all i ∈ [g].

58 4.4. Some final remarks

Remark 4.1. (Partial exchangeability and HMP over the g-dim. unit cube) The result for infinite partially exchangeable sequences over {0, 1} and HMP can be seen as a special case of the above theorem, by choosing k = 1. Then we get that “an infinite sequence of random variables {Xij : i ∈ [g], j ∈ N}, taking values in {0, 1} is g-fold partially exchangeable if, and only if, the corresponding HMP over the g-dimensional unit cube has a solution (see Theorem 2.7)”.

4.4 Some final remarks

In this Section, we will discuss several issues regarding the connections be- tween HMP and Exchangeability, that will serve as the concluding remarks for this thesis.

Remark 4.2. (Alternative proofs to de Finetti’s Representation Theorems) It is easy to see that, similar to the result of Theorems 4.1 and 4.2, the “only if” parts of the proofs of Theorems 4.3, 4.4, 4.5 and 4.6 can be used as alternative proofs to their corresponding de Finetti’s representation theorems over the dif- ferent cases (compare to results in Chapter 3). It also means that these “only if” proofs can be vastly simplified by applying the de Finetti’s representation theorems.

Remark 4.3. (Exchangeability and other Hausdorff-style moment problems) Stockbridge (2003) provided the solutions to HMP over polytopes

P = {(θ1, . . . , θk): ah1θ1 + ··· + ahkθk ≤ ah0, h = 1, . . . , g} and more general bounded regions defined by

bh1 bhk G = {(θ1, . . . , θk): ah1θ1 + ··· + ahkθk ≤ ah0, h = 1, . . . , g} . Essentially, through defining the increments and the p terms differently, we may obtain the solutions in a similar way to the several HMP discussed in Chapter 2 (see Stockbridge (2003)). For example, in the basic case of HMP over [0, b], the increments are defined by

∆µn = bµn − µn+1 which will eventually lead to n n X n1 ∆n−kµ = 1 . k b k k=0

So we may still define an exchangeable sequence X1,X2,... by n1n (k out of n X ’s are equal to 1) = ∆n−kµ . P i k b k

59 4.4. Some final remarks

And, conversely, define

n pk,n = b P(X1 = 1,...,Xk = 1,Xk+1 = 0,...,Xn = 0) . and µn = pn,n = P(X1 = 1,...,Xn = 1) then we get pn−1,n = bpn−1,n−1 − pn,n = ∆µn−1 and hence 2 pn−2,n = bpn−2,n−1 − pn−1,n = ∆ µn−2 . Therefore, by induction, we have for all k < n

n−k pk,n = bpk,n−1 − pk+1,n = ∆ µk . So, this means that we obtain no new result for HMP over [0, b] than the HMP over [0, 1]. A similar approach can be applied to HMP over polytopes and more general bounded regions defined above.

Remark 4.4. (Exchangeability and general moment problems) It is somewhat obvious that we have been dealing with HMP over bounded regions because we, in essence, are looking for distribution functions and prob- ability measures which are obviously bounded. However, in theory, one may consider general measures that not need be bounded and this could relate to more general types of moment problems that are not bounded, for example, the Stieltjes moment problem over [0, ∞) and the Hamburger moment problem over (−∞, ∞). However, these are outside of the scope of this thesis.

Remark 4.5. (Real exchangeable sequences and moment problems) In Chapter 3, we provided the results for random variables that assume values on the real line. Without loss of generality, let us consider Theorem 3.4, but reduced to the bounded region [0, 1].

An infinite sequence of random variables X1,X2,..., taking values in [0, 1], is exchangeable iff there exists a distribution function Q such that, ∀n ∈ N, n Z Y F (x1, x2, . . . , xn) = G(xi)dQ(G) , τ i=1 where Q is a probability measure over τ, the set of all distribution functions on [0, 1], and Q(G) = limn→∞ P(Gn), with Gn being the empirical distribution func- tion defined by X1,...,Xn.

This theorem implies that it is as if we have a random sample from an un- known distribution G, with Q representing our belief about “what the empirical

60 4.4. Some final remarks distribution function would look like for large n”. Now, the task of representing such a belief function Q over the set of all possible distribution functions, is by no means straightforward, since F is, in effect, an infinite dimensional param- eter. There have been various results that explored additional features of belief which can restrict τ to a “nicer” space (see, for example, Freedman (1962b) and Bernardo & Smith (1994)), which are not covered in this thesis. Nevertheless, let us try the methods we have already developed.

If now we proceed as the previous theorems in this Chapter, which means es- sentially picking some fixed values for the xi’s, then we will obtain Z F (e, e, . . . , e) = [G(e)]ndQ(G) , τ by making x1 = x2 = ··· = xn = e, for some constant e in [0, 1]. This is a more general form than the usual HMP, in fact, if we choose some “constant” G, we may deduce the HMP over the unit line. Similarly, if we choose some of the xi’s to be equal, for example, ni of them equal to ei, ei ∈ [0, 1] for all i ∈ [k], and P ni = n, then we get Z n1 n2 nk [G(e1)] [G(e2)] ··· [G(ek)] dQ(G) . τ Again, through the choices of a “constant” G, we may deduce the HMP over a triangle or k-dimensional simplex. A similar idea can be used for real partial exchangeable sequences.

The situation is no better when one tries to go the opposite direction; to de- duce an infinite exchangeable sequence in [0, 1] from HMP. It means that we will need infinitely (even uncountably) many moment sequences. Clearly, this shows that the representation theorem for the real-valued case is a much more general concept than that of HMP.

Remark 4.6. (Finite exchangeability and finite moment problems) Finite exchangeable sequences and finite moment problems are untouched ar- eas in this thesis. We have seen in Remark 3.8 that de Finetti’s Theorem may fail for a finite exchangeable sequence in {0, 1}. Also, various approximation procedures has already been produced for finite moment problems (see, for ex- ample, Talenti (1987) and Inglese (1995)). These suggest that finite exchange- ability and finite moment problems are clearly “weaker” than their correspond- ing infinite problems. However, it would be interesting to see how the two con- cepts of exchangeability and moment problems may connect or disconnect at this point. This is left, perhaps, for future research.

61 Appendix

Theorem A.1 (Helly’s Theorem) k Every sequence {Fn : n = 1, 2, 3,...} of probability distributions in R possesses a subse- quence Fn1 ,Fn2 ,... that converges to a probability distribution F .

Proof. See Helly (1912), Gnedenko (1962) or Feller (1966).

Theorem A.2 (The Helly-Bray Theorem) k If {Fn} is a sequence of probability distributions over R with Fn → F , then Z Z f(x)dFn(x) → f(x)dF (x) as n → ∞ , I I where I = [a, b] is any bounded, closed interval in Rk whose boundaries contain no dis- continuities of F and f is any continuous function over I.

Proof. See Tucker (1967).

62 Bibliography

Achyeser, N. I. & Krein, M. (1934). Ueber fouriersche reihen beschrankter¨ summier- barer funktionen und ein neues extremumproblem. Communications de la Soci´et´e Math´ematiquede Kharkoff, (4)9:3–28, (4)10:3–32.

Aldous, D. (1985). Exchangeability and Related Topics. Springer, Berlin.

Aldous, D. J. (2009). More uses of exchangeability: representations of complex random structures. Preprint.

Askey, R., Schoenberg, I. J., & Sharma, A. (1982). Hausdorff’s moment problem and expansion in Legendre polynomials. Journal of Mathematical Analysis and Applica- tions, 86, 237–245.

Atzmon, A. (1975). A moment problem for positive measures on the unit disc. Pacific Journal of Mathematics, 59(2), 317–325.

Bassetti, F. & Regazzini, E. (2008). The unsung de Finetti’s first paper about exchange- ability. Rendiconti di Matematica, Serie VII, 28, 1–17.

Bernardo, J. M. & Smith, A. F. M. (1994). Bayesian Theory. Wiley, Chichester.

Carleman, T. (1923). Sur les equations´ integrales´ singulieres` a noyau reel´ et symetrique.´ Uppsala Universitets Arsskrift˚ , 1–228.

Carleman, T. (1926). Les fonctions quasi-analytiques. Gauthier-Villars, Paris.

Chow, Y. S. & Teicher, H. (1988). Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, New York.

Cifarelli, D. M. & Regazzini, E. (1996). De Finetti’s contribution to probability and statistics. Statistical Science, 11(4), 253–282.

Cobb, E. B. & Harris, B. (1966). The characterization of the solution sets for general- ized reduced moment problems and its application to numerical integration. SIAM Review, 8, 86–99.

Dale, A. I. (1983). A probabilistic proof of Hausdorff’s theorem for double sequences. Sankhy¯a,the Indian Journal of Statistics, 45 A Pt.3, 391–394.

Dale, A. I. (1987). Two-dimensional moment problem. Math. Scientist, 12, 21–29. de Finetti, B. (1930). Funzione caratteristica di un fenomeno aleatorio. Mem. R. Acad. Linc., 4, 86–133.

63 BIBLIOGRAPHY de Finetti, B. (1937). La prevision:´ ses lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincar´e, 7, 1–68. de Finetti, B. (1938a). Resconto critico del colloquio di Ginevra intorno alla teoria delle probabilita.` Giorn. Istit. Ital. Attuari, 9, 1–40. de Finetti, B. (1938b). Sur la condition d’equivalence´ partielle. Actualit´esScientifiques et Industrielles, 739, 5–18. de Finetti, B. (1939). Compte rendu critique du colloque de Gen´ eve` sur la theorie` des probabilites.´ Actualit´esScientifiques et Industrielles, 766, 1938–1939. de Finetti, B. (1959). La probabilita` e la statistica nei rapporti con l’induzione, secondo i diversi punti di vista. Atti corso CIME su Induzione e Statistica Varenna, 1–115. de Finetti, B. (1974). Theory of Probability, volume II. New York, Wiley. de Finetti, B. (1976). La probabilita:` guardarsi dalle contraffazioni. Scientia, 111, 255–281. de Moivre, A. (1718). The Doctrine of Chances. London.

Devinatz, A. (1957). Two parameter moment problem. Duke Mathematical Journal, 24, 481–498.

Diaconis, P. (1977). Finite forms of de Finetti’s theorem on exchangeability. Synthese, 36, 271–281.

Diaconis, P. & Freedman, D. (1980a). De Finetti’s theorem for Markov chains. The Annals of Probability, 8, 115–130.

Diaconis, P. & Freedman, D. (1980b). Finite exchangeable sequences. The Annals of Probability, 8, 745–764.

Diaconis, P. & Freedman, D. (2004a). The Markov moment problem and de Finetti’s theorem: Part 1. Mathematische Zeitschrift, 247, 183–199.

Diaconis, P. & Freedman, D. (2004b). The Markov moment problem and de Finetti’s theorem: Part 2. Mathematische Zeitschrift, 247, 201–212.

Draper, D., Hodges, J. S., Mallows, C. L., & Pregibon, D. (1993). Exchangeability and data analysis. Journal of the Royal Statistical Society. Series A, 156(1), 9–37.

Dubins, L. E. & Freedman, D. (1979). Exchangeable processes need not be mixtures of independent identically distributed random variables. Z. Wahrsch. Verw. Gebiete, 48, 115–132.

Dynkin, E. B. (1953). Klassy ekvivalentnyh slucaˇ ˘inyh velicin.ˇ Uspehi Matematiˇceskih Nauk 8, 54, 125–134.

Feller, W. (1966). An Introduction to Probability Theory and Its Applications, volume II. Wiley, New York.

Freedman, D. (1962a). Mixtures of Markov processes. The Annals of Mathematical Statistics, 33, 114–118.

64 BIBLIOGRAPHY

Freedman, D. A. (1962b). Invariants under mixing which generalize de Finetti’s theo- rem. The Annals of Mathematical Statistics, 33(3), 916–923.

Fuglede, B. (1983). The multidimensional moment problem. Expositiones Mathemati- cae, 1, 47–65.

Gnedenko, B. V. (1962). The Theory of Probability. Chelsea, New York.

Gnedin, A. V. (1996). On a class of exchangeable sequences. Statist. Probab. Lett., 28, 159–164.

Gupta, J. C. (1999). The moment problem for the standard k-dimensional simplex. Sankhy¯a,the Indian Journal of Statistics, 61A Pt.2, 286–291.

Haag, J. (1924). Sur un probleme´ gen´ eral´ de probabilites´ et ses diverses applications. Proceedings of the International Congress of Mathematicians, 659–674.

Hamburger, H. (1921). Ueber die konvergenz eines mit einer potenzreihe assoziierten kettenbruchs. Mathematische Annalen, 81:235–319, 82:120–164, 168–187.

Hausdorff, F. (1923). Momentprobleme fur¨ ein endliches intervalle. Mathematische Zeitschrift, 16, 220–248.

Haviland, E. K. (1935–1936). On the momentum problem for distribution functions in more than one dimension. American Journal of Mathematics, 57:562–568, 58:164– 168.

Heath, D. & Sudderth, W. (1976). De Finetti’s theorem on exchangeable variables. The American Statistician, 30(4), 188–189.

Hedge, L. B. (1941). Moment problem for a bounded region. Bulletin of the American Mathematical Society, 47, 282–285.

Heine, H. E. (1861). Handbuch der Kugelfunctionen. G. Reimer, Berlin.

Heine, H. E. (1878). Handbuch der Kugelfunctionen, Theorie und Anwendungen, vol- ume 1. G. Reimer, Berlin.

Heine, H. E. (1881). Handbuch der Kugelfunctionen, Theorie und Anwendungen, vol- ume 2. G. Reimer, Berlin.

Helly, E. (1912). Uber lineare funktionaloperationen. Sitzungsberichte der Mathematicsch-Naturwissenschaftliche Klaase der Akademie der Wissenschaften, Vi- enna, 121, 265–297.

Helmes, K. & Stockbridge, R. H. (2003). Extension of Dale’s moment conditions with application to the Wright-Fisher model. Stochastic Models, 19(2), 255–267.

Hewitt, E. & Savage, L. J. (1955). Symmetric measures on cartesian products. Trans- actions of the American Mathematical Society, 80(2), 470–501.

Hildebrandt, T. H. & Schoenberg, I. J. (1933). On linear functional operations and the moment problem for a finite interval in one or several dimensions. Annals of Mathematics, 34, 317–328.

65 BIBLIOGRAPHY

Hincin,ˇ A. Y. (1932). Sur les classes d’ev´ enements´ equivalents.´ Mat. Sbornik, 39, 40–43.

Hincin,ˇ A. Y. (1952). O klassah ekvivalentnyh sobytiˇi. Doklady Akad. Nauk SSSR, 85, 713–714.

Inglese, G. (1995). Christoffel functions and finite moment problems. Inverse Problems, 11, 949–960.

Inverardi, P. N., Pontuale, G., Petri, A., & Tagliani, A. (2003). Hausdorff moment problem via fractional moments. Applied Mathematics and Computation, 144, 61– 74.

Jaynes, E. T. (1986). Some applications and extensions of the de Finetti’s representa- tion theorem. Bayesian Inference and Decision Techniques, 31–42. Elsevier Science Publishers B. V., Canada.

Karlin, S. & Shapley, L. (1953). Geometry of moment spaces. Mem. Amer. Math. Soc., 12, 673–677.

Kendall, D. G. (1967). On finite and infinite sequences of exchangeable events. Studia Sci. Math. Hung., 2, 319–327.

Kerns, G. J. & Szekely,´ G. J. (2006). De Finetti’s theorem for abstract finite exchange- able sequences. Journal of Theoretical Probability, 19(3), 589–608.

Khamis, S. H. (1954). On the reduced moment problem. The Annals of Mathematical Statistics, 25, 113–122.

Khintchine, A. I. (1932). Sur les classes d’ev´ enements´ equivalents.´ Mat. Sbornik, 39, 40–43.

Knill, O. (2000). On Hausdorff’s moment problem in higher dimensions. University of Arizona, Tucson (Preprint).

Kyburg, H. E. & Smokler, H. E. (1964). Studies in Subjective Probability. Wiley, New York.

Lorentz, G. G. (1953). Bernstein Polynomials. University of Toronto Press, Toronto.

Markov, A. (1896). Nouvelles applications des fractions continues. Mathematische An- nalen, 47, 579–597.

Mood, A., Graybill, F., & Boes, D. (1974). Introduction to the Theory of Statistics. McGraw-Hill, New York.

Nevanlinna, R. (1922). Asymptotische entwickelungen beschrankter¨ functionen und das Stieltjessche momentenproblem. Annales Academiae Scientiarum Fennicae, (A)18(5), 1–52.

Peng, H., Dang, X., & Wang, X. (2009). The distribution of partially exchangeable random variables. Preprint.

Petersen, L. C. (1982). On the relation between the multidimensional moment problem and the one-dimensional moment problem. Math. Scand., 51, 361–366.

66 BIBLIOGRAPHY

Ressel, P. (1985). De Finetti-type theorems: An analytical approach. The Annals of Probability, 13(3), 898–922.

Riesz, M. (1921–1923). Sur le probleme` des moments. Arkiv f¨ormatematik, astronomi och fysik, 16(12):1–23, 16(19):1–21, 17(16):1–52.

Romano, J. P. & Siegel, A. F. (1986). Counterexamples in Probability and Statistics. Wadsworth and Brooks, California.

Romera, E., Angulo, J. C., & Dehesa, J. S. (2001). The Hausdorff entropic moment problem. Journal of Mathematical Physics, 42(5), 2309–2314.

Scalas, E. & Viano, G. A. (1993). The Hausdorff moments in statistical mechanics. Journal of Mathematical Physics, 34(12), 5781–5800.

Schoenberg, I. J. (1932). On finite and infinite completely monotonic sequences. Bul- letin of the American Mathematical Society, 38, 72–76.

Shohat, J. A. & Tamarkin, J. D. (1943). The Problem of Moments. American Mathemat- ical Society, New York.

Stieltjes, T. J. (1894-1895). Recherches sur les fractions continues. Annales de la Facult´e des Sciences de Toulouse, 1(8):T1–122, 1(9):A5–47.

Stochel, J. (2001). Solving the truncated moment problem solves the full moment prob- lem. Glasgow Mathematical Journal, 43, 335–441.

Stockbridge, R. H. (2003). The problem of moments on polytopes and other bounded regions. Journal of Mathematical Analysis and Applications, 285, 356–375.

Talenti, G. (1987). Recovering a function from a finite number of moments. Inverse Problems, 3, 501–517.

Taylor, R. L., Daffer, P. Z., & Patterson, R. F. (1985). Limit Theorems for Sums of Exchangeable Random Variables. Rowman and Allanheld, New Jersey.

Tucker, H. G. (1967). A Graduate Course in Probability. Academic Press, New York.

67