<<

THE UNIVERSITY OF CHICAGO

INFINITELY EXCHANGEABLE PARTITION, TREE AND GRAPH-VALUED STOCHASTIC PROCESSES

A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF

BY HARRY CRANE

CHICAGO, ILLINOIS APRIL 2012 To my parents, Harry and Regina Crane ABSTRACT

The theory of infinitely exchangeable random partitions began with the work of Ewens [53] as a model for species sampling in population biology, known as the Ewens sam- pling formula. Kingman [66, 67] established a correspondence between infinitely ex- changeable partitions and probability measures on partitions of the unit interval, called the paintbox representation. Later, Kingman [66, 68] introduced the coales- cent, an exchangeable Markov process on the space of set partitions, in the field of . In this thesis, we build on Kingman’s theory to construct an infinitely exchange- able Markov process on the space of partitions whose sample paths differ from pre- viously studied coalescent and fragmentation type processes; we call this process the cut-and-paste process. The cut-and-paste process possesses many of the same proper- ties as its predecessors, including finite-dimensional transition probabilities that can be expressed in terms of a paintbox process, a unique equilibrium measure under general conditions, a Poissonian construction, and an associated mass-valued process . A special subfamily with parameter α > 0 and k ≥ 1 is related to the Chinese restaurant process and is reversible with respect to the two-parameter Pitman-Yor family with parameter (−α/k, α). An extension of the (α, k)-subfamily has a third parameter Σ, a symmetric square matrix with non-negative entries, called the similarity matrix. From a family of partition-valued Markov kernels, we show how to construct a Markov process on the space of N-rooted fragmentation trees through the ancestral branching procedure. If the family of kernels is infinitely exchangeable, then its associated ancestral branching process is infinitely exchangeable. In addition, the ancestral branching process based on the cut-and-paste Markov kernel possesses a unique equilibrium measure, admits a Poissonian construction and has an associated mass fragmentation-valued process almost surely. Furthermore, the results can be extended to characterize a Markov process on the space of trees with edge lengths. iii iv

Aside from the Erd¨os-R´enyi process and its variants, infinitely exchangeable graph- valued processes are uncommon in the literature. We show a construction for a family of infinitely exchangeable Poisson random hypergraphs which is induced by a con- sistent family of Poisson point processes on the power set of the natural numbers. Infinitely exchangeable families of hereditary hypergraphs and undirected graphs are induced from an infinitely exchangeable Poisson random hypergraph by projection. Finally, we consider balanced and even partition structures, which are families of distributions on partitions with a prespecified block structure. Consistency of these families can be shown under a random deletion procedure. We show Chi- nese restaurant-type constructions for a special class of these structures based on the two-parameter Pitman-Yor family, and discuss connections to randomization in experimental design. ACKNOWLEDGMENTS

First and foremost, this thesis reflects the influence and hard work of my parents, Harry and Regina, who have supported me unconditionally throughout my life. They are due my deepest and most sincere gratitude. I also thank my sister, Kayla, who has been supportive when I needed it most. At Chicago, I thank Peter McCullagh for being readily available with his time and insights, and especially for introducing me to partition and tree-valued processes, for which I have developed a strong affinity. Among the many qualities of his I hope to emulate are attention to detail and patience in research pursuits. I thank the other members of my committee for their valuable contributions: Steve Lalley for providing insightful comments about the direction of my research and about several parts of this thesis, and Mathias Drton for his advice and encouragement throughout my final year in Chicago. I also thank: Michael Wichura, whose emphasis on precision and dedication to his teaching have been an important aspect of my education, and Mei Wang, who has been generous with her time and extremely encouraging. I would also like to thank those who I have met in Chicago and have contributed to my experience here: Alan, Marcin, Joe, Walter, Lior, Winfried, Andrei and, of course, Sherman. Last, but not least, I thank Jie for making my last year here the best of all.

v TABLE OF CONTENTS

ABSTRACT ...... iii

ACKNOWLEDGMENTS ...... v

Chapter 1 INTRODUCTION ...... 2 1.1 Preliminary remarks ...... 2 1.2 Integer partitions ...... 4 1.2.1 Random integer partitions ...... 5 1.3 Projective systems ...... 6 1.4 Projective systems of partitions, trees and graphs ...... 8 1.4.1 Set partitions ...... 10 1.4.2 Fragmentation trees ...... 12 1.4.3 Graphs and permutations ...... 14 1.5 Exchangeable random partitions ...... 14 1.5.1 Distribution of block sizes ...... 16 1.6 Ewens process ...... 17 1.6.1 Pitman-Yor process ...... 18 1.6.2 Gibbs partitions ...... 18 1.6.3 Product partition models ...... 20 1.7 Mass partitions ...... 21 1.8 Paintbox process ...... 22 1.8.1 Asymptotic frequencies ...... 23 1.9 Exchangeable coalescents ...... 24 1.10 Exchangeable fragmentations ...... 26 1.10.1 Exchangeable random fragmentation trees ...... 27 1.10.2 Gibbs fragmentation trees ...... 27 1.10.3 Genealogical interpretation of a tree ...... 28 1.11 Exchangeable fragmentation-coalescence processes ...... 31 1.12 Random graphs ...... 32 1.12.1 Heavy-tailed networks ...... 32 1.12.2 Small-world networks ...... 34 1.13 Organization of thesis ...... 34

vi vii

2 A CONSISTENT MARKOV PARTITION PROCESS GENERATED BY THE PAINTBOX PROCESS ...... 36 2.1 Preliminaries ...... 36 2.2 The Cut-and-Paste process ...... 37 2.2.1 Equilibrium measure ...... 42 2.3 Continuous-time version of CP(ν)-process ...... 45 2.3.1 Poissonian construction ...... 47 2.4 Asymptotic frequencies ...... 49 2.4.1 Poissonian construction ...... 50 2.4.2 Equilibrium measure ...... 51 2.5 A two parameter subfamily ...... 52 2.6 A three-parameter extension ...... 55 2.6.1 Similarity and dissimilarity matrices ...... 55 2.6.2 The extended model ...... 57 2.7 Properties of the CP(α, k; Σ) process ...... 58 2.8 Discussion ...... 63

3 ANCESTRAL BRANCHING AND TREE-VALUED PROCESSES . . . . 64 3.1 Introduction ...... 64 3.2 Ancestral branching kernels ...... 66 3.3 Exchangeable ancestral branching Markov kernels ...... 67 3.4 Consistent ancestral branching kernels ...... 69 3.5 Cut-and-paste ancestral branching processes ...... 73 3.5.1 Construction of the cut-and-paste ancestral branching on T ...... 74 3.5.2 Equilibrium measure ...... 76 3.5.3 Continuous-time ancestral branching process ...... 77 3.5.4 Poissonian construction ...... 79 3.5.5 ...... 79 3.6 Mass fragmentations ...... 82 3.6.1 Associated mass fragmentation process ...... 83 3.6.2 Equilibrium measure ...... 85 3.6.3 Poissonian construction ...... 86 3.7 Weighted trees ...... 88 3.8 Discussion ...... 92 viii

4 INFINITE RANDOM HYPERGRAPHS ...... 93 4.1 Introduction ...... 93 4.1.1 Projective systems of hypergraphs ...... 94 4.2 Infinite Poisson random hypergraphs ...... 98 4.2.1 Construction of the infinite random hypergraph ...... 99 4.3 Induced hypergraphs, hereditary hypergraphs, and undirected graphs 102 4.3.1 Random hypergraphs ...... 103 4.3.2 Hereditary hypergraphs and monotone sets ...... 105 4.3.3 Random undirected graphs ...... 109 4.4 Discussion ...... 110

5 BALANCED AND EVEN PARTITION STRUCTURES ...... 111 5.1 Preliminaries ...... 111 5.2 Balanced partitions ...... 112 5.3 Even partitions ...... 113 5.4 Partition structures ...... 114 5.5 Balanced and even permutations ...... 115 5.6 Relating balanced and even partitions ...... 115 5.7 Chinese restaurant constructions ...... 116 5.7.1 Chinese restaurant construction for balanced partitions . . . . 116 5.7.2 Chinese restaurant construction for even partitions ...... 117 5.8 Randomization ...... 118 1

Notation

[n] {1, 2, . . . , n} N the natural numbers, [∞] := {1, 2,...} 2A the power set of A, i.e. {a : a ⊆ A} P set partitions of N (k) P set partitions of N with at most k blocks PI interval partitions of [0, 1] R↓ ranked-mass partitions R↓(k) ranked k-simplex; ranked-mass partitions with at most k positive components

Pn integer partitions of n ∈ N (k) Pn integer partitions of n with at most k parts

Pn,k integer partitions of n with exactly k parts

P[n] set partitions of [n] (k) P[n] partitions of [n] with at most k blocks

P[nj]:j partitions of [nj] with block sizes divisible by j; j-even partitions of [nj] 0 P[nj]:j partitions of [nj] with each element labeled as one of j types and each block containing an equal number of elements of each type; j-balanced partitions ↓ %s paintbox based on s ∈ R R %ν ν-mixture of s-paintboxes; %ν(·) := R↓ %s(·)ν(ds) µ(k) k-fold product measure µ ⊗ · · · ⊗ µ of µ S permutations of N; symmetric group acting on N

Sn permutations of [n]; symmetric group acting on [n] T N-rooted (fragmentation) trees T¯ weighted N-rooted trees; N-rooted trees with edge lengths (k) T k-ary N-rooted trees (k) T¯ weighted k-ary N-rooted trees; k-ary N-rooted trees with edge lengths

Tn [n]-rooted trees

T¯n weighted [n]-rooted trees; [n]-rooted trees with edge lengths (k) Tn k-ary [n]-rooted trees (k) T¯n weighted k-ary [n]-rooted trees; k-ary [n]-rooted trees with edge lengths G N-labeled undirected graphs

Gn undirected graphs with vertices labeled in [n] CHAPTER 1 INTRODUCTION

In this volume, we discuss infinitely exchangeable probability models for random partitions, trees and graphs. Our development builds upon theory that has been refined through decades of research in the fields of probability, statistics and various scientific disciplines. We begin this chapter by discussing the motivation for the work contained in subsequent chapters. Some commonly used notation is found on page 1.

1.1 Preliminary remarks

Exchangeable probability models have been studied in relation to the theory and application of statistical models [9, 10, 12, 76], in which discrete collections of units are labeled by some countable index set, without loss of generality the natural numbers N := {1, 2,...}. Meanwhile, the theory of partitions has been studied in direct connection to cumulants [73], population genetics [53, 69], experimental design and association schemes [15] and [93]. Our study of partition models is motivated by their potential use in mathematical biology and population genetics, as demonstrated by the widespread use of coalescent theory in these areas. Specifically, the concept of chapter 2 preceded the rest of this thesis and was borne of the following modeling consideration. For a sample of units (individuals) from a population, suppose data takes the form of a sequence of RNA/DNA along a contiguous region of the chromosome, as in table 1.1. By disregarding the nucleotides and considering only the equivalence classes of the species at each site, we obtain a sequence of partitions of the species. For example, the partition at site 1 in table 1.1 is

{snake, iguana, lizard, crocodile, bird, whale, monkey}, {cow, human}; 2 3

site 1 2 3 ... snake TAGGATTAGATACCC iguana TAGGATTAGATACCC lizard TAGGATTAGATACCC crocodile TAGGATTAGATACCC bird TGGGATTAGATACCC whale TGGGATTAGATACCC cow AAGCATC-TACACCC human AACCCCCGCCCATCC monkey TGGGATTAGATACCC

Table 1.1: Mitochondrial DNA (mtDNA) data for nine species obtained from http://www.bch.umontreal.ca/ogmp/projects/other/mt list.html in January 2012. Note that the entry of cow at site 8 shows as ‘-’, which does not correspond to miss- ing data, but is a result of sequence alignment and other biological and evolutionary processes which are far beyond the scope of this thesis. the partition at site 2 is

{snake, iguana, lizard, crocodile, cow, human}, {bird, whale, monkey}; and so on. One area where data of this sort arises is in phylogenetic inference, in which DNA sequence data is used to infer phylogenetic relationships among individuals or species. The scientific consensus for the shape of the phylogenetic tree relating the nine species in table 1.1 is shown in figure 1.1.

snake lizard iguana

crocodile bird

whale cow human monkey

Figure 1.1: Consensus phylogenetic tree (Obtained from Theobald, Douglas L.“29+ Evidences for Macroevolution” TalkOrigins. Department of Biochemistry, Brandeis University.)

Felsenstein [55] recounts much of the work in this area in recent decades, which 4

includes statistical and non-statistical methods. There is also recent work by Holmes [28, 63, 64] on the estimation of unknown phlogenetic trees. In recent years, tech- niques from algebraic geometry have been developed for use in computational biology, particularly phylogenetic inference; Pachter and Sturmfels [88] provide a recent survey of some of this work. In chapter 2 we endeavor to develop a theoretical framework within which to model dependent partition sequences, e.g. the partition sequence determined by DNA data in table 1.1. To this end, the work in chapter 2 is necessary, as previously studied partition-valued processes, e.g. coalescent and fragmentation processes, are not suited to modeling the sequences which arise from DNA data of the sort we discuss here. The extended model in section 2.6 seems to have potential for this application; however, we do not address any application here. The remainder of this thesis deals primarily with theory relating to random parti- tions, trees and graphs. We now give a brief overview of the relevant literature in this field and introduce notation and terminology critical to the rest of the thesis. Though inessential to the rest of this thesis, it is common to introduce the theory of random set partitions by first discussing random integer partitions, which we do in section 1.2; sections 1.3 and 1.4 are essential to all subsequent chapters; sections 1.5-1.8 are central to chapters 2 and 3, and section 1.10 is relevant to chapter 3; sections 1.9-1.12 contain mostly background material.

1.2 Integer partitions

An integer partition n := (n1, . . . , nm) ≡ n1 + ··· + nm of n ∈ N is a list of parts Pm such that i=1 ni = n. We write #n to denote the number of parts of n, which is m in our description. Note that the order in which the parts are listed is irrelevant,

though it is conventional to list parts in decreasing order. We write Pn to denote the space of integer partitions of n. Alternatively, a partition of n can be written as a list of multiplicities λ = λ λ λn (λ1, . . . , λn), sometimes also denoted 1 12 2 ··· n , where λj is the number of parts Pn Pn of size j such that j=1 jλj = n. The number of parts of λ is the sum λ. := j=1 λj. 5

These two ways for writing integer partitions are equivalent. For example, an integer partition of 8 with four parts of size 3, 2, 2, 1 can be denoted by

• a list of parts: (3, 2, 2, 1) ≡ 3 + 2 + 2 + 1 ≡ 2 + 1 + 3 + 2, or

• a list of multiplicities: (1, 2, 1) ≡ 112231.

Andrews [13] provides a thorough account theory of integer partitions in the fields of combinatorics and number theory.

1.2.1 Random integer partitions

The study of random partitions dates back to the work of Ewens [53] who intro- duced the Ewens sampling formula (ESF) as a one-parameter distribution on integer partitions. For n ∈ N, the Ewens sampling formula with parameter α > 0, ESF(α), on Pn is

n λ n! Y α j pn(λ; α) = , λ ∈ Pn, (1.1) α↑n λj j=1 j λj!

where α↑n := α(α + 1) ··· (α + n − 1) = Γ(n + α)/Γ(α) is the ascending factorial and

Z ∞ Γ(α) := xα−1e−xdx 0 is the gamma function. The Ewens sampling formula was derived from the study of sampling theory for selectively neutral alleles at a single locus in population genetics. In this setting, n ≥ 1 is the size of a sample of individuals and λ ∈ Pn is the integer partition associated with this sample where λ. represents the number of different alleles which appear at one locus in the sample, and the parts of λ represent the number of individuals who share a particular allele. Kingman [66, 67] introduced the notion of a partition structure as a sequence

(P1,P2,...) of distributions on (P1, P2,...), integer partitions of n for each n ≥ 1, which is sampling consistent under uniform deletion of one of the elements. In other 6

words, for πn ∼ Pn, the partition πn−1 ∈ Pn−1 obtained by choosing a part p ∈ πn with probability proportional to its size and reducing it by 1 has distribution Pn−1. Gnedin and Pitman [58, 60, 61] have studied general properties of regenerative and self-similar partition structures under some alternative methods of sampling. Kingman [69] has shown a one-to-one correspondence between partition structures and exchangeable random partitions of the set N and, consequently, that the theory of partition structures is more naturally developed in the context of random set partitions of N, which can be treated as a collection of probability distributions on a projective system, which we now discuss.

1.3 Projective systems

A projective system indexed by N associates with each finite set [n] := {1, . . . , n} a set Qn and with each one-to-one injective map ϕ :[m] → [n], m ≤ n, a projection ∗ ϕ : Qn → Qm which maps Qn into Qm such that

∗ • if ϕ is the identity [n] → [n] then ϕ is the identity Qn → Qn, and

∗ • if ψ :[l] → [m], l ≤ m, and ψ : Qm → Ql is its associated projection, the ∗ ∗ ∗ composition (ϕψ):[l] → [n] satisfies (ϕψ) ≡ ψ ϕ : Qn → Ql.

[n] Qn

ϕ ϕ∗ (Q,∗ ) ∗ ∗ ϕψ [m] Qm ψ ϕ

ψ ψ∗

[l] Ql

In terms of category theory, a projective system is a contravariant functor (Q,∗ ) from the category of finite sets and morphisms between the sets and a category of sets

(Qn, n ≥ 1) and morphisms in the opposite direction between these sets, as shown in the above diagram; see [14, 72]. 7

The construction of a family of probability distributions on a projective system allows one to uniquely characterize a probability measure on the projective limit space defined by Q := (Qn, n ≥ 1) by Kolmogorov’s extension theorem.

Theorem 1.3.1. (Kolmogorov’s extension theorem) ([87, 99]) For each n ≥ 1, let (Ωn, Fn, µn) be a projective system of measure spaces such that for every injection ∗ ϕm,n :[m] → [n], m ≤ n, with associated measurable projection ϕm,n :Ωn → Ωm,

∗−1 µm(Bm) = µn(ϕm,n(Bm)) (1.2) holds for all Bm ∈ Fm. Then (µn, n ≥ 1) can be uniquely extended to a measure µ on (Ω, F) the projective limit measurable space of (Ωn, Fn).

A collection of measures (µn, n ≥ 1) which satisfies (1.2) with respect to some ∗ collection of measurable functions (ϕm,n, 1 ≤ m ≤ n) is called self-consistent with respect to sub-sampling. The original statement of Kolmogorov’s theorem, as it appears e.g. in [87], ex- tended a measure on a finite product space to the infinite product space. Bochner [30] showed a generalization of Kolmogorov’s theorem to the projective limit measur- able space, which we have rewritten above and is relevant to our study of partitions, trees and graphs labeled by N. In our study of processes of random partitions, trees and graphs, we are interested in constructing families of self-consistent probability measures which are finitely ex- changeable.

Definition 1.3.2. A collection of measures (µn, n ≥ 1) on a family of measurable spaces (Ωn, Fn) is finitely exchangeable if for each n ≥ 1, Bn ∈ Fn and measurable bijective map ϕ :Ωn → Ωn

−1 µn(Bn) = µn(ϕ (Bn)). (1.3)

For our treatment, an injective map ϕ :Ωn → Ωn corresponds to an element of the symmetric group Sn of permutations acting on [n] and finite exchangeability 8

corresponds to invariance under relabeling of elements. A family of measures (µn, n ≥ 1) which is self-consistent and finitely exchangeable is said to be infinitely exchangeable and its unique extension to (Ω, F) is invariant under measurable injective mappings Ω → Ω. Infinitely exchangeable collections of measures have a natural interpretation in terms of statistical models in which statistical units are labeled by N and a finite sample of size n ∈ N is taken from an infinite population. Finite exchangeability (1.3) guarantees that inference based on a sample of size n is unaffected by arbitrary labeling of the units, while self-consistency (1.2), also referred to as consistency under subsampling, guarantees that inference based on the finite sample is unaffected by unsampled statistical units [75]. In particular, a statistic based on a sample of size m taken from the population has the same distribution as a statistic based on a subsample of size m ≤ n taken from a sample of size n.

The collection (Pn, n ≥ 1) of integer partitions is not a projective system, and so theorem 1.3.1 cannot be applied to study integer partitions; however, the collections

(P[n], n ≥ 1) of finite set partitions of [n], (Tn, n ≥ 1) of [n]-rooted fragmentation trees and (Gn, n ≥ 1) of [n]-labeled graphs are projective systems under appropriate projection operations, which we now discuss.

1.4 Projective systems of partitions, trees and graphs

[n]2 2 If Qn := 2 is the set of subsets of [n] , i.e. the space of directed graphs with n vertices, one can define the projection Qn → Qm either by restriction or by delete- and-repair. Each A ∈ Qn can be represented as an n × n matrix with entries in {0, 1}

such that Aij = 1 if (i, j) ∈ A and Aij = 0 otherwise. For each n ≥ 1, let ϕn,n+1 be the operation on Qn+1 which restricts A to the

complement of {n+1}. In matrix form, ϕn,n+1A =: A|[n] is the n×n matrix obtained from A by removing the last row and last column of A and keeping the rest of the

entries unchanged. It is clear that the compositions ϕm,n := ϕm,m+1 ◦ · · · ◦ ϕn−1,n for m ≤ n are well-defined as the restriction of A ∈ Qn to [m] by removing the last 9 n − m rows and columns of A. We call the maps (ϕm,n, 1 ≤ m ≤ n) the restriction, or deletion, maps on (Qn, n ≥ 1). A permutation σ ∈ Sn acts on each element A ∈ Qn componentwise in the usual way. That is, for each i, j ∈ [n], σ(A)(i, j) := Aσ(i, j) := A(σ(i), σ(j)). The restriction maps (ϕm,n, m ≤ n) together with permutation maps (σ ∈ Sn, n ≥ 1) ϕ and their compositions make Q := (Qn, n ≥ 1) a projective system, written Q ≡ (Qn, ϕm,n). Another way to specify a projective system on (Qn, n ≥ 1) is by delete-and-repair. For n ≥ m ≥ 1, let ψm act on A ∈ Qn by removing the mth row and column of A and directing an edge from each i in {j ∈ [n]:(j, m) ∈ A} to each k in

{j ∈ [n]:(m, j) ∈ A}. In other words, ψmA is obtained by deleting the vertex labeled m from A and connecting two vertices i and k by a directed edge from i to k if both (i, m) and (m, k) are elements of A, i.e. there is a directed path i → m → k in A.

For m ≤ n, define ψm,n := ψm+1 ◦ · · · ◦ ψn. Plainly, ψm,n is well-defined since for each n ≥ 2, ψn−2,n ≡ ψn−1 ◦ ψn = ψn ◦ ψn−1 and ψl,n = ψl,m ◦ ψm,n. The delete- and-repair maps (ψm,n, m ≤ n) together with permutation maps (σ ∈ Sn, n ≥ 1) and ψ compositions also make (Qn, n ≥ 1) a projective system, written Q ≡ (Qn, ψm,n). Note that the two projective systems (Qn, ϕm,n) and (Qn, ψm,n) have the same objects but are different projective systems; the arrows are different.

Example 1.4.1. A permutation σ ∈ Sn is a one-to-one and onto map [n] → [n], 2 and we regard σ as a subset of [n] with (i, j) ∈ σ if σ(i) = j. For σ ∈ Sn+1, 0 delete-and-repair acts on σ by putting σ := ψn,n+1σ which satisfies

( σ(n + 1), i = σ−1(n + 1) σ0(i) = σ(i), otherwise.

(Sn, n ≥ 1) together with delete-and-repair maps (ψm,n, m ≤ n) and permutation maps is a projective system.

The collections (P[n], n ≥ 1), (Gn, n ≥ 1) and (Sn, n ≥ 1) of partitions, graphs and permutations respectively can each be described as a special case of one or more 10

of the above formulations. A different specification, however, is required when we

discuss the collection (Tn, n ≥ 1) of fragmentation trees later.

1.4.1 Set partitions

For any subset A ⊂ N, a (set) partition π of A is a collection of disjoint non-empty S subsets of A, called blocks, written {π1,...}, such that i πi = A. In general, we write PA to denote the collection of partitions of A. Specifically, the space of partitions of

[n] is denoted by P[n]. For each n ≥ 1, a set partition π ∈ P[n] can be regarded as (i) a collection of disjoint subsets; e.g. {{1, 2, 5}, {3, 6}, {4}} ≡ 125|36|4 ≡ 36|135|4;

(ii) an equivalence relation π :[n] × [n] → {0, 1} such that π(i, j) = 1 if and only if

i ∼π j, i.e. i and j are in the same block of π;

(iii) a symmetric Boolean matrix with entries given by the equivalence relation in (ii); e.g. 125|36|4 corresponds to

  1 1 0 0 1 0    1 1 0 0 1 0       0 0 1 0 0 1    .    0 0 0 1 0 0     1 1 0 0 1 0    0 0 1 0 0 1

Note that the order in which blocks are listed in (i) is irrelevant, though it is conven-

tional to list blocks in order of their least element. In general, we write π := (π1,...), as opposed to {π1,...}, when we wish to emphasize the order in which blocks are listed. Description (i) is often best for visualization of partitions, (ii) is useful for theoretical treatment of partitions and (iii) is sometimes convenient for computations involving partitions. As the above three descriptions are equivalent, we need not spec- ify which we are using; however, by using description (iii), we can discuss a partition of [n] in the context of a subset of [n]2 as we have at the beginning of section 1.4. 11

For any set partition π, we write #π to denote the number of blocks of π; for example, for π = {{1, 2, 5}, {3, 6}, {4}} we have #π = 3. We write b ∈ π to denote a (k) block b of π and #b denotes the number of elements in b. The notation P[n] denotes the subspace of P[n] which consists of partitions of [n] with at most k ≥ 1 blocks, i.e. (k) P[n] := {π ∈ P[n] :#π ≤ k}. For B ⊂ A ⊂ N, let π := {π1,...} be a set partition of A. We write π|B := {πi ∩B : πi ∈ π}\{∅} to denote the restriction of π to B, excluding the empty set.For each n ≥ 1, we define the restriction map Dm,n : P[n] → P[m] in the obvious way by Dn,n+1π := π|[m] ≡ {b ∩ [m]: b ∈ π}\{∅} for every π ∈ P[n]. It is clear that, for m ≤ n, Dm,n corresponds to the composition Dm,m+1 ◦ · · · ◦ Dn−1,n.

The symmetric group Sn acts on a set partition π ∈ P[n] by relabeling. We write σ π := σ(π) ≡ σπ ≡ σ ◦ π to be the partition of [n] defined by i ∼σπ j if and only if −1 −1 σ (i) ∼π σ (j). In other words, for π regarded as a function [n] × [n] → {0, 1}, πσ(i, j) = π(σ(i), σ(j)) for each i, j ∈ [n].

The spaces P[n] for n = 1, 2, 3, 4 are

P[1] : 1 P[2] : 12 1|2 P[3] : 123 1|23 12|3 13|2 1|2|3 P[4] : 1234 1|234[4] 12|34[3] 1|2|34[6] 1|2|3|4.

For π ∈ P[n], we write π[m] to denote the coset of π under action of the symmetric group, i.e. π[m] := {σπ : σ ∈ Sn}; m denotes the cardinality of this coset. In particular, 1|234[4] := {1|234, 134|2, 124|3, 123|4}.

The cardinality of P[n] for n = 1, 2, 3, 4 is 1, 2, 5, 15 respectively. In general, the cardinality of P[n] is a special case of the Bell numbers (section 1.6.2). For each n ≥ 1, the set (P[n], ≤) is a partially ordered lattice with binary relation 0 0 ≤ called sub-partition. For B,B ∈ P[n], we write B ≤ B if, for every b ∈ B, 0 0 0 0 there is a b ∈ B such that b ⊆ b . For B,B ∈ P[n], we define the infimum, 0 0 0 0 B ∧ B , by B ∧ B := sup{π ∈ P[n] : π ≤ B & π ≤ B }. In words, B ∧ B is the 12

0 largest partition π ∈ P[n] such that π ≤ B and π ≤ B . Conversely, the supremum, 0 0 B ∨ B := inf{π ∈ P[n] : B ≤ π & B ≤ π }, is the smallest partition π ∈ P[n] such 0 that B ≤ π and B ≤ π. We illustrate the lattice P[3] in the Hasse diagram below.

123

1|23 12|3 13|2

123

By the transitivity property of set partitions (description (ii)), the operations restriction and delete-and-repair are equivalent on the space of partitions. Hence,   the collection P[n], n ≥ 1 of finite set partitions together with restriction maps (Dm,n, m ≤ n), permutation maps (σ ∈ Sn, n ≥ 1) and their admissible compositions

constitute a projective system (P[n],Dm,n), whose limit we denote by P, the space of partitions of N.

1.4.2 Fragmentation trees

A For any subset A ⊂ N, a collection of non-empty subsets T ⊂ 2 , the power set of A, is a rooted tree if

(i) A ∈ T , called the root of T and denoted root(T ) = A, and

(ii) B,C ∈ T implies B ∩ C ∈ {∅,B,C}. That is, either B and C are disjoint or one is a subset of the other.

If T contains all singleton subsets of A, T is a rooted fragmentation tree. Throughout the rest of this thesis, the word tree and fragmentation are both understood to mean

rooted fragmentation tree. We write TA to denote the space of fragmentations of A and, specifically, Tn to denote the space of fragmentations of [n], or [n]-rooted fragmentation trees.

An element TA of TA can be regarded as either 13

(a) a collection of subsets of A satisfying (i) and (ii) above. For example, TA := {12345, 12, 345, 45, 1, 2, 3, 4, 5}; or

(b) the tree with a distinguished vertex, labeled root, and k other vertices labeled

by subsets of TA, {A1,...,Ak}, such that there is a directed edge root 7→ A and a directed edge Ai 7→ Aj if and only if Aj is a child of Ai (see section 1.10.3).

For example, the collection TA in (a) can be represented by the tree below.

root 12345 345 12 45 1 2 3 4 5

As these descriptions are equivalent, we need not specify which we intend; however, specification (a) is often the more useful of the two. (k) For k ≥ 2, write T ⊂ T to denote the space of fragmentations of N such that (k) (k) each parent of t ∈ T has at most k children. For n ≥ 1, we write Tn and Tn for the restriction to [n] of T and T (k) respectively.

For any subset S ⊂ A, the restriction of T ∈ TA to B is defined by T|B := {B ∩ t : t ∈ T } (excluding the empty set), the reduced sub-tree of Aldous [2]. We abuse notation slightly to write Dn,n+1 : Tn+1 → Tn to denote the operation Dn,n+1T :=

T|[n] on trees. Note that the apparent overloading of Dn,n+1 as a function on both P[n+1] and Tn+1 should cause no confusion as it is fundamentally defined, in both cases, as a function on collections of subsets of N. For n ≥ 1 and σ ∈ Sn, σ acts on each T ∈ Tn componentwise in the usual way. σ σ σ That is, for T := {Ai : i ≥ 1}, σT := T := {Ai : i ≥ 1} where Ai := {σ(i): i ∈ A}. The collection (Tn, n ≥ 1) of [n]-rooted trees together with restriction maps (Dm,n, 1 ≤ m ≤ n), permutation maps (σ ∈ Sn, n ≥ 1) and their compositions defines a projective system. We write T := (Tn, n ≥ 1) to denote the projective limit space of N-rooted fragmentation trees. 14

As in the description of partitions of N, any fragmentation T ∈ T can be expressed

as a compatible sequence (T|[n], n ≥ 1) of reduced subtrees, and we often write T := (T|[n], n ≥ 1).

1.4.3 Graphs and permutations

An undirected graph is a collection A of subsets of [n]2 such that (i, j) ∈ A implies (j, i) ∈ A. That is, the relationships in G are symmetric and it is often convenient to

write i ∼G j to denote that i and j are neighbors or adjacent in G. A permutation σ of [n] is a one-to-one and onto function [n] → [n] which can be

described as a directed graph in the sense of example 1.4.1 with σij = 1 if and only if σ(i) = j.

The collection (Gn, n ≥ 1) is a projective system under both restriction and delete- and-repair, but in this thesis we only study the projective system (Gn, n ≥ 1) char- acterized by restriction maps (ϕm,n, m ≤ n) in chapter 4. On the other hand, the collection (Sn, n ≥ 1) of permutations is a projective system under delete-and-repair maps (ψm,n, m ≤ n). We write G and S to denote the projective limit spaces characterized by (Gn, ϕm,n) and (Sn, ψm,n) respectively.

1.5 Exchangeable random partitions

Let (Sn, ψm,n) and (P[n], ϕm,n) be the projective systems of permutations and par- titions of [n] respectively. There are natural maps π : Sn → P[n] and ν : P[n] → Pn where

k (i) for σ ∈ Sn, i ∼π(σ) j if and only if there exists k ≥ 1 such that σ (i) := σ ◦ · · · ◦ σ(i) = j; that is, π(σ) is the set partition whose blocks correspond to | {z } k times the cycles of σ; and

(ii) for π ∈ P[n], ν(π) := (λ1, . . . , λn) where λj := #{b ∈ π :#b = j}; that is, ν(π) is the integer partition whose parts correspond to the block sizes of π. 15

The interplay between mappings ν, π, ψm,n, and ϕm,n is summarized in (1.4) such that the arrows commute.

π ν Sn P[n] Pn

ψm,n ϕm,n

Sm P Pm π [m] ν (1.4)

n! To each integer partition λ ∈ Pn, there are partitions of [n] in the Qn λj j=1(j!) λj! −1 inverse image ν (λ) := {π ∈ P[n] : ν(π) = λ}, partitions of [n] whose block sizes correspond to the parts of λ. Hence, a straightforward way to obtain a random set

partition of [n] given a distribution Pn on Pn is to first sample λ ∼ Pn and then, given λ, sample uniformly from the collection ν−1(λ) of set partitions which correspond to ∗ λ. The resulting distribution Pn on P[n] is

n Pn(ν(π)) Y P ∗(π) = (j!)λj λ ! (1.5) n n! j j=1

where (λj, 1 ≤ j ≤ n) are the parts of ν(π).

A random partition Π ∈ P[n] for n ≥ 1 is called exchangeable if its distribution is invariant under the natural action of Sn on P[n]. Equivalently, a random partition Π is exchangeable if, for each partition {π1, . . . , πk} of [n], its distribution can be expressed as

P(Π = {π1, . . . , πk}) = p(#π1,..., #πk) for some symmetric function p of integer partitions (n1, . . . , nk) of n. This function is called the exchangeable partition probability function (EPPF) [93]. Note that any

EPPF can be formulated as in (1.5) for an appropriate choice of distribution Pn on Pn. 16 1.5.1 Distribution of block sizes

Let Π be a random partition of [n] whose distribution is determined by EPPF p and consider N↓ = (N↓,...,N↓ ) the partition of n induced by Π, i.e. the decreasing 1 Kn sequence of block sizes of Π. Then the distribution of N↓ is obtained by inverting (1.5) to obtain

n! P(N↓ = (n , . . . , n )) = p(n , . . . , n ), (1.6) 1 k Qn λ 1 k i=1(i!) iλi!

where k X λ := i I{nl=i} l=1 is the number of components of size i in the partition of n. We call (1.6) the distri- bution induced on block sizes in decreasing order. ˜ ˜ ˜ Alternatively, we could consider the distribution of N := (N1,..., NKn) of block sizes of Π in order of appearance, or size-biased order, which is obtained by ordering ˜ the blocks {π1, . . . , πk} of Π in order of their least element and putting Ni = #πi for each i = 1, . . . , k. The distribution of N˜ is obtained by multiplying (1.6) by the factor Qk i=1 λi!ni nk(nk + nk−1) ··· (nk + ··· + n1) which yields

n! (N˜ = (n , . . . , n )) = p(n , . . . , n ). P 1 k Qk 1 k nk(nk + nk−1) ··· (nk + ··· + n1) i=1(ni − 1)! (1.7) Finally, we can consider the block sizes of Π in exchangeable random order as

follows. Conditional on Π = {π1, . . . , πk}, generate a uniform permutation σ of [k] ex ex and put N = (πσ(1), . . . , πσ(k)). The distribution of N is

  ex n 1 P(N = (n1, . . . , nk)) = p(n1, . . . , nk). (1.8) n1, . . . , nk k! 17 1.6 Ewens process

For α > 0, the Ewens distribution on P[n] induced by ESF(α) (1.1) through (1.5) is

α#π Y pn(π; α) = Γ(#b). (1.9) α↑n b∈π

As each of the finite-dimensional distributions in the Ewens family depends on π ∈

P[n] through ν(π), equation (1.9) is an exchangeable partition probability function for each n ≥ 1 and the collection of Ewens distributions on P[n] are finitely exchangeable. The collection (pn(·; α), n ≥ 1) on the projective system (P[n], n ≥ 1) under restriction is self-consistent in the sense of (1.2). This is most readily seen by a

sequential construction of pn(·; α) through the Chinese restaurant process (CRP).

Chinese restaurant process In a Chinese restaurant process with parameter α > 0, CRP(α), we assume customers are labeled 1, 2,... and arrive sequentially in a Chinese restaurant, with infinite seating capacity, as follows. (The tables of the restaurant correspond to blocks of a partition.)

(i) The first customer sits at his own table;

(ii) after n customers have been seated in configuration π ∈ P[n], the (n + 1)st labeled customer enters the restaurant and randomly chooses a table (block) at which to sit according to the following seating rule

( #b/(n + α), b ∈ π pr(n + 1 7→ b|π) = α/(n + α), b = ∅.

In other words, a new customer sits at each occupied table with probability proportional to the number of customers already seated at that table and sits at a new (unoccupied) table with probability proportional to α.

It is clear by this construction that the finite-dimensional distributions in (1.9) are self-consistent and characterize an infinitely exchangeable measure on  S  P, σ n P[n] , which is called the Ewens(α) process on P. 18 1.6.1 Pitman-Yor process

Pitman [90] and [94] (with Yor) studies a two-parameter extension of the Ewens(α) process, which we call the Pitman-Yor(α, θ) process, or simply the (α, θ)-model. For (α, θ) satisfying either

• α = −κ < 0 and θ = mκ for some m = 1, 2,..., or

• 0 ≤ α ≤ 1 and θ > −α, the modified Chinese restaurant seating rule

( (#b − α)/(n + θ), b ∈ π pr(n + 1 7→ b|π) = (α#π + θ)/(n + θ), b = ∅

generates finite-dimensional distributions

↑#π (θ/α) Y ↑#b pn(π; α, θ) = −(−α) (1.10) θ↑n b∈π

  on P[n] which satisfy (1.2) and (1.3) on P[n], n ≥ 1 under restriction. Note that when α = 0 (1.10) coincides with (1.9) with parameter θ > 0. The (α, θ)-model has several nice properties and appears in a variety of contexts in the theory of random partitions, as we see in chapter 2. Pitman [91, 93] has established several properties of the above process, including another construction via a residual allocation model, or stick breaking scheme, and also its relation to the Poisson-Dirichlet distribution and stable subordinators.

1.6.2 Gibbs partitions

Composite structures and the Bell numbers The enumeration of integer parti- tions has been studied widely in the field of combinatorics, and a more general notion

of an integer partition is that of a composite structure [21]. Let v• := (v1, v2,...) and w• := (w1, w2,...) be sequences of non-negative integers and let V and W be 19 two species of combinatorial structures so that for each finite set F with cardinal- ity #F = n, the collection of V -structures (respectively W -structures) of F is a set

V (F ) (respectively W (F )) having vn (respectively wn) elements. For a finite set F , the V ◦ W composite structure of F , written (V ◦ W )(F ), is the set of all ways to partition F into blocks {F1,...,Fk}, assign the collection of blocks a V -structure, and assign each block a W -structure. For a set F with #F = n, the cardinality of (V ◦W )(F ) is given by the expression

n X #(V ◦ W )(F ) := Bn(v•, w•) := vkBn,k(w•), (1.11) k=1 where k X Y B (w ) := w , (1.12) n,k • #Bi {B1,...,Bk} i=1 and the sum is taken over all partitions {B1,...,Bk} of F into k non-empty blocks. Bn(v•, w•) is called the nth Bell polynomial.

Example 1.6.1. Let v• ≡ w• ≡ 1. Then Bn(v•, w•) = # P[n] corresponds to the number of partitions of [n] for each n ≥ 1; these are called the Bell numbers.

Gibbs partitions The Ewens process, and more generally the (α, θ)-model, appear in a wide range of contexts and is a special case of several classes of random partition models. One such class is the class of Gibbs partitions, which are defined on P[n] by finite-dimensional distributions

v Qk w k i=1 #πi P(Πn = {π1, . . . , πk}) = (1.13) Bn(v, w) where v := (v1,...) and w := (w1,...) are sequences of non-negative real numbers and Bn(v, w) is the Bell polynomial from (1.11). A random partition of [n] distributed according to (1.13) for suitable sequences v and w is said to have a Gibbs[n](v, w) distribution. 20

Gibbs partitions have been studied in some detail in the literature. The sequences v and w give a Gibbs partition a natural physical interpretation for partitions for which each block of size j can be in any of wj internal states and a partition with k

blocks can be in any of vk internal states and each configuration is equally likely. A

Gibbs[n](v, w) partition with k blocks of size (λ1, . . . , λk) is obtained by ignoring the state of the associated V ◦ W composite structure and considering only the partition of [n] which it induces. See chapter 1.5 of [93] and [20, 59] for an overview of Gibbs partitions. McCullagh, Pitman and Winkel [80] study Gibbs fragmentation processes, which we discuss in more detail in section 1.10 and chapter 3.

Example 1.6.2. The Ewens(α) distribution on P[n] is a Gibbs[n](v, w) distribution with v = (αn, n ≥ 1) and w = ((n − 1)!, n ≥ 1).

1.6.3 Product partition models

Hartigan [62] studies a family of random partition models called product partition models which assign probability

Y Pn(Π = π) ∝ c(b) (1.14) b∈π

[n] + to each π ∈ P[n], where c : 2 → R is a (non-negative) cohesion function associated with subsets of [n].

Example 1.6.3. The Ewens(α) distribution is a product partition distribution with c(b) := αΓ(#b) for each b ⊂ N.

Note the relationship among exchangeable, Gibbs and product partition models, as these classes overlap, but none is a subset of another. By the form of a Gibbs partition in (1.13), it is clear that a Gibbs[n](v, w) partition is finitely exchangeable as it admits the form of a finite EPPF; however, a collection (Pn, n ≥ 1) where Pn is a Gibbs[n](v, w) distribution for suitable sequences v and w for each n ≥ 1 is not, in general, consistent. Furthermore, a product partition distribution need not be either finitely exchangeable or consistent. 21 1.7 Mass partitions

There is a relationship between exchangeable random partitions of N and partitions of unit mass, or mass partitions, originally introduced by Kingman [67] with subsequent developments summarized by Bertoin [25, 27] and Pitman [91, 93]. We exploit this relationship in our development of partition-valued and tree-valued Markov processes in chapters 2 and 3.

Definition 1.7.1. A (ranked)-mass partition s := (s1, s2,...) is a sequence of non- negative real numbers such that

• s1 ≥ s2 ≥ · · · ≥ 0 and P∞ • i=1 si ≤ 1. P A mass partition for which i si = 1 is called proper; otherwise, it is improper. The P residual mass s0 := 1 − i si is called dust. ↓ P Let R := {(s1, s2,...): s1 ≥ s2 ≥ 0, i si ≤ 1} denote the space of ranked- ↓(k) ↓ P mass partitions and R := {s ∈ R : sj = 0 ∀j > k, i si = 1} the space of proper mass partitions with at most k positive components, commonly referred to as the ranked k-simplex. Bertoin [25] shows that the space R↓ arises as a subclass ↓ of the space S := {(x1,...): x1 ≥ x2 ≥ · · · ≥ 0} of decreasing sequences of ↓ non-negative numbers, which is the projective limit of the spaces S := {(x1,...): x1 ≥ x2 ≥ · · · ≥  ≥ 0} which are obtained by applying the threshold operator ↓ ϕ : [0, 1] → [0, 1], ϕ(x) = xI{x>}, to each component of x ∈ S . An interval partition of (0, 1) is the collection of interval components of an ar- bitrary open set θ ∈ (0, 1). We denote the space of interval partitions of (0, 1) by ↓ PI. Each θ ∈ PI determines a mass partition s ∈ R in an obvious way: the ith component si of s is the length of the ith largest interval of θ. Conversely, given a mass partition s ∈ R↓, an interval partition can be obtained in infinitely many ways; however, for our purposes, we need only choose an arbitrary interval parti-

tion which corresponds to s = (s1, s2,...). We choose this to be (0, 1)\θs where Pk θs := { i=1 si, k ≥ 1}. In what follows, we abuse notation and write θs to refer to Pk the interval partition (0, 1)\θs := (0, 1)\{ i=1 si, k ≥ 1}. 22 1.8 Paintbox process

Kingman [67] developed the notion of the paintbox process which establishes a bijec- ↓ tion between measures on R and exchangeable partitions of N. For any s ∈ R↓, the paintbox process based on s provides a construction of a random partition of N as follows. Let θs be the interval partition corresponding to s defined above. Generate an i.i.d. sequence U := (U1,U2,...) of random variables uniformly distributed on (0, 1). Given U := (u1, u2,...), define partition π(U) by the equivalence relation

i ∼π(U) j ⇔ Ui and Uj are in the same sub-interval component of θs.

Equivalently, a random partition distributed as π(U) can be generated by an indepen-

dent sequence of random variables, X := (X1,X2,...), with Xi having distribution

 s , k ≥ 1  k  P Ps[Xi = k] = 1 − i si, k = −i   0, otherwise.

The random partition Π(X) of N generated by s through X is the partition of N defined by

i ∼Π(X) j if and only if Xi = Xj.

Π(X) is called the paintbox based on s, written Π(X) ∼ %s. Given a measure ν on ↓ R R , we write %ν(·) := R↓ %s(·)ν(ds) to denote the ν-mixture of paintboxes.

Theorem 1.8.1. (Kingman’s correspondence) [27, 93] To any infinitely exchange- ↓ able partition Π of N, there exists a unique measure ν on R such that Π ∼ %ν. Conversely, any paintbox based on ν is an infinitely exchangeable partition of N.

The above result is principally obtained by application of de Finetti’s theorem [47] for infinitely exchangeable sequences of random variables. 23 1.8.1 Asymptotic frequencies

Definition 1.8.2. A set A ⊂ N is said to have asymptotic frequency ||A|| if the limit

#(A ∩ [n]) ||A|| := lim n→∞ n

exists. A partition B ∈ P has asymptotic frequency ||B|| if each of its blocks possesses ↓ ↓ asymptotic frequencies, in which case we write ||B|| := (||B1||, ||B2||,...) ∈ R , the decreasing rearrangement of the asymptotic block frequencies of B.

Theorem 1.8.3. ([27]) Let Π be an infinitely exchangeable partition of N, then it possesses asymptotic frequencies ||Π|| almost surely. Moreover, ||Π|| is distributed as ↓ ν where ν is the unique measure on R such that Π ∼ %ν as in theorem 1.8.1. A distribution on R↓ which has garnered in-depth study in the literature is the Poisson-Dirichlet distribution with parameter (α, θ), denoted by PD(α, θ). A PD(α, θ) mass partition can be obtained from the asymptotic frequencies of a Pitman-Yor(α, θ) partition [93]. A size-biased ordering of the mass components of a PD(α, θ) distribu- tion is distributed according to the Griffiths-Engen-McCloskey (GEM) distribution with parameter (α, θ) [89].

For 0 ≤ α < 1 and θ > −α, let (Wk, k ≥ 1) be a sequence of independent random variables such that Wk ∼ Beta(1 − α, θ + kα). Put

k−1 Y V1 = W1,V2 = W2(1 − W1),...,Vk = Wk (1 − Wi),.... (1.15) i=1

Definition 1.8.4. ([56]) Let V := (Vk, k ≥ 1) be as above for 0 ≤ α < 1 and θ > −α. The collection (V1,V2,...) has the two-parameter Griffiths-Engen-McCloskey distribu- ↓ ↓ tion, written V ∼ GEM(α, θ). The collection V := (Vk, k ≥ 1) of frequencies listed in decreasing order is said to have the two-parameter Poisson-Dirichlet distribution, written V ↓ ∼ PD(α, θ).

Properties of the Poisson-Dirichlet distribution have been shown by Bertoin [26], Holst [65] and Pitman [91, 94]. A book length treatment of the Poisson-Dirichlet distribution can be found in Feng [56]. 24 1.9 Exchangeable coalescents

Kingman [69] constructed a Markov process Π := (Π(t), t ≥ 0) on P, called a coales- cent process, for which Π(t) is an exchangeable partition of N for each t ≥ 0. Given two set partitions π, π0, we say π is a refinement of π0 if π ≤ π0, as defined in section 1.4.1. For such a pair, π0 is said to be obtained from a coagulation of the blocks of π. For example, for π = 126|35|47|8 and π0 = 12356|47|8, π0 is obtained from π by the coagulation of the blocks {1, 2, 6} and {3, 5}. In general, more than two blocks of π0 can be a coagulation of blocks of π. We call a coagulation simple if all but one of the blocks of π0 are identical to those of π.

Definition 1.9.1. An (exchangeable) coalescent process Π := (Π(t), t ≥ 0) is a + collection of exchangeable random partitions of N indexed by t ∈ R such that Π(s) ≤ Π(t) for every s ≤ t.

Definition 1.9.2. An n-coalescent is a collection Πn := (Πn(t), t ≥ 0) on P[n] satisfying Πn(s) ≤ Πn(t) for every s ≤ t which evolves as follows.

• Πn(0) = π0 for some π0 ∈ P[n];

0 • given Πn(t) = π for some t ≥ 0, any pair b, b of blocks of π coagulate at exponential rate 1, and all other collections of blocks of π coagulate at rate 0;

• 1[n] = {[n]}, the trivial one block partition of [n], is an absorbing state for Πn.

In other words, given that Πn(t) = π such that #π = k, the process stays at π for an exponential time with rate parameter k(k−1)/2, the number of pairs of non-empty blocks of π, and then jumps to one of the k(k − 1)/2 partitions of [n] which can be obtained from π by a simple coagulation of its blocks, with each such jump having equal probability.

Theorem 1.9.3. (Existence of Kingman’s coalescent)([27]) For every n ≥ 2,

the restriction Πn|[n−1] of an n-coalescent to [n−1] is an (n−1)-coalescent. Moreover, there exists a unique (in law) process ΠK := (ΠK(t), t ≥ 0) on P such that for every K K n ∈ N the process induced by the restriction to [n], Π|[n] := (Π|[n](t), t ≥ 0), is an n-coalescent. 25

The process ΠK is called Kingman’s coalescent and is an exchangeable partition- K valued Markov process. Moreover, the standard coalescent, i.e. Π (0) = 0N, the K partition of N into singletons, comes down from infinity almost surely, i.e. #Π (t) < ∞ for every t > 0. Kingman’s coalescent arises as the limit of the Wright-Fisher model in the ge- nealogy of populations. For a particular application to neutral mutations and allelic partitions and its association with a PD(0, θ) distribution, see section 2.4 of [27] or Tavar`e[96]. In general, an exchangeable coalescent can be characterized by a unique measure 0 on P as follows. For a subset A ⊂ N, let π ∈ PA and π ∈ P[k] where k ≥ #π ≥ 1. 0 0 00 00 Define the coagulation of π by π , written Coag(π, π ), as the partition π := (πj , j ≥ 1) of A where 00 [ πj := πi, j ∈ N, 0 i∈πj 0 0 where (πi, i ≥ 1) and (πj, j ≥ 1) are the blocks of π and π listed in order of ap- pearance, respectively. For example, let π := 145|28|3|67 and π0 := 13|2|4, then Coag(π, π0) = 1345|28|67. Note that Coag(·, ·) preserves the ordering of blocks ac- cording to least elements, provided the blocks of the first argument are ordered by their least element. The transition mechanism of an exchangeable coalescent can be described in terms of a unique measure µ on P where, given that a jump occurs at time t ≥ 0, Π(t) = Coag(Π(t−), π) where π ∼ µ. For example, for the Kingman coalescent, write K(i, j) as the partition of N whose blocks are given by the pair {i, j} and singletons {k} for k 6= i, j. Then the measure µK which characterizes the Kingman coalescent is

K X µ (·) := δK(i,j)(·) 1≤i

and is called the Kingman measure. 26

Let ν be a measure on R↓ which satisfies

∞ ! Z X ν((0, 0,...)) = 0 and s2 ν(ds) < ∞. (1.16) ↓ i R i=1

Then we have the following for a general exchangeable coalescent.

Theorem 1.9.4. (Characterization of exchangeable coalescent processes) ([25]) Let Π := (Π(t), t ≥ 0) be an exchangeable coalescent on P. Then there exists a unique c ≥ 0 and unique measure ν on R↓ that fulfills (1.16) such that the transitions of Π K are described by Coag(Π(t−), π) where π ∼ µ := cµ + %ν .

The Coag(·, ·) operator is convenient for constructing coalescent processes via a + Poisson on R × P with intensity measure dt ⊗ µ(dπ). The Poissonian construction of Markov partition processes is useful for establishing special properties for the process and will be used in sections 2.3.1 and 3.5.4. In addition to Kingman’s coalescent, there are several special families of coales- cents which fall under the theory of general exchangeable coalescents. The additive co- alescent and multiplicative coalescent are processes with collision rates K(i, j) = i+j and K(i, j) = ij respectively. Other such coalescents are the Marcus-Lushnikov coa- lescent, Bolthausen-Sznitman coalescent and the beta coalescent. A review of these processes, and relevant references, can be found in either Bertoin [25, 27] or Pitman [93]. Bertoin also studies mass-coalescents, the process induced on the space R↓ of mass partitions by the asymptotic frequencies of an exchangeable coalescent. Pitman [92] studies coalescents which allow simultaneous multiple collisions.

1.10 Exchangeable fragmentations

Fragmentations of a set A ⊂ N are given special treatment in chapter 3. We now review the relevant literature in this area while introducing some notation which we use later. 27

Heuristically, an exchangeable fragmentation process Π := (Π(t), t ≥ 0) is the time reversal of an exchangeable coalescent. That is, a fragmentation process is a + family of partitions (Π(t), t ≥ 0) indexed by t ∈ R such that Π(t) ≤ Π(s) for every s ≤ t. However, the time reversal of an exchangeable coalescent is not, in general, an exchangeable fragmentation process because time reversal only preserves the branching property in special cases.

In our study, we distinguish between fragmentations in discrete time, (Πn, n ≥ 1), which we call fragmentation trees, and fragmentations in continuous time, which we call fragmentation processes or weighted fragmentation trees.

1.10.1 Exchangeable random fragmentation trees

A random fragmentation of A is a on TA which satisfies

(a) the branching property: Given the root partition ΠT , the subtrees {T|b : b ∈ ΠT } are distributed independently, and

(b) for each S ∈ ΠT , the subtree T|S is a random fragmentation of S.

σ A random fragmentation of A is exchangeable if T ∼ T := σ(T ) for any σ ∈ SA, the symmetric group of all permutations acting on A. A family of random fragmentations

{PS : S ⊆ A} is consistent if T ∼ PA implies T|S ∼ PS for all S ⊆ A. That is, the marginal distribution of each restricted subtree to S ⊂ A corresponds to PS. A family of distributions P := {PA : A ⊂ N} defines an infinitely exchangeable fragmentation of N if PA is exchangeable for each A ⊂ N and P is consistent as in the discussion of section 1.3.

1.10.2 Gibbs fragmentation trees

The distribution of a random fragmentation tree is described by a splitting rule which is tantamount to a collection (pA,A ⊂ N) of probability distributions on partitions of each A ⊂ N. McCullagh, Pitman and Winkel [80] study a family of fragmentation trees called Gibbs fragmentation trees. 28

For a Gibbs fragmentation tree, the splitting rule p has the form

k Y p (A ,...,A ) ∝ v w (1.17) ∪Ai 1 k k #Ai i=1 for some pair of non-negative sequences v := (vk, k ≥ 1) and w := (wk, k ≥ 1). A tree T is called binary if each parent of T has 0 or 2 children. β β Let ν(dx) = x (1 − x) dx for β ∈ (−2, ∞) and ν(dx) = δ1/2(x) for β = ∞. Aldous’ beta-splitting model is a model for fragmentation trees with binary splitting rule Z 1 p(i, j) ∝ xi(1 − x)jν(dx). 0 Theorem 1.10.1. ([80]) Aldous’s beta-splitting models for β ∈ (−2, ∞] are the only consistent Markovian binary fragmentations with splitting rule of the form

p(i, j) ∝ wiwj, i, j ≥ 1

for some sequence of non-negative weights w := (w1,...).

Infinitely exchangeable fragmentation trees have been studied elsewhere in the literature, see [2, 7, 25].

1.10.3 Genealogical interpretation of a tree

As a collection of subsets of A ⊂ N, the elements of T ∈ TA are partially ordered by inclusion. That is, if A, B ∈ T such that A ⊂ B and #B < ∞, then the intervals [A, B], (A, B], and [A, B) are well-defined subsets of T . This partial ordering induces a natural genealogical interpretation of the relationships among the elements of a tree. For each t ∈ T , the subset anc(t) := (t, A] := {s ∈ T : t ⊂ s} denotes the set of ancestors of t. Note that anc(root(T )) = ∅ and for each t 6= root(T ), anc(t) has a least element denoted by pa(t) := min anc(t), the parent of t. Conversely, except for the singleton elements of T , each t ∈ T is the parent of some collection of subsets of T , called the children of t, which is given by pa−1(t) := 29

frag(t) := {t0 ∈ T : pa(t0) = t}. For each non-singleton t ∈ T with #t < ∞, frag(t) forms a non-trivial partition of t. In particular, for any tree T , the children of root(T ) form the root partition, denoted ΠT := rp(T ) := frag(root(T )). The fragmentation degree of T is given by maxt∈T # frag(t), which may be infinite. For k ≥ 1, we write (k) TA to denote the collection of trees of A with fragmentation degree at most k.

Weighted fragmentation trees

The trees discussed so far are unweighted, or Boolean, meaning their edges are as- signed unit weight, or length. A weighted tree T¯ is a Boolean tree T together with ¯ a collection of non-negative edge lengths {tb : b ∈ T }. We write T to denote the space of weighted trees and usually write T¯ ∈ T¯ as the pair (T,W ) where T is the associated Boolean tree and W := {tb : b ∈ T } is the collection of weights attached to the edges of T . All notation from Boolean trees carries over to weighted trees with the modification that a bar is placed over any symbol to indicate that we are discussing weighted trees. In the literature, weighted trees are also called fragmentation processes where edge weights represent how long a fragment survives before breaking into smaller pieces. Given a Markovian coalescent process (Π(t), t ≥ 0), the time-reversal (Π(−t), t ≥ 0) is also Markovian, but is not, in general, time-homogeneous as such a time-reversal does not possess the branching property of fragmentation processes, which assumes that different fragments evolve independently. Pitman [93] discusses the duality between coalescent and fragmentation processes. As in the case of the Coag operator for describing the behavior of exchangeable coalescent processes, there is an analogous operator, called Frag, by which we can 0 describe a fragmentation process. For a subset A ⊂ N, let π, π ∈ PA with #π = m and let k ∈ [m]. Define the fragmentation of the kth block of π by π0, written 0 00 Frag(π, π , k), as the partition π of A with blocks πi for i 6= k plus the sub-blocks 0 0 0 of π := {π ∩ πk : i ≥ 1}. For example, let π := 134789|256, π := 1268|39|457 and |πk i k = 1, then Frag(π, π0, k) = 18|256|39|47. 30

Bertoin [25] characterizes the behavior of homogeneous fragmentation processes by the Frag(·, ·, ·) operator. For n ∈ N, let en be the partition of N into two non-empty blocks, N\{n} and {n}. Define the erosion measure by

X (·) := δen(·). (1.18) n≥1

The transitions of an exchangeable fragmentation process Π := (Π(t), t ≥ 0) can be described in terms of a unique measure µ⊗# on P ×N, where # denotes the count- ing measure on N. Given that a jump occurs at time t ≥ 0, Π(t) = Frag(Π(t−), π, k) for (π, k) ∼ µ ⊗ #. Let ν be a measure on R↓ which satisfies

Z ν((1, 0,...)) = 0 and (1 − s1)ν(ds) < ∞. (1.19) R↓

Then we have the following for a general exchangeable fragmentation process.

Theorem 1.10.2. (Characterization of exchangeable fragmentationprocesses)([25]) Let Π := (Π(t), t ≥ 0) be an exchangeable fragmentation process on P. Then there exists a unique c ≥ 0 and unique measure ν on R↓ that fulfills (1.19) such that the

transitions of Π are given by Frag(Π(t−), π, k) where π ∼ µ := c + %ν and k is uniform on [#Π(t)].

The Frag operator is convenient for constructing fragmentation processes via a + on R × P × N with intensity measure dt ⊗ µ(dπ) ⊗ #. McCullagh, Pitman and Winkel [80] discuss how weights can be attached to the edges of a Gibbs fragmentation tree in a consistent way. Bertoin [22, 23, 24] has done extensive work in this area and provides thorough explanation and further references in his book [25]. Aldous [2, 3] studies the continuum random tree (CRT) within a different framework than that which we discuss above and in chapter 3. 31 1.11 Exchangeable fragmentation-coalescence processes

The most general exchangeable process of fragmentation and coalescent type is called the exchangeable fragmentation-coalescence (EFC) process. Essentially, an EFC pro- cess is the combination of an exchangeable coalescent and an exchangeable fragmen- tation into one process on P. These processes arise in applications of certain physical sciences [11] and allow one to construct processes which are more flexible than the coalescent or fragmentation processes. We summarize a main result of Berestycki [19] for EFC processes. Durrett, Gra- novsky and Gueron [48] provide general results for the equilibrium behavior of certain EFC processes which predates the work of Berestycki.

Definition 1.11.1. ([19]) A P-valued Markov process (Π(t), t ≥ 0) is an exchangeable fragmentation-coalescent process if it has the following properties.

• It is exchangeable.

• Its restrictions Π|[n] are c`adl`agfinite state Markov processes which can only evolve by fragmentation of one block or by coagulation. 0 0 More precisely, the transition rate of Π|[n](·) from π to π , say qn(π, π ), is non-zero only if there exists π00 such that π0 = Coag(π, π00) or there exists π00, k ≥ 1 such that π0 = Frag(π, π00, k).

Theorem 1.11.2. (Characterization of EFC process)([19]) An EFC process Π := (Π(t), t ≥ 0) is characterized by exchangeable measures C and F on P such that 0 0 C({0N}) = 0, C(P\{π ∈ P : π|[n] = 0|[n]}) < ∞ for every n ∈ N, F ({1N}) = 0 and 0 0 F (P\{π ∈ P : π|[n] = 1[n]}) < ∞ for every n ∈ N and there exist unique constants ↓ ck, ce ≥ 0 and unique measures νC, νF on R such that νC fulfills (1.16) and νF fulfills (1.19) and K C = ckµ + %νC and F = cee + %νF . The above theorem completely characterizes all EFC processes. The manner in which transitions occur in an EFC process are a natural generalization to both coales- cent and fragmentation processes, and the behavior of such processes can be studied using well-established results for both coalescent and fragmentation processes. 32 1.12 Random graphs

The study of random graphs began in the 1950s with the seminal work of Erd¨os and R´enyi [50, 51] in which they constructed a as follows. Let n ≥ 1 and 0 < p < 1, then G(n, p) is the ensemble of graphs with n vertices obtained by including, independently for each pair i, j ∈ [n], an edge between i and j with probability p. Several results have been shown for these graphs, which we call the Erd¨os-R´enyirandom graph with parameter p, including phase transitions, emergence of a giant component, and several other aspects related to its degree distribution and connectivity, see Chung and Lu [39] or recent work by Bollob´as[31, 32, 33, 34] for more details. Another property of the Erd¨os-R´enyi family is that it characterizes an exchange- able graph on countably many vertices, i.e. an infinite graph indexed by N. We call an infinite graph G whose finite dimensional restrictions G := (G1,G2,...) are distributed as G(1, p),G(2, p),... an Erd¨os-R´enyiprocess with parameter p, written ER(p). Though infinite exchangeability is obvious for the Erd¨os-R´enyi process, it is still a striking property as the only widely studied instances of infinitely exchangeable random graphs in the literature are variants of the Erd¨os-R´enyi process. Because of its nice properties and intuitive construction, the Erd¨os-R´enyi process has been the subject of considerable study in the field of mathematics; however, applicability of the ER processes is restricted by its inability to replicate many properties of real-world networks, i.e. heavy tailed degree distributions, clustering, small-world. The terms graph and networks are interchangeable, though we generally use graph to refer to a mathematical object, i.e. a set of subsets of [n]2, and network to refer to a graph which is used to model some real world object, i.e. the Internet.

1.12.1 Heavy-tailed networks

Faloutsos, Faloutsos and Faloutsos [54] presented an empirical study of the degree distribution of the Internet which has led to an explosion of research in the field of complex networks. In this work, Faloutsos, et al claim that the degree distribution 33

observed in the Internet at the router level is scale-free; i.e. let u denote a vertex and du its degree, then −α P(du = k) ∼ k for some α > 0. Later studies suggest that several other real world networks exhibit similar topology to the Internet with α typically between 2 and 3, see e.g. Newman [85] or Albert and Barab´asi[1]. The findings of these papers had been widely accepted until the sampling method used in arriving at these results was called into question in [71, 98]. Heavy-tailed degree distributions give rise to heterogeneous networks which reflect the notion that most networks, e.g. social, biological, etc., consist of a high proportion of vertices with a very small degree which are connected to each other principally through a small, but significant, proportion of high-degree vertices, sometimes called hubs or connectors. In the past decade, there has been much research in the area of modeling hetero- geneous networks. One of these models is the Barab´asi-Albert model.

Barab´asi-Albert model The Barab´asi-Albert model [17] generates scale free net- works of preferential attachment type which evolves as follows. Given a network on n vertices, the (n + 1)st vertex is connected to a subset of the first n vertices with a probability that favors vertices with higher degree. This same phenomenon is re- flected in the Chinese restaurant construction of the Pitman-Yor process whereby individuals tend to choose tables which already have a larger number of individuals. This phenomenon is commonly referred to as the rich get richer or Matthew effect.

To generate a network from the Barab´asi-Albert model, start with m0 ≥ 2 initial nodes, each having degree of at least one. Let Gn be the network obtained from this

procedure after n nodes have been added and add node n + 1 with m ≤ m0 edges connected to nodes i1, . . . , im in the vertex set of Gn with probability proportional Qm to j=1 dij , i.e. the product of degrees of the chosen set of vertices. In particular, if m = 1 then di P(n + 1 7→ i) = Pn . j=1 dj 34

The Barab´asi-Albert model has drawn attention in the study of complex net- works and their application to social networks, the Internet, etc. partly because of its straightforward and intuitive way for generating scale-free networks. Some drawbacks are that the model is not fully specified, i.e. given we start with m0 ≥ 2 vertices each having degree at least one, the generating algorithm does not specify how these intial vertices should be connected. Another drawback of the Barab´asi-Albert model is its lack of exchangeability. This issue has been raised in some literature discussing how to implement this model in applications since it is generally unclear how to prop- erly label observed units to match up with labels of the Barab´asi-Albert generated network.

1.12.2 Small-world networks

Another property commonly observed in networks, particularly social networks, is the small-world property. A network G has the small-world property if the average shortest path length between any two vertices is short, which replicates the notion of a “small world” in which every node in the network is closely connected to every other node in the network [84]. Mathematically, a network is a small-world if the average path length L is asymptotically proportional to log N, i.e. L ∼ α log N for some α > 0. Watts and Strogatz [97] introduced a model for small-world networks which starts with a fixed, regular ring of vertices each having degree k ≥ 2 and generates a random small-world graph by rewiring each edge with probability p ∈ (0, 1).

1.13 Organization of thesis

The rest of this volume is organized as follows. Chapter 2 constructs a Markov process on P that admits transitions of a different nature from the EFC process discussed in section 1.11. Chapter 3 builds upon the theory of chapter 2 to construct an exchangeable Markov process on T and T¯. Chapter 4 shows a construction of an infinitely exchangeable random graph using a Poisson point process on the power 35 set of the natural numbers. Chapter 5 extends the theory of infinitely exchangeable random partitions, particularly the Pitman-Yor family, to projective systems which can be associated with structures called even and balanced partitions. CHAPTER 2 A CONSISTENT MARKOV PARTITION PROCESS GENERATED BY THE PAINTBOX PROCESS

Markov processes on the space of partitions appear in many situations in scientific literature, such as, but not limited to, physical chemistry, astronomy, and population genetics. See Aldous [11] for a relatively recent overview of this literature. In chapter 1 we reviewed exchangeable coalescent processes, fragmentation pro- cesses and their concatentation, the exchangeable fragmentation-coalescence (EFC) process. Well-behaved mathematically tractable models of random partitions are of interest to probabilists as well as statisticians and scientists. Ewens [53] introduced the Ewens sampling formula in the context of theoretical population biology; and Kingman’s coalescent model [69] was introduced as a model for population genetics, still its most natural setting. However, since the seminal work of Ewens and King- man, random partitions have appeared in areas ranging from classification models [29, 81, 82] to probability theory [25, 93]. McCullagh [77] describes how the Ewens model can be used in the classical problem of estimating the number of unseen species, introduced by Fisher [57] and later studied by many, including Efron and Thisted [49].

2.1 Preliminaries

In this chapter, we discuss Markov processes on P(k), the space of set partitions of the natural numbers N with at most k ≥ 1 blocks. See section 1.4.1 for notation and terminology which arises in this context. For each π, π0 ∈ P, partitions of the natural numbers, define the metric d : P × P → R such that

0 0 d(π, π ) = 1/ max{n ∈ N : π|[n] = π|[n]}.

36 37

The space (P, d) is compact [27].

Our focus is on families of exchangeable Markovian transition probabilities (pn, n ≥ n 1) on (P[n],Dm,n) such that if Πn := (πj , j ≥ 1) is a Markov chain on P[n] governed n by pn(·, ·), then the restricted process Dm,n(Πn) := (πj|[m], j ≥ 1) is a Markov chain on P[m] governed by pm(·, ·), for any m ≤ n. This property is equivalent to the condition: For every n ≥ 1,

0 X ∗ 00 pn(B,B ) = pn+1(B ,B ), (2.1) 00 −1 0 B ∈Dn,n+1(B )

0 ∗ −1 for each B,B ∈ P[n] and B ∈ Dn,n+1(B). Burke and Rosenblatt [36] show that (2.1) is necessary and sufficient for the function Dm,n(Πn) to be a Markov chain, and hence for the collection (pn(·, ·), n ≥ 1) to be self-consistent.

Likewise, for a continuous-time Markov process, (Bn(t), t ≥ 0)n∈N, where Bn(t) is a process on P[n] with infinitesimal generator Qn, it is sufficient that the entries of Qn satisfy (2.1) for there to be a Markov process on P with those finite-dimensional transition rates.

2.2 The Cut-and-Paste process

↓(k) Let n, k ∈ N and let ν be a probability measure on the ranked k-simplex R , so that the paintbox based on ν is obtained by sampling a conditionally i.i.d. sequence

X1,X2,... from ν, i.e. given s ∼ ν, X1,X2,... are i.i.d. with Ps(Xi = j) = sj for (k) each j = 1, . . . , k. For convenience, we write B ∈ P as an ordered list (B1,...,Bk) where Bi corresponds to the ith block of B in order of appearance for i ≤ #B and Bi = ∅ for i = #B + 1, . . . , k. Consider the following Markov transition operation B 7→ B0 on P(k). Let B = (k) (B1,...,Bk) ∈ P and, independently of B, generate C1,C2,... which are inde- pendent partitions of N distributed according to a paintbox based on ν. For each (k) i, we write Ci := (Ci1,...,Cik) ∈ P . Independently of B,C1,C2,..., gener- ate σ1, σ2,... which are independent uniform random permutations of [k]. Given 38

σ := (σ1, σ2, . . . , σk), we arrange B,C1,...,Ck in matrix form as follows:

C.1 C.2 ...C.k   B C ∩ B C ∩ B ...C ∩ B 1 1,σ1(1) 1 1,σ1(2) 1 1,σ1(k) 1   B  C ∩ B C ∩ B ...C ∩ B  2 2,σ2(1) 2 2,σ2(2) 2 2,σ2(k) 2  σ .  . . . .  =: B ∩ C . .  . . .. .    B C ∩ B C ∩ B ...C ∩ B k k,σk(1) k k,σk(2) k k,σk(k) k

B ∩ Cσ is a matrix with row totals, the union over columns of entries within each row, corresponding to the blocks of B and column totals C = Sk (C ∩ B ). .j i=1 i,σi(j) i 0 Finally, B is obtained as the collection of non-empty blocks of (C.1,...,C.k), written 2 CP(B, C, σ). The non-empty entries of B ∩ Cσ form a partition in P(k ) which corresponds to the greatest lower bound B ∧ B0 in the partition lattice.

Proposition 2.2.1. The above description gives rise to finite-dimensional transition (k) probabilities on P[n]

(k − #B0 )! k! Y |b pν (B,B0) = % (B0 ). (2.2) n (k − #B0)! k! ν |b b∈B

(k) (k) Proof. Let A ∈ P . Fix n, k ∈ N, put B := A|[n] ∈ P[n] . Let C1,...,Ck be i.i.d. %ν-distributed partitions and σ := (σ1, . . . , σk) i.i.d. uniform random permutations of [k] as described above. Let B0 be the set partition obtained from the column totals of the matrix B ∩ Cσ in the above construction. From the matrix construction, it is clear that for each i = 1, . . . , k, the restriction 0 (k) B is equal to the set partition in P associated with Ci[Bi] := (Ci1∩Bi,...,Cik ∩ |Bi Bi 0 Bi). Conversely, the transition B 7→ B occurs only if the collection (C1,...,Ck) is 0 such that, for each Bi ∈ B, Ci[Bi] = B . By consistency of the paintbox process, |Bi for each i = 1, . . . , k, Ci[Bi] has probability

0 0 %ν(C [B ]) = %ν(B ) := %ν({π ∈ P : π = B }). i i |Bi |[Bi] |Bi 39

0 Independence of the Ci implies that the probability of B ∧ B given B is

Y 0 %ν(B|b). b∈B

Finally, each uniform permutation σi has probability 1/k! and there are

k! Y (k − #B0 )! (k − #B0)! |b b∈B

σ collections σ1, . . . , σ#B such that the column totals of B ∩ C correspond to the blocks of B0. This completes the proof.

Definition 2.2.2. A P(k)-valued Markov process Π with finite-dimensional transition probabilities of the form (2.2) is called an (exchangeable) cut-and-paste1 process with parameter ν, written Π ∼ CP(ν).

0 For fixed n, (2.2) depends only on B and B through %ν and the number of blocks of B and B0 and is, therefore, finitely exchangeable. Consistency of (2.2) is automatic by the construction of the process on P(k). However, we provide a longer proof below which appeals directly to (2.1) and gives some insight into the transitions of the process.

↓(k) ν Proposition 2.2.3. For any measure ν on R , let (pn(·, ·))n≥1 be the collection (k) of transition probabilities on P[n] defined in (2.2). Then (pn) is a consistent family of transition probabilities.

0 (k) Proof. Fix n, k ∈ N and let B,B ∈ P[n] . To establish consistency it is enough to ∗ −1 verify condition (2.1) from theorem 1 of [36], i.e. for every ν and B ∈ Dn,n+1(B),

ν ∗ −1 0 ν 0 pn+1(B ,Dn,n+1(B )) = pn(B,B ).

∗ −1 We assume without loss of generality that B ∈ Dn,n+1(B) is obtained from B by ∗ the operation n + 1 7→ B1 ∈ B and we write B1 := B1 ∪ {n + 1}. Likewise, for

1. The name “cut-and-paste process” was suggested by Marcin Hitczenko as a descriptive name of the transition procedure of the process. 40

00 −1 0 0 0 0∗ 0 B ∈ Dn,n+1(B ) obtained by n + 1 7→ Bi ∈ B ∪ {∅}, write Bi := Bi ∪ {n + 1}. So 0∗ 0 0 either n + 1 ∈ Bi for some i = 1,..., #B or n + 1 is inserted in B as a singleton. σ 0 0 The change to B ∩ C that results from inserting n + 1 into B1 ∈ B and Bi ∈ B 0 0 is summarized by the following matrix. Note that Bj = ∅ for j > #B .

0 0 0∗ 0 B1 B2 ...Bi ...Bk ∗ 0 0 0 0  B1 B1 ∩ B1 B2 ∩ B1 ... (Bi ∩ B1) ∪ {n + 1} ...Bk ∩ B1   B  B0 ∩ B B0 ∩ B ...B0 ∩ B ...B0 ∩ B  2  1 2 2 2 i 2 k 2  .  ...... . .  ......    0 0 0 0 Bk B1 ∩ Bk B2 ∩ Bk ...Bi ∩ Bk ...Bk ∩ Bk

Here, the blocks of B are listed in any order, with empty sets inserted as needed, and the blocks of B0 are listed in order of least elements, with k − #B0 empty sets at the end. 0 −1 0 Given B , the set of compatible partitions Dn,n+1(B ) consists of three types of 0 partitions depending on the subset B1 ⊂ [n] and the block of B into which {n + 1} 00 −1 0 is inserted. Let B ∈ Dn,n+1(B ) be the partition of [n + 1] obtained by inserting n + 1 in B0. Either

0 0 00 0 (i) n + 1 is inserted into a block Bi such that Bi ∩ B1 6= ∅ ⇒ #B ∗ = #B ; |B1 |B1 0 0 00 (ii) n + 1 is inserted into a block Bi 6= ∅ such that Bi ∩ B1 = ∅ ⇒ #B ∗ = |B1 #B0 + 1; or |B1 0 00 0 (iii) n + 1 is inserted into B as a singleton block ⇒ #B ∗ = #B + 1 and |B1 |B1 00 0 0 #B = #B + 1; we denote this partition by B∅. There are k − #B0 empty columns in which {n + 1} can be inserted as a singleton in 0 00 00 ∗ B , as in (iii). For B obtained by (ii), the restriction of B to B1 coincides with the 0 ∗ restriction of B∅ to B1, so each of these restrictions has the same probability under %ν. For notational convenience in the following calculation, let D1 be those elements −1 0 of Dn,n+1(B ) which satisfy condition (i) above and D2 those which satisfy condition (ii). 41

ν ∗ −1 0 pn+1(B ,Dn,n+1(B )) = 00 X k! Y (k − #B|b)! % (B00 ) (2.3) (k − #B00)! k! ν |b 00 −1 0 b∈B∗ B ∈Dn,n+1(B )  (k − #B0 )! k! Y |b X Y 00 = 0  %ν(B|b)+ (k − #B )! k! 00 ∗ b∈B B ∈D1 b∈B  0 X 1 Y 00 k − #B Y 0 + 0 %ν(B|b) + 0 %ν(B∅|b) (2.4) 00 k − #B ∗ k − #B ∗ B ∈D2 |B1 b∈B |B1 b∈B  (k − #B0 )! k! Y |b Y 0 X 00 = %ν(B ) %ν(B ∗ )+ 0 |b  |B1 (k − #B )! k! ∗ ∗ 00 b∈B b∈B :b6=B1 B ∈D1  0 X 1 00 k − #B 0 + %ν(B ∗ ) + %ν(B ∗ ) 0 |B1 0 ∅|B1  00 k − #B k − #B B ∈D2 |B1 |B1   (k − #B0 )! k! Y |b Y 0 X 00 0 = %ν(B ) %ν(B ∗ ) + %ν(B ∗ ) (2.5) 0 |b  |B1 ∅|B1  (k − #B )! k! ∗ 00 b∈B b∈B:b6=B1 B ∈D1   0 k! Y (k − #B|b)! Y X = % (B0 )  % (B00) (2.6) (k − #B0)! k! ν |b  ν  ∗ −1 0 b∈B b∈B:b6=B1 B00∈D (B ) #B1,#B1+1 |B1 0 k! Y (k − #B|b)! Y h i = % (B0 ) % (B0 ) (2.7) 0 ν |b ν |B1 (k − #B )! k! ∗ b∈B b∈B:b6=B1 0 k! Y (k − #B|b)! Y = % (B0 ) (k − #B0)! k! ν |b b∈B b∈B ν 0 = pn(B,B ).

(k−#B0 )! Here, (2.4) is obtained from (2.3) by factoring k! Q |b out of the (k−#B0)! b∈B k! sum and using observations (i), (ii) and (iii). In (2.5), we use the fact that for any 00 00 0 0 0 B ∈ D2, B ∗ = B ∗, and there are #B − #B elements in D2 according to |B1 ∅|B1 |B1 00 (ii). Line (2.6) follows by observing that each B ∈ D1 corresponds to an element −1 0 0 −1 0 of D (B ) and B ∗ is the element of D (B ) obtained by #B1,#B1+1 |B1 ∅|B1 #B1,#B1+1 |B1 inserting {n+1} as a singleton in B0 . Finally, (2.7) follows from (2.6) by consistency |B1 of the paintbox process. This completes the proof. 42

The following result is immediate by finite exchangeability and consistency of (2.2) for every n and Kolmogorov’s extension theorem (theorem 1.3.1).

Theorem 2.2.4. There exists a transition probability measure pν(·, ·) on  (k) S (k) P , σ n P[n] whose finite-dimensional restrictions are given by (2.2). In par- ticular, the cut-and-paste process exists.

2.2.1 Equilibrium measure

0 (k) ν 0 From (2.2), it is clear that for each n, k ∈ N and B,B ∈ P[n] , pn(B,B ) is strictly ↓(k) positive provided ν is such that ν(s) > 0 for some s = (s1, . . . , sk) ∈ R with sk > 0. Under this condition, the finite-dimensional chains are aperiodic and ir- (k) reducible on P[n] and, therefore, have a unique stationary distribution. In fact, the finite-dimensional chains based on ν are aperiodic and irreducible provided ν is not degenerate at (1, 0,..., 0) ∈ R↓(k). The existence of a unique stationary distri- bution for each n implies that there is a unique stationary probability measure on  (k) S (k) ν P , σ n P[n] for p (·, ·) from theorem 2.2.4. Proposition 2.2.5. Let ν be a measure on R↓(k) such that ν is non-degenerate at ↓(k) ν (1, 0,..., 0) ∈ R . Then there exists a unique stationary distribution θn(·) for ν pn(·, ·) for each n ≥ 1.

↓(k) Proof. Fix n ∈ N and let ν be any measure on R other than that which puts (k) unit mass at (1, 0,..., 0). For B = (B1,...,Bm) ∈ P[n] , (2.2) gives the transition probability m k! Y 1 pν (B,B) = % (B ) n (k − m)! k ν i i=1 ν and %ν(Bi) = %ν([#Bi]) > 0 for each i = 1, . . . , m. Hence, pn(B,B) > 0 for every (k) B ∈ P[n] and the chain is aperiodic. 0 (k) To see that the chain is irreducible, let B,B ∈ P[n] and let 1n denote the one block partition of [n]. Then

Y 1 pν (B, 1 ) = k % ([#b]) > 0 n n k ν b∈B 43

0 and, since ν is not degenerate at (1, 0,..., 0), there exists a path 1n 7→ B by recur- 0 0 0 0 sively partitioning 1n until it coincides with B . For instance, let B := (B1,...,Bm) ∈ (k) 0 P . One such path from 1n to B is

m m 0 [ 0 0 0 [ 0 1n → (B1, Bi) → (B1,B2, Bi) → · · · → B i=2 i=3

ν which has positive probability for any non-degenerate ν. Hence pn(·, ·) is irreducible, which establishes the existence of a unique stationary distribution for each n.

Theorem 2.2.6. Let ν be a measure on R↓(k) such that ν((1, 0,..., 0)) < 1. Then there exists a unique stationary probability measure θν(·) for the CP(ν)-Markov chain  (k) S (k) on P , σ n P[n] .

Proof. For ν satisfying ν((1, 0,..., 0)) < 1, proposition 2.2.5 shows that a stationary ν distribution exists for each n ≥ 1. Let (θn(·), n ≥ 1) be the collection of stationary ν distributions for the finite-dimensional transition probabilities (pn(·, ·), n ≥ 1). We now show that the θn are consistent and finitely exchangeable for each n. (k) ν Fix n ∈ N and let B ∈ P[n] . Then stationarity of θn(·) implies

X ν 0 ν 0 ν θn(B )pn(B ,B) = θn(B). (k) B0∈P [n]

ν ν 0 (k) Now write θn(·) ≡ θn(·) and pn(·, ·) ≡ pn(·, ·) for convenience and let B ∈ P[n] . 44

Then

X 00 X X ∗ ∗ 00 θn+1(B ) = θn+1(B )pn+1(B ,B ) −1 −1 (k) B00∈D (B0) B00∈D (B0) B∗∈P n,n+1 n,n+1 [n+1] | {z } −1 0 (θn+1Dn,n+1)(B )  

X ∗  X ∗ 00  = θn+1(B )  pn+1(B ,B )   (k) −1 B∗∈P B00∈D (B0) [n+1] n,n+1 X X ∗  0  = θn+1(B ) pn(B,B ) (k) −1 B∈P B∗∈D (B) [n] n,n+1 X 0 X ∗ = pn(B,B ) θn+1(B ) (k) −1 B∈P B∗∈D (B) [n] n,n+1 X 0 −1 = pn(B,B )(θn+1Dn,n+1)(B). (k) B∈P [n]

−1 −1 So we have that θn+1Dn,n+1 is stationary for pn which implies that θn ≡ θn+1Dn,n+1 by uniqueness and θn is consistent for each n. 0 (k) 0 Let σ be a permutation of [n]. Then for any B,B ∈ P[n] , pn(σ(B), σ(B )) = 0 pn(B,B ) by exchangeability of pn. It follows that θn is finitely exchangeable for each n since X 0 0 θn(σ(B))pn(σ(B), σ(B )) = θn(σ(B )) (k) B∈P [n] 0 0 by stationarity, and pn(σ(B), σ(B )) = pn(B,B ) implies that

X 0 0 θn(σ(B))pn(B,B ) = θn(σ(B )). (k) B∈P [n]

Hence, θn ◦ σ is stationary for pn and θn ≡ θn ◦ σ by uniqueness. Kolmogorov’s extension theorem (theorem 1.3.1) implies that there exists a unique 45

(k) exchangeable stationary probability measure θ on P whose restriction to [n] is θn for each n ∈ N. This completes the proof.

2.3 Continuous-time version of CP(ν)-process

↓(k) Let λ > 0, ν be a measure on R and for each n ∈ N define Markovian infinitesimal (k) jump rates for a Markov process on P[n] by

( ν 0 0 ν 0 λpn(B,B ),B 6= B qn(B,B ) = (2.8) 0, otherwise

ν ν (k) where pn is as in (2.2). The infinitesimal generator, Qn, of the process on P[n] governed by qn has entries

( pν (B,B0),B 6= B0 Qν (B,B0) = λ × n (2.9) n ν 0 pn(B,B) − 1,B = B .

Since the parameter λ acts only as a parameter to scale time, we can assume λ = 1 without loss of generality. We now construct a Markov process B := (B(t), t ≥ 0) in continuous time whose finite-dimensional transition rates are given by (2.8).

Definition 2.3.1. A process B := (B(t), t ≥ 0) on P(k) is a (continuous-time) cut- and-paste process with parameter ν if, for each n ∈ N, B|[n] is a Markov process on (k) ν P[n] with Q-matrix Qn as in (2.9).

(k) ν A process on P whose finite-dimensional restrictions are governed by Qn can be constructed according to the matrix construction from section 2.2 by permitting 0 0 0 (k) only transitions B 7→ B for B 6= B, where B,B ∈ P[n] , and adding a hold time ν which is exponentially distributed with mean −1/Qn(B,B).

↓(k) ν Proposition 2.3.2. For a measure ν on R , let (Qn)n∈N be the collection of ν Q-matrices in (2.9). For every n ∈ N, the entries of Qn satisfy (2.1). 46

0 (k) 0 Proof. Fix n ∈ N and let B,B ∈ P[n] such that B 6= B . Then

ν 0 X ν 00 Qn(B,B ) = Qn+1(B∗,B ) 00 −1 0 B ∈Dn,n+1(B )

−1 for all B∗ ∈ Dn,n+1(B) by the consistency of pn from proposition 2.2.3. 0 −1 For B = B and B∗ ∈ Dn,n+1(B), we have

X ν 00 Qn+1(B∗,B ) = 00 −1 B ∈Dn,n+1(B) ν X ν 00 Qn+1(B∗,B∗) + Qn+1(B∗,B ) 00 −1 B ∈Dn,n+1(B)\{B∗} ν X ν 00 = pn+1(B∗,B∗) − 1 + pn+1(B∗,B ) 00 −1 B ∈Dn,n+1(B)\{B∗} X ν 00 = pn+1(B∗,B ) − 1 00 −1 B ∈Dn,n+1(B) ν = pn(B,B) − 1 ν = Qn(B,B).

Theorem 2.3.3. For each measure ν on R↓(k), there exists a Markov process (B(t), t ≥ 0) on P(k) that has finite-dimensional transition rates given in (2.8).

↓(k) Proof. Let ν be a measure on R and (B|[n](t), t ≥ 0)n∈N be the collection of ν restrictions of a CP(ν)-process with consistent Q-matrices (Qn)n∈N as in (2.9). For ν ν each n, Qn is finitely exchangeable and consistent with Qn+1 by proposition 2.3.2, which is sufficient for B|[n] to be consistent with B|[n+1] for every n. Kolmogorov’s extension theorem implies that there exist transition rates, Qν, on P(k) such that for 0 (k) every B,B ∈ P[n] ,

ν 00 (k) 00 0 ν 0 Q (B∗, {B ∈ P : B|[n] = B }) = Qn(B,B ),

00 (k) 00 for every B∗ ∈ {B ∈ P : B|[n] = B}. 47

(k) ν (k) ν Finally, for every B ∈ P[n] , Qn(B, P[n] \{B}) = 1 − pn(B,B) < ∞ so that the sample paths of B|[n] are c`adl`agfor every n, which implies that B is c`adl`ag.

Corollary 2.3.4. For ν satisfying the condition of theorem 2.2.6, the continuous-time ν process B := (B(t), t ≥ 0) with finite-dimensional rates qn(·, ·) in (2.8) has unique stationary distribution θν(·) from theorem 2.2.6.

ν Proof. For each n ∈ N, let θn(·) be the unique finite-dimensional stationary distribu- ν ν ν tion of pn(·, ·) from (2.2). It is easy to verify that for each n ∈ N,Θn := (θn(B),B ∈ (k) P[n] ) satisfies ν t ν (Θn) Qn = 0,

ν ν which establishes that Θn is stationary for Qn for every n. The rest follows by theorem 2.2.6.

2.3.1 Poissonian construction

From the matrix construction at the beginning of section 2.2, a consistent fam- ily of finite-dimensional Markov processes with transition rates as in (2.8) can be + Qk (k) constructed by a Poisson point process on R × i=1 P as follows. Let P = + Qk (k) {(t, C1,...,Ck)} ⊂ R × i=1 P be a Poisson point process with intensity mea- (k) ↓(k) (k) sure dt ⊗ %ν for some measure ν on R , where %ν is the product measure Qk (k) %ν ⊗ · · · ⊗ %ν on i=1 P . Construct an exchangeable process B := (B(t), t ≥ 0) on P(k) by taking π ∈ P(k) to be some exchangeable random partition and setting B(0) = π.

For each n ∈ N, put B|[n](0) = π|[n] and

• if t is not an atom time for P , then B|[n](t) = B|[n](t−);

• if t is an atom time for P so that (t, C1,...,Ck) ∈ P , then, independently of (B(s), s < t) and (t, C1,...,Ck) generate σ1, . . . , σk i.i.d. uniform random permutations of [k] and construct B0 from the set partition induced by the 48

column totals (C.1,...,C.k) of

C.1 C.2 ...C.k   B C ∩ B C ∩ B ...C ∩ B 1 1,σ1(1) 1 1,σ1(2) 1 1,σ1(k) 1   B  C ∩ B C ∩ B ...C ∩ B  2 2,σ2(1) 2 2,σ2(2) 2 2,σ2(k) 2  σ .  . . . .  =: B ∩ C . .  . . .. .    B C ∩ B C ∩ B ...C ∩ B k k,σk(1) k k,σk(2) k k,σk(k) k

where (B1,...,Bk) are the blocks of B = B|[n](t−) listed in order of their least element, with k − #B empty sets at the end of the list.

0 0 – if B 6= B, then B|[n](t) = B ; 0 – if B = B, B|[n](t) = B|[n](t−).

Proposition 2.3.5. The above process B is a Markov process on P(k) with transition matrix Qν defined by theorem 2.3.3.

Proof. This is clear from the infinite exchangeability of both the paintbox process ν (theorem 1.8.1) and the Qn-matrices for every n (proposition 2.3.2), and the fact that, by this construction, for any n such that B|[n](t) = π then B[n]|[m](t) = Dm,n(π) for −1 all m < n and B|[p](t) ∈ Dn,p(π) for all p > n.

(k) Let Pt be the semi-group of a CP(ν)-process B(·), i.e. for any continuous ϕ : P → R

Ptϕ(π) := Eπϕ(B(t)),

the expectation of ϕ(B(t)) given B(0) = π.

Corollary 2.3.6. A CP(ν)-process has the Feller property, i.e.

(k) (k) • for every continuous function ϕ : P → R and π ∈ P one has

lim Ptϕ(π) = ϕ(π), and t↓0

• for all t > 0, π 7→ Ptϕ(π) is continuous. 49

Proof. The proof follows the same program as the proof of corollary 6 in [19]. (k) 0 0 Let Cf := {f : P → R : ∃n ∈ N s.t. π|[n] = π|[n] ⇒ f(π) = f(π )} be a set (k) of functions which is dense in the space of continuous functions from P → R. It is clear that for g ∈ Cf , limt↓0 Ptg(π) = g(π) since the first jump-time of B(·) is an exponential variable with finite mean. The first point follows for all continuous (k) functions P → R by denseness of Cf . For the second point, let π, π0 ∈ P(k) such that d(π, π0) < 1/n and use the same Poisson point process P to construct two CP(ν)-processes, B(·) and B0(·), 0 0 with starting points π and π respectively. By the construction, B|[n] = B|[n] and 0 d(B(t),B (t)) < 1/n for all t ≥ 0. It follows that for any continuous g, π 7→ Ptg(π) is continuous.

This allows us to characterize the CP(ν)-process in terms of its infinitesimal gen- erator. Let B := (B(t), t ≥ 0) be the CP(ν)-process on P(k) with transition rates as in (2.8). The infinitesimal generator, A, of B is given by

Z A(f)(π) = (f(π0) − f(π))Qν(π, dπ0), P(k) for every f ∈ Cf .

2.4 Asymptotic frequencies

↓ Adopting the notation of section 1.8.1, let Λ(B) = (kB1k, kB2k,...) be the decreas- ing arrangement of asymptotic frequencies of a partition B = (B1,B2,...) ∈ P which possesses asymptotic frequencies, some of which could be 0. The process described in section 2.2 only assigns positive probability to transitions involving two partitions with at most k blocks. From the Poissonian construction of the transition rates in section 2.3.1, it is evident that the states of B = (B(t), t ≥ 0) will have at most k blocks almost surely. Moreover, the description of the transition rates in terms of the paintbox process allows us to describe the associated measure- valued process of B := (B(t), t ≥ 0) characterized by λ and ν. 50 2.4.1 Poissonian construction

Consider the following Poissonian construction of a measure-valued process X := ↓(k) 0 0 0 (X(t), t ≥ 0) on R . For any k ∈ N and ν as above, let P = {(t, P1,...,Pk)} ⊂ + Qk ↓(k) (k) R × i=1 R be a Poisson point process with intensity measure dt ⊗ ν , where (k) Qk ↓(k) ν is the k-fold product measure ν ⊗ ... ⊗ ν on i=1 R . ↓(k) Construct a process X := (X(t), t ≥ 0) on R by generating p0 from some ↓(k) probability measure on R . Put X(0) = p0 and

• if t is not an atom time for P 0, then X(t) = X(t−);

0 0 0 0 0 j j • if t is an atom time for P so that (t, P1,...,Pk) ∈ P , with Pj = (P1 ,...,Pk ) ↓(k) for each j = 1, . . . , k, and X(t−) = (x1, . . . , xk) ∈ R , then, independently 0 0 of (X(s), s < t) and (t, P1,...,Pk), generate σ1, . . . , σk i.i.d. uniform random permutations of [k] and construct X(t) from the marginal column totals of

. . . P1 P2 ...Pk  1 1 1  x1 x1P x1P . . . x1P σ1(1) σ1(2) σ1(k)  2 2 2  x2 x2P x2P . . . x2P   σ2(1) σ2(2) σ2(k)   ; .  . . .. .  .  . . . .   k k k  xk xkP xkP . . . xkP σk(1) σk(2) σk(k)

↓ . . . ↓ Pk i  i.e. put X(t) = (P ,P ,...,P ) := xiP , 1 ≤ j ≤ k . 1 2 k i=1 σi(j) Theorem 2.4.1. Let X := (X(t), t ≥ 0) be the process constructed above. Then

X =L Λ(B) where B := (B(t), t ≥ 0) is the CP(ν)-process from theorem 2.3.3, and =L denotes “equal in law”.

↓(k) Proof. Fix k ∈ N and let ν(·) be a measure on R . In the description of the sample paths of B in section 2.3, note that generating (k) (C1,...,Ck) ∼ %ν is equivalent to first generating si ∼ ν independently for each i = 1, . . . , k, and then generating random partitions Ci by sampling from si for each 0 i = 1, . . . , k. Finally, Bi is set equal to the marginal total of column i of the matrix 51

σ B∩C , where σ := (σ1, . . . , σk) is an i.i.d. collection of uniform random permutations of [k]. Hence, we can couple the two processes X and B together using the Poisson point process P 0 described above. 0 + Qk ↓(k) Let X evolve according to the Poisson point process P on R × i=1 R as described above. Let B evolve by the modification that if t is an atom time of P 0 i i i 0 then we obtain partitions (C1,...,Ck) by sampling X := (X1,X2,...) i.i.d. from Pi for each i = 1, . . . , k, i.e. i 0 i P(X1 = j|Pi ) = Pj ,

i and defining the blocks of Ci as the equivalence classes of X . Constructed in this i (k) way, kCijk= Pj almost surely for each i, j = 1, . . . , k and (C1,...,Ck) ∼ %ν . 0 After obtaining the Ci, generate, independently of B,C1,...,Ck,P , i.i.d. uniform permutations σ1, . . . , σk of [k] and proceed as in the construction of section 2.3.1 where σ 0 B,C1,...,Ck are arranged in the matrix B ∩ C and the blocks of B are obtained as the marginal column totals of B ∩ Cσ. The (i, j)th entry of B ∩ Cσ is C ∩ B i,σi(j) i i for which we have kC ∩ Bik= kC kkBik= xiP a.s. i,σi(j) i,σi(j) σi(j) By this construction, B(t) is constructed according to a Poisson point process with the same law as that described in section 2.3.1, and B(t) possesses ranked asymptotic frequencies which correspond to X(t) almost surely for all t ≥ 0.

Corollary 2.4.2. X(t) := (Λ(B(t)), t ≥ 0) exists almost surely.

2.4.2 Equilibrium measure

Just as the process (B(t), t ≥ 0) on P(k) converges to a stationary distribution, so does its associated measure-valued process (X(t), t ≥ 0) from section 2.4.1.

Theorem 2.4.3. The associated measure-valued process X for a CP(ν)-process with unique stationary measure θν has equilibrium measure |θν|↓, the distribution of the ranked frequencies of a θν(·)-partition.

Proof. Proposition 1.4 in [27] states that if a sequence of exchangeable random parti- tions converges in law on P to π∞ then its sequence of ranked asymptotic frequencies 52

↓ converges in law to |π∞| . Hence, from corollary 2.3.4 X has equilibrium distribution |θν|↓.

2.5 A two parameter subfamily

For k ∈ N and α > 0, the Pitman-Yor(−α, kα) process (1.10) has finite-dimensional distributions Q ↑#b k! b∈B α ρn(B; α, k) = , (2.10) (k − #B)! (kα)↑#n

(k) ↑n which is supported by P[n] , where α := α(α + 1) ··· (α + n − 1). For notational convenience, introduce the α-permanent [78] of an n × n matrix K,

n X #σ Y perα K = α Ki,σ(i), (2.11) σ∈Sn i=1 where #σ is the number of cycles of the permutation σ. Note that when B ∈ P[n] is regarded as a matrix,

Y Y ↑#b perα B = perα B|b = α , (2.12) b∈B b∈B which allows us to write (2.10) as

k! perα B ρn(B; α, k) = . (2.13) (k − #B)! (kα)↑n

We now consider a specific sub-family of reversible CP(ν)-processes whose tran- sition probabilities can be written down explicitly. For k ∈ N and α > 0, let ν be the PD(−α/k, α) distribution on R↓(k) and define transition probabilities according to the matrix construction based on ν as in section 2.2. We call this process the CP(α, k)-process. 53

Proposition 2.5.1. The CP(α, k)-process has finite-dimensional transition probabil- ities

Q ↑#(b∩b0) 0 k! Y b0∈B0(α/k) pn(B,B ; α, k) = (2.14) (k − #B0)! α↑#b b∈B 0 k! perα/k(B ∧ B ) = 0 . (2.15) (k − #B )! perα B

Proof. Theorem 3.2 and definition 3.3 from [93] shows that the distribution of B ∼ %ν where ν = PD(−α/k, α) is

k! perα/k B ρn(B; α/k, k) = . (k − #B)! (α)↑n

Combining this and (2.2) yields (2.14); (2.15) follows from (2.12).

+ Proposition 2.5.2. For each (α, k) ∈ R × N and n ∈ N, pn(·, ·; α, k) defined in proposition 2.5.1 is reversible with respect to (2.10) with parameter (−α, kα).

Proof. Let ρn(·; α, k) be the distribution with parameter (−α, kα) defined in (2.10), 0 (k) and pn(·, ·; α, k) be as defined in (2.14). For any B,B ∈ P[n] , it is immediate that

0 0 0 ρn(B; α, k)pn(B,B ; α, k) = ρn(B ; α, k)pn(B ,B; α, k), (2.16)

which establishes reversibility.

Bertoin [26] discusses some reversible EFC processes which have PD(α, θ) distri- bution as their equilibrium measure, for 0 < α < 1 and θ > −α. Here we have shown reversibility with respect to PD(α, θ) for α < 0 and θ = −mα for m ∈ N. The construction of the continuous-time process is a special case of the procedure in section 2.3. The measure-valued process (X(t), t ≥ 0) based on the (α, k)-Markov process has unique stationary measure PD(−α, kα), the distribution of the ranked frequencies of a partition with finite-dimensional distributions as in (2.10) with pa- rameter (−α, kα). 54

The parameter α is related to how local the jumps in the CP(α, k)-process are likely to be. Larger values of α assign higher probability to large jumps, while small values of α favor more local jumps. The asymptotic behavior of the CP(α, k)-process as α and k vary is summarized in the following proposition.

Proposition 2.5.3. For α > 0 and k ≥ 1, let pn(·, ·; α, k) be the finite-dimensional transition probabilities (2.14) of the CP(α, k) process. The following asymptotic tran- sition laws hold.

• For fixed α > 0 and k → ∞, (2.14) converges to

 Q 0 #(B∧B0) Q b0∈B0 Γ(#(b∩b )) 0 0  α b∈B ↑#b ,B ≤ B pn(B,B ; α) = α  0, otherwise,

the law of a Ewens fragmentation chain.

• For fixed k ≥ 1 and α → ∞, (2.14) converges to

0 0 ↓#B0 n pn(B,B ; k) = Pn(B ; k) = k /k . (2.17)

• For fixed k ≥ 1 and α → 0, (2.14) converges to

 ↓#B0  k 0 0 #B ,B ≤ B pn(B,B ; k) = k  0, otherwise,

a discrete-time coalescent chain.

• For α/k → 0, k → ∞ such that α → θ > 0, (2.14) converges to

 Q 0 #(B∧B0) Q b0∈B0(Γ(#(b∩b )) 0 0  θ b∈B ↑#b ,B ≤ B pn(B,B ; θ) = θ  0, otherwise,

a discrete-time fragmentation chain where each block fragments independently according to a Ewens distribution with parameter θ. 55

• For α → 0 and k → ∞, (2.14) converges to

0 pn(B,B ) = I{B=B0},

the unit mass at B.

Remark 2.5.4. Note that the asymptotic transition law for α → ∞ shown in (2.17) is independent of B. The weak limit of a sequence of partitions under α → ∞ is that of an independent and identically distributed sequence of partitions governed by

the coupon-collector distribution on partitions. For each i = 1, 2,...,Bi is chosen independently of Bi−1 and each element chooses their block membership uniformly among the k blocks of the partition and independently of how other elements are configured. In other words, all structure of the model is lost and all elements act independently at all time points.

Remark 2.5.5. The results in proposition 2.5.3 elucidate the role of the parameter α in the CP(α, k) process. In particular, as α grows large, the process becomes more erratic (asymptotically i.i.d.), while small values of α lead to more controlled behavior, which tends toward more local one-step transitions (pure coalescence).

2.6 A three-parameter extension

For fixed k ≥ 1, the CP(α, k)-process is a two-parameter family. We now introduce a three-parameter extension where an extra parameter Σ is introduced. The parameter Σ is a symmetric square matrix with non-negative entries, and is called the similarity

matrix. In particular, the entry Σij is a measure of similarity between elements i and j, and a higher value of Σij leads to a higher probability that elements i and j appear in the same block over the course of the partition sequence.

2.6.1 Similarity and dissimilarity matrices

An n × n symmetric matrix Σ is a similarity matrix if

(1)Σ ij ≥ 0 for all i, j, and 56

(2)Σ ij ≤ Σii ∧ Σjj for all i, j. Condition (2) permits the interpretation of Σ as a matrix of pairwise similarities as it requires that any element i is more similar to itself than any other element in the set. In many instances, it may be natural for Σ to be constant along the diagonal, signifying that all elements are equally similar to themselves, but we do not explicitly require this. ¯ Any T := {(Ai, λi), i ≥ 1} ∈ Tn,[n]-labeled trees with edge lengths, can be Pk written as a matrix by putting T = i=1 λiei ⊗ ei, where ei := (ei1, . . . , ein) is a vector with eij = 1 if j ∈ Ai and eij = 0 otherwise, and ei ⊗ ei is the outer product of ei with itself. Any rooted tree T satisfies the three-point inequality

Tij ≥ Tik ∧ Tjk (2.18)

for all i, j, k. The entry Tij in the matrix T can be interpreted as the distance from the root of T to the first point at which i and j appear on different branches of T . Any rooted [n]-tree is a similarity matrix. Similarity matrices enter our discussion in section 2.6.2 when we extend the CP(α, k) process to include a parameter Σ, a similarity matrix on the index set. When used to model certain phenomena in genetics, it may be natural that Σ repre- sents a phylogenetic tree or, more generally, a hierarchical clustering of the elements of the sample. In some settings, it may be more convenient or more natural to represent rela- tionships through a matrix of pairwise distances. An n × n symmetric matrix ∆ is a dissimilarity, or distance, matrix if

(1)∆ ii = 0 for all i,

(2)∆ ij ≥ 0 for all i, j, and p p p (3) ∆ij ≤ ∆ik + ∆jk for all i, j, k. To any similarity matrix S, there corresponds a dissimilarity matrix D with entries

Dij = Sii + Sjj − 2Sij. (2.19) 57

A particular class of dissimilarity matrices of interest is the space UT n of unrooted [n]-trees. An element ∆ ∈ UT n is a symmetric n × n matrix which satisfies (1), (2) and (3) above along with Buneman’s four-point condition

∆ij + ∆kl ≤ max{∆ik + ∆jl, ∆il + ∆jk}. (2.20)

For any S ∈ Tn, the dissimilarity matrix D obtained from S by (2.19) is an unrooted [n]-tree.

2.6.2 The extended model

Recall the α-permanent of a matrix X, perα X, from (2.11). The α-permanent also has the following convolution property which was shown by McCullagh and Møller Pk [78]. Let α1, . . . , αk ≥ 0 and α. := i=1 αi, then

X per X = per (X[w ]) ··· per (X[w ]), (2.21) α. α1 1 αk k (w1,...,wk) where the sum is over ordered collections (w1, . . . , wk) of disjoint subsets of [n] such Sk that i=1 wi = [n] and X[w] := (Xij : i ∈ w, j ∈ w) is the sub-matrix of X with rows and columns indexed by w ⊆ [n].

For our purposes, (2.21) can be used by putting α1 = ··· = αk = α > 0 so that

α. = kα and noting that for any B ∈ P[n]

Y perα(X · B) = perα X[b]. b∈B

In this case, (2.21) becomes

X ↓#B perkα X = k perα(X · B). (2.22) (k) B∈P [n]

Based on (2.22), we have that any matrix Σ for which the entries Σ1σ(1),..., Σn,σ(n) 58

(k) are all strictly positive for some σ ∈ Sn defines a probability distribution on P[n] by

↓#B µn(B; α, k, Σ) = k perα(Σ · B)/ perkα Σ (2.23) which can be seen as a generalization of (2.13). Moreover, the expression in (2.15) can be extended to include the matrix parameter Σ

0 ↓#B0 0 pn(B,B ; α, k, Σ) = k perα/k(Σ · B · B )/ perα(Σ · B) (2.24) which is reversible with respect to (2.23). 0 (k) To guarantee that there is some pair B,B ∈ P[n] for which the transition proba- bility (2.24) is positive, Σ need only satisfy positivity for a properly chosen choice of n of its entries. Thus, general choices of Σ can lead to restrictions to certain subspaces (k) within P[n] . Definition 2.6.1. For n ≥ 1, k ≥ 1, α > 0 and Σ an n × n matrix with entries which are non-negative everywhere and strictly positive on the diagonal, a Markov chain governed by the transition probabilities in (2.24) is a discrete-time cut-and- (k) paste process on P[n] with concentration parameter α and similarity matrix Σ, written CP(α, k; Σ).

Remark 2.6.2. If Σ ≡ 1 is the matrix of all ones, (2.24) coincides with (2.15) and we have the standard CP(α, k) process.

Note that the similarity of elements i and j, given by Σij only plays a role in 0 the probability of the transition B 7→ B if Bij = 1. In this case, a larger value of 0 0 Σij will tend to favor transitions to states B for which Bij = 1. If Bij = 0, then i and j are in distinct blocks of B and, according to the transition procedure for the cut-and-paste process, choose their blocks in B0 independently of Σ.

2.7 Properties of the CP(α, k; Σ) process

Though Σ is naturally disposed toward an interpretation as a similarity matrix, the + expression (2.24) permits any matrix with strictly positive diagonal entries. Let Mn 59

denote the space of non-negative n × n matrices which are strictly positive along the diagonal. The unrestricted parameter space of the CP(α, k; Σ) process is (α, k, Σ) ∈ + + R ×N×Mn ; however, for r1, . . . , rn > 0 and c1, . . . , cn > 0, let R = diag(r1, . . . , rn) and C = diag(c1, . . . , cn) be diagonal matrices with entries ri and ci, i = 1, . . . , n, + respectively. For any Σ ∈ Mn , the matrix RΣC is the image of Σ under multiplication of the ith row of Σ by ri and the jth column of Σ by cj. The α-permanent of RΣC satisfies n Y perα(RΣC) = perα Σ × rici i=1 and the Hadamard product (RΣC) · X = R(Σ · X)C implies that

0 0 0 perα/k((RΣC) · B · B ) perα/k(R(Σ · B · B )C) perα/k(Σ · B · B ) = = . perα((RΣC) · B) perα(R(Σ · B)C) perα(Σ · B)

Therefore, for fixed n ≥ 1, it is only necessary to consider matrices Σ in the space

DSn of doubly stochastic n × n matrices, i.e. matrices with non-negative entries for which all row and column totals are one, which are positive along the diagonal. + For any Σ ∈ Mn , the entry Σij represents a measure of similarity between elements i and j, and, in general, it is not possible to arbitrarily permute the labels of the index set without performing a corresponding permutation of the rows and columns of Σ. In this sense, a CP(α, k; Σ) process is not exchangeable for a fixed value of Σ and, in the absence of a natural projective system of arrays with rows and columns indexed in N, it is not possible to establish any self-consistency property for finite-dimensional distributions. We show in corollary 2.7.3 that for fixed values of Σ, the CP(α, k; Σ) process on (k) P[n] is exchangeable under certain strict conditions on Σ. However, the following equivariance property holds for any fixed Σ. For an n × n matrix X and permutation σ −1 σ ∈ Sn, we write X := σXσ to denote the image of X by permutation of rows and columns by σ.

0 (k) 0 Proposition 2.7.1. Let B,B ∈ P[n] so that the transition B 7→ B is governed by 60

+ (2.24) for some choice of α > 0, k ≥ 1 and Σ ∈ Mn . For any σ ∈ Sn, (2.24) satisfies

0 σ 0σ σ pn(B,B ; α, k, Σ) = pn(B ,B ; α, k, Σ ). (2.25)

σ σ σ Proof. For any matrices X,Y and σ ∈ Sn, perα X = perα(X ) and (X ) · (Y ) = (X · Y )σ. It is now clear that (2.25) holds for (2.24).

+ If Σ ∈ Mn is introduced as an exchangeable random matrix parameter, i.e. Σ ∼ ζ + −1 for some probability measure ζ on Mn such that Σ ∼ σΣσ for any permutation σ ∈ Sn, then a sequence of partitions which is distributed according to CP(α, k; Σ) conditionally on Σ, is unconditionally exchangeable, as the next proposition shows.

Proposition 2.7.2. Suppose Σ is a partially exchangeable matrix with distribution + σ ζ(·) on Mn , i.e. Σ ∼ Σ ∼ ζ for any σ ∈ Sn. Further suppose that conditional on Σ, the sequence B1,B2,... of partitions is distributed according to CP(α, k; Σ) (2.24) on (k) P[n] for some α > 0 and k ≥ 1. Then the unconditional distribution of (B1,B2,...) σ σ is exchangeable, i.e. (B1 ,B2 ,...) ∼ (B1,B2,...) for any σ ∈ Sn.

Proof. Let σ :[n] → [n] be any permutation of [n]. Let B := (B1,B2,...) be a sequence of partitions governed by the CP(α, k; Σ)-model with initial distribution ρn which is equivariant with respect to Σ. For m ≥ 1, write B[m] := (B1,...,Bm) to denote the restriction of the sequence to its first m elements. For fixed α > 0 and + Σ ∈ Mn , let

m−1 Y Pm(B; Σ) := P (B[m]; Σ) = ρn(B1) pn(Bj,Bj+1; Σ) j=1 where pn(·, ·; Σ) is the transition probability (2.24) with parameter (α, k, Σ). Here we treat α as a fixed parameter and therefore suppress it in the notation. Write R Pm(B) = + Pm(B; Σ)ζ(dΣ) to be the unconditional distribution of B[m]. Then Mn 61

we have

Z Pm(B) := Pm(B; Σ)ζ(dΣ) + Mn Z σ σ = Pm(B ;Σ )ζ(dΣ) (2.26) + Mn Z σ −1 = Pm(B ; Σ)ζσ (dΣ) (2.27) + Mn Z σ = Pm(B ; Σ)ζ(dΣ) (2.28) + Mn σ = Pm(B ). (2.29)

Here (2.26) follows by equivariance of (2.24) and (2.23) from proposition 2.7.1; (2.27) −1 follows by a change of variables Σσ 7→ Σσσ ; (2.28) follows by partial exchangeability of Σ so that ζ = ζσ−1; and (2.29) follows by definition. This holds for all m ≥ 1. In addition, the unconditional law satisfies

h (k) (k)i P (B1,...,Bm,Bm+1,...,Bm0) ∈ B1 × · · · × Bm × P[n] × · · · × P[n] = P [(B1,...,Bm) ∈ B1 × · · · × Bm] .

+ Corollary 2.7.3. Fix n, k ≥ 1, α > 0 and Σ ∈ Mn is such that Σii = a > 0 and Σij = b ≥ 0 for i 6= j. Then the CP(α, k; Σ) process is finitely exchangeable.

Σ as a partition

When Σ is restricted to the space of partitions, the expression of (2.24) is simplified and further results are possible than for general matrices Σ. In particular, infinite exchangeability can be established if Σ represents an infinitely exchangeable random partition, and self-consistency of (2.24) can be shown for all fixed values of Σ ∈ P. Remark 2.6.2 points out that Σ ≡ 1 coincides with the standard CP(α, k) process. Equation (2.24) also coincides with the CP(α, k; Σ) process when Σ is the trivial 62

partition of [n] with one block. In this case, we write Σ = 1n. Another special case arises when Σ = In, the n × n identity matrix. In this case, (2.24) becomes

0 ↓#B0 n pn(B,B ; α, k, Σ = In) = k /k ,

the coupon-collector distribution shown in (2.17). The identity matrix corresponds

to the partition of [n] into singletons, denoted 0n.

Theorem 2.7.4. Let Σ be a partition of N and write Σ[n] as the restriction of Σ to [n] regarded as a matrix. For k ≥ 1 and α > 0, the transition probabilities based on (α, Σ) 0 (k) ∗ −1 are consistent. In particular, for every n ≥ 1, B,B ∈ P[n] and B ∈ Dn,n+1(B) we have

0 X ∗ 00 pn(B,B ; α, k, Σ[n]) = pn+1(B ,B ; α, k, Σ[n + 1]). (2.30) 00 0 B ∈Dn,n+1(B )

0 (k) ∗ −1 Proof. Let B,B ∈ P[n] and B ∈ Dn,n+1(B). The partition Σ[n + 1] is obtained from Σ[n] by adding the element labeled n + 1 to one of the blocks of Σ[n] or as 00 −1 0 a member of its own block. Likewise, a partition B ∈ Dn,n+1(B ) is obtained by adding n + 1 to a block b ∈ B0 or to its own block.

Let b∗ denote the block of Σ[n] ∧ B = Σ[n] · B to which n + 1 is added to obtain Σ[n + 1] ∧ B∗. Then, conditional on Σ[n + 1] ∧ B∗, the probability that n + 1 is added 0 0 to a block b∗ of B is

 0 α/k+#(b∗∩b∗) 0 0  , b∗ ∈ B (n + 1 7→ b0 |Σ[n + 1] ∧ B∗, b ) = α+#b∗ P ∗ ∗ (k−#B0)α/k  , b0 = ∅. α+#b∗ ∗

Based on these conditional probabilities, it is easy to see that

0 0 X α/k + #(b∗ ∩ b ) (k − #B )α/k + = 1 α + #b∗ α + #b∗ b0∈B0 and the finite-dimensional transition probabilities of the (α, k; Σ)-model satisfy (2.30) for any partition Σ ∈ P and every n ≥ 1. 63

A consequence of proposition 2.7.2 and theorem 2.7.4 is that any mixture of CP(α, k; Σ), for Σ distributed as an infinitely exchangeable random partition of N, characterizes an infinitely exchangeable partition-valued process on P(k).

Theorem 2.7.5. Let Σ ∼ Π be an infinitely exchangeable random partition of N and, conditional on Σ, suppose B := (B1,B2,...) ∼ CP(α, k; Σ). Then the unconditional distribution of B is infinitely exchangeable.

2.8 Discussion

The cut-and-paste process shares many of the same properties as exchangeable EFC processes [19], with the exception that the paths of the CP(ν)-process are confined to P(k). While the EFC-process [19] has a natural interpretation as a model in certain physical sciences, the CP(ν)-process has no clear interpretation as a physical model. However, the (α, k) class of models could be useful as a statistical model for relationships among statistical units which are known to fall into one of k classes. In statistical work, it is important that any observation has positive probability under the specified model. The (α, k)-process assigns positive probability to all pos- (k) sible transitions and so any observed sequence of partitions in P[n] will have positive probability for any choice of α > 0. In addition, the model is exchangeable, con- sistent and reversible, particularly attractive mathematical properties which could have a natural interpretation in certain applications. The CP(α, k; Σ) process may be natural for modeling DNA sequences where Σ represents either a block structure or phylogenetic tree structure for the units under study. CHAPTER 3 ANCESTRAL BRANCHING AND TREE-VALUED PROCESSES

In this chapter, we study a Markov process on the space of fragmentation trees whose transition probabilities are a product of transition probabilities on the space of parti- tions. The result is a family of transition probabilities on the space of fragmentation trees for which we can characterize an infinitely exchangeable process under general conditions.

3.1 Introduction

Stochastic processes on the state space of fragmentation trees have appeared in the literature related to inference of unknown phylogenetic trees as well as Markov chain Monte Carlo (MCMC) methods on the space of trees. For large n, the size of the space of [n]-rooted trees is prohibitively large to perform an exhaustive search and some random methods and algorithms have been derived for this purpose. Felsenstein [55] catalogs various transition procedures on the space of trees which have arisen over the years. Subtree prune-and-regraft (SPR) is one procedure for generating a new phylogenetic tree T 0 from another T by first pruning a subtree from T and then regrafting that subtree to a randomly chosen branch in the unpruned part of T . Some processes based on SPR have been studied by Aldous [4] and Aldous and Pitman [6] who show an SPR generated tree-valued process related to the Galton-Watson process. Evans and Winter [52] study a tree-valued process based on SPR which is reversible with respect to Aldous’s continuum random tree (CRT) [2]. Diaconis and Holmes [44] studied a tree-valued Markov chain in relation to random matchings and MCMC.

64 65

We are interested in studying tree-valued processes in the context of modeling sequences of phylogenetic trees and partitions generated from genetic data. Some previous work which studies trees from this perspective appears in Aldous and Popovic [8], Aldous, Krikun and Popovic [5], and Donnelly and Kurtz [45, 46].

Let n ≥ 1 and let {pb(·, ·): b ⊆ [n]} be a collection of transition probabilities on Pb, set partitions of each b ⊆ [n], such that pb(π, 1b) < 1 for every π ∈ Pb, where

1b = {b} is the one block partition of b. For t ∈ Tn and b ⊆ [n], we write t|b for the reduced subtree of t restricted to b in the natural way, and Πt is the root partition of t; see section 1.10.3 for discussion of notation and terminology for fragmentation

trees. The ancestral branching kernel based on {pb, b ⊆ [n]} assigns probability

0 Y Qn(t, t ) = pb(Πt , Πt0 )/[1 − pb(Πt , 1b)] (3.1) |b |b |b b∈t0:#b≥2

0 0 to the transition t 7→ t for every pair t, t ∈ Tn. The transition probability in (3.1) is the product over all non-singleton ancestors of t0 of transition probabilities between 0 root partitions of reduced subtrees of t, t . In general, for b ⊂ N, we write Qb(·, ·) for the distribution on Tb, b-rooted fragmentation trees, corresponding to (3.1). A main result is given by theorem 3.1.1.

Theorem 3.1.1. Let {pb : b ⊂ N} be a finitely exchangeable and consistent family of transition probabilities on partitions of b for each b ⊂ N. Then {Qb : b ⊂ N} based on {pb : b ⊂ N} is finitely exchangeable, consistent and characterizes an infinitely S  exchangeable transition measure Q on T , σ n≥1 Tn , the space of fragmentations of N with sigma field generated by the finite sets Tn for each n ≥ 1. Furthermore, if pb is exchangeable for every b with #b = 2, and if pb(π, 1b) = ∗ ∗ pa(ϕ (π), 1a) for all a, b with #a = #b and any injection ϕ : Pb → Pa, then {Qb : b ⊂ N} infinitely exchangeable implies that {pb : b ⊂ N} is infinitely exchangeable.

We are able to show the inheritance of several properties from the associated transition probability on partitions. In section 3.5, we show specific connections for the ancestral branching process based on the transition probabilities of the cut-and- paste process with parameter ν. 66 3.2 Ancestral branching kernels

+ A Markov kernel on a measurable space (Ω, F) is a map P :Ω × F → R such that

• for each ω ∈ Ω, P (ω, ·) is a probability measure on (Ω, F), and

+ • for each B ∈ F, P (·,B) is measurable between F and B(R ), the Borel sigma + field of R .

Let A ⊂ N be a finite subset such that #A ≥ 2 and let {PS : S ⊆ A} be a collection of Markov kernels on PS, set partitions of S, such that PS(·, 1S) < 1 for all S ⊆ A. We define the Markov kernel QA(·, ·) on TA as in (3.1) by

p (Π , Π ) b t|b t0 0 Y |b QA(t, t ) = , (3.2) 1 − pb(Πt , 1b) b∈t0:#b≥2 |b the product of Markov kernels on the root partitions of the reduced subtrees of all 0 parents of t conditioned to be non-trivial, i.e. not the one block partition 1b. The form of (3.2) admits the recursive expression

0 pA(Πt, Πt0) Y 0 QA(t, t ) = Qb(t|b, t|b), (3.3) 1 − pA(Πt, 1A) b∈Πt0

which has an intuitive interpretation in terms of independent self-similar transitions on the space of reduced subtrees of the children of the root of t0. The reader familiar with the literature on fragmentation processes may draw parallels to a typical de- scription of a fragmentation process in terms of successive partitioning of fragments, see e.g. [25, 80]. The specification in (3.2) is related to this specification, but has the added feature of including a Markovian dependence on the previous state in a sequence of fragmentation trees.

Definition 3.2.1. A Markov kernel on TA of the form (3.2) is called an ancestral branching (AB) Markov kernel on TA.

Remark 3.2.2. The ancestral branching Markov kernel only requires associated P-

valued transition probabilities to be defined on PS\{1S} for each S ⊆ A. However, we 67 always assume that we have a family of transition probabilities which is well-defined on the whole space PS and satisfies pS(·, 1S) < 1. This distinction becomes convenient when we consider infinitely exchangeable processes of AB type later on.

3.3 Exchangeable ancestral branching Markov kernels

A collection of Markov kernels Q := {QA(·, ·): A ⊂ N} on (TA,A ⊂ N) is finitely exchangeable if for each n ≥ 1, A, B ⊂ N with #A = #B = n, and t ∈ TA

∗ ∗ QA(t, ·) = QB(ϕ (t), ϕ (·)) (3.4)

∗ for every bijection ϕ : A → B, where ϕ : TA → TB denotes the associated bijection TA → TB. If the kernels QA are finitely exchangeable, then there exists a canonical injection map σA : A → [n] such that QA(·, ·) = Qn(σA(·), σA(·)) =: QnσA(·, ·), the exchangeable transition probability function for n. We define the canonical injection A → [n] as follows. Suppose, without loss of generality, that A = {a1, . . . , an} with a1 < a2 < . . . < an. Then we define the canonical injection by σA : A → [n], ai 7→ i. For each A ⊆ N such that #A = n, QA(·, ·) = QnσA(·, ·) whenever {QA : A ⊂ N} is finitely exchangeable, as in (3.4). Therefore, for a finitely exchangeable family of Markovian transition probabilities, we need only specify a transition probability Qn(·, ·) on Tn for each n ≥ 1 and, in this case, we denote Qb ≡ Qnσb.

Theorem 3.3.1. Let n ≥ 1 and, for each A ⊂ N with #A < ∞, let QA(·, ·) be a ∗ ∗ branching Markov kernel on TA induced by the family {PS : S ⊆ A}, where PS := {pS(B, ·): B ∈ PS} is a Markov kernel on PS. Also, assume that pA(π, 1A) = ∗ ∗ pB(ψ (π), 1B) for every finite A, B ⊂ N with #A = #B and bijection ψ : PA → PB. Then the family {QA : A ⊂ N} is finitely exchangeable if and only if the restricted ∗ collection {PS : S ⊂ N} is finitely exchangeable.

Proof. Let A ⊂ N be a finite subset and P := {PS : S ⊆ A} be a family of Markov 68

kernels on {PS : S ⊆ A}. From (3.2), the AB Markov kernel on TA based on P is

pb(ΠT , ΠT 0 ) 0 Y |b |b QA(T,T ) = . 1 − pb(ΠT , 1b) b∈T 0:#b≥2 |b

For A, B ⊂ N with #A = #B and injection map ϕ : A → B with associated ∗ ∗ injection ϕ : TA → TB, we also write ϕ to denote the associated injection PA → PB, which should cause no confusion since it is clear from context to which we are referring. For n = 2 and A, B ⊂ N such that #A = #B = 2, let ψ : A → B be an injective ∗ map with associated injection ψ : PA → PB. In this case, #PA = #PB = 2 so that we can write A1,A2 as the elements of PA with #A1 = 1 and #A2 = 2. Likewise, we write B1 and B2 for one and two block (respectively) elements of PB. Hence, ∗ ∗ ψ (Ai) = Bi for i = 1, 2. It is assumed that pB(ψ (π), 1B) = pA(π, 1A) for every ∗ ∗ ∗ π ∈ PA. Hence, pB(ψ (π), 1B) = pB(ψ (π),B1) = pA(π, A1), 1 − pB(ψ (π),B1) = ∗ ∗−1 pB(ψ (π),B2) = pA(π, A2) = 1 − pA(π, A1), and pB = pAψ for #A = #B = 2. So {pA(·, ·):#A = 2} is exchangeable. Also, #TA = #TB = 1 implies for ∗ ∗ t ∈ TA, QA(t, t) = QB(ψ (t), ψ (t)) = 1 trivially. And so, {QA(·, ·):#A = 2} is exchangeable. Now, fix n > 2 and suppose that for any pair A, B ⊂ N with #A = #B ≤ n ∗−1 ∗−1 and any injective map ϕ : A → B that QB = QAϕ implies pB = pAϕ on ∗ ∗ ∗ ∗ ∗ ∗ ∗ PB. Consider A ,B ⊂ N with #A = #B = n + 1 and A ⊃ A, B ⊃ B and let ψ : A∗ → B∗ be an injective map A∗ → B∗ whose restriction to A → B corresponds ∗ to ϕ. Write ψ : TA∗ → TB∗ for its associated injection TA∗ → TB∗. 69

∗ 0 Assume that QA∗ = QB∗ψ and let t, t ∈ TA∗. Then

0 pA∗ (Πt, Πt0 ) Y 0 QA∗ (t, t ) = Qb(t|b, t|b) (3.5) 1 − pA∗ (Πt, 1A∗ ) b∈Πt0 :#b≥2

pB∗ (Πψ∗(t), Πψ∗(t0)) Y ∗ ∗ 0 = Qb(ψ (t)|b, ψ (t )|b) (3.6) 1 − pB∗ (Πψ∗(t), 1B∗ ) b∈Πψ∗(t0):#b≥2

pB∗ (Πψ∗(t), Πψ∗(t0)) Y ∗ ∗ 0 = Qb(ψ (t|ψ−1(b)), ψ (t|ψ−1(b))) (3.7) 1 − pB∗ (Πψ∗(t), 1B∗ ) b∈Πψ∗(t0):#b≥2

pB∗ (Πψ∗(t), Πψ∗(t0)) Y 0 = Qb(t|b, t|b) (3.8) ∗ ∗ ∗ 1 − pB (Πψ (t), 1B ) 0 b∈Πt:#b≥2 which implies that

0 ∗ ∗ 0 pA∗(π, π ) pB∗(ψ (π), ψ (π )) = ∗ 1 − pA∗(π, 1A∗) 1 − pB∗(ψ (π), 1B∗)

∗ 0 ∗ for all one-to-one functions ψ : PA∗ → PB∗ and all π, π ∈ PA∗\{1A∗} =: PA∗; hence,

0 ∗ ∗ 0 pA∗(π, π ) = pB∗(ψ (π), ψ (π ))

∗ ∗ ∗ ∗ by assumption that pB∗(ψ (·), 1B∗) = pA∗(·, 1A∗) for all A ,B such that #A = #B∗ and all injective map ψ : A∗ → B∗.

This establishes finite exchangeability for {pA(·, ·):#A ≤ n + 1}. Induction implies this holds for all n ≥ 1 and hence also implies finite exchangeability of

{pA(·, ·): A ⊂ N, #A < ∞}. ∗ ∗ If {PS : S ⊂ N} is finitely exchangeable, then pA(π, 1A) = pB(ψ (π), 1B) for any ∗ A, B with #A = #B and any injection ψ : PA → PB. The reverse implication is immediate.

3.4 Consistent ancestral branching kernels

Let A ⊆ N and for ∅= 6 C ⊂ B ⊆ A, let DC,B : TB → TC be the restriction map TB → TC (section 1.4.2). A family of Markov kernels {QS : S ⊆ A} defined 70

on the projective system {TS : S ⊆ A} under restriction is consistent if for all ∅= 6 C ⊂ B ⊆ A and t ∈ TC,

−1 ∗ −1 QBDC,B(t, ·) := QB(t ,DC,B(·)) = QC(t, ·) (3.9)

∗ −1 for every choice of t ∈ DC,B(t). x Let S ⊂ N. Below we write S := S ∪ {x} for any x∈ / S, the set S augmented x by x. We write eSx = {{S}, {x}}, the two block partition of S having {x} as one of its blocks.

Theorem 3.4.1. Let Q := {QS : S ⊆ A} be a family of ancestral branching Markov kernels based on a collection P := {PS : S ⊆ A}. The family Q is consistent if pS(π, ·) is consistent for all S ⊆ A and π 6= 1S. Moreover, if, in addition, ∗ ∗ (A) pSx(π , eSx) + pSx(π , 1Sx) = pS(π, 1S) ∗ −1 for every S ⊆ A, π ∈ PS and π ∈ DS,Sx(π), then Q consistent implies pS is consistent for all S ⊆ A. ∗ Proof. Suppose Q is consistent and (A) holds for every S ⊆ A, π ∈ PS and π ∈ −1 DS,Sx(π). We show that P is consistent by induction. For S ⊂ A such that #S = 2, TS contains exactly one element, which we denote tS. QS(tS, tS) = 1 and, for any Sx ⊆ A,

X ∗ 00 X ∗ 00 X pSx(Πt∗, π) QSx(t , t ) = QSx(t , t ) = = 1 1 − pSx(Πt∗, 1Sx) 00 −1 t00∈T x π∈P x\{1 x} t ∈DS,Sx(tS) S S S

∗ for all t ∈ TSx since QSx is a transition probability. By assumption (A),

X x ∗ x x ∗ x ∗ x 1 − pS (Πt , 1S ) = pS (Πt , π) + [pS(ΠtS , 1S) − pS (Πt , 1S )] −1 π∈DS,Sx (Πt0 ) X x ∗ pS(ΠtS , ΠtS ) + pS(ΠtS , 1S) = pS (Πt , π) + pS(ΠtS , 1S) −1 π∈DS,Sx (Πt0 ) X x ∗ pS(ΠtS , ΠtS ) = pS (Πt , π), −1 π∈DS,Sx (ΠtS ) 71

x and pS is consistent with pSx for every S . Now, for each S ⊂ A with #S = m < #A, assume that pT (·, ·) is consistent x c 0 for all T ⊆ S. Let S = S ∪ {x} for some x ∈ A ∩ S . Assume t, t ∈ TS and let ∗ −1 x ∗ −1 t ∈ DS,Sx(t). For a partition π ∈ PS and b ∈ π, write b ∈ π ∈ DS,Sx(π) to denote the block of π∗ which contains x and write b∗ ∈ π to denote the block of π to which x is added to obtain bx ∈ π∗. Then

X ∗ 00 QSx (t , t ) = (3.10) 00 −1 0 t ∈DS,Sx (t ) ∗ X X pSx (Πt∗ , π ) Y h ∗ 0 i ∗ 00 = Qb(t|b, t|b) Qbx (t|bx , t|bx ) + 1 − pSx (Πt∗ , 1Sx ) ∗ −1 00 −1 0 b∈π∗:b6=bx π ∈DS,Sx (Πt0 ) t ∈Db∗,bx (t|b∗ )

pSx (Πt∗ , eSx ) 0 + QS(t, t ) (3.11) 1 − pSx (Πt∗ , 1Sx ) ∗ X pSx (Πt∗ , π ) Y ∗ 00 X ∗ 00 = Qb(t|b, t|b) Qbx (t|bx , t|bx ) 1 − pSx (Πt∗ , 1Sx ) ∗ −1 b∈π∗:b6=bx 00 −1 0 π ∈DS,Sx (Πt0 ) t ∈Db∗,bx (t|b∗ )

pSx (Πt∗ , eSx ) 0 + QS(t, t ) (3.12) 1 − pSx (Πt∗ , 1Sx ) ∗ X pSx (Πt∗ , π ) Y 0 pSx (Πt∗ , eSx ) 0 = Qb(t|b, t|b) + QS(t, t ) (3.13) 1 − pSx (Πt∗ , 1Sx ) 1 − pSx (Πt∗ , 1Sx ) ∗ −1 b∈π∗ π ∈DS,Sx (Πt0 )   ∗ 0 pSx (Πt∗ , eSx ) X pSx (Πt∗ , π ) 1 − pS(Πt, 1S) = QS(t, t )  + (3.14). 1 − pSx (Πt∗ , 1Sx ) 1 − pSx (Πt∗ , 1Sx ) pS(Πt, Πt0 )  ∗ −1 π ∈DS,Sx (Πt0 )

∗ 0 Here, (3.11) follows by noticing that the restriction t|b and t|b is unaffected unless x 00 −1 0 ∗ −1 b = b and that t ∈ DS,Sx(t ) can be broken down into a sum over π ∈ DS,Sx(Πt0) 0 and a sum over trees in the inverse image of the reduced subtree t|b∗. Line (3.12) follows by bringing factors that do not depend on bx outside of the sum. Line (3.13) follows by the induction hypothesis that Qb is consistent for all b ⊆ S. And line (3.14) follows by the recursive expression of (3.3). P ∗ 00 0 ∗ −1 Consistency requires that 00 −1 0 QSx(t , t ) = QS(t, t ) for all t ∈ DS,Sx(t) t ∈DS,Sx(t ) 72

and hence

∗ p x(Π ∗, e x) X p x(Π ∗, π ) 1 − p (Π , 1 ) S t S + S t S t S = 1, 1 − pSx(Π ∗, 1Sx) 1 − pSx(Π ∗, 1Sx) pS(Πt, Π 0) t ∗ −1 t t π ∈DS,Sx(Πt0)

which is equivalent to

1 − pS(Πt, 1S) X ∗ pSx(Πt∗, eSx) + pSx(Πt∗, π ) = 1 − pSx(Πt∗, 1Sx). pS(Πt, Π 0) t ∗ −1 π ∈DS,Sx(Πt0)

P ∗ Suppose that ∗ −1 pSx(Πt∗, π ) 6= pS(Πt, Πt0), then π ∈DS,Sx(Πt0)

1 − pS(Πt, 1S) X ∗ pSx(Πt∗, π ) 6= 1 − pS(Πt, 1S) pS(Πt, Π 0) t ∗ −1 π ∈DS,Sx and

∗ p x(Π ∗, e ) X p x(Π ∗, π ) 1 − p (Π , 1 ) S t n+1 + S t S t S 6= 1 1 − pSx(Π ∗, 1 ∗) 1 − pSx(Π ∗, 1Sx) pS(Πt, Π 0) t S ∗ −1 t t π ∈DS,Sx(Πt0)

by the assumption that pT (·, ·) is consistent for all T ⊆ S and assumption (A). Hence, we conclude that consistency of QS and QSx, along with assumption (A), ∗ implies that pS(π, ·) and pSx(π , ·) are consistent for all π ∈ PS with #π > 1 and all ∗ −1 π ∈ DS,Sx(π). Reversal of the above argument shows that consistency of pS(π, ·) for π with #π > 1 implies consistency of QS in (3.14).

Infinitely exchangeable kernels From section 1.4.2, the collection (Tn, n ≥ 1) is

a projective system under restriction and so a collection {QA : A ⊂ N} of Markov ker- nels on TA for each finite A characterizes an infinitely exchangeable kernel Q on T if it satisfies the necessary finite exchangeability and self-consistency conditions. Putting together theorems 3.3.1 and 3.4.1, we arrive at a condition for the infinite exchange- 73 ability of Q in terms of associated partition-valued Markov kernels. In particular, if

P := {pS : S ⊂ N} is finitely exchangeable and consistent, and pS(·, 1S) < 1 for every S, then the collection Q := {QS : S ⊂ N} based on P is infinitely exchangeable ∞ S  and there is a unique transition measure Q on T , σ n≥1 Tn such that for every 0 n ≥ 1 and t, t ∈ Tn,

∞ ∞ ∗ ∗ 0 0 Q (t , {t ∈ T : t|[n] = t }) = Qn(t, t )

∞ ∗ ∗ for all t ∈ {t ∈ T : t|[n] = t}. The coalescent process does not satisfy this condition because it becomes absorbed in the one-block state almost surely, but other known processes do, e.g. exchangeable fragmentation-coalescence (EFC) processes [19] and CP(ν) Markov processes [40]. We now turn our attention to the special properties of the ancestral branching kernel induced by the transition probabilities of the CP(ν)- process.

3.5 Cut-and-paste ancestral branching processes

For the rest of this chapter, the space T is assumed to be equipped with the sigma S  field σ n≥1 Tn as in the previous sections. ↓(k) ν For n ≥ 1, k ≥ 2 and ν a probability measure on R , let pn(·, ·) denote (k) ν ν the CP(ν) transition probability on P[n] in (2.2) and qn(·, ·) = 1 − pn(·, ·) its com- ν plementary probability. The family {pn(·, ·), n ≥ 1} is infinitely exchangeable on  S (k) ν (k) P, σ n≥1 P[n] and so defines a unique transition probability pA(·, ·) on PA for each A ⊂ N by ν ν pA(·, ·) := p#A(·, ·) for #A < ∞ and pν (·, ·) = pν (·, ·) otherwise. A N ν Furthermore, if ν is non-degenerate at (1, 0,..., 0), then pb (·, 1b) < 1 for all b ⊂ N with #b > 1; and so, (3.2) is well-defined and the results of section 3.4 hold. In particular, the T -valued process induced by the finite-dimensional transition probabilities (2.2) is infinitely exchangeable. 74 3.5.1 Construction of the cut-and-paste ancestral branching Markov chain on T

We introduce a genealogical indexing system to label the elements of tA ∈ TA (chapter 1.2.1 of Bertoin [25]). We write ∞ [ n U := N n=0 0 to denote the infinite set of all indices, with convention that N = {∅}. For a fragmentation tree T , the nth generation of T is the collection of children

t ∈ T such that # anc(t) = n − 1. For each u = (u1, . . . , un) ≡ u1u2 ··· un ∈ U, n is the generation of u. Let u− := (u1, . . . , un−1) denote the parent of u and let ui := (u, i) := (u1, . . . , un, i) denote the ith child of u. The ith child of t ∈ T is defined as the ith element to appear in a list when the elements of frag(t), the children of t, are listed in order of their least element. A Markov chain on T (k) governed by (3.2) can be constructed by a genealogical branching procedure as follows. Let k ≥ 2 and ν be a probability measure on R↓(k) which is non-degenerate at (1, 0,..., 0). For T,T 0 ∈ T (k), the transition T 7→ T 0 occurs as follows. Generate u (k) (k) {B : u ∈ U} i.i.d. %ν partition sequences, where %ν := %ν ⊗ · · · ⊗ %ν is the k-fold product measure of paintboxes based on ν, and {σu : u ∈ U} i.i.d. k-tuples of i.i.d. uniform permutations of [k].

Genealogical Branching Procedure

∅ ∅ (i) Put ΠT 0 = CP(ΠT ,B , σ ), the partition obtained from the column totals of ∅ σ∅ ΠT ∩ (B ) , as shown in section 2.2;

(ii) for Au ∈ T 0, put Auj equal to the jth block of CP(Π ,Bu, σu) listed in order T|Au of least elements.

In other words, each Bu is an independent k-tuple of independent paintboxes based on ν and we index this sequence just as we index the vertices of a tree using U. Likewise, u u u each σ is an independent k-tuple (σ1 , . . . , σk ) of i.i.d. uniform permutations of [k]. 75

The next state T 0 is obtained from T by a sequential branching procedure which starts from the root and progressively branches the roots of the subtrees restricted to each child of T 0. The children of T 0 are given by {Au, u ∈ U} and for each n ≥ 1 the 0 0 u restriction to [n] of T is T|[n] = {A ∩ [n], u ∈ U}, excluding the empty set. The genealogical branching procedure simultaneously generates sequences of trees on Tn for every n ≥ 1. It should be plain that this construction is equivalent to that in section 3.2 since it uses the matrix construction of the CP(ν) transition probabilities (k) on PA . The benefit to this construction is that it gives an explicit recipe which will be employed in the proofs of various properties of the CP(ν)-based ancestral branching process in later sections.

Proposition 3.5.1. Let T 7→ T 0 ∈ T (k) be a transition generated by the above genealogical branching procedure. For n ≥ 1, the finite-dimensional transition proba- 0 bility of the restricted transition T|[n] 7→ T|[n] is

ν pb (ΠT , ΠT 0 ) ν 0 Y |b |b Qn(T,T ) := ν . (3.15) q (ΠT , 1b) b∈T 0 b |b

ν ν Proof. Write pn(·, ·) ≡ pn(·, ·) and qn(·, ·) ≡ qn(·, ·). For n ≥ 1, the branching 0 u(M) of the root of T|[n] given T|[n] is given by the children of A|[n] where u(m) := u(m) (1,..., 1, 0,...) ∈ U and M is chosen to be the smallest m ≥ 0 such that A = 1n | {z } |[n] m times u(m+1) u(m) 0 and A|[n] 6= 1n. If M = 0, we put u(m) = ∅ and A|[n] is the root of T restricted 0 to [n]. The distribution of the branching of the root of T|[n] given T|[n] obtained in this way is

∞ pn(ΠT , ΠT 0 ) X i |[n] |[n] pn(ΠT , ΠT 0 )pn(ΠT , 1n) = . |[n] |[n] |[n] qn(Π , 1n) i=0 T|[n]

By independence of the steps of the procedure, we can write the distribution of 76

the transition T 7→ T 0 recursively, as in (3.3), by

0 pn(ΠT , ΠT 0) Y 0 Qn(T,T ) = Qb(T|b,T|b). qn(ΠT , 1n) b∈ΠT 0

Iterating the above argument yields (3.15).

3.5.2 Equilibrium measure

ν 0 The form of Qn(T,T ) in (3.15) is a product of independent transition probabilities of the branching at the root in each of the subtrees of T 0. It is known that for ν ↓(k) ν non-degenerate at (1, 0,..., 0) ∈ R , pn(·, ·) has a unique equilibrium distribution ν (k) for each n ≥ 1 [40]. Since pn(B,B) > 0 for every n ≥ 1 and B ∈ P[n] , we have that ν (k) ν Qn(t, t) > 0 for all t ∈ Tn and so each Qn(·, ·) is aperiodic for non-degenerate ν ∈ ↓(k) ν ν R . Irreducibility of Qn(·, ·) follows by the irreducibility of pn(·, ·) and assignment (k) of positive probability to B 7→ B for every B ∈ P[n] . The following proposition is immediate.

Proposition 3.5.2. Let ν be a probability measure on R↓(k) such that ν((1, 0,..., 0)) < ν 1 and let Qn(·, ·) be the CP(ν)-ancestral branching Markov kernel, then there exists a ν (k) ν unique measure ρn(·) on Tn which is stationary for Qn(·, ·) for each n ≥ 1.

ν ν The existence of ρn and the finite exchangeability and consistency of Qn for each ν n ≥ 1 induce finite exchangeability and consistency for the collection (ρn, n ≥ 1) of equilibrium measures. The proof of the following proposition is identical to the proof of theorem 2.2.6 with the obvious changes of notation.

Proposition 3.5.3. Let (Qn(·, ·), n ≥ 1) be an infinitely exchangeable collection of ancestral branching Markov kernels (3.2) on (Tn, n ≥ 1) and suppose for each n ≥ 1 ρn(·) is the unique stationary distribution for Qn(·, ·). Then the family (ρn(·), n ≥ 1) is infinitely exchangeable. Moreover, there exists a unique measure ρ on T such that

  ρ {T ∈ T : T|[n] = Tn} = ρn(Tn). 77

Remark 3.5.4. The above results for the equilibrium measure ρ apply specifically to the CP(ν) ancestral branching process under the condition that ν is non-degenerate at (1, 0,..., 0) ∈ R↓(k). The above proposition can be extended to general infinitely exchangeable ancestral branching Markov chains by modifying the condition on ν 6=

(1, 0,..., 0) to state that, for each n ≥ 1, pn is irreducible and pn(B,B) > 0 for every

B ∈ P[n].

Remark 3.5.5. There appears to be nothing particular to the ancestral branching, or tree-valued process in general, in our proof of theorem 2.2.6. In particular, for a general infinitely exchangeable collection of transition measures (Pn(·, ·), n ≥ 1) on a countably indexed projective system such that each Pn has unique stationary measure πn, the collection (πn, n ≥ 1) is infinitely exchangeable.

3.5.3 Continuous-time ancestral branching process

An infinitely exchangeable collection (Qn, n ≥ 1) of ancestral branching transition probabilities in (3.2) can be embedded in continuous time in a straightforward way by defining the Markovian infinitesimal jump rates rn(·, ·) on Tn

( 0 0 0 λQn(T,T ),T 6= T rn(T,T ) = (3.16) 0, otherwise, for some λ > 0. Since λ acts only as a scaling parameter for time, we can assume, without loss of generality, that λ = 1.

Definition 3.5.6. A process T := (T (t), t ≥ 0) is an ancestral branching Markov process if for each n ≥ 1, the restriction T|[n] := (T|[n](t), t ≥ 0) is a Markov process on Tn with infinitesimal transition rates rn(·, ·) in (3.16) for (Qn, n ≥ 1) as in (3.2).

A process on T whose finite-dimensional restrictions are governed by rn can be constructed by running a Markov chain on Tn governed by (3.2) in which only transi- tions T 7→ T 0 for T 6= T 0 are permitted, and adding a hold time which is exponentially distributed with mean 1/[1 − rn(T,T )]. The following proposition is a corollary of theorems 3.3.1 and 3.4.1. 78

Corollary 3.5.7. For an infinitely exchangeable family (Qn(·, ·), n ≥ 1) of ancestral branching kernels (3.2), the collection (Rn, n ≥ 1) of finite-dimensional Q-matrices 0 0 0 based on (Qn, n ≥ 1) with entries Rn(T,T ) = rn(T,T ) as in (3.16) for T 6= T and Rn(T,T ) = Qn(T,T ) − 1 are infinitely exchangeable. ν ν We write Rn to denote the Q-matrix based on the transition kernel Qn in (3.15). The existence of a continuous-time process with embedded jump chain governed by (3.15) is clear by corollary 3.5.7 and the discussion at the end of section 3.4.

Theorem 3.5.8. There exists a continuous-time Markov process (T (t), t ≥ 0) on  (k) S (k) ν T , σ n Tn governed by R such that

ν ∞ 00 (k) 00 0 ν 0 R (T , {T ∈ T : T|[n] = T }) = Rn(T,T ),

∞ ∗ (k) ∗ for every T ∈ {T ∈ T : T|[n] = T }. Proof. Corollary 3.5.7 establishes that the finite-dimensional infinitesimal jump rates ν (rn, n ≥ 1) are finitely exchangeable and consistent. Kolmogorov’s extension theorem ν ν implies the existence of R with finite-dimensional restrictions given by (Rn, n ≥ 1). ν ν Furthermore, for each n ≥ 1 and T ∈ Tn, −Rn(T,T ) = 1 − Qn(T,T ) < 1 < ∞ so that the finite-dimensional paths are c`adl`agfor each n, which implies the paths of (T (t), t ≥ 0) governed by Rν are c`adl`ag.

The transition rates above are defined in terms of a collection of infinitely ex- changeable transition probabilities (Qn(·, ·), n ≥ 1). If Qn has unique equilibrium measure ρ, then so does its associated continuous-time process. We have the follow- ing corollary for the stationary measure of the continuous-time process.

Corollary 3.5.9. Let (T (t), t ≥ 0) be a continuous-time process governed by an infinitely exchangeable collection (Qn, n ≥ 1) of ancestral branching transition proba- bilities (3.2). If the characteristic measure Q on T has unique equilibrium measure ρ as in proposition 3.5.3, then (T (t), t ≥ 0) has unique equilibrium measure ρ.

We now restrict our attention to the CP(ν) subfamily of ancestral branching processes on T (k) for fixed k ≥ 1. We index transition measures and stationary ν ν measures by ν, e.g. Qn, pn, etc., to make this explicit. 79 3.5.4 Poissonian construction

A consequence of the above continuous-time embedding and the alternative speci- fication of the cut-and-paste ancestral branching algorithm given in section 3.5.1 is another construction via a Poisson point process. u + Q hQk (k)i Let P = {(t, B : u ∈ U)} ⊂ R × u∈U j=1 P be a Poisson point process N (k) (k) with intensity measure dt ⊗ u∈U %ν , where %ν is the k-fold product measure Qk (k) u u u u Qk (k) %ν ⊗ · · · ⊗ %ν on j=1 P . For each (t, B ) ∈ P , B := (B1 ,...,Bk ) ∈ j=1 P (k) is distributed as %ν and is labeled according to the genealogical index system of section 3.5.1. Construct a continuous time CP(ν)-ancestral branching Markov process as follows. Let τ ∈ T (k) be an infinitely exchangeable random fragmentation tree. For each

n ≥ 1, put T|[n](0) = τ|[n] and for t > 0

• if t is not an atom time for P , then T|[n](t) = T|[n](t−);

• if t is an atom time for P so that (t, Bu : u ∈ U) ∈ P , generate σ := (σu : u ∈ Q hQk i U) ∈ u∈U j=1 Sk , an i.i.d. collection of k-tuples of uniform permutations of [k]. Put T := T (t−) and T 0 equal to the tree constructed from T , {Bu : u ∈ U} and σ through the function CP(·, ·, ·) which is described in section 2.2. If 0 0 T|[n] 6= T|[n], put T|[n](t) = T|[n]; otherwise, put T|[n](t) = T|[n](t−).

Proposition 3.5.10. The above process T is a Markov process on T (k) with transition matrix Rν defined by theorem 3.5.8.

Proof. By the above construction, for every n ≥ 1 and t > 0, T|[n](t) evolves according ν −1 to rn in (3.16), Dm,nT|[n](t) = T|[m](t) for all m ≤ n, and T|[p](t) ∈ Dn,p(T|[n](t)) ν for all p > n. Hence, the restriction T|[n] is a Rn-governed Markov process for each ν n ≥ 1 and the result is clear by consistency of Rn (corollary 3.5.7).

3.5.5 Feller process

In section 2.3.1, we showed that the cut-and-paste process with finite-dimensional Markovian jump rates corresponding to the transition probabilities in (2.1) is a Feller 80

process. We now show that the ancestral branching Markov process on T (k) which is associated to the CP(ν) Markov process is also Fellerian. + Define the metric d : T × T → R by

0 0 d(T,T ) := 1/ max{n ∈ N : T|[n] = T|[n]}, (3.17) for every T,T 0 ∈ T , with the convention that 1/∞ = 0.

Proposition 3.5.11. d is an ultrametric on T . That is, for any T,T 0,T 00 ∈ T ,

d(T,T 0) ≤ max(d(T,T 00), d(T 0,T 00)).

Proof. Positivity and symmetry are obvious. To see that the ultrametric inequality holds, let T,T 0,T 00 ∈ T so that d(T,T 0) = 1/a for some a ≥ 1. Now suppose that d(T,T 00) = 1/b ≥ 1/a. Then the ultrametric property is trivially satisfied. If 00 00 0 0 d(T,T ) = 1/b < 1/a then T|[b] = T|[b] for b > a and T|[a] = T|[a] but T|[a+1] 6= T|[a+1] by assumption. Hence, d(T 0,T 00) = 1/a and the ultrametric property holds.

Proposition 3.5.12. (T , d) is a compact space.

Proof. Let (T 1,T 2,...) be a sequence in T . Any element T ∈ T can be written

as a compatible sequence of finite-dimensional restrictions, T := (T|[1],T|[2],...) := (T1,T2,...). The set Tn is finite for each n, and so one can extract a convergent subsequence (T (1),T (2),...) of (T 1,T 2,...) by the diagonal procedure such that d(T (i),T (j)) ≤ 1/ min{i, j} for all i, j.

0 0 Lemma 3.5.13. Cf := {f : T → R : ∃n ∈ N s.t. d(T,T ) ≤ 1/n ⇒ f(T ) = f(T )} 0 is dense in the space of continuous functions T → R under the metric ρ(f, f ) := 0 supτ∈T |f(τ) − f (τ)|.

Proof. Let ϕ : T → R be a continuous function. Then for every  > 0 there exists 0 0 0 n() ∈ N such that τ, τ ∈ T satisfying d(τ, τ ) ≤ 1/n() implies |ϕ(τ) − ϕ(τ )| ≤ . For fixed  > 0, let N = n() and define f : T → R as follows. First, partition T

into equivalence classes {τ ∈ T : τ|[N] = t|[N]} for each t ∈ T . For each equivalence 81 class U, choose a representative elementu ˜ ∈ U and put f(u) := ϕ(˜u) for all u ∈ U so that f ∈ Cf . For any t ∈ T , let t˜ denote the representative of t obtained in this way. Hence, f(t) = f(t0) = f(t˜) for all t, t0 such that d(t, t0) ≤ 1/N. Thus,

|f(τ) − ϕ(τ)| = |ϕ(˜τ) − ϕ(τ)| ≤  for all τ ∈ T by continuity of ϕ and

ρ(f, ϕ) = supτ |f(τ) − ϕ(τ)| ≤ , which establishes density of Cf .

Let Pt be the semi-group of a CP(ν)-ancestral branching process T (·), i.e. for any (k) continuous ϕ : T → R Ptϕ(τ) := Eτ ϕ(T (t)), the expectation of ϕ(T (t)) given T (0) = τ.

Corollary 3.5.14. A CP(ν)-ancestral branching Markov process has the Feller prop- erty, i.e.

(k) • for every continuous function ϕ : T → R and τ ∈ T , one has

lim Ptϕ(τ) = ϕ(τ), and t↓0

• for all t > 0, τ 7→ Ptϕ(τ) is continuous.

Proof. The proof follows the same line of reasoning as corollary 2.3.6. Let ϕ be a (k) continuous function T → R. For g ∈ Cf , limt↓0 Ptg(τ) = g(τ) is clear since the first jump-time of T (·) is exponential with finite mean. Denseness of Cf establishes the first point. For the second point, let n ≥ 1 and τ, τ 0 ∈ T (k) such that d(τ, τ 0) ≤ 1/n, i.e. 0 τ|[n] = τ|[n]. Use the same Poisson point process P , as in section 3.5.4, to construct 0 0 0 0 T (·) and T (·) such that T (0) = τ and T (0) = τ . By construction, T|[n] = T|[n] 82

0 and d(T (t),T (t)) ≤ 1/n for all t ≥ 0. Hence, for any continuous ϕ, τ 7→ Ptϕ(τ) is continuous.

By corollary 3.5.14, we can characterize the CP(ν)-ancestral branching Markov ν process (T (t), t ≥ 0) with finite-dimensional rates (rn(·, ·), n ≥ 1) by its infinitesimal generator G given by

Z G(f)(τ) = (f(τ 0) − f(τ))Rν(τ, dτ 0) T (k)

ν for every f ∈ Cf , where R is from theorem 3.5.8.

3.6 Mass fragmentations

+ A mass fragmentation of x ∈ R is a collection Mx of masses such that

(i) x ∈ Mx and

Pk (ii) there are m1, . . . , mk ∈ Mx such that i=1 mi ≤ x and

Mx = {x} ∪ Mm1 ∪ · · · ∪ Mmk .

We write Mx to denote the space of mass fragmentations of x. Essentially, a mass fragmentation of x is a fragmentation tree whose vertices are labeled by masses such that the children of a vertex comprise a ranked-mass partition of its parent vertex. Pk The case where children {m1, . . . , mk} of a vertex m satisfy i=1 mi < m is called a dissipative mass fragmentation. We are interested in conservative mass fragmentations

which have the property that the children {m1, . . . , mk} of every vertex m ∈ Mx Pk satisfy i=1 mi = m. It is plain that Mx is isomorphic to M1 by scaling, i.e. Mx = xM1 and so it is sufficient to study M1. See Bertoin [25] for a study of Markov processes on M1 called fragmentation chains. We construct a Markov process on M1 which corresponds to the associated mass fragmentation valued process of the CP(ν)-ancestral branching Markov process on T (k). 83

Recall from definition 1.8.2 in section 1.8.1 that a partition B = {B1,B2,...} ∈ P is said to possess asymptotic frequency ||B|| if each of its blocks has asymptotic ↓ ↓ frequency and we write ||B|| := (||B1||,...) ∈ R , the decreasing rearrangement of block frequencies of B. According to Kingman’s correspondence (theorems 1.8.1 and 1.8.3) any infinitely exchangeable partition B of N possesses asymptotic frequencies which are distributed according to ν; where ν is the unique measure on R↓ such that

B ∼ %ν.

3.6.1 Associated mass fragmentation process

↓(k) (k) Fix k ≥ 2 and let ν be a probability measure on R . Let M1 := {µ ∈ M1 : # frag(A) ≤ k for every A ∈ µ} be the subspace of conservative mass fragmentations (k) of 1 such that each A ∈ µ ∈ M1 has at most k children. (k) (k) Construct a Markov chain on M1 as follows. For µ ∈ M1 , the transition (k) u (k) µ 7→ µ˜ ∈ M1 is generated by an i.i.d. collection S := {s : u ∈ U} of ν mass u u u Qk ↓(k) partitions, i.e. each s := (s1, . . . , sk) ∈ i=1 R is an i.i.d. collection of mass partitions distributed according to ν and sw is independent of sv for all w 6= v, and an i.i.d. collection Σ := {σu : u ∈ U} of k-tuples of i.i.d. uniform permutations of [k]. (i) Write µ := {µu : u ∈ U} andµ ˜ := {µ˜u : u ∈ U}.

(ii) Putµ ˜∅ = 1, the root ofµ ˜.

(iii) Givenµ ˜u ∈ µ˜, putµ ˜uj equal to the jth largest column total of the matrix

u u u s1. s2. . . . sk. u 1 u 1 u u 1 u u 1 u  µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 1,σ1 (1) 1,σ1 (2) 1,σ1 (k) u 2 u 2 u u 2 u u 2 u  µ˜ µ  µ˜ µ s u µ˜ µ s u ... µ˜ µ s u   2,σ2 (1) 2,σ2 (2) 2,σ2 (k)    .  . . .. .  .  . . . .  u k u k u u k u u k u  µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u k,σk (1) k,σk (2) k,σk (k)

 ↓ uj Pk u i u 1 k i.e.µ ˜ := i=1 µ˜ µ si,σu(m), m = 1, . . . , k , where µ , . . . , µ correspond i j to the mass fragmentation of the root of µ. 84

Definition 3.6.1. For a fragmentation tree T ∈ T , we write M(T ) to denote the associated mass fragmentation of T , i.e. the mass fragmentation of 1 obtained by replacing each child of T by its asymptotic frequency, if it exists. If M(T ) does not exist we put M(T ) = ∂, an extra point added to M1.

The map M : T → M1 ∪ {∂} such that T 7→ M(T ) is a natural measurable mapping. For the remainder of this chapter, we equip M1 with the sigma field σ(M) generated S (k) by {m ∈ M1} ∪ {∂}, which makes the map M measurable between σ n Tn and σ(M1).

Theorem 3.6.2. Let T := (Tn, n ≥ 0) be a CP(ν)-ancestral branching Markov chain with transition measure Qν(·, ·) on T (k), and with initial distribution τ, which is some infinitely exchangeable measure on T . Let µ := (µn, n ≥ 1) be the Markov chain on (k) M1 generated from the above procedure. Then M(T) =L µ. ν Moreover, the transition measure λ (·, ·) for µ ∈ M1 and C ∈ σ(M) is

ν ν −1 λ (µ, C) = Q (Tµ, M (C))

−1 (k) where Tµ is any element of M (µ) := {T ∈ T : M(T ) = µ}.

Proof. Fix k ≥ 2 and ν a probability measure on R↓(k) such that ν((1, 0,..., 0)) < 1. ν For T ∼ Q (·, ·) with initial distribution T0 ∼ τ, the set of children {t1, . . . , tm} of t forms an exchangeable partition of {t} ⊂ N, for every n ≥ 1 and t ∈ Tn, and so possesses asymptotic frequency ||t|| almost surely by Kingman’s correspondence. ν The construction of the Markov chain T with transition measure Q (·, ·) con- structed in section 3.5.1 can also be specified as follows. Let S := {su : u ∈ U} be the collection of mass partitions in the construction at the beginning of this section. Given u Q hQk (k)i u u u S, generate B := {B : u ∈ U} ∈ u∈U i=1 P by letting B := (B1 ,...,Bk ) and Bu ∼ % u independently of all other Bv. Constructed in this way, {Bu : u ∈ U} j sj i (k) is a collection of independent %ν partitions whose asymptotic frequencies satisfy u u u ||Bj || = sj almost surely. Furthermore, the unconditional distribution of each B is (k) %ν . 85

Next, we let Σ := {σu : u ∈ U} be a collection of i.i.d. k-tuples of i.i.d. uniform permutations of [k] and generate transitions of T from the construction of section 3.5.1 based on Σ and {Bu : u ∈ U}. Using the same Σ and S, generate a Markov chain ν µ on M1 as above. Then T is a Markov chain with transition measure Q (·, ·) on  (k) S (k) T , σ n≥1 Tn and, furthermore, the associated mass fragmentation chain M(T) := (M(Tn), n ≥ 1) is equal to µ almost surely. By the construction of transitions on M1 at the beginning of this section, it is clear that µ is a Markov chain. Hence, the function M(T) is a Markov chain and so the result of Burke and Rosenblatt [36] states that it is necessary that the transition measure of M(T) satisfies Z ν −1 Q M (m, C) = Q(Tm, dt) M−1(C)

−1 for all Tm ∈ M (m) := {T ∈ T : M(T ) = m} and C ∈ σ(M). Finally, since M(T) = µ almost surely and M is measurable, the transition measure ν ν ν −1 λ of µ on M1 satisfies λ = Q M .

Corollary 3.6.3. The associated mass fragmentation process M(T) of a CP(ν) an- cestral branching process on T (k) exists almost surely.

3.6.2 Equilibrium measure

As in section 3.5.2, suppose ν is non-degenerate at (1, 0,..., 0) ∈ R↓(k). Proposition ν 3.5.3 states that a Markov chain T := (Tn, n ≥ 1) governed by Q (·, ·) possesses a unique equilibrium measure ρν(·). The following theorem follows immediately from this fact and from theorem 3.6.2.

Theorem 3.6.4. Let ν be a probability measure on R↓(k) such that ν((1, 0,..., 0)) < ν −1 1. The mass fragmentation chain µ := (µn, n ≥ 1) on M1 governed by Q M (·, ·) ν (k) possesses a unique stationary measure ζ (·). Moreover, for µ ∈ M1 ,

ν ν −1 ζ (µ) = ρ (M (µ)) 86

where ρν(·) is the unique equilibrium measure of Qν(·, ·) on T (k) from proposition 3.5.3.

ν Proof. Let µ be a Markov chain on M1 with transition measure λ (·, ·) governed by the transition procedure at the beginning of section 3.6. By theorem 3.6.2, λν ≡ ν −1 ν Q M where Q (·, ·) is the transition measure of the CP(ν)-ancestral branching Markov chain on T (k) with unique equilibrium measure ρν(·) from proposition 3.5.3. It is shown in theorem 3.6.2 that µ is equal in distribution to the associated mass fragmentation chain of a Markov chain on T (k) governed by Qν(·, ·). Hence, for S (k) B ∈ σ n Tn Z ρν(B) = Qν(τ, B)ρν(dτ) T (k) and for C ∈ σ(M)

ν −1 ν −1 ρ M (C) = ρ [M (C)] Z ν −1 ν = Q (τ, M (C))ρ (dτ) T (k) Z ν −1 ν −1 = Q M (µ, C)ρ M (dµ) M1 Z ν ν −1 = λ (µ, C)ρ M (dµ), M1

ν ν −1 ν which shows that ζ := ρ M is stationary for λ. Uniqueness of ρ implies unique- ness of ζν.

3.6.3 Poissonian construction

Just as the CP(ν)-ancestral branching process on T (k) admits a Poissonian con- struction of the T (k)-valued process in continuous time (section 3.5.4), so does its associated mass fragmentation-valued process. ↓(k) u + Let ν be a probability measure on R . Let S = {(t, s ): u ∈ U} ⊂ R × Q hQk ↓(k)i N (k) u∈U i=1 R be a Poisson point process with intensity dt ⊗ u∈U ν where (k) Qk ↓(k) u u u ν := ν⊗· · ·⊗ν is the k-fold product measure on i=1 R and s := (s1, . . . , sk) ∈ Qk ↓(k) i=1 R for each u ∈ U. 87

Construct a Markov process µ := (µ(t), t ≥ 0) in continuous-time on M1 as follows. Let µ0 ∈ M1 be a random mass fragmentation. Put µ(0) = µ0 and

• if t is not an atom time for S, µ(t) = µ(t−);

u v w • if t is an atom time for S, generate Σt := {σ : u ∈ U} where σ and σ u u u are independent for all v 6= w and σ := (σ1 , . . . , σk ) is an i.i.d. sequence of uniform permutations of [k] for each u ∈ U. Given (t, su) ∈ S, σu and µ(t−) = {µu : u ∈ U}, put µ(t) = {µ˜u : u ∈ U} where

1)˜µ∅ = 1 and

2) givenµ ˜u, putµ ˜uj equal to the jth largest column total of the matrix

u u ri s1. s2. . . . sk. u 1 u 1 u u 1 u u 1 u  µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 1,σ1 (1) 1,σ1 (2) 1,σ1 (k) u 2 u 2 u u 2 u u 2 u  µ˜ µ  µ˜ µ s u µ˜ µ s u ... µ˜ µ s u   2,σ2 (1) 2,σ2 (2) 2,σ2 (k)    .  . . .. .  .  . . . .  u k u k u u k u u k u  µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u k,σk (1) k,σk (2) k,σk (k)

 ↓ uj Pk u i u i.e.µ ˜ := i=1 µ˜ µ si,σu(m), m = 1, . . . , k . i j

Theorem 3.6.5. Let T := (T (t), t ≥ 0) be a CP(ν)-ancestral branching Markov process from section 3.5.3 and let X := (X(t), t ≥ 0) be the Markov process on M1 generated from the above Poisson point process, then M(T ) =L X.

↓(k) Proof. Let k ∈ N and ν be a measure on R . u + Q hQk ↓(k)i Let S = {(t, s ): u ∈ U} ⊂ R × u∈U i=1 R be a Poisson point N (k) process with intensity dt ⊗ u∈U ν as shown above and let X := (X(t), t ≥ 0) u be the process on M1 constructed above. Given S, generate P := {(t, B ): u ∈ + Q hQk (k)i u u U} ⊂ R × u∈U i=1 P where for each (t, s : u ∈ U) ∈ S we let B := k (Bu,...,Bu) ∈ Q P(k) be a k-tuple of partitions such that Bu ∼ % u for each 1 k i=1 i si i = 1, . . . , k and all components are independent. Thus, we have that P is a Poisson 88

+ Q hQk (k)i N (k) point process on R × u∈U i=1 P with intensity measure dt ⊗ u∈U %ν . Given P and S, generate Σ := {σu : u ∈ U} independently of P and S such that σv w u u u and σ are independent for all v 6= w and each σ = (σ1 , . . . , σk ) is an i.i.d. collection of uniform permutations of [k]. (k) Let T := (T (t), t ≥ 0) be the process on T constructed from Σ and P , as shown in section 3.5.4, so that T is a CP(ν)-ancestral branching Markov process. Likewise, let X := (X(t), t ≥ 0) be the process on M1 constructed from Σ and S shown above. Now for any atom time t ≥ 0, let T (t−) = τ and T (t) =τ ˜. From section 3.5.1, let u ∈ U andτ ˜uj = CP(Π ,Bu, σu). Since τ andτ ˜ are infinitely exchangeable τ|τu u and #τ = ∞ almost surely, we have that Π = (Π ) u almost surely and hence τ|τu τ |τ (Πτ )|τu 6= 1τu almost surely. Therefore, we have

 k  uj u \ [ i u τ˜ =τ ˜  (τ ∩ B u ) i,σi (j) i=1

for each u ∈ U and j = 1, . . . , k which has asymptotic frequency

k k u X i u u X i u ||τ˜ || ||τ ||||B u || =µ ˜ µ s u a.s. i,σi (j) i,σi (j) i=1 i=1

Hence we have that µ = M(T) almost surely; and so, µ =L M(T).

Corollary 3.6.6. The process M(T) := (M(T (t)), t ≥ 0) exists almost surely.

3.7 Weighted trees

+U A weighted tree is a fragmentation tree with edge lengths. We write T¯ := T × R ¯ ¯ to denote the space of weighted trees; i.e. each T ∈ T is a pair (T, {tb : b ∈ T }) consisting of a fragmentation tree and a set of edge lengths corresponding to each edge of the tree with the convention that tb ≡ 0 if b∈ / T . ¯ ¯∗ ∗ ∗ ∗ ∗ Consider T = (T, {tb : b ∈ T }) and T = (T , {tb : b ∈ T }) such that T ∈ −1 ∗ Dn,n+1(T ). Then T has a vertex A ∪ {n + 1} with children {n + 1} and A ∈ T . 89

This is the branch of T on which the leaf {n + 1} is attached. Denote this vertex ∗ ∗ ∗ ∗ ∗ ∗ by A ∈ T and require that tb = tb for b∈ / {A ,A} and tA∗ + tA = tA. We ¯ ¯ ¯ ¯ ∗ ∗ define Dn,n+1 : Tn+1 → Tn to be the projection mapping T := (T , {tb : b ∈ ∗ ∗ ∗ ∗ ∗ T }) 7→ (T|[n], {tb := tb∪{n+1} + tb, b ∈ T|[n]}. In general, for 1 ≤ m ≤ n, we define D¯m,n := D¯m,m+1 ◦ · · · ◦ D¯n−1,n in the usual way by composition. We denote by ¯ −1 ¯ ¯∗ Dn,n+1(T ) the set of T satisfying these conditions. A permutation σ ∈ Sn maps ¯ σ T := (T, {tb : b ∈ T }) 7→ (T , {tσ(b) : b ∈ T }). For each n ≥ 1, let σ(T¯n) denote the sigma field which makes the maps Dm,n and ¯ S ¯  σ ∈ Sn measurable. The measurable space T , σ n≥1 σ(Tn) is the projective limit of the finite-dimensional measurable spaces (T¯n, σ(T¯n)) under the projection maps (D¯m,n, m ≤ n), permutation maps (σ ∈ Sn, n ≥ 1) and their composite mappings. For notational convenience, we write T¯n or T¯ to mean the measurable spaces equipped with their appropriate sigma field.

Remark 3.7.1. We prefer the term weighted tree to the alternative fragmentation process which is generally thought of as a non-increasing sequence of random parti- + tions of N, B := (B(t), t ≥ 0), indexed by t ∈ R , i.e. B(t) ≤ B(s) for all t ≥ s. By referring to these objects as weighted trees, we hope to emphasize T¯ ∈ T¯ as an object, rather than a process. In this way, our construction of a Markov process on T¯ (k) is naturally interpreted as a on this space of objects with only one temporal component.

In section 3.5 we introduce a family of finite-dimensional transition probabilities ν (k) ↓(k) Qn(T, ·) for each k ≥ 2, T ∈ Tn and ν a probability measure on R . The results of section 3.5 establish the existence of a transition measure Qν(T, ·) on T (k) with infinitely exchangeable stationary measure ρν(·). ¯ (k) ¯ We now construct a transition procedure on T . Let T = (T, {tb : b ∈ T }) ∈ ¯ (k) ¯0 0 0 0 ¯ (k) Tn and generate T = (T , {tb : b ∈ T }) ∈ Tn as follows. 0 ν 1. Generate T from Qn(T, ·);

0 0 2. given T , generate each tb from an exponential distribution with rate parameter βqν(Π , 1 ) := β×(1−pν(Π , 1 )) (i.e. mean 1/βqν(Π , 1 )) independently b T|b b b T|b b b T|b b for each b ∈ T 0, for some β > 0. 90

Henceforth, we shall assume β = 1 without loss of generality.

Proposition 3.7.2. The above procedure yield finite-dimensional transition densities (k) on T¯n for each n ≥ 1 given by

−t0 qν(Π ,1 ) ¯ν ¯ ¯0 Y ν b b T|b b 0 Qn(T, T ) = pb (ΠT , ΠT 0 )e dtb. (3.18) |b |b b∈T 0

0 The purpose of choosing each waiting time tb to be an exponential random vari- able with parameter qν(Π , 1 ) is to ensure the consistency of the process under b T|b b restriction; see proposition 3.7.3. ¯00 ¯ν ¯∗ ¯0 ¯00 Consistency requires that for a tree T ∼ Qn+1(T , ·), the restriction T := T|[n] ¯ν ¯∗ is distributed as Qn(T|[n], ·).

↓(k) ¯∗ ¯ (k) Proposition 3.7.3. Let ν be a probability measure on R , n ≥ 1, T ∈ Tn+1 and ¯00 ¯ν ¯∗ ¯0 ¯00 ¯ν ¯∗ T ∼ Qn+1(T , ·). Then the restriction T := T|[n] is distributed as Qn(T|[n], ·).

¯∗ ∗ ∗ ∗ ¯ (k) ¯00 00 00 00 ¯ (k) Proof. Let T = (T , {tb : b ∈ T }) ∈ Tn+1 and T = (T , {tb : b ∈ T }) ∈ Tn+1. ¯ν ¯ (k) 00 ν ∗ By construction of Qn(·, ·) on Tn for each n ≥ 1, we have that T|[n] ∼ Qn(T|[n], ·) and the induced process on Boolean trees is consistent. 00 ¯00 Let t[ n + 1] denote the length of the edge labeled by [n + 1] in T , the root edge ¯00 ¯00 0 of T , and consider the length of the root edge of the restriction T|[n], denoted t[n]. Recall that we write en to denote the partition of [n] with two blocks, {1, . . . , n − 1} 0 00 0 00 00 and {n}. If ΠT 00 6= en+1, then t[n] = t[n+1]. Otherwise, t[n] = t[n+1] + t[n]. Hence, 0 0 0 t[n] ∼ τ + τ IA where τ and τ are, respectively, independent exponential random ν ν variables with parameters q (ΠT ∗, 1n+1) and qn(ΠT ∗ , 1n), and A := {Π 00 = n+1 |[n] T 00 en+1} is the event that the children of the root [n + 1] in T are [n] and {n + 1}, which is independent of τ and τ 0.

For notational convenience, we drop the dependence on ν and write qb(·, ·) ≡ ν ν qb (·, ·) for any b ⊂ N, likewise for pb (·, ·), where qn and pn are defined in section 3.5. 91

An exponential with rate parameter λ > 0 has moment generating 0 function Eλ(t) := λ/(λ − t). The moment generating function of tn is

t(τ+τ 0 ) Ee IA = tτ tτ 0 Ee Ee IA (3.19) qn+1(ΠT ∗ , 1n+1) h  tτ 0   tτ 0 c c i = E e IA |A P(A) + E e IA |A P(A ) (3.20) qn+1(ΠT ∗ , 1n+1) − t

" ∗ # q (Π ∗ , 1 ) p (Π ∗ , e ) qn(ΠT , 1n) p (Π ∗ , e ) = n+1 T n+1 n+1 T n+1 |[n] + 1 − n+1 T n+1 (3.21) q (Π ∗ , 1 ) − t q (Π ∗ , 1 ) q (Π ∗ , 1 ) − t q (Π ∗ , 1 ) n+1 T n+1 n+1 T n+1 n T|[n] n n+1 T n+1

" ∗ ∗ ∗ ∗ q (Π ∗ , 1 ) pn+1(ΠT , en+1)qn(ΠT , 1n) + qn+1(ΠT , 1n+1)(qn(ΠT , 1n) − t) = n+1 T n+1 |[n] |[n] − q (Π ∗ , 1 ) − t q (Π ∗ , 1 )(q (Π ∗ , 1 ) − t) n+1 T n+1 n+1 T n+1 n T|[n] n # pn+1(ΠT ∗ , en+1)(qn(ΠT ∗ , 1n) − t) − |[n] (3.22) q (Π ∗ , 1 )(q (Π ∗ , 1 ) − t) n+1 T n+1 n T|[n] n

" ∗ ∗ ∗ q (Π ∗ , 1 ) qn+1(ΠT , 1n+1)qn(ΠT , 1n) − tqn+1(ΠT , 1n+1) = n+1 T n+1 |[n] q (Π ∗ , 1 ) − t q (Π ∗ , 1 )(q (Π ∗ , 1 ) − t) n+1 T n+1 n+1 T n+1 n T|[n] n # tp (Π ∗ , e ) + n+1 T n+1 (3.23) q (Π ∗ , 1 )(q (Π ∗ , 1 ) − t) n+1 T n+1 n T|[n] n

" ∗ ∗ ∗ # q (Π ∗ , 1 ) qn+1(ΠT , 1n+1)qn(ΠT , 1n) − tqn(ΠT , 1n) = n+1 T n+1 |[n] |[n] (3.24) q (Π ∗ , 1 ) − t q (Π ∗ , 1 )(q (Π ∗ , 1 ) − t) n+1 T n+1 n+1 T n+1 n T|[n] n qn(ΠT ∗ , 1n) = |[n] (3.25) q (Π ∗ , 1 ) − t n T|[n] n the moment generating function of τ 0. Line (3.19) follows by independence of τ, τ 0 and A; (3.20) uses the tower property of conditional expections; (3.21) substitutes explicit expressions for the expression in (3.20); (3.23) is obtained from (3.22) by canceling terms in the numerator; (3.24) follows (3.23) by fact that qn(ΠT ∗ , 1n) = qn+1(ΠT ∗, 1n+1) − pn+1(ΠT ∗, en+1) by |[n] consistency of (3.15); finally, (3.25) is obtained by simplifying the expression (3.24). So we have that the length of the root edge of the tree T¯00 restricted to [n] is 0 distributed as the root edge of a fragmentation tree T¯ ∈ Tn conditional on current ¯ ¯00 0 state T := T|[n]. It remains to show that the all edges of the restricted tree T are distributed as exponential with appropriate mean. By the branching nature (section 3.5.1), the subtrees restricted to the children of the root evolve independently of each 92

other, and of the length of the root edge. Let ΠT 00 := {B1,...,Bm} be the root 00 partition of T ∈ Tn+1. Then ΠT 00 coincides with ΠT 00 for all but one child, i.e. the |[n] 00 ∗ ∗ ¯ ¯∗ ¯00 child of T which contains n+1 which we denote B . For Bi 6= B , QB (T , T ) = i |Bi |Bi ¯ ¯∗ ¯00 ∗ ¯00 ∗ QB (T , T ). For B , we have T ∗ behaves as a weighted tree on #B i |Bi∩[n] |Bi∩[n] |B elements. By independence, we know the root edge of this subtree restricted to elements of [n] is distributed as exponential with mean 1/qB∗∩[n](ΠT ∗ , ·). |B∗∩[n] Since n is finite, we can apply this procedure iteratively in a straightforward way ¯00 ¯ ¯∗ to obtain that T|[n] ∼ Qn(T|[n], ·).

Finite exchangeability is immediate by inspecting the form of (3.18). The ex- istence of a transition density on T¯ (k) is once again immediate by Kolmogorov’s theorem.

Theorem 3.7.4. There exists a transition density Q¯ν(·, ·) on T¯(k) whose finite- dimensional restrictions are given by (3.18).

3.8 Discussion

In this chapter, we introduced a family of infinitely exchangeable transition probabil- ities on the space of fragmentation trees. To my knowledge, Markov kernels on these spaces are mostly studied for the purpose of searching the space of trees, and much of this work has been related to phylogenetic inference. I have not come across previous work which establishes infinite exchangeability for such processes. The feasibility of implementing the Markov chains studied here is not immediately clear. However, there seems to be opportunity to further develop the section on weighted trees, which is a topic for future research. CHAPTER 4 INFINITE RANDOM HYPERGRAPHS

We now consider processes on hypergraphs and graphs, which we construct from a Poisson point process on the power set 2N of the natural numbers.

4.1 Introduction

For any finite set V , a multi-hypergraph, or hypergraph with multiplicity, H on V is a map V + H : 2 → Z ,

+ where Z := {0, 1, 2,...} is the set of nonnegative integers. A hypergraph H on V is a collection of subsets of V , called hyperedges, which can be expressed as a map

H : 2V → {0, 1}.

In general, a multi-hypergraph can be regarded as a pair (V,H), where H(A) denotes the multiplicity of the hyperedge A ⊆ V . Typically, the term hypergraph on V refers to a subset of 2V , as we have defined above. However, throughout this chapter, we adopt the convention of Darling and Norris [41] and use the term hypergraph to mean hypergraph with multiplicity. If we wish to emphasize a hypergraph as a subset of 2V , we shall use the term hypergraph without multiplicity. For n ≥ 1, we write Hn to denote the space of multi-hypergraphs on [n]. + For each n ≥ 1, Hn is partially ordered by the total ordering ≤ on Z . That is, 0 for H,H ∈ Hn, H ≤ H0 ⇐⇒ (∀a ⊆ [n])[H(a) ≤ H0(a)].

93 94

A graph G on V is a map

G : V × V → {0, 1} and an undirected graph can be regarded as a special case of a hypergraph H : V + 2 → Z for which H({i}) = 1 for all i ∈ V , H(A) = 0 whenever #A ≥ 3 and im H ⊆ {0, 1}. A pair i, j ∈ V such that G(i, j) = 1 is called an edge between i and j. Alternatively, for E := E(G) = {(i, j) ∈ V × V : G(i, j) = 1}, we can write a graph G on V as the pair (V,E). A graph G for which G(i, j) = G(j, i) for all i, j ∈ V is called undirected; otherwise, G is directed. Darling and Norris [41] study the structure of random hypergraphs and show scal- ing limits for certain statistics of Poisson random hypergraphs. Our study of Poisson random hypergraphs differs from that in [41]; we are interested in characterizing a families of infinite random hypergraphs.

4.1.1 Projective systems of hypergraphs

We now define projective systems on the collection (Hn, n ≥ 1) of hypergraphs on [n], which correspond to contravariant functors associated to insertion maps and injective maps.

For m ≤ n, define im,n :[m] → [n], j 7→ j, as the insertion map [m] → [n]. There is an associated contravariant functor (H, ∗) which associates [n] with Hn and im,n ∗ with im,n : Hn → Hm as follows. For n ≥ 1, H ∈ Hn+1 and A ⊆ [n], we define

∗ (in,n+1H)(A) = H(A) + H(A ∪ {n + 1}).

∗ ∗ ∗ For m ≤ n, we define im,n = im,m+1 ◦ · · · ◦ in−1,n. In general, for H ∈ Hn and A ⊆ [m], we have ∗ X (im,nH)(A) = H(A ∪ X). (4.1) X⊆{m+1,...,n} Note that the sum in (4.1) is equivalent to taking the sum of H(B) over all subsets B of [n] such that B ∩ [m] = A, which corresponds to the notion of restriction for 95 hypergraphs. See section 1.4 for discussion of restriction for partitions and trees. ∗ For m = n, it is clear that in,n is the identity [n] → [n] and in,n is the identity ∗ ∗ ∗ ∗ Hn → Hn. By construction, we have il,m◦im,n = il,n for l ≤ m ≤ n and so (Hn, im,n) is a projective system.

For any injective map ϕm,n :[m] → [n], m ≤ n, there corresponds a map ϕm,n : [m] [n] 2 → 2 which is defined for each A ⊆ [m] by A 7→ ϕm,n[A] := {ϕm,n(a): a ∈ A} 0 [n] [m] in the usual way. There is also a corresponding map ϕm,n : 2 → 2 defined by

0 ϕm,n(B) := {j ∈ [m]: ϕm,n(j) ∈ B} for each B ⊆ [n].

In general, for any injective map ϕm,n :[m] → [n], we define its associated ∗ projection ϕm,n : Hn → Hm by

∗ X (ϕm,nH)(A) := H(B), (4.2) B:B∩im ϕm,n=ϕm,n[A] for each H ∈ Hn and A ⊆ [m]. Equation (4.2) corresponds to the sum of H(B) over subsets B ⊆ [n] whose restriction to the image of ϕm,n is ϕm,n[A]; that is, the sum 0 over the set {B : ϕm,n(B) = A}.

∗ Proposition 4.1.1. (Hn, ϕm,n) is a projective system.

Proof. We must verify that

∗ (i) if ϕn,n is the identity [n] → [n] then ϕn,n is the identity Hn → Hn, and

(ii) if l ≤ m ≤ n and ϕl,n = ϕm,n ◦ ϕl,m :[l] → [n], then its associated projection ∗ ∗ ∗ ∗ ϕl,n satisfies ϕl,n = ϕl,m ◦ ϕm,n.

Condition (i) is obvious by definition since each ϕm,n is injective and so if ϕn,n is the identity, then ϕn,n[A] = A; hence, {B ⊆ [n]: B ∩ [n] = A} = {A}. 96

To establish (ii), we have the following. For H ∈ Hn and A ⊆ [l], the associated

projection of the composition ϕl,n = ϕm,n ◦ ϕl,m is defined to be

∗ X ϕl,n := H(B). (4.3) B:B∩im ϕl,n=ϕ[A]

∗ ∗ ∗ It remains to show that ϕl,n = ϕl,m ◦ ϕm,n. We have

∗ ∗ ϕl,m(ϕm,nH)(A) =   ∗  X  = ϕl,m  H(B) (A) B:B∩im ϕm,n=ϕm,n[·]   X  X  =  H(B) . (4.4) C:C∩im ϕl,m=ϕl,m[A] B:B∩im ϕm,n=ϕm,n[C]

We need to show that (4.4) corresponds to (4.3), which amounts to showing that the sets ∗ [ Z := {B ⊆ [n]: B ∩ im ϕm,n = ϕm,n[C]} C:C∩im ϕl,m=ϕl,m[A] and

Z := {D ⊆ [n]: D ∩ im ϕl,n = ϕl,n[A]} are equal. We have the following lemma.

Lemma 4.1.2. For Z and Z∗ defined above, Z∗ = Z.

Proof. Let ϕl,m, ϕm,n, ϕl,n and A ⊆ [l] be as above. Suppose D ∈ Z. Then D ∩ ∗ im ϕl,n = ϕl,n[A] and so D can be written as a disjoint union D = ϕl,n[A] ∪ D for ∗ c some D ⊆ (im ϕl,n) , the complement of im ϕl,n in [n]. For d ∈ D∗, either

• d∈ / im ϕm,n or

• d ∈ im ϕm,n but there is no c ∈ im ϕl,m such that ϕm,n(c) = d. 97

Define two sets, B∗ and C∗, as follows. For d ∈ D∗,

∗ • if d∈ / im ϕm,n, put d ∈ B ;

• otherwise, put d ∈ C∗.

We can write D∗ as the disjoint union B∗ ∪ C∗, and we define C∗∗ := {c0 ∈ [m]: 0 ∗ ∗ ∗ ϕm,n(c ) ∈ C } as the pre-image of C under ϕm,n. By construction, we have C = ∗∗ ∗∗ ϕm,n[C ] and C ∩ im ϕl,m = ∅. ∗∗ By putting C := ϕl,m[A] ∪ C it is clear that

∗∗  C ∩ im ϕl,m = (ϕl,m[A]) ∪ C ∩ im ϕl,m | {z } =∅ = ϕl,m[A].

It follows that

∗∗ ∗ \ ∗  ∗  ϕm,n[ϕl,m[A] ∪ C ] ∪ B im ϕm,n = ϕl,n[A] ∪ C ∩ im ϕm,n ∪ B ∩ im ϕm,n | {z } | {z } =C∗ =∅ ∗ = ϕl,n[A] ∪ C ∗∗ = ϕm,n[ϕl,m[A] ∪ C ].

∗ ∗ ∗ ∗ ∗ ∗ Also, D = ϕl,n[A] ∪ C ∪ B . Hence, D ∈ Z and Z ⊆ Z . Conversely, let C ⊆ [m] satisfy C ∩ im ϕl,m = ϕl,m[A] and B ⊆ [n] satisfy B ∩ ∗ ∗ c im ϕm,n = ϕm,n[C]. Then we can write C = ϕl,m[A] ∪ C for some C ⊆ (im ϕl,m) ∗ ∗ c ∗ ∗ and B = ϕm,n[C]∪B for some B ⊆ (im ϕm,n) . Note that neither B nor ϕm,n[C ] ∗ ∗ overlap with im ϕl,n, and we can assume, without loss of generality, that B ∩C = ∅. Hence,

∗ ∗ ∗ ∗ ϕm,n[ϕl,m[A] ∪ C ] ∪ B = ϕl,n[A] ∪ ϕm,n[C ] ∪ B | {z } =:D∗ ∗ = ϕl,n[A] ∪ D ,

∗ c ∗ ∗ where D ⊆ (im ϕl,n) . Hence, ϕl,n[A] ∪ D ∈ Z and Z ⊆ Z. 98

Finally, since the lemma shows the equivalence of Z and Z∗, (4.3) and (4.4) are ∗ equivalent, and ϕm,n satisfies (ii).

Remark 4.1.3. Lemma 4.1.2 allows for the construction of different projective sys- tems on the collection (Hn, n ≥ 1) by considering functions other than summation, as in (4.1) and (4.2). For example, for an injective map ϕm,n :[m] → [n], m ≤ n, ∨ we can define a projection ϕm,n : Hn → Hm as follows. Let A ⊆ [m] and H ∈ Hn, then ∨ (ϕm,nH)(A) := max{H(B): B ∩ im ϕm,n = ϕm,n[A]}

∨ determines a projective system (Hn, ϕm,n) on the space of multi-hypergraphs.

∨ Corollary 4.1.4. (Hn, ϕm,n) is a projective system.

∗ ∗ The projective systems (Hn, im,n) and (Hn, ϕm,n) allow one to construct a family of infinite Poisson random hypergraphs by applying Kolmogorov’s extension theorem (theorem 1.3.1), as we now discuss.

4.2 Infinite Poisson random hypergraphs

For a probability space (Ω, F, P), a random hypergraph on V is a measurable map

V + H :Ω × 2 → Z .

+ For a sequence β := (βj : j ∈ Z ), Darling and Norris [41] define a Poisson(β) random hypergraph to be a random hypergraph H such that

1. H(A), A ∈ 2V , are independent;

2. H(A) depends only on #A;

P 3. A:#A=j H(A) ∼ Poisson(nβj), j = 0, 1, . . . , n.

By infinite divisibility of the Poisson distribution, these conditions imply that {H(A):  n #A = j} is a collection of independent and identically distributed Poisson nβj/ j + random variables for each j = 0, 1, . . . , n. Here, we study conditions on (βj : j ∈ Z ) 99

for which a collection (Hn, n ≥ 1) of Poisson random hypergraphs characterizes a + unique random hypergraph H∞ :Ω × 2N → Z , the infinite random hypergraph. For each n ≥ 1, let µn be a probability measure on Hn and let (Hn, ψm,n) be a projective system on the space of hypergraphs. The family of measures (µn, n ≥ 1) on (Hn, n ≥ 1) characterizes an infinite measure µ∞ on the projective limit measurable space of (Hn, ψm,n) if −1 µm = µnψm,n (4.5) for every m ≤ n and ψm,n : Hn → Hm; see theorem 1.3.1.

4.2.1 Construction of the infinite random hypergraph

Initially, we only assume that, for n ≥ 1, we have a collection {Xa(ω): a ⊆ [n]} of independent Poisson random variables with each Xa having mean Λ(a); we do not specifically assume that conditions 2. and 3. hold for Poisson random hypergraphs. [n] Let X be a Poisson point process on 2 with mean measure Λ so that {Xa(ω): [n] a ∈ 2 } is a collection of independent Poisson random variables with each Xa having mean Λ(a) ≥ 0. The process X determines a random hypergraph on [n] through the map [n] + X :Ω × 2 → Z defined by X(ω, a) := Xa(ω). For convenience, we drop the dependence on ω and write Xa(ω) ≡ Xa.

Proposition 4.2.1. For each m ≤ n, let im,n :[m] → [n] be the insertion map ∗ from the previous section with associated projection im,n : Hn → Hm given in (4.1). n A family (X , n ≥ 1) of Poisson random hypergraphs on (Hn, n ≥ 1) with mean ∞ ∗ measures (Λn, n ≥ 1) characterizes an infinite process X on (Hn, im,n) if and only if for each n ≥ 1 and a ⊆ [n]

Λn(a) = Λn+1(a) + Λn+1(a ∪ {n + 1}). 100

∗ n m [m] Proof. Consistency requires im,nX (ω, a) =L X (ω, a) for all a ∈ 2 , where “=L” ∗ n n n denotes “equal in law.” By definition, in−1,nX (ω, a) = X (ω, a) + X (ω, a ∪ {n}). ∗ n n n Therefore, in−1,nX (ω, a) ∼ Poisson(Λn(a)) only if X (ω, a) + X (ω, a ∪ {n}) ∼ [n] Poisson(Λn(a)), which requires for each n ≥ 1 and a ∈ 2 that

Λn(a) = Λn+1(a) + Λn+1(a ∪ {n + 1}).

The reverse implication is obvious by superposition.

∗ For the projective system (Hn, ϕm,n) under all injective maps, the result is similar, but we have an added condition on the mean measures.

For n ≥ 1, let Hn+1 and Hn be random hypergraphs determined by Poisson n+1 n point processes X and X with mean measures Λn+1 and Λn respectively. For m = n, condition (4.5) says that the distribution of Hn is invariant under action of permutation ϕ :[n] → [n], which corresponds to condition 2. above. We have the following elementary lemma.

Lemma 4.2.2. Let ϕ be an injective map [m] → [n] for m ≤ n, and for every a ∈ 2[m], let S(a) := {a0 ∈ 2[n] : im ϕ ∩ a0 = ϕ[a]}. Then #S(a) = 2n−m.

Proof. It suffices to show this for n = m + 1. Let ϕ :[m] → [m + 1] be a one-to-one mapping. Since ϕ is one-to-one, # im ϕ = m and #ϕ[a] = #a for a ∈ 2[m]. Hence, for a0 ∈ S(a), either #a0 = #a or #a0 = #a + 1. If #a0 = #a, then a0 = a. Otherwise, let x ∈ [m + 1]\ im ϕ be the element in [m + 1] that is not in im ϕ. Then a0 = a ∪ {x} ∈ S(a). If y∈ / a and y 6= x, it is clear that a ∪ {y} ∈/ S(a) since ϕ is injective. It is also clear that a0 ∈ S(a) cannot have #a0 < #a or #a0 > #a + 1. Hence, S(a) = {a, a ∪ {x}} and #S(a) = 2.

By lemma 4.2.2, we have that for each n ≥ 1 and H ∈ Hn,

∗ (ϕn,n+1Hn+1)(a) = Hn+1(a) + Hn+1(a ∪ {x}), 101 where x is the unique element of [n + 1] such that x∈ / im ϕn,n+1. Hence (4.5) corresponds to

Λn(a) = Λn+1(a) + Λn+1(a ∪ {x}) by the superposition property of independent Poisson random variables.

Theorem 4.2.3. Let (Hn, n ≥ 1) be a collection of Poisson random hypergraphs with ∗ mean measures (Λn(·), n ≥ 1). The law of (Hn) satisfies (4.5) on (Hn, ϕm,n) if and only if

(1) for all n ≥ 1 and a ⊆ [n], Λn(a) = λn(#a) for some sequence λ := (λn(r): n ≥ 1, 0 ≤ r ≤ n), and

(2) the sequence λ satisfies λn(r) = λn+1(r) + λn+1(r + 1) for all n ≥ 1 and 0 ≤ r ≤ n.

Proof. The argument for the ‘only if’ direction is given in the text leading up to the theorem. In particular, condition (1) follows by the requirement that Λn is invari- ant under permutation maps; condition (2) is a consequence of condition (1) and lemma 4.2.2. The ‘if’ direction is apparent by lemma 4.2.2, condition (4.5) and the superposition property of the Poisson distribution.

Note the difference between proposition 4.2.1 and theorem 4.2.3 from the stand- point of Kolmogorov’s extension theorem and infinite exchangeability. Both results appeal to Kolmogorov’s theorem, and so both characterize an infinite hypergraph on the appropriate projective system, but the conditions of theorem 4.2.3 impose an invariance under permutation maps which makes the process infinitely exchange- able. Proposition 4.2.1 characterizes an infinite random hypergraph which need not be exchangeable.

Remark 4.2.4. There are few obvious families of functions (λn(r), n ≥ 1, 0 ≤ r ≤ n) which satisfy both (1) and (2) of theorem 4.2.3. The form of the function λn(r) =

λn+1(r) + λn+1(r + 1) is related to the Hausdorff moment problem, see [42]. Some valid explicit choices for λn(r) are 102

n−1 • λn(r) ∝ r , and

R ∞ R 1 r n−r + • λn(r) ∝ 0 0 βα (1−α) µ(dα, dβ) for some measure µ(·, ·) on (0, 1)×R .

4.3 Induced hypergraphs, hereditary hypergraphs, and undirected graphs

A multi-hypergraph can be projected into the space of hypergraphs on [n] without multiplicity, hereditary hypergraphs on [n] and undirected graphs on [n], as we show later in this section. These projections carry with them the various furnishings of ∗ ∗ the associated projective system on hypergraphs, either (Hn, im,n), (Hn, ϕm,n) or perhaps some other projective system on (Hn, n ≥ 1), which we generically denote ∗ by (Hn, ψm,n). Let ([n], ψm,n) be a category consisting of finite sets [n] and maps ψm,n :[m] → [n], ∗ + which define the arrows. Let (Hn, ψm,n) and (An, ψm,n) be projective systems, i.e. a contravariant functor associated to the category ([n], ψm,n) of finite sets and maps ψm,n :[m] → [n]. Here, Hn is a collection of hypergraphs on [n] and (An, ≥ 1) is + some collection of abstract spaces An equipped with maps ψm,n : An → Am which correspond to ψ :[m] → [n] for m ≤ n. For each n ≥ 1, suppose there is a measurable map Tn : Hn → An such that for each ψm,n :[m] → [n], m ≤ n,

+ ∗ ψm,n ◦ Tn = Tm ◦ ψm,n.

Then the collection (Tn, n ≥ 1) of maps Hn → An determines a natural transforma- ∗ + tion T :(Hn, ψm,n) → (An, ψm,n) as follows. ∗ Let m ≤ n and ψ :[m] → [n] be associated with the projection ψ : Hn → Hm. ∗ + Then, for each n ≥ 1, T associates Hn with An and ψ with ψ : An → Am such that + ∗ ψ ◦ Tn = Tm ◦ ψ .

In other words, T maps elements in (Hn, n ≥ 1) to elements in (An, n ≥ 1) and 103

∗ + projections in (ψm,n) to projections in (ψm,n) such that the following diagram com- mutes.

Tn [n] Hn An

ψ ψ∗ ψ+

[m] Hm Am Tm

∗ + Proposition 4.3.1. Let (Hn, ψm,n), (An, ψm,n) and Tn : Hn → An be as described ∗ above. Then if (µn, n ≥ 1) is a collection of measures on (Hn, ψm,n) satisfying (4.5), −1 then (µnTn , n ≥ 1), the family of measures induced on An by µn through T , also satisfies (4.5).

∗ Proof. For 1 ≤ m ≤ n, let A ∈ Am and ψm,n be a projection Hn → Hm with ∗ + associated projection T (ψm,n) ≡ ψm,n : An → Am. Then

−1 +−1 + −1 µnTn ψm,n (A) = µn(ψm,n ◦ Tn) (A) ∗ −1 = µn(Tm ◦ ψm,n) (A) ∗−1 −1 = µnψm,nTm (A) −1 = µmTm (A).

−1 + and the family (µnTn , n ≥ 1) of induced measures on (An, ψm,n) satisfies (4.5).

+ Below, we will see that there are some natural choices for T and (An, ψm,n) which satisfy the conditions of proposition 4.3.1.

4.3.1 Random hypergraphs

By ignoring multiplicities, each realization of a Poisson(β) random multi-hypergraph

X defines a random hypergraph without multiplicities, Set(X) := {a ⊆ [n]: Xa > [n] 0} ⊆ 2 , which consists of those subsets a ⊆ [n] for which Xa > 0. The function Set : ∗ ∗ ∗ Hn → Hn maps H 7→ H ∨ 1. For ψm,n corresponding to either im,n or ϕm,n, Set(·) ∗ + determines a transformation from (Hn, ψm,n) into (Hn, ψm,n) where each hypergraph 104

∗ H ∈ Hn is mapped to Set(H) := H ∨1 and each projection ψm,n : Hn → Hm maps to + ∗ ∨ + ψm,n := ψm,n ∨ 1 = ψm,n. It is clear that ψm,n satisfies the conditions of proposition 4.3.1. Set [n] Hn Hn

ψ ψ∗ ψ+ := ψ∗ ∨ 1 = ψ∨

[m] Hm Hm Set

Proposition 4.3.2. Let X be a Poisson(Λ) random hypergraph on [n]. For each E ⊆ 2[n], the induced distribution of Set(X) ⊆ 2[n] is

Y Y P(Set(X) ⊆ E) = exp{−Λ(a)} = exp{−Λ(a)}. (4.6) a/∈[∅,E] a⊆[n]:a/∈E

Moreover, for Λn satisfying the conditions of either proposition 4.2.1 or theorem 4.2.3, the family of induced measures (4.6) on Hn through Set(·) characterizes a measure ∨ ∨ on the appropriate limit space, either (Hn, im,n) or (Hn, ϕm,n).

Proof. The calculation of (4.6) is clear by the independence of the components of the Poisson point process X and the definition of the function Set(·).

Suppose Λn satisfies the conditions so that the family of measures (µn, n ≥ 1) ∗ ∗ on the appropriate space (Hn, ψm,n) satisfy (4.5). With each ψm,n : Hn → Hm, we + + ∗ associate ψm,n : Hn → Hm, which we define by ψm,n := ψm,n ∨ 1. We have for each H ∈ Hn,

+ + ψm,n ◦ Set(H) = ψm,n(H ∨ 1) ∗ = ψm,n(H ∨ 1) ∨ 1 ∗ = ψm,n(H) ∨ 1 ∗ = Set ◦ψm,n(H).

+ Hence, ψm,n satisfies the conditions of proposition 4.3.1 and the consistency of the −1 family of induced measures (µn Set , n ≥ 1) follows immediately. 105 4.3.2 Hereditary hypergraphs and monotone sets

V + So far, we have regarded a hypergraph H on V as any function H : 2 → Z . It is sometimes natural to require that H satisfies the heredity condition: H(A) ≥ H(A0) for every pair of subsets A ⊆ A0 ⊆ V .

Definition 4.3.3. A hypergraph H on V which satisfy the heredity condition is called a hereditary hypergraph.

If we only consider hypergraphs without multiplicity, an equivalent condition for a hypergraph H : 2V → {0, 1} to be hereditary is

a ∈ H =⇒ 2a ⊆ H.

Definition 4.3.4. A subset A ⊆ 2[n] is monotone, or an abstract simplicial complex, if a ∈ A implies 2a ⊆ A.

In this chapter, we use the term monotone subset in favor of abstract simplicial complex. A hereditary hypergraph without multiplicities is a monotone subset. For example, the set A = {∅, {1}, {2}, {3}, {1, 2}} = h{1, 2}, {3}i is a monotone set with maximal elements {1, 2} and {3}, which constitute the generating class of A, written G(A). An element a of the generating class of a monotone set A is a maximal element of A in the sense that no other element a0 ∈ A contains a as a subset. The generating class G(A) of a monotone subset A consists of all maximal elements of A.

We write Mon[n] ⊆ Hn as the set of hereditary hypergraphs on [n]. We dis- ∨ ∨ ∨ cuss two natural transformations Mon between (Hn, ϕm,n) and (Mon[n], ϕm,n) and ∗ ∗ ∗ ∨ ∗ Mon between (Hn, ϕm,n) and (Mon[n], ϕm,n). The maps Mon and Mon give rise to a notion of a minimal hereditary hypergraph. Define Mon∨ and Mon∗ as follows.

For H ∈ HV and A ⊆ V , we put

(Mon∨(H))(A) := max H(B) (4.7) B:B⊇A and X (Mon∗(H))(A) := H(B). (4.8) B:B⊇A 106

Recall from remark 4.1.3 that for each 1 ≤ m ≤ n and injective map ϕm,n :[m] → [n], ∨ we define the projection ϕm,n : Hn → Hm by

∨ (ϕm,nH)(A) := max H(B), B:B∩im ϕm,n=ϕm,n[A] for each H ∈ Hn and A ⊆ [m]. In order for both Mon∨ and Mon∗ to determine a natural transformation, propo- sition 4.3.1 requires that the diagrams

Mon∨ [n] Hn Mon[n] ⊆ Hn ϕ ϕ∨ ϕ∨

[m] Hm Mon[m] ⊆ Hm Mon∨

and Mon∗ [n] Hn Mon[n] ⊆ Hn ϕ ϕ∗ ϕ∗

[m] Hm Mon ⊆ Hm Mon∗ [m]

commute.

For convenience, write ϕ ≡ ϕm,n, then

(Mon∨ ◦ϕ∨H)(A) = max max H(C) B⊇A C∩im ϕ=ϕ[B] and

(ϕ∨ ◦ Mon∨ H)(A) = max max H(C) B∩im ϕ=ϕ[A] C⊇B = max H(C). C⊇ϕ[A] 107

Also, X X (Mon∗ ◦ϕ∗H)(A) = H(C) B⊇A C∩im ϕ=ϕ[B] and X (ϕ∗ ◦ Mon∗ H)(A) = H(C). C⊇ϕ[A] The above diagrams commute if

[ {C ⊆ [n]: C ⊇ ϕ[A]} = {C ⊆ [n]: C ∩ im ϕ = ϕ[B]}. B⊆[m]:B⊇A

We have the following lemma.

Lemma 4.3.5. For 1 ≤ m ≤ n, let A ⊆ [m] and ϕ :[m] → [n] be an injective map. Then Z∗ = Z, where Z∗ and Z are the sets defined by

Z∗ := {C ⊆ [n]: C ⊇ ϕ[A]}

and [ Z := {C ⊆ [n]: C ∩ im ϕ = ϕ[B]}. B⊆[m]:B⊇A

Proof. Let [m] ⊇ B ⊇ A, then ϕ[B] ⊇ ϕ[A] and C ∩ im ϕ = ϕ[B] implies C ⊇ ϕ[B] ⊇ ϕ[A]. Hence, C ⊇ ϕ[A] and C ∈ Z∗. This shows that Z ⊆ Z∗. Let [n] ⊇ C ⊇ ϕ[A], then C = ϕ[A] ∪ D∗ for some D∗ ⊆ [n] disjoint from ϕ[A]. We can write D∗ as a disjoint union D∗ = B∗ ∪ C∗ where B∗ = D∗ ∩ im ϕ and C∗ = D∗∩(im ϕ)c. Hence, we can write C as a disjoint union C = ϕ[A]∪B∗∪C∗. Let B := {b ∈ [m]: ϕ(b) ∈ B∗} be the pre-image of B∗ under ϕ. Then C = ϕ[A∪B]∪C∗; therefore, we have A∪B ⊇ A and C∩im ϕ = ϕ[A∪B]. Hence, C ∈ Z and Z∗ ⊆ Z.

The following is a corollary of lemma 4.3.5 and proposition 4.3.1.

∨ Corollary 4.3.6. Let X be a random hypergraph whose law satisfies (4.5) on (Hn, ϕm,n) ∗ ∨ ∗ or (Hn, ϕm,n). Then the induced law of Mon (X) and Mon (X) satisfies (4.5) on ∨ ∗ the corresponding space, (Mon[n], ϕm,n) or (Mon[n], ϕm,n) respectively. 108

Every hypergraph H ∈ Hn has a least monotone cover in Mon[n], which we denote 1 ∨ 1 by Mon (H) := Mon (H) ∨ 1 and is given by Mon (H)(A) := (maxB⊇A H(A)) ∨ 1 [n] ∨ for each A ∈ 2 . For a hypergraph H ∈ (Hn, ϕm,n) without multiplicity, the least monotone cover and minimal hereditary hypergraph under Mon∨ of H coincide, i.e. Mon1(H) = Mon∨(H) = {2a : a ∈ H}.

Corollary 4.3.7. Let (Hn, n ≥ 1) be a family of random hypergraphs without multi- ∨ 1 plicity whose law satisfies (4.5) on (Hn, ϕm,n). Then the induced family (Mon (Hn), n ≥ ∨ 1) of hereditary hypergraphs on (Hn, ϕm,n) satisfies (4.5).

1 Proposition 4.3.8. The map Mon : Hn → Mon[n] determines a natural trans- ∗ ∨ formation from (Hn, ϕm,n) to (Mon[n], ϕm,n). Moreover, the Poisson random hy- pergraph satisfying conditions of proposition 4.2.1 or theorem 4.2.3 determines an infinite random hereditary hypergraph without multiplicity, i.e. an infinite random ∨ ∨ monotone subset, on (Hn, im,n) or (Hn, ϕm,n) respectively.

Proof. For 1 ≤ m ≤ n, let ϕm,n be an injective map [m] → [n]. It suffices to show that 1 ∗ ∨ 1 Mon ◦ϕm,n = ϕm,n ◦ Mon .

Let H ∈ Hn and A ⊆ [m]. We have

  1 ∗ X (Mon ϕm,nH)(A) = max H(C) ∨ 1 B⊇A  C∩im ϕm,n=ϕm,n[B] ! = max max H(C) ∨ 1 B⊇A C∩im ϕm,n=ϕm,n[B] and

  ∨ 1 (ϕm,n Mon H)(A) = max max H(C) ∨ 1 B∩im ϕm,n=ϕm,n[A] C⊇B ! = max max H(C) ∨ 1. B∩im ϕm,n=ϕm,n[A] C⊇B 109

Put [ Z := {C ⊆ [n]: C ∩ im ϕm,n = ϕm,n[B]} B⊆[m]:B⊇A and [ Z∗ := {C ⊆ [n]: C ⊇ B}. B⊆[n]:B∩im ϕm,n=ϕm,n[A]

Let B,C satisfy B ⊇ A and C ∩ im ϕm,n = ϕm,n[B]; then C ∈ Z. B can be written as a disjoint union B = A ∪ A∗ and C can be written as a disjoint union ∗ ∗ c C = ϕm,n[B] ∪ C for some C ⊆ (im ϕm,n) . Hence, C can be written as a disjoint ∗ ∗ union C = ϕm,n[A] ∪ ϕm,n[A ] ∪ C . By assumption, B ⊇ A implies ϕm,n[B] ⊇ ∗ ϕm,n[A] since ϕm,n is injective. Hence, (ϕm,n[A] ∪ C ) ∩ im ϕm,n = ϕm,n[A] and ∗ ∗ ∗ C ⊇ ϕm,n[A] ∪ C implies C ∈ Z . Thus, Z ⊆ Z . ∗ Conversely, let B,C satisfy B ∩ im ϕm,n = ϕm,n[A] and C ⊇ B; then C ∈ Z . ∗ ∗ ∗ c We can write C as a disjoint union C = ϕm,n[A] ∪ C ∪ B where B ⊆ (im ϕm,n) ∗ ∗ and C ⊆ im ϕm,n. Here, B corresponds to ϕm,n[A] ∪ B . Now, since ϕm,n is ∗ 0 injective and C ⊆ im ϕm,n, we can uniquely define C := {c ∈ [m]: ϕm,n(c) ∈ ∗ ∗ 0 C } as the pre-image of C under ϕm,n. In this case, we have A ∪ C ⊇ A and ∗ ∗ ∗ 0 (ϕm,n[A] ∪ C ∪ B ) ∩ im ϕm,n = ϕm,n[A] ∪ C = ϕm,n[A ∪ C ]. Hence C ∈ Z and Z∗ ⊆ Z. This allows us to apply proposition 4.3.1 and thus completes the proof. ¯ Corollary 4.3.9. For n ≥ 1, let A ∈ Mon[n] and A denote the complement of A, i.e. A¯ = 1 − A : 2[n] → {0, 1}. For X a Poisson(Λ) random hypergraph, the induced 1 distribution of Mon (X) on Mon[n] is

  1  X  Pn(Mon (X) ⊆ A) = exp − Λ(a) .  a∈A¯ 

4.3.3 Random undirected graphs

An undirected graph G : V × V → {0, 1} is easily induced from a hypergraph H : V + 2 → Z by putting

(GH)(v, w) := G(v, w) := (Mon1 H)({v, w}) 110 for each v, w ∈ V . We write Gn to denote the space of undirected graphs on [n]. In this sense, an undirected graph G : V × V → {0, 1} can be regarded as a hereditary hypergraph G : 2V → {0, 1} for which G(A) = 0 for all A ⊆ V with #A ≥ 3. This observation leads immediately to the following.

Corollary 4.3.10. Let H := (Hn, n ≥ 1) be a family of Poisson(Λ) random hy- pergraphs which characterizes an infinite hypergraph through either proposition 4.2.1 or theorem 4.2.3. Denote the projective system generically as (Hn, ψm,n). Then the induced family of undirected graphs (GHn, n ≥ 1) characterizes an infinite random ∨ graph G∞ on the projective limit space of (Gn, ψm,n).

Example 4.3.11. A random graph distributed according to the Erd¨os-R´enyiprocess on n ≥ 1 vertices and parameter p ∈ (0, 1), G(n, p) from section 1.12, can be generated 1 by the projection Mon (H), where H := (Hn, n ≥ 1) is a family of Poisson random ∗ hypergraphs on (Hn, ϕm,n) with λn(0) ≥ 0, λn(1) = 0, λn(2) = − log(1 − p) and λn(j) = 0 for j ≥ 3, for n = 1, 2,....

4.4 Discussion

A hereditary hypergraph corresponds to a factorial model in statistics, see e.g. [79] for the definition of a factorial model. Speed and Bailey [95] and McCullagh [74] study factorial models in greater depth. A potential application of the above discussion on infinitely exchangeable hereditary hypergraphs is as a class of prior distributions on factorial models in a Bayesian setting. The persistent study of models for complex networks has resulted in several recent books, e.g. Chung and Lu [39], Kolaczyk [70], review papers, e.g. Albert and Barab`asi [1], Newman [86], and publications, e.g. [18, 31, 32, 34, 37, 38, 43]. These models are studied from different perspectives and for different purposes. In this chapter, we constructed a class of random graphs which are infinitely exchangeable; however, from a modeling perspective, this class seems lacking in its computational tractability. The extent to which the above class of Poisson random graph models can be understood, and subsequently applied, is a topic for future research. CHAPTER 5 BALANCED AND EVEN PARTITION STRUCTURES

We now return to the study of projective systems of set partitions and permutations. We construct processes on products of projective systems which can be associated to specially structured partitions, called balanced and even partitions.

5.1 Preliminaries

For j, n ∈ N, B ∈ P[nj] is j-divisible, or j-even, if the size of each b ∈ B is divisible by j. Let P[nj]:j denote the set of partitions of [nj] that are j-even. For a set J of j ≥ 1 types, a j-balanced partition is a j-even partition B for which each element is assigned a type in J and each b ∈ B contains an equal number of elements of 0 each type. Let P[nj]:j denote the set of j-balanced partitions [nj]. The collections 0 (P[nj]:j, n ≥ 1) and (P[nj]:j, n ≥ 1) of j-balanced and j-even partitions, respectively, do not form a projective system for j ≥ 2; nevertheless, it is possible to define the analog of a partition structure on these spaces, as we discuss in sections 5.2 and 5.3.

Recalling the exposition from section 1.4, let Sn denote the symmetric group acting on [n]. The collection (Sn, n ≥ 1) is a projective system under the delete- and-repair mappings ψm,n : Sn → Sm, m ≤ n, which are defined for each σ ∈ Sn by ( σ(n + 1), i = σ−1(n + 1) (ψn−1,nσ)(i) = σ(i), otherwise.

For m ≤ n, we define ψm,n := ψm,m+1 ◦ · · · ◦ ψn−1,n. To each partition B ∈ P[n], Q there correspond b∈B(#b − 1)! permutations of [n] whose cycles correspond to the blocks of B. We define Υ to be the law of an infinite uniform permutation of N which has finite dimensional distributions Υn(σ) = 1/n! for each n ≥ 1 and σ ∈ Sn.

111 112 5.2 Balanced partitions

Let ν be a probability measure on R↓. A compatible sequence of j-balanced par- titions can be constructed by first constructing a j-tuple (B, σ2, . . . , σj) with dis-

tribution %ν ⊗ Υ ⊗ · · · ⊗ Υ on the product space P × SN × · · · × SN. We de- fine D˜m,n := (Dm,n, ψm,n, . . . , ψm,n) in the obvious way as restriction maps on

P × SN × · · · × SN which act componentwise. The product space is a projective system under (D˜m,n, m ≤ n) and, for n ≥ 1, the distribution of the restriction 2 j (B, σ , . . . , σ )|[n] is

2 j j−1 Pn((B|[n], σ|[n], . . . , σ|[n])) = %ν({π ∈ P : π|[n] = B|[n]})/(n!) . (5.1)

For convenience, we write B|[n] := {π ∈ P : π|[n] = B|[n]} to denote the set of partitions of N compatible with B|[n]. To obtain a j-balanced partition of [nj]∗ := [n](1) ×· · ·×[n](j) from the restriction of (B, σ2, . . . , σj) to [n], we regard σi :[n](1) → [n](i) as uniform matchings from the population [n](1) of individuals of type (1) to the population [n](i) of individuals of ∗ ∗ type (i) for each i = 2, . . . , j.A j-balanced partition Bn of [nj] is obtained by regarding σ1 as the identity map [n](1) → [n](1) and putting

 j  ∗  [ k  Bn := σ|[n](Bi): i ≥ 1 , (5.2) k=1  where σ(b) := {σ(a): a ∈ b} for each b ∈ B.

2 j Proposition 5.2.1. Let (B, σ , . . . , σ ) ∼ %ν ⊗ Υ ⊗ · · · ⊗ Υ be as above. The distri- ∗ bution of Bn in (5.2) is

∗ ∗ %ν(π ) Y j−1 P[Bn = π] = ((#b/j)!) , (5.3) (n!)j−1 b∈π

∗ where π ∈ P[n] is any partition with blocks of size (#b/j, b ∈ π). 113

∗ Proof. Each block b ∈ Bn contains an equal number of individuals of each type i (1),..., (j). In addition, (σ|[n], i = 2, . . . , j) are i.i.d. uniform matchings. Hence, ∗ within each block b of Bn, there are #b/j elements of type (i) and to any specific al- i i location of individuals u1, . . . , u#b/j of type (i) within b, there are (#b/j)! admissible matchings σi :[n](1) → [n](i) for each i = 2, . . . , j.

5.3 Even partitions

A compatible sequence of j-even partitions can be constructed from a pair (B, σ) ∼

%ν ⊗ Υ on the product projective system (P[n] ×Snj : n ≥ 1) with restriction maps ¯ ∗ Dm,n := (Dm,n, ψmj,nj) which act componentwise. A j-even partition Bn of [nj] is obtained from (B, σ) by defining u(i) := {(i − 1)j + 1, . . . , ij} for i = 1, . . . , n and putting   ∗  [ [  Bn := σ|[nj](l): b ∈ B|[n] . (5.4)   k∈b l∈u(k) ∗ Proposition 5.3.1. The law of Bn in (5.4) is

n! Y (#b)! (B∗ = π) = % (π∗) (5.5) P n (nj)! ν (#b/j)! b∈π

where π∗ is any partition of [n] with blocks of size (#b/j, b ∈ π).

∗ Proof. The distribution of the partition B|[n] is %ν(π ) since the block sizes of π are obtained by multiplying each block of an exchangeable partition with distribution %ν by j, and the block sizes of any exchangeable partition are sufficient to determine its distribution [93]. Given the block sizes (n , . . . , n ) of π∗, write m := Pk to denote the 1 k i j=1 I{nj=i} ∗ n! number of blocks of size i in π . There are Qn m partitions of [n] with the i=1(i!) imi! ∗ Qn mi same block sizes as π . To each of these partitions, there are i=1((ji)!) mi! permu- tations of [nj] which result in the partition π∗ in (5.4), each of which has probability ∗ n! Q (#b)! 1/(nj)!. Hence, the probability that σ|[nj] maps π to π is (nj)! b∈π (#b/j)!. 114 5.4 Partition structures

Kingman [67] defines a partition structure on integer partitions as a collection of probability measures (P1, P2,...) on the spaces of integer partitions of each n ≥ 1 0 such that if λ is obtained from λ ∼ Pn by randomly choosing a part of λ with 0 probability proportional to its size and reducing it by one, then λ ∼ Pn−1. We introduce a corresponding notion on the spaces of balanced and even partitions which, like integer partitions, are not projective systems. ∗ Suppose Bn ∼ Pn is a random balanced (respectively even) partition of [nj]. ∗ Randomly remove a group of j elements from Bn as follows.

∗ (1) Choose b ∈ Bn randomly with probability proportional to #b;

(2) given b, for each i = 1, . . . , j, choose u(i) uniformly among all type (i) ele- ments in b (respectively choose j elements from b by drawing uniformly without replacement from the elements of b);

∗ (i) ∗ (3) remove u := {u , i = 1, . . . , j} from Bn;

(4) relabel units within each population [n−1](i) uniformly at random (respectively relabel units uniformly in [(n − 1)j]).

Proposition 5.4.1. Let P := (Pn, n ≥ 1) be the collection of finite-dimensional mea- 0 sures (5.3) or (5.5). Then P is a partition structure on P[nj]:j or P[nj]:j, respectively, with respect to the above process of random deletion.

Proof. Since units are randomly relabeled in step (4), the choice of which units to remove in (2) and (3) does not effect the distribution of the reduced partition. Fur- ∗ thermore, by exchangeability of the paintbox process, given Bn, the probability of the ∗ event {u ∈ b} for some b ∈ Bn is #b/n. As the paintbox process is consistent under the deletion mappings, the structure of the reduced partition is that of a random balanced (respectively even) partition distributed according to Pn−1. 115 5.5 Balanced and even permutations

The product measures %ν ⊗Υ⊗· · ·⊗Υ and %ν ⊗Υ which generate balanced partitions Qj and even partitions, respectively, also induce a distribution on i=1 Sn and Sn ×Snj, 0 respectively, in a straightforward way by choosing σ ∈ Sn uniformly among the permutations of [n] corresponding to π ∼ %ν. For balanced permutations, we randomize the arrangement of the elements within ∗ each block by letting σ ∈ Sj be a uniform permutation of [j]. The permutation σ∗ then specifies the arrangement of types within groups. Additional rearrangement ∗ can be obtained by generating an independent uniform permutation σi for each cycle of the induced balanced permutation, which determines the order of the types for each cycle independently. It is straightforward to compute the distributions of these induced permutations through the formulae in propositions 5.2.1 and 5.3.1.

5.6 Relating balanced and even partitions

To any B ∈ P[n], there corresponds an integer partition ν(B) := (λ1, . . . , λn) of n where λj := #{b ∈ B :#b = j} is the number of blocks of B of size j.

0 0 Proposition 5.6.1. For n, j ≥ 1, let B ∈ P[nj]:j have distribution (5.3) and B ∈ 0 P[nj]:j have distribution (5.5). Then ν(B) ∼ ν(B ). Moreover,

n! (ν(B) = (m , . . . , m )) = % (B∗), (5.6) P 1 nj Qn mij ν i=1(i!) mij!

∗ where B ∈ P[n] is any partition with blocks of size (#b/j, b ∈ B).

0 Proof. Each π ∈ P[nj]:j can be naturally regarded as a j-tuple of partitions (π(1), . . . , π(j)) where π(i) is the partition of elements of type (i) induced by π. The (i) block sizes of each π are equivalent. If (m1, . . . , mn) is the integer partition corre- (i) n! (i) sponding to each π , then there are Qn m partitions of [n] which corre- i=1(i!) imi! 0 (1) spond to (m1, . . . , mn) for each i = 1, . . . , j. Also, for any partition π of [n] corre- Qj (i) sponding to (m1, . . . , mn), there are i=1 mi! ways to match each partition of [n] 116

n−1 0 n!(mij) to π . Hence, the distribution of ν(B) is (5.3) multiplied by Q Qn n . b∈B(#b/j)! i=1(mij!) n! On the other hand, there are Qn mij partitions of [nj] which correspond i=1((ij)!) mij! to the integer partition (m1, . . . , mnj). Each such partition is j-even. Hence, the distribution of ν(B0) and ν(B) corresponds to (5.6).

5.7 Chinese restaurant constructions

A special family of exchangeable random partition models is the (α, θ)-model [93], which is a more general version of the celebrated Ewens process [53]. We describe this model, as well as its construction via the Chinese restaurant process, in section 1.6. We now describe a Chinese restaurant-type construction for balanced and even partition structures.

5.7.1 Chinese restaurant construction for balanced partitions

Let J = [j] be a set of j ∈ N types and suppose individuals arrive in groups of size j, one individual of each type, and are labeled in N × [j], where the label (x, y) corresponds to the individual of type y in the xth group to arrive. For (α, θ) in the 0 parameter space of (1.10), we construct Bn ∈ P[nj]:j as follows.

(1) the first j units, (1, 1),. . . , (1, j), are seated at the same table;

(2) after nj arrivals are seated according to partition π, the next j units, (n + 1, 1),. . . , (n + 1, j) are seated randomly as follows:

a. for each i ≥ 1, (n+1, i) chooses u(i) uniformly among (1, i), (2, i),..., (n+ 1, i) and switches positions with u(i);

b.(( n + 1, 1), u(2), . . . , u(j)) is treated as a single unit, chooses a table, b ∈ π, according to CRP(nj, α, θ) and is seated at b. 117

0 Proposition 5.7.1. The probability distribution on P[nj]:j that results from the above construction is

↑#π j (θ/α) Y ↑(#b/j) j−1 pn(π; α, θ) = −(−α/j) [(#b/j)!] (5.7) (θ/j)↑n(n!)j−1 b∈π

Proof. The block sizes of π correspond to those of a CRP(n, α/j, θ/j) partition whose block sizes are multiplied by j. After n steps, (2b.) corresponds to uniform matching [n](1) → [n](i) for i = 2, . . . , j and the identity map [n](1) → [n](1). Equation (5.7) corresponds to (5.3) for ν corresponding to the Poisson-Dirichlet distribution with parameter (α, θ) (Bertoin [25], p. 74).

j For α = −κ < 0 and θ = mκ, pn(·; α, θ) in (5.7) has limit

Q j−1 j j #π b∈π(#b/j) Γ(#b/j) pn(π; λ) = (λ/j) , (5.8) (λ/j)↑n(n!)j−1

0 as m → ∞ and θ → λ, the distribution on P[nj]:j obtained if seating is done according to CRP(nj, 0, λ) in step 3b.

5.7.2 Chinese restaurant construction for even partitions

Alternatively, a compatible sequence of even partitions can be constructed from the following random seating rule. In this case, we label individuals in N and

(1) customers arrive in groups of size j;

(2) the first j customers, labeled 1, . . . , j, are seated at the same table;

(3) after nj arrivals are seated according to partition π, the next j customers, labeled nj + 1,..., (n + 1)j, are seated as follows:

a. for each i ≥ 2, nj+i chooses a label u(i) uniformly among 1, 2, . . . , nj + i − 1 and switches positions with u(i);

b.( nj + 1, u(2), . . . , u(j)) is treated as a single unit, chooses a block b ∈ π according to CRP(nj, α, θ), and is seated at b. 118

Proposition 5.7.2. The probability distribution on P[nj]:j that results from the above construction is

#π−1 ↑#π j j Γ(n) (θ/α) Y ↑(#b/j) Γ(#b) pn(π; α, θ) = −(−α/j) . (5.9) Γ(nj) (θ/j)↑n Γ(#b/j) b∈π

Proof. The block sizes of π correspond to those of a CRP(nj, α/j, θ/j) partition multiplied by j, which is exchangeable. Step (3b) results in π being chosen uniformly at random from even partitions corresponding to an integer partition (#b, b ∈ π). Propositions 5.3.1 and 5.6.1 imply (5.9).

Corollary 5.7.3. For α = −κ < 0 and θ = mκ for some m ∈ N, (5.9) is a distribu- (m) tion on P[nj]:j with limit (as κ → 0 and θ → λ)

#π Q j Γ(n) λ b∈π Γ(#b) pn(π; λ) = , (5.10) Γ(nj) (λ/j)↑n

the Ewens distribution with parameter λ conditioned to be even of order j.

From (5.10), we obtain the combinatorial identity

1 X α↑n α#πΓ(π) = /n!, (5.11) (nk)! k π∈P[nk]:k

Q where Γ(π) := b∈π Γ(#b).

5.8 Randomization

Brien and Bailey [35] discuss randomization schemes for designed experiments with multiple tiers of randomization. For now, we consider a standard designed experiment with a set U of units, T of treatments and B of blocks such that #T = t ≥ 1, #B = b ≥ 0 and #U = n ≥ bt. In a randomized experiment, units are placed into blocks according to similarities in certain characteristics and, within each block, treatments are randomly assigned to units. We highlight two scenarios: b = 0 and b ≥ 1. If b = 0, then individuals need only be randomly assigned a treatment. In 119 this case #U = nt and it is common to assign each treatment to n distinct units. We can regard this treatment assignment as an n-divisible partition in which each of the blocks is randomly assigned a label in T . This can be obtained by modifying the random seating rule in section 5.7.2 so that a new group is always placed in a new block. If b ≥ 1, each unit is assigned to one of b blocks. If n = 1 so that #U = bt, a balanced, complete design assigns each treatment to exactly one unit within each block. In the terminology of balanced partitions introduced here, B acts as a set of b types, and balanced groups of units are placed into treatment groups (blocks of a partition) randomly according to some procedure which generates a balanced partition. In this way, each treatment is assigned to the same number of individuals in each block and the design is balanced. If each new group is assigned a different treatment from all previous groups, we obtain a b-balanced partition of [bt] with t blocks and the design is complete; this corresponds to the classical randomized complete block design (RCBD) (Bailey [16], Mead [83]). In the case where blocks have unequal sizes, dummy units which do not correspond to any experimental unit can be introduced to maintain the randomization procedure and obtain a balanced or even partition. This corresponds to a randomized incomplete block design (RIBD). In general, if treatments are assigned to the blocks of a b-balanced or b-even parti- tion of [nbt] distributed according to either (5.7) (b ≥ 1) or (5.9) (b = 0) respectively, then a randomized design is obtained in which, for b ≥ 1, each treatment is applied the same number of times to each block. A design obtained in this way is not neces- sarily balanced or complete; different treatments are likely to be assigned in different frequencies due to the imbalance in block sizes of the resulting random partition, and not all t treatments are necessarily assigned to at least one unit. Usually, randomization is done in the interest of fairness in assigning treatments to units, but is sufficiently restricted so that each treatment is guaranteed to occur a prescribed number of times. Any computational or inferential benefits which can be derived from randomizing according to a more general rule for random even and balanced partitions are not immediately clear; however, partitions have previously appeared in the theory of designed experiments and association schemes, see e.g. 120

Bailey [15] p. 145. The discussion on balanced partitions resembles a more general version of group-divisible association schemes. REFERENCES

[1] R. Albert and A.-L. Barab´asi.Statistical mechanics of complex networks. Rev. Modern Phys., 74(1):47–97, 2002.

[2] D. Aldous. The continuum random tree. I. Ann. Probab., 19(1):1–28, 1991.

[3] D. Aldous. The continuum random tree. III. Ann. Probab., 21(1):248–289, 1993.

[4] D. Aldous. Tree-valued Markov chains and Poisson-Galton-Watson distribu- tions. In Microsurveys in discrete probability (Princeton, NJ, 1997), volume 41 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pages 1–20. Amer. Math. Soc., Providence, RI, 1998.

[5] D. Aldous, M. Krikun, and L. Popovic. Stochastic models for phylogenetic trees on higher-order taxa. J. Math. Biol., 56(4):525–557, 2008.

[6] D. Aldous and J. Pitman. Tree-valued Markov chains derived from Galton- Watson processes. Ann. Inst. H. Poincar´eProbab. Statist., 34(5):637–686, 1998.

[7] D. Aldous and J. Pitman. A family of random trees with random edge lengths. Random Structures Algorithms, 15(2):176–195, 1999.

[8] D. Aldous and L. Popovic. A critical branching process model for biodiversity. Adv. in Appl. Probab., 37(4):1094–1115, 2005.

[9] D. J. Aldous. Representations for partially exchangeable arrays of random vari- ables. J. Multivariate Anal., 11(4):581–598, 1981.

[10] D. J. Aldous. Exchangeability and related topics. In Ecole´ d’´et´ede probabilit´es de Saint-Flour, XIII—1983, volume 1117 of Lecture Notes in Math., pages 1–198. Springer, Berlin, 1985.

121 122

[11] D. J. Aldous. Deterministic and stochastic models for coalescence (aggregation and coagulation): a review of the mean-field theory for probabilists. Bernoulli, 5(1):3–48, 1999.

[12] D. J. Aldous. More uses of exchangeability: representations of complex random structures. In Probability and mathematical genetics, volume 378 of London Math. Soc. Lecture Note Ser., pages 35–63. Cambridge Univ. Press, Cambridge, 2010.

[13] G. Andrews. The Theory of Partitions. The Encyclopedia of Mathematics and Its Applications Series. Addison-Wesley Pub. Co., New York, 1976.

[14] S. Awodey. Category theory, volume 49 of Oxford Logic Guides. The Clarendon Press Oxford University Press, New York, 2006.

[15] R. A. Bailey. Association Schemes: Designed Experiments, Algebra and Combi- natorics, volume 84 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2004.

[16] R. A. Bailey. Design of comparative experiments. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2008.

[17] A.-L. Barab´asiand R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.

[18] E. Beer, J. Fill, S. Janson, and E. Scheinerman. On vertex, edge and vertex-edge random graphs. Accessed at arXiv:0812.1410, 2010.

[19] J. Berestycki. Exchangeable fragmentation-coalescence processes and their equi- librium measures. Electron. J. Probab., 9:no. 25, 770–824 (electronic), 2004.

[20] N. Berestycki and J. Pitman. Gibbs distributions for random partitions generated by a fragmentation process. J. Stat. Phys., 127(2):381–418, 2007.

[21] F. Bergeron, G. Labelle, and P. Leroux. Combinatorial species and tree-like structures. 1998. 123

[22] J. Bertoin. Eternal additive coalescents and certain bridges with exchangeable increments. Ann. Probab., 29(1):344–360, 2001.

[23] J. Bertoin. Self-similar fragmentations. Ann. Inst. H. Poincar´eProbab. Statist., 38(3):319–340, 2002.

[24] J. Bertoin. Some aspects of additive coalescents. In Proceedings of the In- ternational Congress of Mathematicians, Vol. III (Beijing, 2002), pages 15–23, Beijing, 2002. Higher Ed. Press.

[25] J. Bertoin. Random fragmentation and coagulation processes, volume 102 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cam- bridge, 2006.

[26] J. Bertoin. Two-parameter Poisson-Dirichlet measures and reversible exchange- able fragmentation-coalescence processes. Combin. Probab. Comput., 17(3):329– 337, 2008.

[27] J. Bertoin. Exchangeable Coalescents. Lecture notes for PIMS Summer School in Probability. University of Washington and Microsoft Research, Seattle, WA, 2010. Lecture notes for PIMS Summer School in Probability.

[28] L. Billera, S. Holmes, and K. Vogtmann. Geometry of the space of phylogenetic trees. Adv. Appl. Math., 27(4):733–767, 2001.

[29] D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Research, 3:993–1022, 2003.

[30] S. Bochner. Harmonic analysis and the theory of probability. 1955.

[31] B. Bollob´as,S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random Structures Algorithms, 31(1):3–122, 2007.

[32] B. Bollob´as,S. Janson, and O. Riordan. Sparse random graphs with clustering. Random Structures Algorithms, 38(3):269–323, 2011. 124

[33] B. Bollob´asand O. Riordan. The diameter of a scale-free random graph. Com- binatorica, 24(1):5–34, 2004.

[34] B. Bollob`as and O. Riordan. Sparse graphs - metrics and random models. Ran- dom Structures and Algorithms, (39):1–38, 2011.

[35] C. Brien and R. Bailey. Multiple randomizations (with discussion). JRSS B, 68:571–609, 2006.

[36] C. J. Burke and M. Rosenblatt. A Markovian function of a Markov chain. Ann. Math. Statist., 29:1112–1122, 1958.

[37] S. Chatterjee and P. Diaconis. Estimating and understanding exponential ran- dom graph models. Preprint, 2011.

[38] S. Chatterjee, P. Diaconis, and A. Sly. Random graphs with a given degree sequence. Ann. Appl. Probab., 2011.

[39] F. Chung and L. Lu. Complex graphs and networks, volume 107 of CBMS Re- gional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 2006.

[40] H. Crane. A consistent markov partition process generated from the paintbox process. J. Appl. Probab., 43(3):778–791, 2011.

[41] R. Darling and J. Norris. Structure of large random hypergraphs. Ann. Appl. Probab., 15(1A):125–152, 2005.

[42] P. Diaconis and D. Freedman. The Markov moment problem and de Finetti’s theorem. I. Math. Z., 247(1):183–199, 2004.

[43] P. Diaconis, S. Holmes, and S. Janson. Interval graph limits. Submitted, 2011.

[44] P. Diaconis and S. P. Holmes. Random walks on trees and matchings. Electron. J. Probab., 7:no. 6, 17 pp. (electronic), 2002. 125

[45] P. Donnelly and T. G. Kurtz. Genealogical processes for Fleming-Viot models with selection and recombination. Ann. Appl. Probab., 9(4):1091–1148, 1999.

[46] P. Donnelly and T. G. Kurtz. Particle representations for measure-valued pop- ulation models. Ann. Probab., 27(1):166–205, 1999.

[47] R. Durrett. Probability: Theory and Examples. The Wadsworth & Brooks/Cole Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1991. Theory and examples.

[48] R. Durrett, B. L. Granovsky, and S. Gueron. The equilibrium behavior of re- versible coagulation-fragmentation processes. J. Theoret. Probab., 12(2):447–474, 1999.

[49] B. Efron and R. Thisted. Estimating the number of unseen species: How many words did shakespeare know? Biometrika, 63:435–447, 1976.

[50] P. Erd˝osand A. R´enyi. On random graphs. I. Publ. Math. Debrecen, 6:290–297, 1959.

[51] P. Erd˝osand A. R´enyi. On the evolution of random graphs. Bull. Inst. Internat. Statist., 38:343–347, 1961.

[52] S. N. Evans and A. Winter. Subtree prune and regraft: a reversible real tree- valued Markov process. Ann. Probab., 34(3):918–961, 2006.

[53] W. J. Ewens. The sampling theory of selectively neutral alleles. Theoret. Pop- ulation Biology, 3:87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376, 1972.

[54] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. ACM Comp. Comm. Review, (29), 1999.

[55] J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, MA, 2004. 126

[56] S. Feng. The Poisson-Dirichlet Distribution and Related Topics. Probability and its Applications. Springer-Verlag, Berlin, 2010.

[57] R. Fisher, A. Corbet, and C. Williams. The relation between the number of species and the number of individuals in a random sample of an animal popula- tion. J. Animal Ecology, 12:42–58, 1943.

[58] A. Gnedin and J. Pitman. Regenerative partition structures. Electron. J. Com- bin., 11(2):Research Paper 12, 21, 2004/06.

[59] A. Gnedin and J. Pitman. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 325(Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12):83–102, 244–245, 2005.

[60] A. Gnedin and J. Pitman. Self-similar and Markov composition structures. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 326(Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 13):59–84, 280–281, 2005.

[61] A. Gnedin, J. Pitman, and M. Yor. Asymptotic laws for regenerative com- positions: gamma subordinators and the like. Probab. Theory Related Fields, 135(4):576–602, 2006.

[62] J. A. Hartigan. Partition models. Comm. Statist. Theory Methods, 19(8):2745– 2756, 1990.

[63] S. Holmes. Statistics for phylogenetic trees. Theoretical Population Biology, 63(1):17–32, 2002.

[64] S. Holmes. Statistical approach to tests involving phylogenies. In Mathematics of Evolution and Phylogeny, O. Gascuel. (ed.). Oxford University Press, USA, 2007.

[65] L. Holst. The poisson-dirichlet distribution and its relatives revisited. Available at http://www.math.kth.se/matstat/fofu/reports/PoiDir.pdf, 2001. 127

[66] J. F. C. Kingman. Random partitions in population genetics. Proc. Roy. Soc. London Ser. A, 361(1704):1–20, 1978.

[67] J. F. C. Kingman. The representation of partition structures. J. London Math. Soc. (2), 18(2):374–380, 1978.

[68] J. F. C. Kingman. Mathematics of genetic diversity, volume 34 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa., 1980.

[69] J. F. C. Kingman. The coalescent. . Appl., 13(3):235–248, 1982.

[70] E. D. Kolaczyk. Statistical analysis of network data. Springer Series in Statistics. Springer, New York, 2009. Methods and models.

[71] L. Li, D. Alderson, J. C. Doyle, and W. Willinger. Towards a theory of scale-free graphs: definition, properties, and implications. Internet Math., 2(4):431–523, 2005.

[72] S. Mac Lane. Categories for the Working Mathematician. Springer, New York, 1998.

[73] P. McCullagh. Tensor Methods in Statistics. Chapman and Hall, London, 1987.

[74] P. McCullagh. Invariance and factorial models. J. Roy Statist. Soc. B, 62(62):209–256, 2000.

[75] P. McCullagh. What is a statistical model? Ann. Statist., 30(5):1225–1310, 2002. With comments and a rejoinder by the author.

[76] P. McCullagh. Exchangeability and regression models. In Celebrating Statistics: Papers in honour of Sir David Cox on his 80th birthday, volume 33 of Oxford Statistical Science Series, pages 89–113. 2005.

[77] P. McCullagh. Random permutations and partition models. In M. Lovic, editor, International Encyclopedia of Statistical Science. 2010. 128

[78] P. McCullagh and J. Møller. The permanental process. Adv. in Appl. Probab., 38(4):873–888, 2006.

[79] P. McCullagh and J. Nelder. Generalized Linear Models, volume 37 of Mono- graphs on Statistics and Applied Probability. CRC Press, 1989.

[80] P. McCullagh, J. Pitman, and M. Winkel. Gibbs fragmentation trees. Bernoulli, 14(4):988–1002, 2008.

[81] P. McCullagh and J. Yang. Stochastic classification models. In International Congress of Mathematicians. Vol. III, pages 669–686. Eur. Math. Soc., Z¨urich, 2006.

[82] P. McCullagh and J. Yang. How many clusters? Bayesian Anal., 3(1):101–120, 2008.

[83] R. Mead. The Design of Experiments: Statistical Principles for Practical Appli- cations. Chapman and Hall, Cambridge, 1988.

[84] S. Milgram. The small world problem. Psych. Today, 1(1):60–67, 1967.

[85] M. E. J. Newman. Random graphs as models of networks. In Handbook of graphs and networks, pages 35–68. Wiley-VCH, Weinheim, 2003.

[86] M. E. J. Newman. The structure and function of complex networks. SIAM Rev., 45(2):167–256 (electronic), 2003.

[87] B. Øksendal. Stochastic differential equations. Universitext. Springer-Verlag, Berlin, sixth edition, 2003. An introduction with applications.

[88] L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press, Cambridge, UK, 2005.

[89] M. Perman, J. Pitman, and M. Yor. Size-biased sampling of poisson point pro- cesses and excursions. Probab. Th. Relat. Fields, 92:21–39, 1992. 129

[90] J. Pitman. The two-parameter generalization of ewens’ random partition struc- ture. In U.C. Berkeley Department of Statistics Technical Reports, number 345. Berkeley, CA, 1992.

[91] J. Pitman. Random discrete distributions invariant under size-biased permuta- tion. Adv. in Appl. Probab., 28(2):525–539, 1996.

[92] J. Pitman. Coalescents with multiple collisions. Ann. Probab., 27(4):1870–1902, 1999.

[93] J. Pitman. Combinatorial stochastic processes, volume 1875 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2006. Lectures from the 32nd Sum- mer School on Probability Theory held in Saint-Flour, July 7–24, 2002, With a foreword by Jean Picard.

[94] J. Pitman and M. Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab., 25(2):855–900, 1997.

[95] T. Speed and R. Bailey. Factorial dispersion models. International Statistical Review, (55):261–277, 1987.

[96] S. Tavar`e. Ancestral Inference in Population Genetics, volume 1837 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2004. Lectures from the 31st Summer School on Probability Theory held in Saint-Flour, 2001.

[97] D. Watts and S. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.

[98] W. Willinger, D. Alderson, and J. C. Doyle. Mathematics and the Internet: a source of enormous confusion and great potential. Notices Amer. Math. Soc., 56(5):586–599, 2009.

[99] Y. Yamasaki. Kolmogorov’s extension theorem for infinite measures. Publ. RIMS, Kyoto Univ., 10:381–411, 1975.