THE UNIVERSITY OF CHICAGO
INFINITELY EXCHANGEABLE PARTITION, TREE AND GRAPH-VALUED STOCHASTIC PROCESSES
A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF STATISTICS
BY HARRY CRANE
CHICAGO, ILLINOIS APRIL 2012 To my parents, Harry and Regina Crane ABSTRACT
The theory of infinitely exchangeable random partitions began with the work of Ewens [53] as a model for species sampling in population biology, known as the Ewens sam- pling formula. Kingman [66, 67] established a correspondence between infinitely ex- changeable partitions and probability measures on partitions of the unit interval, called the paintbox representation. Later, Kingman [66, 68] introduced the coales- cent, an exchangeable Markov process on the space of set partitions, in the field of population genetics. In this thesis, we build on Kingman’s theory to construct an infinitely exchange- able Markov process on the space of partitions whose sample paths differ from pre- viously studied coalescent and fragmentation type processes; we call this process the cut-and-paste process. The cut-and-paste process possesses many of the same proper- ties as its predecessors, including finite-dimensional transition probabilities that can be expressed in terms of a paintbox process, a unique equilibrium measure under general conditions, a Poissonian construction, and an associated mass-valued process almost surely. A special subfamily with parameter α > 0 and k ≥ 1 is related to the Chinese restaurant process and is reversible with respect to the two-parameter Pitman-Yor family with parameter (−α/k, α). An extension of the (α, k)-subfamily has a third parameter Σ, a symmetric square matrix with non-negative entries, called the similarity matrix. From a family of partition-valued Markov kernels, we show how to construct a Markov process on the space of N-rooted fragmentation trees through the ancestral branching procedure. If the family of kernels is infinitely exchangeable, then its associated ancestral branching process is infinitely exchangeable. In addition, the ancestral branching process based on the cut-and-paste Markov kernel possesses a unique equilibrium measure, admits a Poissonian construction and has an associated mass fragmentation-valued process almost surely. Furthermore, the results can be extended to characterize a Markov process on the space of trees with edge lengths. iii iv
Aside from the Erd¨os-R´enyi process and its variants, infinitely exchangeable graph- valued processes are uncommon in the literature. We show a construction for a family of infinitely exchangeable Poisson random hypergraphs which is induced by a con- sistent family of Poisson point processes on the power set of the natural numbers. Infinitely exchangeable families of hereditary hypergraphs and undirected graphs are induced from an infinitely exchangeable Poisson random hypergraph by projection. Finally, we consider balanced and even partition structures, which are families of distributions on partitions with a prespecified block structure. Consistency of these families can be shown under a random deletion procedure. We show Chi- nese restaurant-type constructions for a special class of these structures based on the two-parameter Pitman-Yor family, and discuss connections to randomization in experimental design. ACKNOWLEDGMENTS
First and foremost, this thesis reflects the influence and hard work of my parents, Harry and Regina, who have supported me unconditionally throughout my life. They are due my deepest and most sincere gratitude. I also thank my sister, Kayla, who has been supportive when I needed it most. At Chicago, I thank Peter McCullagh for being readily available with his time and insights, and especially for introducing me to partition and tree-valued processes, for which I have developed a strong affinity. Among the many qualities of his I hope to emulate are attention to detail and patience in research pursuits. I thank the other members of my committee for their valuable contributions: Steve Lalley for providing insightful comments about the direction of my research and about several parts of this thesis, and Mathias Drton for his advice and encouragement throughout my final year in Chicago. I also thank: Michael Wichura, whose emphasis on precision and dedication to his teaching have been an important aspect of my education, and Mei Wang, who has been generous with her time and extremely encouraging. I would also like to thank those who I have met in Chicago and have contributed to my experience here: Alan, Marcin, Joe, Walter, Lior, Winfried, Andrei and, of course, Sherman. Last, but not least, I thank Jie for making my last year here the best of all.
v TABLE OF CONTENTS
ABSTRACT ...... iii
ACKNOWLEDGMENTS ...... v
Chapter 1 INTRODUCTION ...... 2 1.1 Preliminary remarks ...... 2 1.2 Integer partitions ...... 4 1.2.1 Random integer partitions ...... 5 1.3 Projective systems ...... 6 1.4 Projective systems of partitions, trees and graphs ...... 8 1.4.1 Set partitions ...... 10 1.4.2 Fragmentation trees ...... 12 1.4.3 Graphs and permutations ...... 14 1.5 Exchangeable random partitions ...... 14 1.5.1 Distribution of block sizes ...... 16 1.6 Ewens process ...... 17 1.6.1 Pitman-Yor process ...... 18 1.6.2 Gibbs partitions ...... 18 1.6.3 Product partition models ...... 20 1.7 Mass partitions ...... 21 1.8 Paintbox process ...... 22 1.8.1 Asymptotic frequencies ...... 23 1.9 Exchangeable coalescents ...... 24 1.10 Exchangeable fragmentations ...... 26 1.10.1 Exchangeable random fragmentation trees ...... 27 1.10.2 Gibbs fragmentation trees ...... 27 1.10.3 Genealogical interpretation of a tree ...... 28 1.11 Exchangeable fragmentation-coalescence processes ...... 31 1.12 Random graphs ...... 32 1.12.1 Heavy-tailed networks ...... 32 1.12.2 Small-world networks ...... 34 1.13 Organization of thesis ...... 34
vi vii
2 A CONSISTENT MARKOV PARTITION PROCESS GENERATED BY THE PAINTBOX PROCESS ...... 36 2.1 Preliminaries ...... 36 2.2 The Cut-and-Paste process ...... 37 2.2.1 Equilibrium measure ...... 42 2.3 Continuous-time version of CP(ν)-process ...... 45 2.3.1 Poissonian construction ...... 47 2.4 Asymptotic frequencies ...... 49 2.4.1 Poissonian construction ...... 50 2.4.2 Equilibrium measure ...... 51 2.5 A two parameter subfamily ...... 52 2.6 A three-parameter extension ...... 55 2.6.1 Similarity and dissimilarity matrices ...... 55 2.6.2 The extended model ...... 57 2.7 Properties of the CP(α, k; Σ) process ...... 58 2.8 Discussion ...... 63
3 ANCESTRAL BRANCHING AND TREE-VALUED PROCESSES . . . . 64 3.1 Introduction ...... 64 3.2 Ancestral branching kernels ...... 66 3.3 Exchangeable ancestral branching Markov kernels ...... 67 3.4 Consistent ancestral branching kernels ...... 69 3.5 Cut-and-paste ancestral branching processes ...... 73 3.5.1 Construction of the cut-and-paste ancestral branching Markov chain on T ...... 74 3.5.2 Equilibrium measure ...... 76 3.5.3 Continuous-time ancestral branching process ...... 77 3.5.4 Poissonian construction ...... 79 3.5.5 Feller process ...... 79 3.6 Mass fragmentations ...... 82 3.6.1 Associated mass fragmentation process ...... 83 3.6.2 Equilibrium measure ...... 85 3.6.3 Poissonian construction ...... 86 3.7 Weighted trees ...... 88 3.8 Discussion ...... 92 viii
4 INFINITE RANDOM HYPERGRAPHS ...... 93 4.1 Introduction ...... 93 4.1.1 Projective systems of hypergraphs ...... 94 4.2 Infinite Poisson random hypergraphs ...... 98 4.2.1 Construction of the infinite random hypergraph ...... 99 4.3 Induced hypergraphs, hereditary hypergraphs, and undirected graphs 102 4.3.1 Random hypergraphs ...... 103 4.3.2 Hereditary hypergraphs and monotone sets ...... 105 4.3.3 Random undirected graphs ...... 109 4.4 Discussion ...... 110
5 BALANCED AND EVEN PARTITION STRUCTURES ...... 111 5.1 Preliminaries ...... 111 5.2 Balanced partitions ...... 112 5.3 Even partitions ...... 113 5.4 Partition structures ...... 114 5.5 Balanced and even permutations ...... 115 5.6 Relating balanced and even partitions ...... 115 5.7 Chinese restaurant constructions ...... 116 5.7.1 Chinese restaurant construction for balanced partitions . . . . 116 5.7.2 Chinese restaurant construction for even partitions ...... 117 5.8 Randomization ...... 118 1
Notation
[n] {1, 2, . . . , n} N the natural numbers, [∞] := {1, 2,...} 2A the power set of A, i.e. {a : a ⊆ A} P set partitions of N (k) P set partitions of N with at most k blocks PI interval partitions of [0, 1] R↓ ranked-mass partitions R↓(k) ranked k-simplex; ranked-mass partitions with at most k positive components
Pn integer partitions of n ∈ N (k) Pn integer partitions of n with at most k parts
Pn,k integer partitions of n with exactly k parts
P[n] set partitions of [n] (k) P[n] partitions of [n] with at most k blocks
P[nj]:j partitions of [nj] with block sizes divisible by j; j-even partitions of [nj] 0 P[nj]:j partitions of [nj] with each element labeled as one of j types and each block containing an equal number of elements of each type; j-balanced partitions ↓ %s paintbox based on s ∈ R R %ν ν-mixture of s-paintboxes; %ν(·) := R↓ %s(·)ν(ds) µ(k) k-fold product measure µ ⊗ · · · ⊗ µ of µ S permutations of N; symmetric group acting on N
Sn permutations of [n]; symmetric group acting on [n] T N-rooted (fragmentation) trees T¯ weighted N-rooted trees; N-rooted trees with edge lengths (k) T k-ary N-rooted trees (k) T¯ weighted k-ary N-rooted trees; k-ary N-rooted trees with edge lengths
Tn [n]-rooted trees
T¯n weighted [n]-rooted trees; [n]-rooted trees with edge lengths (k) Tn k-ary [n]-rooted trees (k) T¯n weighted k-ary [n]-rooted trees; k-ary [n]-rooted trees with edge lengths G N-labeled undirected graphs
Gn undirected graphs with vertices labeled in [n] CHAPTER 1 INTRODUCTION
In this volume, we discuss infinitely exchangeable probability models for random partitions, trees and graphs. Our development builds upon theory that has been refined through decades of research in the fields of probability, statistics and various scientific disciplines. We begin this chapter by discussing the motivation for the work contained in subsequent chapters. Some commonly used notation is found on page 1.
1.1 Preliminary remarks
Exchangeable probability models have been studied in relation to the theory and application of statistical models [9, 10, 12, 76], in which discrete collections of units are labeled by some countable index set, without loss of generality the natural numbers N := {1, 2,...}. Meanwhile, the theory of partitions has been studied in direct connection to cumulants [73], population genetics [53, 69], experimental design and association schemes [15] and probability theory [93]. Our study of partition models is motivated by their potential use in mathematical biology and population genetics, as demonstrated by the widespread use of coalescent theory in these areas. Specifically, the concept of chapter 2 preceded the rest of this thesis and was borne of the following modeling consideration. For a sample of units (individuals) from a population, suppose data takes the form of a sequence of RNA/DNA along a contiguous region of the chromosome, as in table 1.1. By disregarding the nucleotides and considering only the equivalence classes of the species at each site, we obtain a sequence of partitions of the species. For example, the partition at site 1 in table 1.1 is
{snake, iguana, lizard, crocodile, bird, whale, monkey}, {cow, human}; 2 3
site 1 2 3 ... snake TAGGATTAGATACCC iguana TAGGATTAGATACCC lizard TAGGATTAGATACCC crocodile TAGGATTAGATACCC bird TGGGATTAGATACCC whale TGGGATTAGATACCC cow AAGCATC-TACACCC human AACCCCCGCCCATCC monkey TGGGATTAGATACCC
Table 1.1: Mitochondrial DNA (mtDNA) data for nine species obtained from http://www.bch.umontreal.ca/ogmp/projects/other/mt list.html in January 2012. Note that the entry of cow at site 8 shows as ‘-’, which does not correspond to miss- ing data, but is a result of sequence alignment and other biological and evolutionary processes which are far beyond the scope of this thesis. the partition at site 2 is
{snake, iguana, lizard, crocodile, cow, human}, {bird, whale, monkey}; and so on. One area where data of this sort arises is in phylogenetic inference, in which DNA sequence data is used to infer phylogenetic relationships among individuals or species. The scientific consensus for the shape of the phylogenetic tree relating the nine species in table 1.1 is shown in figure 1.1.
snake lizard iguana
crocodile bird
whale cow human monkey
Figure 1.1: Consensus phylogenetic tree (Obtained from Theobald, Douglas L.“29+ Evidences for Macroevolution” TalkOrigins. Department of Biochemistry, Brandeis University.)
Felsenstein [55] recounts much of the work in this area in recent decades, which 4
includes statistical and non-statistical methods. There is also recent work by Holmes [28, 63, 64] on the estimation of unknown phlogenetic trees. In recent years, tech- niques from algebraic geometry have been developed for use in computational biology, particularly phylogenetic inference; Pachter and Sturmfels [88] provide a recent survey of some of this work. In chapter 2 we endeavor to develop a theoretical framework within which to model dependent partition sequences, e.g. the partition sequence determined by DNA data in table 1.1. To this end, the work in chapter 2 is necessary, as previously studied partition-valued processes, e.g. coalescent and fragmentation processes, are not suited to modeling the sequences which arise from DNA data of the sort we discuss here. The extended model in section 2.6 seems to have potential for this application; however, we do not address any application here. The remainder of this thesis deals primarily with theory relating to random parti- tions, trees and graphs. We now give a brief overview of the relevant literature in this field and introduce notation and terminology critical to the rest of the thesis. Though inessential to the rest of this thesis, it is common to introduce the theory of random set partitions by first discussing random integer partitions, which we do in section 1.2; sections 1.3 and 1.4 are essential to all subsequent chapters; sections 1.5-1.8 are central to chapters 2 and 3, and section 1.10 is relevant to chapter 3; sections 1.9-1.12 contain mostly background material.
1.2 Integer partitions
An integer partition n := (n1, . . . , nm) ≡ n1 + ··· + nm of n ∈ N is a list of parts Pm such that i=1 ni = n. We write #n to denote the number of parts of n, which is m in our description. Note that the order in which the parts are listed is irrelevant,
though it is conventional to list parts in decreasing order. We write Pn to denote the space of integer partitions of n. Alternatively, a partition of n can be written as a list of multiplicities λ = λ λ λn (λ1, . . . , λn), sometimes also denoted 1 12 2 ··· n , where λj is the number of parts Pn Pn of size j such that j=1 jλj = n. The number of parts of λ is the sum λ. := j=1 λj. 5
These two ways for writing integer partitions are equivalent. For example, an integer partition of 8 with four parts of size 3, 2, 2, 1 can be denoted by
• a list of parts: (3, 2, 2, 1) ≡ 3 + 2 + 2 + 1 ≡ 2 + 1 + 3 + 2, or
• a list of multiplicities: (1, 2, 1) ≡ 112231.
Andrews [13] provides a thorough account theory of integer partitions in the fields of combinatorics and number theory.
1.2.1 Random integer partitions
The study of random partitions dates back to the work of Ewens [53] who intro- duced the Ewens sampling formula (ESF) as a one-parameter distribution on integer partitions. For n ∈ N, the Ewens sampling formula with parameter α > 0, ESF(α), on Pn is
n λ n! Y α j pn(λ; α) = , λ ∈ Pn, (1.1) α↑n λj j=1 j λj!
where α↑n := α(α + 1) ··· (α + n − 1) = Γ(n + α)/Γ(α) is the ascending factorial and
Z ∞ Γ(α) := xα−1e−xdx 0 is the gamma function. The Ewens sampling formula was derived from the study of sampling theory for selectively neutral alleles at a single locus in population genetics. In this setting, n ≥ 1 is the size of a sample of individuals and λ ∈ Pn is the integer partition associated with this sample where λ. represents the number of different alleles which appear at one locus in the sample, and the parts of λ represent the number of individuals who share a particular allele. Kingman [66, 67] introduced the notion of a partition structure as a sequence
(P1,P2,...) of distributions on (P1, P2,...), integer partitions of n for each n ≥ 1, which is sampling consistent under uniform deletion of one of the elements. In other 6
words, for πn ∼ Pn, the partition πn−1 ∈ Pn−1 obtained by choosing a part p ∈ πn with probability proportional to its size and reducing it by 1 has distribution Pn−1. Gnedin and Pitman [58, 60, 61] have studied general properties of regenerative and self-similar partition structures under some alternative methods of sampling. Kingman [69] has shown a one-to-one correspondence between partition structures and exchangeable random partitions of the set N and, consequently, that the theory of partition structures is more naturally developed in the context of random set partitions of N, which can be treated as a collection of probability distributions on a projective system, which we now discuss.
1.3 Projective systems
A projective system indexed by N associates with each finite set [n] := {1, . . . , n} a set Qn and with each one-to-one injective map ϕ :[m] → [n], m ≤ n, a projection ∗ ϕ : Qn → Qm which maps Qn into Qm such that
∗ • if ϕ is the identity [n] → [n] then ϕ is the identity Qn → Qn, and
∗ • if ψ :[l] → [m], l ≤ m, and ψ : Qm → Ql is its associated projection, the ∗ ∗ ∗ composition (ϕψ):[l] → [n] satisfies (ϕψ) ≡ ψ ϕ : Qn → Ql.
[n] Qn
ϕ ϕ∗ (Q,∗ ) ∗ ∗ ϕψ [m] Qm ψ ϕ
ψ ψ∗
[l] Ql
In terms of category theory, a projective system is a contravariant functor (Q,∗ ) from the category of finite sets and morphisms between the sets and a category of sets
(Qn, n ≥ 1) and morphisms in the opposite direction between these sets, as shown in the above diagram; see [14, 72]. 7
The construction of a family of probability distributions on a projective system allows one to uniquely characterize a probability measure on the projective limit space defined by Q := (Qn, n ≥ 1) by Kolmogorov’s extension theorem.
Theorem 1.3.1. (Kolmogorov’s extension theorem) ([87, 99]) For each n ≥ 1, let (Ωn, Fn, µn) be a projective system of measure spaces such that for every injection ∗ ϕm,n :[m] → [n], m ≤ n, with associated measurable projection ϕm,n :Ωn → Ωm,
∗−1 µm(Bm) = µn(ϕm,n(Bm)) (1.2) holds for all Bm ∈ Fm. Then (µn, n ≥ 1) can be uniquely extended to a measure µ on (Ω, F) the projective limit measurable space of (Ωn, Fn).
A collection of measures (µn, n ≥ 1) which satisfies (1.2) with respect to some ∗ collection of measurable functions (ϕm,n, 1 ≤ m ≤ n) is called self-consistent with respect to sub-sampling. The original statement of Kolmogorov’s theorem, as it appears e.g. in [87], ex- tended a measure on a finite product space to the infinite product space. Bochner [30] showed a generalization of Kolmogorov’s theorem to the projective limit measur- able space, which we have rewritten above and is relevant to our study of partitions, trees and graphs labeled by N. In our study of processes of random partitions, trees and graphs, we are interested in constructing families of self-consistent probability measures which are finitely ex- changeable.
Definition 1.3.2. A collection of measures (µn, n ≥ 1) on a family of measurable spaces (Ωn, Fn) is finitely exchangeable if for each n ≥ 1, Bn ∈ Fn and measurable bijective map ϕ :Ωn → Ωn
−1 µn(Bn) = µn(ϕ (Bn)). (1.3)
For our treatment, an injective map ϕ :Ωn → Ωn corresponds to an element of the symmetric group Sn of permutations acting on [n] and finite exchangeability 8
corresponds to invariance under relabeling of elements. A family of measures (µn, n ≥ 1) which is self-consistent and finitely exchangeable is said to be infinitely exchangeable and its unique extension to (Ω, F) is invariant under measurable injective mappings Ω → Ω. Infinitely exchangeable collections of measures have a natural interpretation in terms of statistical models in which statistical units are labeled by N and a finite sample of size n ∈ N is taken from an infinite population. Finite exchangeability (1.3) guarantees that inference based on a sample of size n is unaffected by arbitrary labeling of the units, while self-consistency (1.2), also referred to as consistency under subsampling, guarantees that inference based on the finite sample is unaffected by unsampled statistical units [75]. In particular, a statistic based on a sample of size m taken from the population has the same distribution as a statistic based on a subsample of size m ≤ n taken from a sample of size n.
The collection (Pn, n ≥ 1) of integer partitions is not a projective system, and so theorem 1.3.1 cannot be applied to study integer partitions; however, the collections
(P[n], n ≥ 1) of finite set partitions of [n], (Tn, n ≥ 1) of [n]-rooted fragmentation trees and (Gn, n ≥ 1) of [n]-labeled graphs are projective systems under appropriate projection operations, which we now discuss.
1.4 Projective systems of partitions, trees and graphs
[n]2 2 If Qn := 2 is the set of subsets of [n] , i.e. the space of directed graphs with n vertices, one can define the projection Qn → Qm either by restriction or by delete- and-repair. Each A ∈ Qn can be represented as an n × n matrix with entries in {0, 1}
such that Aij = 1 if (i, j) ∈ A and Aij = 0 otherwise. For each n ≥ 1, let ϕn,n+1 be the operation on Qn+1 which restricts A to the
complement of {n+1}. In matrix form, ϕn,n+1A =: A|[n] is the n×n matrix obtained from A by removing the last row and last column of A and keeping the rest of the
entries unchanged. It is clear that the compositions ϕm,n := ϕm,m+1 ◦ · · · ◦ ϕn−1,n for m ≤ n are well-defined as the restriction of A ∈ Qn to [m] by removing the last 9 n − m rows and columns of A. We call the maps (ϕm,n, 1 ≤ m ≤ n) the restriction, or deletion, maps on (Qn, n ≥ 1). A permutation σ ∈ Sn acts on each element A ∈ Qn componentwise in the usual way. That is, for each i, j ∈ [n], σ(A)(i, j) := Aσ(i, j) := A(σ(i), σ(j)). The restriction maps (ϕm,n, m ≤ n) together with permutation maps (σ ∈ Sn, n ≥ 1) ϕ and their compositions make Q := (Qn, n ≥ 1) a projective system, written Q ≡ (Qn, ϕm,n). Another way to specify a projective system on (Qn, n ≥ 1) is by delete-and-repair. For n ≥ m ≥ 1, let ψm act on A ∈ Qn by removing the mth row and column of A and directing an edge from each i in {j ∈ [n]:(j, m) ∈ A} to each k in
{j ∈ [n]:(m, j) ∈ A}. In other words, ψmA is obtained by deleting the vertex labeled m from A and connecting two vertices i and k by a directed edge from i to k if both (i, m) and (m, k) are elements of A, i.e. there is a directed path i → m → k in A.
For m ≤ n, define ψm,n := ψm+1 ◦ · · · ◦ ψn. Plainly, ψm,n is well-defined since for each n ≥ 2, ψn−2,n ≡ ψn−1 ◦ ψn = ψn ◦ ψn−1 and ψl,n = ψl,m ◦ ψm,n. The delete- and-repair maps (ψm,n, m ≤ n) together with permutation maps (σ ∈ Sn, n ≥ 1) and ψ compositions also make (Qn, n ≥ 1) a projective system, written Q ≡ (Qn, ψm,n). Note that the two projective systems (Qn, ϕm,n) and (Qn, ψm,n) have the same objects but are different projective systems; the arrows are different.
Example 1.4.1. A permutation σ ∈ Sn is a one-to-one and onto map [n] → [n], 2 and we regard σ as a subset of [n] with (i, j) ∈ σ if σ(i) = j. For σ ∈ Sn+1, 0 delete-and-repair acts on σ by putting σ := ψn,n+1σ which satisfies
( σ(n + 1), i = σ−1(n + 1) σ0(i) = σ(i), otherwise.
(Sn, n ≥ 1) together with delete-and-repair maps (ψm,n, m ≤ n) and permutation maps is a projective system.
The collections (P[n], n ≥ 1), (Gn, n ≥ 1) and (Sn, n ≥ 1) of partitions, graphs and permutations respectively can each be described as a special case of one or more 10
of the above formulations. A different specification, however, is required when we
discuss the collection (Tn, n ≥ 1) of fragmentation trees later.
1.4.1 Set partitions
For any subset A ⊂ N, a (set) partition π of A is a collection of disjoint non-empty S subsets of A, called blocks, written {π1,...}, such that i πi = A. In general, we write PA to denote the collection of partitions of A. Specifically, the space of partitions of
[n] is denoted by P[n]. For each n ≥ 1, a set partition π ∈ P[n] can be regarded as (i) a collection of disjoint subsets; e.g. {{1, 2, 5}, {3, 6}, {4}} ≡ 125|36|4 ≡ 36|135|4;
(ii) an equivalence relation π :[n] × [n] → {0, 1} such that π(i, j) = 1 if and only if
i ∼π j, i.e. i and j are in the same block of π;
(iii) a symmetric Boolean matrix with entries given by the equivalence relation in (ii); e.g. 125|36|4 corresponds to
1 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 1 . 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1
Note that the order in which blocks are listed in (i) is irrelevant, though it is conven-
tional to list blocks in order of their least element. In general, we write π := (π1,...), as opposed to {π1,...}, when we wish to emphasize the order in which blocks are listed. Description (i) is often best for visualization of partitions, (ii) is useful for theoretical treatment of partitions and (iii) is sometimes convenient for computations involving partitions. As the above three descriptions are equivalent, we need not spec- ify which we are using; however, by using description (iii), we can discuss a partition of [n] in the context of a subset of [n]2 as we have at the beginning of section 1.4. 11
For any set partition π, we write #π to denote the number of blocks of π; for example, for π = {{1, 2, 5}, {3, 6}, {4}} we have #π = 3. We write b ∈ π to denote a (k) block b of π and #b denotes the number of elements in b. The notation P[n] denotes the subspace of P[n] which consists of partitions of [n] with at most k ≥ 1 blocks, i.e. (k) P[n] := {π ∈ P[n] :#π ≤ k}. For B ⊂ A ⊂ N, let π := {π1,...} be a set partition of A. We write π|B := {πi ∩B : πi ∈ π}\{∅} to denote the restriction of π to B, excluding the empty set.For each n ≥ 1, we define the restriction map Dm,n : P[n] → P[m] in the obvious way by Dn,n+1π := π|[m] ≡ {b ∩ [m]: b ∈ π}\{∅} for every π ∈ P[n]. It is clear that, for m ≤ n, Dm,n corresponds to the composition Dm,m+1 ◦ · · · ◦ Dn−1,n.
The symmetric group Sn acts on a set partition π ∈ P[n] by relabeling. We write σ π := σ(π) ≡ σπ ≡ σ ◦ π to be the partition of [n] defined by i ∼σπ j if and only if −1 −1 σ (i) ∼π σ (j). In other words, for π regarded as a function [n] × [n] → {0, 1}, πσ(i, j) = π(σ(i), σ(j)) for each i, j ∈ [n].
The spaces P[n] for n = 1, 2, 3, 4 are
P[1] : 1 P[2] : 12 1|2 P[3] : 123 1|23 12|3 13|2 1|2|3 P[4] : 1234 1|234[4] 12|34[3] 1|2|34[6] 1|2|3|4.
For π ∈ P[n], we write π[m] to denote the coset of π under action of the symmetric group, i.e. π[m] := {σπ : σ ∈ Sn}; m denotes the cardinality of this coset. In particular, 1|234[4] := {1|234, 134|2, 124|3, 123|4}.
The cardinality of P[n] for n = 1, 2, 3, 4 is 1, 2, 5, 15 respectively. In general, the cardinality of P[n] is a special case of the Bell numbers (section 1.6.2). For each n ≥ 1, the set (P[n], ≤) is a partially ordered lattice with binary relation 0 0 ≤ called sub-partition. For B,B ∈ P[n], we write B ≤ B if, for every b ∈ B, 0 0 0 0 there is a b ∈ B such that b ⊆ b . For B,B ∈ P[n], we define the infimum, 0 0 0 0 B ∧ B , by B ∧ B := sup{π ∈ P[n] : π ≤ B & π ≤ B }. In words, B ∧ B is the 12
0 largest partition π ∈ P[n] such that π ≤ B and π ≤ B . Conversely, the supremum, 0 0 B ∨ B := inf{π ∈ P[n] : B ≤ π & B ≤ π }, is the smallest partition π ∈ P[n] such 0 that B ≤ π and B ≤ π. We illustrate the lattice P[3] in the Hasse diagram below.
123
1|23 12|3 13|2
123
By the transitivity property of set partitions (description (ii)), the operations restriction and delete-and-repair are equivalent on the space of partitions. Hence, the collection P[n], n ≥ 1 of finite set partitions together with restriction maps (Dm,n, m ≤ n), permutation maps (σ ∈ Sn, n ≥ 1) and their admissible compositions
constitute a projective system (P[n],Dm,n), whose limit we denote by P, the space of partitions of N.
1.4.2 Fragmentation trees
A For any subset A ⊂ N, a collection of non-empty subsets T ⊂ 2 , the power set of A, is a rooted tree if
(i) A ∈ T , called the root of T and denoted root(T ) = A, and
(ii) B,C ∈ T implies B ∩ C ∈ {∅,B,C}. That is, either B and C are disjoint or one is a subset of the other.
If T contains all singleton subsets of A, T is a rooted fragmentation tree. Throughout the rest of this thesis, the word tree and fragmentation are both understood to mean
rooted fragmentation tree. We write TA to denote the space of fragmentations of A and, specifically, Tn to denote the space of fragmentations of [n], or [n]-rooted fragmentation trees.
An element TA of TA can be regarded as either 13
(a) a collection of subsets of A satisfying (i) and (ii) above. For example, TA := {12345, 12, 345, 45, 1, 2, 3, 4, 5}; or
(b) the tree with a distinguished vertex, labeled root, and k other vertices labeled
by subsets of TA, {A1,...,Ak}, such that there is a directed edge root 7→ A and a directed edge Ai 7→ Aj if and only if Aj is a child of Ai (see section 1.10.3).
For example, the collection TA in (a) can be represented by the tree below.
root 12345 345 12 45 1 2 3 4 5
As these descriptions are equivalent, we need not specify which we intend; however, specification (a) is often the more useful of the two. (k) For k ≥ 2, write T ⊂ T to denote the space of fragmentations of N such that (k) (k) each parent of t ∈ T has at most k children. For n ≥ 1, we write Tn and Tn for the restriction to [n] of T and T (k) respectively.
For any subset S ⊂ A, the restriction of T ∈ TA to B is defined by T|B := {B ∩ t : t ∈ T } (excluding the empty set), the reduced sub-tree of Aldous [2]. We abuse notation slightly to write Dn,n+1 : Tn+1 → Tn to denote the operation Dn,n+1T :=
T|[n] on trees. Note that the apparent overloading of Dn,n+1 as a function on both P[n+1] and Tn+1 should cause no confusion as it is fundamentally defined, in both cases, as a function on collections of subsets of N. For n ≥ 1 and σ ∈ Sn, σ acts on each T ∈ Tn componentwise in the usual way. σ σ σ That is, for T := {Ai : i ≥ 1}, σT := T := {Ai : i ≥ 1} where Ai := {σ(i): i ∈ A}. The collection (Tn, n ≥ 1) of [n]-rooted trees together with restriction maps (Dm,n, 1 ≤ m ≤ n), permutation maps (σ ∈ Sn, n ≥ 1) and their compositions defines a projective system. We write T := (Tn, n ≥ 1) to denote the projective limit space of N-rooted fragmentation trees. 14
As in the description of partitions of N, any fragmentation T ∈ T can be expressed
as a compatible sequence (T|[n], n ≥ 1) of reduced subtrees, and we often write T := (T|[n], n ≥ 1).
1.4.3 Graphs and permutations
An undirected graph is a collection A of subsets of [n]2 such that (i, j) ∈ A implies (j, i) ∈ A. That is, the relationships in G are symmetric and it is often convenient to
write i ∼G j to denote that i and j are neighbors or adjacent in G. A permutation σ of [n] is a one-to-one and onto function [n] → [n] which can be
described as a directed graph in the sense of example 1.4.1 with σij = 1 if and only if σ(i) = j.
The collection (Gn, n ≥ 1) is a projective system under both restriction and delete- and-repair, but in this thesis we only study the projective system (Gn, n ≥ 1) char- acterized by restriction maps (ϕm,n, m ≤ n) in chapter 4. On the other hand, the collection (Sn, n ≥ 1) of permutations is a projective system under delete-and-repair maps (ψm,n, m ≤ n). We write G and S to denote the projective limit spaces characterized by (Gn, ϕm,n) and (Sn, ψm,n) respectively.
1.5 Exchangeable random partitions
Let (Sn, ψm,n) and (P[n], ϕm,n) be the projective systems of permutations and par- titions of [n] respectively. There are natural maps π : Sn → P[n] and ν : P[n] → Pn where
k (i) for σ ∈ Sn, i ∼π(σ) j if and only if there exists k ≥ 1 such that σ (i) := σ ◦ · · · ◦ σ(i) = j; that is, π(σ) is the set partition whose blocks correspond to | {z } k times the cycles of σ; and
(ii) for π ∈ P[n], ν(π) := (λ1, . . . , λn) where λj := #{b ∈ π :#b = j}; that is, ν(π) is the integer partition whose parts correspond to the block sizes of π. 15
The interplay between mappings ν, π, ψm,n, and ϕm,n is summarized in (1.4) such that the arrows commute.
π ν Sn P[n] Pn
ψm,n ϕm,n
Sm P Pm π [m] ν (1.4)
n! To each integer partition λ ∈ Pn, there are partitions of [n] in the Qn λj j=1(j!) λj! −1 inverse image ν (λ) := {π ∈ P[n] : ν(π) = λ}, partitions of [n] whose block sizes correspond to the parts of λ. Hence, a straightforward way to obtain a random set
partition of [n] given a distribution Pn on Pn is to first sample λ ∼ Pn and then, given λ, sample uniformly from the collection ν−1(λ) of set partitions which correspond to ∗ λ. The resulting distribution Pn on P[n] is
n Pn(ν(π)) Y P ∗(π) = (j!)λj λ ! (1.5) n n! j j=1
where (λj, 1 ≤ j ≤ n) are the parts of ν(π).
A random partition Π ∈ P[n] for n ≥ 1 is called exchangeable if its distribution is invariant under the natural action of Sn on P[n]. Equivalently, a random partition Π is exchangeable if, for each partition {π1, . . . , πk} of [n], its distribution can be expressed as
P(Π = {π1, . . . , πk}) = p(#π1,..., #πk) for some symmetric function p of integer partitions (n1, . . . , nk) of n. This function is called the exchangeable partition probability function (EPPF) [93]. Note that any
EPPF can be formulated as in (1.5) for an appropriate choice of distribution Pn on Pn. 16 1.5.1 Distribution of block sizes
Let Π be a random partition of [n] whose distribution is determined by EPPF p and consider N↓ = (N↓,...,N↓ ) the partition of n induced by Π, i.e. the decreasing 1 Kn sequence of block sizes of Π. Then the distribution of N↓ is obtained by inverting (1.5) to obtain
n! P(N↓ = (n , . . . , n )) = p(n , . . . , n ), (1.6) 1 k Qn λ 1 k i=1(i!) iλi!
where k X λ := i I{nl=i} l=1 is the number of components of size i in the partition of n. We call (1.6) the distri- bution induced on block sizes in decreasing order. ˜ ˜ ˜ Alternatively, we could consider the distribution of N := (N1,..., NKn) of block sizes of Π in order of appearance, or size-biased order, which is obtained by ordering ˜ the blocks {π1, . . . , πk} of Π in order of their least element and putting Ni = #πi for each i = 1, . . . , k. The distribution of N˜ is obtained by multiplying (1.6) by the factor Qk i=1 λi!ni nk(nk + nk−1) ··· (nk + ··· + n1) which yields
n! (N˜ = (n , . . . , n )) = p(n , . . . , n ). P 1 k Qk 1 k nk(nk + nk−1) ··· (nk + ··· + n1) i=1(ni − 1)! (1.7) Finally, we can consider the block sizes of Π in exchangeable random order as
follows. Conditional on Π = {π1, . . . , πk}, generate a uniform permutation σ of [k] ex ex and put N = (πσ(1), . . . , πσ(k)). The distribution of N is
ex n 1 P(N = (n1, . . . , nk)) = p(n1, . . . , nk). (1.8) n1, . . . , nk k! 17 1.6 Ewens process
For α > 0, the Ewens distribution on P[n] induced by ESF(α) (1.1) through (1.5) is
α#π Y pn(π; α) = Γ(#b). (1.9) α↑n b∈π
As each of the finite-dimensional distributions in the Ewens family depends on π ∈
P[n] through ν(π), equation (1.9) is an exchangeable partition probability function for each n ≥ 1 and the collection of Ewens distributions on P[n] are finitely exchangeable. The collection (pn(·; α), n ≥ 1) on the projective system (P[n], n ≥ 1) under restriction is self-consistent in the sense of (1.2). This is most readily seen by a
sequential construction of pn(·; α) through the Chinese restaurant process (CRP).
Chinese restaurant process In a Chinese restaurant process with parameter α > 0, CRP(α), we assume customers are labeled 1, 2,... and arrive sequentially in a Chinese restaurant, with infinite seating capacity, as follows. (The tables of the restaurant correspond to blocks of a partition.)
(i) The first customer sits at his own table;
(ii) after n customers have been seated in configuration π ∈ P[n], the (n + 1)st labeled customer enters the restaurant and randomly chooses a table (block) at which to sit according to the following seating rule
( #b/(n + α), b ∈ π pr(n + 1 7→ b|π) = α/(n + α), b = ∅.
In other words, a new customer sits at each occupied table with probability proportional to the number of customers already seated at that table and sits at a new (unoccupied) table with probability proportional to α.
It is clear by this construction that the finite-dimensional distributions in (1.9) are self-consistent and characterize an infinitely exchangeable measure on S P, σ n P[n] , which is called the Ewens(α) process on P. 18 1.6.1 Pitman-Yor process
Pitman [90] and [94] (with Yor) studies a two-parameter extension of the Ewens(α) process, which we call the Pitman-Yor(α, θ) process, or simply the (α, θ)-model. For (α, θ) satisfying either
• α = −κ < 0 and θ = mκ for some m = 1, 2,..., or
• 0 ≤ α ≤ 1 and θ > −α, the modified Chinese restaurant seating rule
( (#b − α)/(n + θ), b ∈ π pr(n + 1 7→ b|π) = (α#π + θ)/(n + θ), b = ∅
generates finite-dimensional distributions
↑#π (θ/α) Y ↑#b pn(π; α, θ) = −(−α) (1.10) θ↑n b∈π
on P[n] which satisfy (1.2) and (1.3) on P[n], n ≥ 1 under restriction. Note that when α = 0 (1.10) coincides with (1.9) with parameter θ > 0. The (α, θ)-model has several nice properties and appears in a variety of contexts in the theory of random partitions, as we see in chapter 2. Pitman [91, 93] has established several properties of the above process, including another construction via a residual allocation model, or stick breaking scheme, and also its relation to the Poisson-Dirichlet distribution and stable subordinators.
1.6.2 Gibbs partitions
Composite structures and the Bell numbers The enumeration of integer parti- tions has been studied widely in the field of combinatorics, and a more general notion
of an integer partition is that of a composite structure [21]. Let v• := (v1, v2,...) and w• := (w1, w2,...) be sequences of non-negative integers and let V and W be 19 two species of combinatorial structures so that for each finite set F with cardinal- ity #F = n, the collection of V -structures (respectively W -structures) of F is a set
V (F ) (respectively W (F )) having vn (respectively wn) elements. For a finite set F , the V ◦ W composite structure of F , written (V ◦ W )(F ), is the set of all ways to partition F into blocks {F1,...,Fk}, assign the collection of blocks a V -structure, and assign each block a W -structure. For a set F with #F = n, the cardinality of (V ◦W )(F ) is given by the expression
n X #(V ◦ W )(F ) := Bn(v•, w•) := vkBn,k(w•), (1.11) k=1 where k X Y B (w ) := w , (1.12) n,k • #Bi {B1,...,Bk} i=1 and the sum is taken over all partitions {B1,...,Bk} of F into k non-empty blocks. Bn(v•, w•) is called the nth Bell polynomial.
Example 1.6.1. Let v• ≡ w• ≡ 1. Then Bn(v•, w•) = # P[n] corresponds to the number of partitions of [n] for each n ≥ 1; these are called the Bell numbers.
Gibbs partitions The Ewens process, and more generally the (α, θ)-model, appear in a wide range of contexts and is a special case of several classes of random partition models. One such class is the class of Gibbs partitions, which are defined on P[n] by finite-dimensional distributions
v Qk w k i=1 #πi P(Πn = {π1, . . . , πk}) = (1.13) Bn(v, w) where v := (v1,...) and w := (w1,...) are sequences of non-negative real numbers and Bn(v, w) is the Bell polynomial from (1.11). A random partition of [n] distributed according to (1.13) for suitable sequences v and w is said to have a Gibbs[n](v, w) distribution. 20
Gibbs partitions have been studied in some detail in the literature. The sequences v and w give a Gibbs partition a natural physical interpretation for partitions for which each block of size j can be in any of wj internal states and a partition with k
blocks can be in any of vk internal states and each configuration is equally likely. A
Gibbs[n](v, w) partition with k blocks of size (λ1, . . . , λk) is obtained by ignoring the state of the associated V ◦ W composite structure and considering only the partition of [n] which it induces. See chapter 1.5 of [93] and [20, 59] for an overview of Gibbs partitions. McCullagh, Pitman and Winkel [80] study Gibbs fragmentation processes, which we discuss in more detail in section 1.10 and chapter 3.
Example 1.6.2. The Ewens(α) distribution on P[n] is a Gibbs[n](v, w) distribution with v = (αn, n ≥ 1) and w = ((n − 1)!, n ≥ 1).
1.6.3 Product partition models
Hartigan [62] studies a family of random partition models called product partition models which assign probability
Y Pn(Π = π) ∝ c(b) (1.14) b∈π
[n] + to each π ∈ P[n], where c : 2 → R is a (non-negative) cohesion function associated with subsets of [n].
Example 1.6.3. The Ewens(α) distribution is a product partition distribution with c(b) := αΓ(#b) for each b ⊂ N.
Note the relationship among exchangeable, Gibbs and product partition models, as these classes overlap, but none is a subset of another. By the form of a Gibbs partition in (1.13), it is clear that a Gibbs[n](v, w) partition is finitely exchangeable as it admits the form of a finite EPPF; however, a collection (Pn, n ≥ 1) where Pn is a Gibbs[n](v, w) distribution for suitable sequences v and w for each n ≥ 1 is not, in general, consistent. Furthermore, a product partition distribution need not be either finitely exchangeable or consistent. 21 1.7 Mass partitions
There is a relationship between exchangeable random partitions of N and partitions of unit mass, or mass partitions, originally introduced by Kingman [67] with subsequent developments summarized by Bertoin [25, 27] and Pitman [91, 93]. We exploit this relationship in our development of partition-valued and tree-valued Markov processes in chapters 2 and 3.
Definition 1.7.1. A (ranked)-mass partition s := (s1, s2,...) is a sequence of non- negative real numbers such that
• s1 ≥ s2 ≥ · · · ≥ 0 and P∞ • i=1 si ≤ 1. P A mass partition for which i si = 1 is called proper; otherwise, it is improper. The P residual mass s0 := 1 − i si is called dust. ↓ P Let R := {(s1, s2,...): s1 ≥ s2 ≥ 0, i si ≤ 1} denote the space of ranked- ↓(k) ↓ P mass partitions and R := {s ∈ R : sj = 0 ∀j > k, i si = 1} the space of proper mass partitions with at most k positive components, commonly referred to as the ranked k-simplex. Bertoin [25] shows that the space R↓ arises as a subclass ↓ of the space S := {(x1,...): x1 ≥ x2 ≥ · · · ≥ 0} of decreasing sequences of ↓ non-negative numbers, which is the projective limit of the spaces S := {(x1,...): x1 ≥ x2 ≥ · · · ≥ ≥ 0} which are obtained by applying the threshold operator ↓ ϕ : [0, 1] → [0, 1], ϕ(x) = xI{x>}, to each component of x ∈ S . An interval partition of (0, 1) is the collection of interval components of an ar- bitrary open set θ ∈ (0, 1). We denote the space of interval partitions of (0, 1) by ↓ PI. Each θ ∈ PI determines a mass partition s ∈ R in an obvious way: the ith component si of s is the length of the ith largest interval of θ. Conversely, given a mass partition s ∈ R↓, an interval partition can be obtained in infinitely many ways; however, for our purposes, we need only choose an arbitrary interval parti-
tion which corresponds to s = (s1, s2,...). We choose this to be (0, 1)\θs where Pk θs := { i=1 si, k ≥ 1}. In what follows, we abuse notation and write θs to refer to Pk the interval partition (0, 1)\θs := (0, 1)\{ i=1 si, k ≥ 1}. 22 1.8 Paintbox process
Kingman [67] developed the notion of the paintbox process which establishes a bijec- ↓ tion between measures on R and exchangeable partitions of N. For any s ∈ R↓, the paintbox process based on s provides a construction of a random partition of N as follows. Let θs be the interval partition corresponding to s defined above. Generate an i.i.d. sequence U := (U1,U2,...) of random variables uniformly distributed on (0, 1). Given U := (u1, u2,...), define partition π(U) by the equivalence relation
i ∼π(U) j ⇔ Ui and Uj are in the same sub-interval component of θs.
Equivalently, a random partition distributed as π(U) can be generated by an indepen-
dent sequence of random variables, X := (X1,X2,...), with Xi having distribution
s , k ≥ 1 k P Ps[Xi = k] = 1 − i si, k = −i 0, otherwise.
The random partition Π(X) of N generated by s through X is the partition of N defined by
i ∼Π(X) j if and only if Xi = Xj.
Π(X) is called the paintbox based on s, written Π(X) ∼ %s. Given a measure ν on ↓ R R , we write %ν(·) := R↓ %s(·)ν(ds) to denote the ν-mixture of paintboxes.
Theorem 1.8.1. (Kingman’s correspondence) [27, 93] To any infinitely exchange- ↓ able partition Π of N, there exists a unique measure ν on R such that Π ∼ %ν. Conversely, any paintbox based on ν is an infinitely exchangeable partition of N.
The above result is principally obtained by application of de Finetti’s theorem [47] for infinitely exchangeable sequences of random variables. 23 1.8.1 Asymptotic frequencies
Definition 1.8.2. A set A ⊂ N is said to have asymptotic frequency ||A|| if the limit
#(A ∩ [n]) ||A|| := lim n→∞ n
exists. A partition B ∈ P has asymptotic frequency ||B|| if each of its blocks possesses ↓ ↓ asymptotic frequencies, in which case we write ||B|| := (||B1||, ||B2||,...) ∈ R , the decreasing rearrangement of the asymptotic block frequencies of B.
Theorem 1.8.3. ([27]) Let Π be an infinitely exchangeable partition of N, then it possesses asymptotic frequencies ||Π|| almost surely. Moreover, ||Π|| is distributed as ↓ ν where ν is the unique measure on R such that Π ∼ %ν as in theorem 1.8.1. A distribution on R↓ which has garnered in-depth study in the literature is the Poisson-Dirichlet distribution with parameter (α, θ), denoted by PD(α, θ). A PD(α, θ) mass partition can be obtained from the asymptotic frequencies of a Pitman-Yor(α, θ) partition [93]. A size-biased ordering of the mass components of a PD(α, θ) distribu- tion is distributed according to the Griffiths-Engen-McCloskey (GEM) distribution with parameter (α, θ) [89].
For 0 ≤ α < 1 and θ > −α, let (Wk, k ≥ 1) be a sequence of independent random variables such that Wk ∼ Beta(1 − α, θ + kα). Put
k−1 Y V1 = W1,V2 = W2(1 − W1),...,Vk = Wk (1 − Wi),.... (1.15) i=1
Definition 1.8.4. ([56]) Let V := (Vk, k ≥ 1) be as above for 0 ≤ α < 1 and θ > −α. The collection (V1,V2,...) has the two-parameter Griffiths-Engen-McCloskey distribu- ↓ ↓ tion, written V ∼ GEM(α, θ). The collection V := (Vk, k ≥ 1) of frequencies listed in decreasing order is said to have the two-parameter Poisson-Dirichlet distribution, written V ↓ ∼ PD(α, θ).
Properties of the Poisson-Dirichlet distribution have been shown by Bertoin [26], Holst [65] and Pitman [91, 94]. A book length treatment of the Poisson-Dirichlet distribution can be found in Feng [56]. 24 1.9 Exchangeable coalescents
Kingman [69] constructed a Markov process Π := (Π(t), t ≥ 0) on P, called a coales- cent process, for which Π(t) is an exchangeable partition of N for each t ≥ 0. Given two set partitions π, π0, we say π is a refinement of π0 if π ≤ π0, as defined in section 1.4.1. For such a pair, π0 is said to be obtained from a coagulation of the blocks of π. For example, for π = 126|35|47|8 and π0 = 12356|47|8, π0 is obtained from π by the coagulation of the blocks {1, 2, 6} and {3, 5}. In general, more than two blocks of π0 can be a coagulation of blocks of π. We call a coagulation simple if all but one of the blocks of π0 are identical to those of π.
Definition 1.9.1. An (exchangeable) coalescent process Π := (Π(t), t ≥ 0) is a + collection of exchangeable random partitions of N indexed by t ∈ R such that Π(s) ≤ Π(t) for every s ≤ t.
Definition 1.9.2. An n-coalescent is a collection Πn := (Πn(t), t ≥ 0) on P[n] satisfying Πn(s) ≤ Πn(t) for every s ≤ t which evolves as follows.
• Πn(0) = π0 for some π0 ∈ P[n];
0 • given Πn(t) = π for some t ≥ 0, any pair b, b of blocks of π coagulate at exponential rate 1, and all other collections of blocks of π coagulate at rate 0;
• 1[n] = {[n]}, the trivial one block partition of [n], is an absorbing state for Πn.
In other words, given that Πn(t) = π such that #π = k, the process stays at π for an exponential time with rate parameter k(k−1)/2, the number of pairs of non-empty blocks of π, and then jumps to one of the k(k − 1)/2 partitions of [n] which can be obtained from π by a simple coagulation of its blocks, with each such jump having equal probability.
Theorem 1.9.3. (Existence of Kingman’s coalescent)([27]) For every n ≥ 2,
the restriction Πn|[n−1] of an n-coalescent to [n−1] is an (n−1)-coalescent. Moreover, there exists a unique (in law) process ΠK := (ΠK(t), t ≥ 0) on P such that for every K K n ∈ N the process induced by the restriction to [n], Π|[n] := (Π|[n](t), t ≥ 0), is an n-coalescent. 25
The process ΠK is called Kingman’s coalescent and is an exchangeable partition- K valued Markov process. Moreover, the standard coalescent, i.e. Π (0) = 0N, the K partition of N into singletons, comes down from infinity almost surely, i.e. #Π (t) < ∞ for every t > 0. Kingman’s coalescent arises as the limit of the Wright-Fisher model in the ge- nealogy of populations. For a particular application to neutral mutations and allelic partitions and its association with a PD(0, θ) distribution, see section 2.4 of [27] or Tavar`e[96]. In general, an exchangeable coalescent can be characterized by a unique measure 0 on P as follows. For a subset A ⊂ N, let π ∈ PA and π ∈ P[k] where k ≥ #π ≥ 1. 0 0 00 00 Define the coagulation of π by π , written Coag(π, π ), as the partition π := (πj , j ≥ 1) of A where 00 [ πj := πi, j ∈ N, 0 i∈πj 0 0 where (πi, i ≥ 1) and (πj, j ≥ 1) are the blocks of π and π listed in order of ap- pearance, respectively. For example, let π := 145|28|3|67 and π0 := 13|2|4, then Coag(π, π0) = 1345|28|67. Note that Coag(·, ·) preserves the ordering of blocks ac- cording to least elements, provided the blocks of the first argument are ordered by their least element. The transition mechanism of an exchangeable coalescent can be described in terms of a unique measure µ on P where, given that a jump occurs at time t ≥ 0, Π(t) = Coag(Π(t−), π) where π ∼ µ. For example, for the Kingman coalescent, write K(i, j) as the partition of N whose blocks are given by the pair {i, j} and singletons {k} for k 6= i, j. Then the measure µK which characterizes the Kingman coalescent is
K X µ (·) := δK(i,j)(·) 1≤i and is called the Kingman measure. 26 Let ν be a measure on R↓ which satisfies ∞ ! Z X ν((0, 0,...)) = 0 and s2 ν(ds) < ∞. (1.16) ↓ i R i=1 Then we have the following for a general exchangeable coalescent. Theorem 1.9.4. (Characterization of exchangeable coalescent processes) ([25]) Let Π := (Π(t), t ≥ 0) be an exchangeable coalescent on P. Then there exists a unique c ≥ 0 and unique measure ν on R↓ that fulfills (1.16) such that the transitions of Π K are described by Coag(Π(t−), π) where π ∼ µ := cµ + %ν . The Coag(·, ·) operator is convenient for constructing coalescent processes via a + Poisson point process on R × P with intensity measure dt ⊗ µ(dπ). The Poissonian construction of Markov partition processes is useful for establishing special properties for the process and will be used in sections 2.3.1 and 3.5.4. In addition to Kingman’s coalescent, there are several special families of coales- cents which fall under the theory of general exchangeable coalescents. The additive co- alescent and multiplicative coalescent are processes with collision rates K(i, j) = i+j and K(i, j) = ij respectively. Other such coalescents are the Marcus-Lushnikov coa- lescent, Bolthausen-Sznitman coalescent and the beta coalescent. A review of these processes, and relevant references, can be found in either Bertoin [25, 27] or Pitman [93]. Bertoin also studies mass-coalescents, the process induced on the space R↓ of mass partitions by the asymptotic frequencies of an exchangeable coalescent. Pitman [92] studies coalescents which allow simultaneous multiple collisions. 1.10 Exchangeable fragmentations Fragmentations of a set A ⊂ N are given special treatment in chapter 3. We now review the relevant literature in this area while introducing some notation which we use later. 27 Heuristically, an exchangeable fragmentation process Π := (Π(t), t ≥ 0) is the time reversal of an exchangeable coalescent. That is, a fragmentation process is a + family of partitions (Π(t), t ≥ 0) indexed by t ∈ R such that Π(t) ≤ Π(s) for every s ≤ t. However, the time reversal of an exchangeable coalescent is not, in general, an exchangeable fragmentation process because time reversal only preserves the branching property in special cases. In our study, we distinguish between fragmentations in discrete time, (Πn, n ≥ 1), which we call fragmentation trees, and fragmentations in continuous time, which we call fragmentation processes or weighted fragmentation trees. 1.10.1 Exchangeable random fragmentation trees A random fragmentation of A is a probability distribution on TA which satisfies (a) the branching property: Given the root partition ΠT , the subtrees {T|b : b ∈ ΠT } are distributed independently, and (b) for each S ∈ ΠT , the subtree T|S is a random fragmentation of S. σ A random fragmentation of A is exchangeable if T ∼ T := σ(T ) for any σ ∈ SA, the symmetric group of all permutations acting on A. A family of random fragmentations {PS : S ⊆ A} is consistent if T ∼ PA implies T|S ∼ PS for all S ⊆ A. That is, the marginal distribution of each restricted subtree to S ⊂ A corresponds to PS. A family of distributions P := {PA : A ⊂ N} defines an infinitely exchangeable fragmentation of N if PA is exchangeable for each A ⊂ N and P is consistent as in the discussion of section 1.3. 1.10.2 Gibbs fragmentation trees The distribution of a random fragmentation tree is described by a splitting rule which is tantamount to a collection (pA,A ⊂ N) of probability distributions on partitions of each A ⊂ N. McCullagh, Pitman and Winkel [80] study a family of fragmentation trees called Gibbs fragmentation trees. 28 For a Gibbs fragmentation tree, the splitting rule p has the form k Y p (A ,...,A ) ∝ v w (1.17) ∪Ai 1 k k #Ai i=1 for some pair of non-negative sequences v := (vk, k ≥ 1) and w := (wk, k ≥ 1). A tree T is called binary if each parent of T has 0 or 2 children. β β Let ν(dx) = x (1 − x) dx for β ∈ (−2, ∞) and ν(dx) = δ1/2(x) for β = ∞. Aldous’ beta-splitting model is a model for fragmentation trees with binary splitting rule Z 1 p(i, j) ∝ xi(1 − x)jν(dx). 0 Theorem 1.10.1. ([80]) Aldous’s beta-splitting models for β ∈ (−2, ∞] are the only consistent Markovian binary fragmentations with splitting rule of the form p(i, j) ∝ wiwj, i, j ≥ 1 for some sequence of non-negative weights w := (w1,...). Infinitely exchangeable fragmentation trees have been studied elsewhere in the literature, see [2, 7, 25]. 1.10.3 Genealogical interpretation of a tree As a collection of subsets of A ⊂ N, the elements of T ∈ TA are partially ordered by inclusion. That is, if A, B ∈ T such that A ⊂ B and #B < ∞, then the intervals [A, B], (A, B], and [A, B) are well-defined subsets of T . This partial ordering induces a natural genealogical interpretation of the relationships among the elements of a tree. For each t ∈ T , the subset anc(t) := (t, A] := {s ∈ T : t ⊂ s} denotes the set of ancestors of t. Note that anc(root(T )) = ∅ and for each t 6= root(T ), anc(t) has a least element denoted by pa(t) := min anc(t), the parent of t. Conversely, except for the singleton elements of T , each t ∈ T is the parent of some collection of subsets of T , called the children of t, which is given by pa−1(t) := 29 frag(t) := {t0 ∈ T : pa(t0) = t}. For each non-singleton t ∈ T with #t < ∞, frag(t) forms a non-trivial partition of t. In particular, for any tree T , the children of root(T ) form the root partition, denoted ΠT := rp(T ) := frag(root(T )). The fragmentation degree of T is given by maxt∈T # frag(t), which may be infinite. For k ≥ 1, we write (k) TA to denote the collection of trees of A with fragmentation degree at most k. Weighted fragmentation trees The trees discussed so far are unweighted, or Boolean, meaning their edges are as- signed unit weight, or length. A weighted tree T¯ is a Boolean tree T together with ¯ a collection of non-negative edge lengths {tb : b ∈ T }. We write T to denote the space of weighted trees and usually write T¯ ∈ T¯ as the pair (T,W ) where T is the associated Boolean tree and W := {tb : b ∈ T } is the collection of weights attached to the edges of T . All notation from Boolean trees carries over to weighted trees with the modification that a bar is placed over any symbol to indicate that we are discussing weighted trees. In the literature, weighted trees are also called fragmentation processes where edge weights represent how long a fragment survives before breaking into smaller pieces. Given a Markovian coalescent process (Π(t), t ≥ 0), the time-reversal (Π(−t), t ≥ 0) is also Markovian, but is not, in general, time-homogeneous as such a time-reversal does not possess the branching property of fragmentation processes, which assumes that different fragments evolve independently. Pitman [93] discusses the duality between coalescent and fragmentation processes. As in the case of the Coag operator for describing the behavior of exchangeable coalescent processes, there is an analogous operator, called Frag, by which we can 0 describe a fragmentation process. For a subset A ⊂ N, let π, π ∈ PA with #π = m and let k ∈ [m]. Define the fragmentation of the kth block of π by π0, written 0 00 Frag(π, π , k), as the partition π of A with blocks πi for i 6= k plus the sub-blocks 0 0 0 of π := {π ∩ πk : i ≥ 1}. For example, let π := 134789|256, π := 1268|39|457 and |πk i k = 1, then Frag(π, π0, k) = 18|256|39|47. 30 Bertoin [25] characterizes the behavior of homogeneous fragmentation processes by the Frag(·, ·, ·) operator. For n ∈ N, let en be the partition of N into two non-empty blocks, N\{n} and {n}. Define the erosion measure by X (·) := δen(·). (1.18) n≥1 The transitions of an exchangeable fragmentation process Π := (Π(t), t ≥ 0) can be described in terms of a unique measure µ⊗# on P ×N, where # denotes the count- ing measure on N. Given that a jump occurs at time t ≥ 0, Π(t) = Frag(Π(t−), π, k) for (π, k) ∼ µ ⊗ #. Let ν be a measure on R↓ which satisfies Z ν((1, 0,...)) = 0 and (1 − s1)ν(ds) < ∞. (1.19) R↓ Then we have the following for a general exchangeable fragmentation process. Theorem 1.10.2. (Characterization of exchangeable fragmentationprocesses)([25]) Let Π := (Π(t), t ≥ 0) be an exchangeable fragmentation process on P. Then there exists a unique c ≥ 0 and unique measure ν on R↓ that fulfills (1.19) such that the transitions of Π are given by Frag(Π(t−), π, k) where π ∼ µ := c + %ν and k is uniform on [#Π(t)]. The Frag operator is convenient for constructing fragmentation processes via a + Poisson point process on R × P × N with intensity measure dt ⊗ µ(dπ) ⊗ #. McCullagh, Pitman and Winkel [80] discuss how weights can be attached to the edges of a Gibbs fragmentation tree in a consistent way. Bertoin [22, 23, 24] has done extensive work in this area and provides thorough explanation and further references in his book [25]. Aldous [2, 3] studies the continuum random tree (CRT) within a different framework than that which we discuss above and in chapter 3. 31 1.11 Exchangeable fragmentation-coalescence processes The most general exchangeable process of fragmentation and coalescent type is called the exchangeable fragmentation-coalescence (EFC) process. Essentially, an EFC pro- cess is the combination of an exchangeable coalescent and an exchangeable fragmen- tation into one process on P. These processes arise in applications of certain physical sciences [11] and allow one to construct processes which are more flexible than the coalescent or fragmentation processes. We summarize a main result of Berestycki [19] for EFC processes. Durrett, Gra- novsky and Gueron [48] provide general results for the equilibrium behavior of certain EFC processes which predates the work of Berestycki. Definition 1.11.1. ([19]) A P-valued Markov process (Π(t), t ≥ 0) is an exchangeable fragmentation-coalescent process if it has the following properties. • It is exchangeable. • Its restrictions Π|[n] are c`adl`agfinite state Markov processes which can only evolve by fragmentation of one block or by coagulation. 0 0 More precisely, the transition rate of Π|[n](·) from π to π , say qn(π, π ), is non-zero only if there exists π00 such that π0 = Coag(π, π00) or there exists π00, k ≥ 1 such that π0 = Frag(π, π00, k). Theorem 1.11.2. (Characterization of EFC process)([19]) An EFC process Π := (Π(t), t ≥ 0) is characterized by exchangeable measures C and F on P such that 0 0 C({0N}) = 0, C(P\{π ∈ P : π|[n] = 0|[n]}) < ∞ for every n ∈ N, F ({1N}) = 0 and 0 0 F (P\{π ∈ P : π|[n] = 1[n]}) < ∞ for every n ∈ N and there exist unique constants ↓ ck, ce ≥ 0 and unique measures νC, νF on R such that νC fulfills (1.16) and νF fulfills (1.19) and K C = ckµ + %νC and F = cee + %νF . The above theorem completely characterizes all EFC processes. The manner in which transitions occur in an EFC process are a natural generalization to both coales- cent and fragmentation processes, and the behavior of such processes can be studied using well-established results for both coalescent and fragmentation processes. 32 1.12 Random graphs The study of random graphs began in the 1950s with the seminal work of Erd¨os and R´enyi [50, 51] in which they constructed a random graph as follows. Let n ≥ 1 and 0 < p < 1, then G(n, p) is the ensemble of graphs with n vertices obtained by including, independently for each pair i, j ∈ [n], an edge between i and j with probability p. Several results have been shown for these graphs, which we call the Erd¨os-R´enyirandom graph with parameter p, including phase transitions, emergence of a giant component, and several other aspects related to its degree distribution and connectivity, see Chung and Lu [39] or recent work by Bollob´as[31, 32, 33, 34] for more details. Another property of the Erd¨os-R´enyi family is that it characterizes an exchange- able graph on countably many vertices, i.e. an infinite graph indexed by N. We call an infinite graph G whose finite dimensional restrictions G := (G1,G2,...) are distributed as G(1, p),G(2, p),... an Erd¨os-R´enyiprocess with parameter p, written ER(p). Though infinite exchangeability is obvious for the Erd¨os-R´enyi process, it is still a striking property as the only widely studied instances of infinitely exchangeable random graphs in the literature are variants of the Erd¨os-R´enyi process. Because of its nice properties and intuitive construction, the Erd¨os-R´enyi process has been the subject of considerable study in the field of mathematics; however, applicability of the ER processes is restricted by its inability to replicate many properties of real-world networks, i.e. heavy tailed degree distributions, clustering, small-world. The terms graph and networks are interchangeable, though we generally use graph to refer to a mathematical object, i.e. a set of subsets of [n]2, and network to refer to a graph which is used to model some real world object, i.e. the Internet. 1.12.1 Heavy-tailed networks Faloutsos, Faloutsos and Faloutsos [54] presented an empirical study of the degree distribution of the Internet which has led to an explosion of research in the field of complex networks. In this work, Faloutsos, et al claim that the degree distribution 33 observed in the Internet at the router level is scale-free; i.e. let u denote a vertex and du its degree, then −α P(du = k) ∼ k for some α > 0. Later studies suggest that several other real world networks exhibit similar topology to the Internet with α typically between 2 and 3, see e.g. Newman [85] or Albert and Barab´asi[1]. The findings of these papers had been widely accepted until the sampling method used in arriving at these results was called into question in [71, 98]. Heavy-tailed degree distributions give rise to heterogeneous networks which reflect the notion that most networks, e.g. social, biological, etc., consist of a high proportion of vertices with a very small degree which are connected to each other principally through a small, but significant, proportion of high-degree vertices, sometimes called hubs or connectors. In the past decade, there has been much research in the area of modeling hetero- geneous networks. One of these models is the Barab´asi-Albert model. Barab´asi-Albert model The Barab´asi-Albert model [17] generates scale free net- works of preferential attachment type which evolves as follows. Given a network on n vertices, the (n + 1)st vertex is connected to a subset of the first n vertices with a probability that favors vertices with higher degree. This same phenomenon is re- flected in the Chinese restaurant construction of the Pitman-Yor process whereby individuals tend to choose tables which already have a larger number of individuals. This phenomenon is commonly referred to as the rich get richer or Matthew effect. To generate a network from the Barab´asi-Albert model, start with m0 ≥ 2 initial nodes, each having degree of at least one. Let Gn be the network obtained from this procedure after n nodes have been added and add node n + 1 with m ≤ m0 edges connected to nodes i1, . . . , im in the vertex set of Gn with probability proportional Qm to j=1 dij , i.e. the product of degrees of the chosen set of vertices. In particular, if m = 1 then di P(n + 1 7→ i) = Pn . j=1 dj 34 The Barab´asi-Albert model has drawn attention in the study of complex net- works and their application to social networks, the Internet, etc. partly because of its straightforward and intuitive way for generating scale-free networks. Some drawbacks are that the model is not fully specified, i.e. given we start with m0 ≥ 2 vertices each having degree at least one, the generating algorithm does not specify how these intial vertices should be connected. Another drawback of the Barab´asi-Albert model is its lack of exchangeability. This issue has been raised in some literature discussing how to implement this model in applications since it is generally unclear how to prop- erly label observed units to match up with labels of the Barab´asi-Albert generated network. 1.12.2 Small-world networks Another property commonly observed in networks, particularly social networks, is the small-world property. A network G has the small-world property if the average shortest path length between any two vertices is short, which replicates the notion of a “small world” in which every node in the network is closely connected to every other node in the network [84]. Mathematically, a network is a small-world if the average path length L is asymptotically proportional to log N, i.e. L ∼ α log N for some α > 0. Watts and Strogatz [97] introduced a model for small-world networks which starts with a fixed, regular ring of vertices each having degree k ≥ 2 and generates a random small-world graph by rewiring each edge with probability p ∈ (0, 1). 1.13 Organization of thesis The rest of this volume is organized as follows. Chapter 2 constructs a Markov process on P that admits transitions of a different nature from the EFC process discussed in section 1.11. Chapter 3 builds upon the theory of chapter 2 to construct an exchangeable Markov process on T and T¯. Chapter 4 shows a construction of an infinitely exchangeable random graph using a Poisson point process on the power 35 set of the natural numbers. Chapter 5 extends the theory of infinitely exchangeable random partitions, particularly the Pitman-Yor family, to projective systems which can be associated with structures called even and balanced partitions. CHAPTER 2 A CONSISTENT MARKOV PARTITION PROCESS GENERATED BY THE PAINTBOX PROCESS Markov processes on the space of partitions appear in many situations in scientific literature, such as, but not limited to, physical chemistry, astronomy, and population genetics. See Aldous [11] for a relatively recent overview of this literature. In chapter 1 we reviewed exchangeable coalescent processes, fragmentation pro- cesses and their concatentation, the exchangeable fragmentation-coalescence (EFC) process. Well-behaved mathematically tractable models of random partitions are of interest to probabilists as well as statisticians and scientists. Ewens [53] introduced the Ewens sampling formula in the context of theoretical population biology; and Kingman’s coalescent model [69] was introduced as a model for population genetics, still its most natural setting. However, since the seminal work of Ewens and King- man, random partitions have appeared in areas ranging from classification models [29, 81, 82] to probability theory [25, 93]. McCullagh [77] describes how the Ewens model can be used in the classical problem of estimating the number of unseen species, introduced by Fisher [57] and later studied by many, including Efron and Thisted [49]. 2.1 Preliminaries In this chapter, we discuss Markov processes on P(k), the space of set partitions of the natural numbers N with at most k ≥ 1 blocks. See section 1.4.1 for notation and terminology which arises in this context. For each π, π0 ∈ P, partitions of the natural numbers, define the metric d : P × P → R such that 0 0 d(π, π ) = 1/ max{n ∈ N : π|[n] = π|[n]}. 36 37 The space (P, d) is compact [27]. Our focus is on families of exchangeable Markovian transition probabilities (pn, n ≥ n 1) on (P[n],Dm,n) such that if Πn := (πj , j ≥ 1) is a Markov chain on P[n] governed n by pn(·, ·), then the restricted process Dm,n(Πn) := (πj|[m], j ≥ 1) is a Markov chain on P[m] governed by pm(·, ·), for any m ≤ n. This property is equivalent to the condition: For every n ≥ 1, 0 X ∗ 00 pn(B,B ) = pn+1(B ,B ), (2.1) 00 −1 0 B ∈Dn,n+1(B ) 0 ∗ −1 for each B,B ∈ P[n] and B ∈ Dn,n+1(B). Burke and Rosenblatt [36] show that (2.1) is necessary and sufficient for the function Dm,n(Πn) to be a Markov chain, and hence for the collection (pn(·, ·), n ≥ 1) to be self-consistent. Likewise, for a continuous-time Markov process, (Bn(t), t ≥ 0)n∈N, where Bn(t) is a process on P[n] with infinitesimal generator Qn, it is sufficient that the entries of Qn satisfy (2.1) for there to be a Markov process on P with those finite-dimensional transition rates. 2.2 The Cut-and-Paste process ↓(k) Let n, k ∈ N and let ν be a probability measure on the ranked k-simplex R , so that the paintbox based on ν is obtained by sampling a conditionally i.i.d. sequence X1,X2,... from ν, i.e. given s ∼ ν, X1,X2,... are i.i.d. with Ps(Xi = j) = sj for (k) each j = 1, . . . , k. For convenience, we write B ∈ P as an ordered list (B1,...,Bk) where Bi corresponds to the ith block of B in order of appearance for i ≤ #B and Bi = ∅ for i = #B + 1, . . . , k. Consider the following Markov transition operation B 7→ B0 on P(k). Let B = (k) (B1,...,Bk) ∈ P and, independently of B, generate C1,C2,... which are inde- pendent partitions of N distributed according to a paintbox based on ν. For each (k) i, we write Ci := (Ci1,...,Cik) ∈ P . Independently of B,C1,C2,..., gener- ate σ1, σ2,... which are independent uniform random permutations of [k]. Given 38 σ := (σ1, σ2, . . . , σk), we arrange B,C1,...,Ck in matrix form as follows: C.1 C.2 ...C.k B C ∩ B C ∩ B ...C ∩ B 1 1,σ1(1) 1 1,σ1(2) 1 1,σ1(k) 1 B C ∩ B C ∩ B ...C ∩ B 2 2,σ2(1) 2 2,σ2(2) 2 2,σ2(k) 2 σ . . . . . =: B ∩ C . . . . .. . B C ∩ B C ∩ B ...C ∩ B k k,σk(1) k k,σk(2) k k,σk(k) k B ∩ Cσ is a matrix with row totals, the union over columns of entries within each row, corresponding to the blocks of B and column totals C = Sk (C ∩ B ). .j i=1 i,σi(j) i 0 Finally, B is obtained as the collection of non-empty blocks of (C.1,...,C.k), written 2 CP(B, C, σ). The non-empty entries of B ∩ Cσ form a partition in P(k ) which corresponds to the greatest lower bound B ∧ B0 in the partition lattice. Proposition 2.2.1. The above description gives rise to finite-dimensional transition (k) probabilities on P[n] (k − #B0 )! k! Y |b pν (B,B0) = % (B0 ). (2.2) n (k − #B0)! k! ν |b b∈B (k) (k) Proof. Let A ∈ P . Fix n, k ∈ N, put B := A|[n] ∈ P[n] . Let C1,...,Ck be i.i.d. %ν-distributed partitions and σ := (σ1, . . . , σk) i.i.d. uniform random permutations of [k] as described above. Let B0 be the set partition obtained from the column totals of the matrix B ∩ Cσ in the above construction. From the matrix construction, it is clear that for each i = 1, . . . , k, the restriction 0 (k) B is equal to the set partition in P associated with Ci[Bi] := (Ci1∩Bi,...,Cik ∩ |Bi Bi 0 Bi). Conversely, the transition B 7→ B occurs only if the collection (C1,...,Ck) is 0 such that, for each Bi ∈ B, Ci[Bi] = B . By consistency of the paintbox process, |Bi for each i = 1, . . . , k, Ci[Bi] has probability 0 0 %ν(C [B ]) = %ν(B ) := %ν({π ∈ P : π = B }). i i |Bi |[Bi] |Bi 39 0 Independence of the Ci implies that the probability of B ∧ B given B is Y 0 %ν(B|b). b∈B Finally, each uniform permutation σi has probability 1/k! and there are k! Y (k − #B0 )! (k − #B0)! |b b∈B σ collections σ1, . . . , σ#B such that the column totals of B ∩ C correspond to the blocks of B0. This completes the proof. Definition 2.2.2. A P(k)-valued Markov process Π with finite-dimensional transition probabilities of the form (2.2) is called an (exchangeable) cut-and-paste1 process with parameter ν, written Π ∼ CP(ν). 0 For fixed n, (2.2) depends only on B and B through %ν and the number of blocks of B and B0 and is, therefore, finitely exchangeable. Consistency of (2.2) is automatic by the construction of the process on P(k). However, we provide a longer proof below which appeals directly to (2.1) and gives some insight into the transitions of the process. ↓(k) ν Proposition 2.2.3. For any measure ν on R , let (pn(·, ·))n≥1 be the collection (k) of transition probabilities on P[n] defined in (2.2). Then (pn) is a consistent family of transition probabilities. 0 (k) Proof. Fix n, k ∈ N and let B,B ∈ P[n] . To establish consistency it is enough to ∗ −1 verify condition (2.1) from theorem 1 of [36], i.e. for every ν and B ∈ Dn,n+1(B), ν ∗ −1 0 ν 0 pn+1(B ,Dn,n+1(B )) = pn(B,B ). ∗ −1 We assume without loss of generality that B ∈ Dn,n+1(B) is obtained from B by ∗ the operation n + 1 7→ B1 ∈ B and we write B1 := B1 ∪ {n + 1}. Likewise, for 1. The name “cut-and-paste process” was suggested by Marcin Hitczenko as a descriptive name of the transition procedure of the process. 40 00 −1 0 0 0 0∗ 0 B ∈ Dn,n+1(B ) obtained by n + 1 7→ Bi ∈ B ∪ {∅}, write Bi := Bi ∪ {n + 1}. So 0∗ 0 0 either n + 1 ∈ Bi for some i = 1,..., #B or n + 1 is inserted in B as a singleton. σ 0 0 The change to B ∩ C that results from inserting n + 1 into B1 ∈ B and Bi ∈ B 0 0 is summarized by the following matrix. Note that Bj = ∅ for j > #B . 0 0 0∗ 0 B1 B2 ...Bi ...Bk ∗ 0 0 0 0 B1 B1 ∩ B1 B2 ∩ B1 ... (Bi ∩ B1) ∪ {n + 1} ...Bk ∩ B1 B B0 ∩ B B0 ∩ B ...B0 ∩ B ...B0 ∩ B 2 1 2 2 2 i 2 k 2 . ...... . . ...... 0 0 0 0 Bk B1 ∩ Bk B2 ∩ Bk ...Bi ∩ Bk ...Bk ∩ Bk Here, the blocks of B are listed in any order, with empty sets inserted as needed, and the blocks of B0 are listed in order of least elements, with k − #B0 empty sets at the end. 0 −1 0 Given B , the set of compatible partitions Dn,n+1(B ) consists of three types of 0 partitions depending on the subset B1 ⊂ [n] and the block of B into which {n + 1} 00 −1 0 is inserted. Let B ∈ Dn,n+1(B ) be the partition of [n + 1] obtained by inserting n + 1 in B0. Either 0 0 00 0 (i) n + 1 is inserted into a block Bi such that Bi ∩ B1 6= ∅ ⇒ #B ∗ = #B ; |B1 |B1 0 0 00 (ii) n + 1 is inserted into a block Bi 6= ∅ such that Bi ∩ B1 = ∅ ⇒ #B ∗ = |B1 #B0 + 1; or |B1 0 00 0 (iii) n + 1 is inserted into B as a singleton block ⇒ #B ∗ = #B + 1 and |B1 |B1 00 0 0 #B = #B + 1; we denote this partition by B∅. There are k − #B0 empty columns in which {n + 1} can be inserted as a singleton in 0 00 00 ∗ B , as in (iii). For B obtained by (ii), the restriction of B to B1 coincides with the 0 ∗ restriction of B∅ to B1, so each of these restrictions has the same probability under %ν. For notational convenience in the following calculation, let D1 be those elements −1 0 of Dn,n+1(B ) which satisfy condition (i) above and D2 those which satisfy condition (ii). 41 ν ∗ −1 0 pn+1(B ,Dn,n+1(B )) = 00 X k! Y (k − #B|b)! % (B00 ) (2.3) (k − #B00)! k! ν |b 00 −1 0 b∈B∗ B ∈Dn,n+1(B ) (k − #B0 )! k! Y |b X Y 00 = 0 %ν(B|b)+ (k − #B )! k! 00 ∗ b∈B B ∈D1 b∈B 0 X 1 Y 00 k − #B Y 0 + 0 %ν(B|b) + 0 %ν(B∅|b) (2.4) 00 k − #B ∗ k − #B ∗ B ∈D2 |B1 b∈B |B1 b∈B (k − #B0 )! k! Y |b Y 0 X 00 = %ν(B ) %ν(B ∗ )+ 0 |b |B1 (k − #B )! k! ∗ ∗ 00 b∈B b∈B :b6=B1 B ∈D1 0 X 1 00 k − #B 0 + %ν(B ∗ ) + %ν(B ∗ ) 0 |B1 0 ∅|B1 00 k − #B k − #B B ∈D2 |B1 |B1 (k − #B0 )! k! Y |b Y 0 X 00 0 = %ν(B ) %ν(B ∗ ) + %ν(B ∗ ) (2.5) 0 |b |B1 ∅|B1 (k − #B )! k! ∗ 00 b∈B b∈B:b6=B1 B ∈D1 0 k! Y (k − #B|b)! Y X = % (B0 ) % (B00) (2.6) (k − #B0)! k! ν |b ν ∗ −1 0 b∈B b∈B:b6=B1 B00∈D (B ) #B1,#B1+1 |B1 0 k! Y (k − #B|b)! Y h i = % (B0 ) % (B0 ) (2.7) 0 ν |b ν |B1 (k − #B )! k! ∗ b∈B b∈B:b6=B1 0 k! Y (k − #B|b)! Y = % (B0 ) (k − #B0)! k! ν |b b∈B b∈B ν 0 = pn(B,B ). (k−#B0 )! Here, (2.4) is obtained from (2.3) by factoring k! Q |b out of the (k−#B0)! b∈B k! sum and using observations (i), (ii) and (iii). In (2.5), we use the fact that for any 00 00 0 0 0 B ∈ D2, B ∗ = B ∗, and there are #B − #B elements in D2 according to |B1 ∅|B1 |B1 00 (ii). Line (2.6) follows by observing that each B ∈ D1 corresponds to an element −1 0 0 −1 0 of D (B ) and B ∗ is the element of D (B ) obtained by #B1,#B1+1 |B1 ∅|B1 #B1,#B1+1 |B1 inserting {n+1} as a singleton in B0 . Finally, (2.7) follows from (2.6) by consistency |B1 of the paintbox process. This completes the proof. 42 The following result is immediate by finite exchangeability and consistency of (2.2) for every n and Kolmogorov’s extension theorem (theorem 1.3.1). Theorem 2.2.4. There exists a transition probability measure pν(·, ·) on (k) S (k) P , σ n P[n] whose finite-dimensional restrictions are given by (2.2). In par- ticular, the cut-and-paste process exists. 2.2.1 Equilibrium measure 0 (k) ν 0 From (2.2), it is clear that for each n, k ∈ N and B,B ∈ P[n] , pn(B,B ) is strictly ↓(k) positive provided ν is such that ν(s) > 0 for some s = (s1, . . . , sk) ∈ R with sk > 0. Under this condition, the finite-dimensional chains are aperiodic and ir- (k) reducible on P[n] and, therefore, have a unique stationary distribution. In fact, the finite-dimensional chains based on ν are aperiodic and irreducible provided ν is not degenerate at (1, 0,..., 0) ∈ R↓(k). The existence of a unique stationary distri- bution for each n implies that there is a unique stationary probability measure on (k) S (k) ν P , σ n P[n] for p (·, ·) from theorem 2.2.4. Proposition 2.2.5. Let ν be a measure on R↓(k) such that ν is non-degenerate at ↓(k) ν (1, 0,..., 0) ∈ R . Then there exists a unique stationary distribution θn(·) for ν pn(·, ·) for each n ≥ 1. ↓(k) Proof. Fix n ∈ N and let ν be any measure on R other than that which puts (k) unit mass at (1, 0,..., 0). For B = (B1,...,Bm) ∈ P[n] , (2.2) gives the transition probability m k! Y 1 pν (B,B) = % (B ) n (k − m)! k ν i i=1 ν and %ν(Bi) = %ν([#Bi]) > 0 for each i = 1, . . . , m. Hence, pn(B,B) > 0 for every (k) B ∈ P[n] and the chain is aperiodic. 0 (k) To see that the chain is irreducible, let B,B ∈ P[n] and let 1n denote the one block partition of [n]. Then Y 1 pν (B, 1 ) = k % ([#b]) > 0 n n k ν b∈B 43 0 and, since ν is not degenerate at (1, 0,..., 0), there exists a path 1n 7→ B by recur- 0 0 0 0 sively partitioning 1n until it coincides with B . For instance, let B := (B1,...,Bm) ∈ (k) 0 P . One such path from 1n to B is m m 0 [ 0 0 0 [ 0 1n → (B1, Bi) → (B1,B2, Bi) → · · · → B i=2 i=3 ν which has positive probability for any non-degenerate ν. Hence pn(·, ·) is irreducible, which establishes the existence of a unique stationary distribution for each n. Theorem 2.2.6. Let ν be a measure on R↓(k) such that ν((1, 0,..., 0)) < 1. Then there exists a unique stationary probability measure θν(·) for the CP(ν)-Markov chain (k) S (k) on P , σ n P[n] . Proof. For ν satisfying ν((1, 0,..., 0)) < 1, proposition 2.2.5 shows that a stationary ν distribution exists for each n ≥ 1. Let (θn(·), n ≥ 1) be the collection of stationary ν distributions for the finite-dimensional transition probabilities (pn(·, ·), n ≥ 1). We now show that the θn are consistent and finitely exchangeable for each n. (k) ν Fix n ∈ N and let B ∈ P[n] . Then stationarity of θn(·) implies X ν 0 ν 0 ν θn(B )pn(B ,B) = θn(B). (k) B0∈P [n] ν ν 0 (k) Now write θn(·) ≡ θn(·) and pn(·, ·) ≡ pn(·, ·) for convenience and let B ∈ P[n] . 44 Then X 00 X X ∗ ∗ 00 θn+1(B ) = θn+1(B )pn+1(B ,B ) −1 −1 (k) B00∈D (B0) B00∈D (B0) B∗∈P n,n+1 n,n+1 [n+1] | {z } −1 0 (θn+1Dn,n+1)(B ) X ∗ X ∗ 00 = θn+1(B ) pn+1(B ,B ) (k) −1 B∗∈P B00∈D (B0) [n+1] n,n+1 X X ∗ 0 = θn+1(B ) pn(B,B ) (k) −1 B∈P B∗∈D (B) [n] n,n+1 X 0 X ∗ = pn(B,B ) θn+1(B ) (k) −1 B∈P B∗∈D (B) [n] n,n+1 X 0 −1 = pn(B,B )(θn+1Dn,n+1)(B). (k) B∈P [n] −1 −1 So we have that θn+1Dn,n+1 is stationary for pn which implies that θn ≡ θn+1Dn,n+1 by uniqueness and θn is consistent for each n. 0 (k) 0 Let σ be a permutation of [n]. Then for any B,B ∈ P[n] , pn(σ(B), σ(B )) = 0 pn(B,B ) by exchangeability of pn. It follows that θn is finitely exchangeable for each n since X 0 0 θn(σ(B))pn(σ(B), σ(B )) = θn(σ(B )) (k) B∈P [n] 0 0 by stationarity, and pn(σ(B), σ(B )) = pn(B,B ) implies that X 0 0 θn(σ(B))pn(B,B ) = θn(σ(B )). (k) B∈P [n] Hence, θn ◦ σ is stationary for pn and θn ≡ θn ◦ σ by uniqueness. Kolmogorov’s extension theorem (theorem 1.3.1) implies that there exists a unique 45 (k) exchangeable stationary probability measure θ on P whose restriction to [n] is θn for each n ∈ N. This completes the proof. 2.3 Continuous-time version of CP(ν)-process ↓(k) Let λ > 0, ν be a measure on R and for each n ∈ N define Markovian infinitesimal (k) jump rates for a Markov process on P[n] by ( ν 0 0 ν 0 λpn(B,B ),B 6= B qn(B,B ) = (2.8) 0, otherwise ν ν (k) where pn is as in (2.2). The infinitesimal generator, Qn, of the process on P[n] governed by qn has entries ( pν (B,B0),B 6= B0 Qν (B,B0) = λ × n (2.9) n ν 0 pn(B,B) − 1,B = B . Since the parameter λ acts only as a parameter to scale time, we can assume λ = 1 without loss of generality. We now construct a Markov process B := (B(t), t ≥ 0) in continuous time whose finite-dimensional transition rates are given by (2.8). Definition 2.3.1. A process B := (B(t), t ≥ 0) on P(k) is a (continuous-time) cut- and-paste process with parameter ν if, for each n ∈ N, B|[n] is a Markov process on (k) ν P[n] with Q-matrix Qn as in (2.9). (k) ν A process on P whose finite-dimensional restrictions are governed by Qn can be constructed according to the matrix construction from section 2.2 by permitting 0 0 0 (k) only transitions B 7→ B for B 6= B, where B,B ∈ P[n] , and adding a hold time ν which is exponentially distributed with mean −1/Qn(B,B). ↓(k) ν Proposition 2.3.2. For a measure ν on R , let (Qn)n∈N be the collection of ν Q-matrices in (2.9). For every n ∈ N, the entries of Qn satisfy (2.1). 46 0 (k) 0 Proof. Fix n ∈ N and let B,B ∈ P[n] such that B 6= B . Then ν 0 X ν 00 Qn(B,B ) = Qn+1(B∗,B ) 00 −1 0 B ∈Dn,n+1(B ) −1 for all B∗ ∈ Dn,n+1(B) by the consistency of pn from proposition 2.2.3. 0 −1 For B = B and B∗ ∈ Dn,n+1(B), we have X ν 00 Qn+1(B∗,B ) = 00 −1 B ∈Dn,n+1(B) ν X ν 00 Qn+1(B∗,B∗) + Qn+1(B∗,B ) 00 −1 B ∈Dn,n+1(B)\{B∗} ν X ν 00 = pn+1(B∗,B∗) − 1 + pn+1(B∗,B ) 00 −1 B ∈Dn,n+1(B)\{B∗} X ν 00 = pn+1(B∗,B ) − 1 00 −1 B ∈Dn,n+1(B) ν = pn(B,B) − 1 ν = Qn(B,B). Theorem 2.3.3. For each measure ν on R↓(k), there exists a Markov process (B(t), t ≥ 0) on P(k) that has finite-dimensional transition rates given in (2.8). ↓(k) Proof. Let ν be a measure on R and (B|[n](t), t ≥ 0)n∈N be the collection of ν restrictions of a CP(ν)-process with consistent Q-matrices (Qn)n∈N as in (2.9). For ν ν each n, Qn is finitely exchangeable and consistent with Qn+1 by proposition 2.3.2, which is sufficient for B|[n] to be consistent with B|[n+1] for every n. Kolmogorov’s extension theorem implies that there exist transition rates, Qν, on P(k) such that for 0 (k) every B,B ∈ P[n] , ν 00 (k) 00 0 ν 0 Q (B∗, {B ∈ P : B|[n] = B }) = Qn(B,B ), 00 (k) 00 for every B∗ ∈ {B ∈ P : B|[n] = B}. 47 (k) ν (k) ν Finally, for every B ∈ P[n] , Qn(B, P[n] \{B}) = 1 − pn(B,B) < ∞ so that the sample paths of B|[n] are c`adl`agfor every n, which implies that B is c`adl`ag. Corollary 2.3.4. For ν satisfying the condition of theorem 2.2.6, the continuous-time ν process B := (B(t), t ≥ 0) with finite-dimensional rates qn(·, ·) in (2.8) has unique stationary distribution θν(·) from theorem 2.2.6. ν Proof. For each n ∈ N, let θn(·) be the unique finite-dimensional stationary distribu- ν ν ν tion of pn(·, ·) from (2.2). It is easy to verify that for each n ∈ N,Θn := (θn(B),B ∈ (k) P[n] ) satisfies ν t ν (Θn) Qn = 0, ν ν which establishes that Θn is stationary for Qn for every n. The rest follows by theorem 2.2.6. 2.3.1 Poissonian construction From the matrix construction at the beginning of section 2.2, a consistent fam- ily of finite-dimensional Markov processes with transition rates as in (2.8) can be + Qk (k) constructed by a Poisson point process on R × i=1 P as follows. Let P = + Qk (k) {(t, C1,...,Ck)} ⊂ R × i=1 P be a Poisson point process with intensity mea- (k) ↓(k) (k) sure dt ⊗ %ν for some measure ν on R , where %ν is the product measure Qk (k) %ν ⊗ · · · ⊗ %ν on i=1 P . Construct an exchangeable process B := (B(t), t ≥ 0) on P(k) by taking π ∈ P(k) to be some exchangeable random partition and setting B(0) = π. For each n ∈ N, put B|[n](0) = π|[n] and • if t is not an atom time for P , then B|[n](t) = B|[n](t−); • if t is an atom time for P so that (t, C1,...,Ck) ∈ P , then, independently of (B(s), s < t) and (t, C1,...,Ck) generate σ1, . . . , σk i.i.d. uniform random permutations of [k] and construct B0 from the set partition induced by the 48 column totals (C.1,...,C.k) of C.1 C.2 ...C.k B C ∩ B C ∩ B ...C ∩ B 1 1,σ1(1) 1 1,σ1(2) 1 1,σ1(k) 1 B C ∩ B C ∩ B ...C ∩ B 2 2,σ2(1) 2 2,σ2(2) 2 2,σ2(k) 2 σ . . . . . =: B ∩ C . . . . .. . B C ∩ B C ∩ B ...C ∩ B k k,σk(1) k k,σk(2) k k,σk(k) k where (B1,...,Bk) are the blocks of B = B|[n](t−) listed in order of their least element, with k − #B empty sets at the end of the list. 0 0 – if B 6= B, then B|[n](t) = B ; 0 – if B = B, B|[n](t) = B|[n](t−). Proposition 2.3.5. The above process B is a Markov process on P(k) with transition matrix Qν defined by theorem 2.3.3. Proof. This is clear from the infinite exchangeability of both the paintbox process ν (theorem 1.8.1) and the Qn-matrices for every n (proposition 2.3.2), and the fact that, by this construction, for any n such that B|[n](t) = π then B[n]|[m](t) = Dm,n(π) for −1 all m < n and B|[p](t) ∈ Dn,p(π) for all p > n. (k) Let Pt be the semi-group of a CP(ν)-process B(·), i.e. for any continuous ϕ : P → R Ptϕ(π) := Eπϕ(B(t)), the expectation of ϕ(B(t)) given B(0) = π. Corollary 2.3.6. A CP(ν)-process has the Feller property, i.e. (k) (k) • for every continuous function ϕ : P → R and π ∈ P one has lim Ptϕ(π) = ϕ(π), and t↓0 • for all t > 0, π 7→ Ptϕ(π) is continuous. 49 Proof. The proof follows the same program as the proof of corollary 6 in [19]. (k) 0 0 Let Cf := {f : P → R : ∃n ∈ N s.t. π|[n] = π|[n] ⇒ f(π) = f(π )} be a set (k) of functions which is dense in the space of continuous functions from P → R. It is clear that for g ∈ Cf , limt↓0 Ptg(π) = g(π) since the first jump-time of B(·) is an exponential variable with finite mean. The first point follows for all continuous (k) functions P → R by denseness of Cf . For the second point, let π, π0 ∈ P(k) such that d(π, π0) < 1/n and use the same Poisson point process P to construct two CP(ν)-processes, B(·) and B0(·), 0 0 with starting points π and π respectively. By the construction, B|[n] = B|[n] and 0 d(B(t),B (t)) < 1/n for all t ≥ 0. It follows that for any continuous g, π 7→ Ptg(π) is continuous. This allows us to characterize the CP(ν)-process in terms of its infinitesimal gen- erator. Let B := (B(t), t ≥ 0) be the CP(ν)-process on P(k) with transition rates as in (2.8). The infinitesimal generator, A, of B is given by Z A(f)(π) = (f(π0) − f(π))Qν(π, dπ0), P(k) for every f ∈ Cf . 2.4 Asymptotic frequencies ↓ Adopting the notation of section 1.8.1, let Λ(B) = (kB1k, kB2k,...) be the decreas- ing arrangement of asymptotic frequencies of a partition B = (B1,B2,...) ∈ P which possesses asymptotic frequencies, some of which could be 0. The process described in section 2.2 only assigns positive probability to transitions involving two partitions with at most k blocks. From the Poissonian construction of the transition rates in section 2.3.1, it is evident that the states of B = (B(t), t ≥ 0) will have at most k blocks almost surely. Moreover, the description of the transition rates in terms of the paintbox process allows us to describe the associated measure- valued process of B := (B(t), t ≥ 0) characterized by λ and ν. 50 2.4.1 Poissonian construction Consider the following Poissonian construction of a measure-valued process X := ↓(k) 0 0 0 (X(t), t ≥ 0) on R . For any k ∈ N and ν as above, let P = {(t, P1,...,Pk)} ⊂ + Qk ↓(k) (k) R × i=1 R be a Poisson point process with intensity measure dt ⊗ ν , where (k) Qk ↓(k) ν is the k-fold product measure ν ⊗ ... ⊗ ν on i=1 R . ↓(k) Construct a process X := (X(t), t ≥ 0) on R by generating p0 from some ↓(k) probability measure on R . Put X(0) = p0 and • if t is not an atom time for P 0, then X(t) = X(t−); 0 0 0 0 0 j j • if t is an atom time for P so that (t, P1,...,Pk) ∈ P , with Pj = (P1 ,...,Pk ) ↓(k) for each j = 1, . . . , k, and X(t−) = (x1, . . . , xk) ∈ R , then, independently 0 0 of (X(s), s < t) and (t, P1,...,Pk), generate σ1, . . . , σk i.i.d. uniform random permutations of [k] and construct X(t) from the marginal column totals of . . . P1 P2 ...Pk 1 1 1 x1 x1P x1P . . . x1P σ1(1) σ1(2) σ1(k) 2 2 2 x2 x2P x2P . . . x2P σ2(1) σ2(2) σ2(k) ; . . . .. . . . . . . k k k xk xkP xkP . . . xkP σk(1) σk(2) σk(k) ↓ . . . ↓ Pk i i.e. put X(t) = (P ,P ,...,P ) := xiP , 1 ≤ j ≤ k . 1 2 k i=1 σi(j) Theorem 2.4.1. Let X := (X(t), t ≥ 0) be the process constructed above. Then X =L Λ(B) where B := (B(t), t ≥ 0) is the CP(ν)-process from theorem 2.3.3, and =L denotes “equal in law”. ↓(k) Proof. Fix k ∈ N and let ν(·) be a measure on R . In the description of the sample paths of B in section 2.3, note that generating (k) (C1,...,Ck) ∼ %ν is equivalent to first generating si ∼ ν independently for each i = 1, . . . , k, and then generating random partitions Ci by sampling from si for each 0 i = 1, . . . , k. Finally, Bi is set equal to the marginal total of column i of the matrix 51 σ B∩C , where σ := (σ1, . . . , σk) is an i.i.d. collection of uniform random permutations of [k]. Hence, we can couple the two processes X and B together using the Poisson point process P 0 described above. 0 + Qk ↓(k) Let X evolve according to the Poisson point process P on R × i=1 R as described above. Let B evolve by the modification that if t is an atom time of P 0 i i i 0 then we obtain partitions (C1,...,Ck) by sampling X := (X1,X2,...) i.i.d. from Pi for each i = 1, . . . , k, i.e. i 0 i P(X1 = j|Pi ) = Pj , i and defining the blocks of Ci as the equivalence classes of X . Constructed in this i (k) way, kCijk= Pj almost surely for each i, j = 1, . . . , k and (C1,...,Ck) ∼ %ν . 0 After obtaining the Ci, generate, independently of B,C1,...,Ck,P , i.i.d. uniform permutations σ1, . . . , σk of [k] and proceed as in the construction of section 2.3.1 where σ 0 B,C1,...,Ck are arranged in the matrix B ∩ C and the blocks of B are obtained as the marginal column totals of B ∩ Cσ. The (i, j)th entry of B ∩ Cσ is C ∩ B i,σi(j) i i for which we have kC ∩ Bik= kC kkBik= xiP a.s. i,σi(j) i,σi(j) σi(j) By this construction, B(t) is constructed according to a Poisson point process with the same law as that described in section 2.3.1, and B(t) possesses ranked asymptotic frequencies which correspond to X(t) almost surely for all t ≥ 0. Corollary 2.4.2. X(t) := (Λ(B(t)), t ≥ 0) exists almost surely. 2.4.2 Equilibrium measure Just as the process (B(t), t ≥ 0) on P(k) converges to a stationary distribution, so does its associated measure-valued process (X(t), t ≥ 0) from section 2.4.1. Theorem 2.4.3. The associated measure-valued process X for a CP(ν)-process with unique stationary measure θν has equilibrium measure |θν|↓, the distribution of the ranked frequencies of a θν(·)-partition. Proof. Proposition 1.4 in [27] states that if a sequence of exchangeable random parti- tions converges in law on P to π∞ then its sequence of ranked asymptotic frequencies 52 ↓ converges in law to |π∞| . Hence, from corollary 2.3.4 X has equilibrium distribution |θν|↓. 2.5 A two parameter subfamily For k ∈ N and α > 0, the Pitman-Yor(−α, kα) process (1.10) has finite-dimensional distributions Q ↑#b k! b∈B α ρn(B; α, k) = , (2.10) (k − #B)! (kα)↑#n (k) ↑n which is supported by P[n] , where α := α(α + 1) ··· (α + n − 1). For notational convenience, introduce the α-permanent [78] of an n × n matrix K, n X #σ Y perα K = α Ki,σ(i), (2.11) σ∈Sn i=1 where #σ is the number of cycles of the permutation σ. Note that when B ∈ P[n] is regarded as a matrix, Y Y ↑#b perα B = perα B|b = α , (2.12) b∈B b∈B which allows us to write (2.10) as k! perα B ρn(B; α, k) = . (2.13) (k − #B)! (kα)↑n We now consider a specific sub-family of reversible CP(ν)-processes whose tran- sition probabilities can be written down explicitly. For k ∈ N and α > 0, let ν be the PD(−α/k, α) distribution on R↓(k) and define transition probabilities according to the matrix construction based on ν as in section 2.2. We call this process the CP(α, k)-process. 53 Proposition 2.5.1. The CP(α, k)-process has finite-dimensional transition probabil- ities Q ↑#(b∩b0) 0 k! Y b0∈B0(α/k) pn(B,B ; α, k) = (2.14) (k − #B0)! α↑#b b∈B 0 k! perα/k(B ∧ B ) = 0 . (2.15) (k − #B )! perα B Proof. Theorem 3.2 and definition 3.3 from [93] shows that the distribution of B ∼ %ν where ν = PD(−α/k, α) is k! perα/k B ρn(B; α/k, k) = . (k − #B)! (α)↑n Combining this and (2.2) yields (2.14); (2.15) follows from (2.12). + Proposition 2.5.2. For each (α, k) ∈ R × N and n ∈ N, pn(·, ·; α, k) defined in proposition 2.5.1 is reversible with respect to (2.10) with parameter (−α, kα). Proof. Let ρn(·; α, k) be the distribution with parameter (−α, kα) defined in (2.10), 0 (k) and pn(·, ·; α, k) be as defined in (2.14). For any B,B ∈ P[n] , it is immediate that 0 0 0 ρn(B; α, k)pn(B,B ; α, k) = ρn(B ; α, k)pn(B ,B; α, k), (2.16) which establishes reversibility. Bertoin [26] discusses some reversible EFC processes which have PD(α, θ) distri- bution as their equilibrium measure, for 0 < α < 1 and θ > −α. Here we have shown reversibility with respect to PD(α, θ) for α < 0 and θ = −mα for m ∈ N. The construction of the continuous-time process is a special case of the procedure in section 2.3. The measure-valued process (X(t), t ≥ 0) based on the (α, k)-Markov process has unique stationary measure PD(−α, kα), the distribution of the ranked frequencies of a partition with finite-dimensional distributions as in (2.10) with pa- rameter (−α, kα). 54 The parameter α is related to how local the jumps in the CP(α, k)-process are likely to be. Larger values of α assign higher probability to large jumps, while small values of α favor more local jumps. The asymptotic behavior of the CP(α, k)-process as α and k vary is summarized in the following proposition. Proposition 2.5.3. For α > 0 and k ≥ 1, let pn(·, ·; α, k) be the finite-dimensional transition probabilities (2.14) of the CP(α, k) process. The following asymptotic tran- sition laws hold. • For fixed α > 0 and k → ∞, (2.14) converges to Q 0 #(B∧B0) Q b0∈B0 Γ(#(b∩b )) 0 0 α b∈B ↑#b ,B ≤ B pn(B,B ; α) = α 0, otherwise, the law of a Ewens fragmentation chain. • For fixed k ≥ 1 and α → ∞, (2.14) converges to 0 0 ↓#B0 n pn(B,B ; k) = Pn(B ; k) = k /k . (2.17) • For fixed k ≥ 1 and α → 0, (2.14) converges to ↓#B0 k 0 0 #B ,B ≤ B pn(B,B ; k) = k 0, otherwise, a discrete-time coalescent chain. • For α/k → 0, k → ∞ such that α → θ > 0, (2.14) converges to Q 0 #(B∧B0) Q b0∈B0(Γ(#(b∩b )) 0 0 θ b∈B ↑#b ,B ≤ B pn(B,B ; θ) = θ 0, otherwise, a discrete-time fragmentation chain where each block fragments independently according to a Ewens distribution with parameter θ. 55 • For α → 0 and k → ∞, (2.14) converges to 0 pn(B,B ) = I{B=B0}, the unit mass at B. Remark 2.5.4. Note that the asymptotic transition law for α → ∞ shown in (2.17) is independent of B. The weak limit of a sequence of partitions under α → ∞ is that of an independent and identically distributed sequence of partitions governed by the coupon-collector distribution on partitions. For each i = 1, 2,...,Bi is chosen independently of Bi−1 and each element chooses their block membership uniformly among the k blocks of the partition and independently of how other elements are configured. In other words, all structure of the model is lost and all elements act independently at all time points. Remark 2.5.5. The results in proposition 2.5.3 elucidate the role of the parameter α in the CP(α, k) process. In particular, as α grows large, the process becomes more erratic (asymptotically i.i.d.), while small values of α lead to more controlled behavior, which tends toward more local one-step transitions (pure coalescence). 2.6 A three-parameter extension For fixed k ≥ 1, the CP(α, k)-process is a two-parameter family. We now introduce a three-parameter extension where an extra parameter Σ is introduced. The parameter Σ is a symmetric square matrix with non-negative entries, and is called the similarity matrix. In particular, the entry Σij is a measure of similarity between elements i and j, and a higher value of Σij leads to a higher probability that elements i and j appear in the same block over the course of the partition sequence. 2.6.1 Similarity and dissimilarity matrices An n × n symmetric matrix Σ is a similarity matrix if (1)Σ ij ≥ 0 for all i, j, and 56 (2)Σ ij ≤ Σii ∧ Σjj for all i, j. Condition (2) permits the interpretation of Σ as a matrix of pairwise similarities as it requires that any element i is more similar to itself than any other element in the set. In many instances, it may be natural for Σ to be constant along the diagonal, signifying that all elements are equally similar to themselves, but we do not explicitly require this. ¯ Any T := {(Ai, λi), i ≥ 1} ∈ Tn,[n]-labeled trees with edge lengths, can be Pk written as a matrix by putting T = i=1 λiei ⊗ ei, where ei := (ei1, . . . , ein) is a vector with eij = 1 if j ∈ Ai and eij = 0 otherwise, and ei ⊗ ei is the outer product of ei with itself. Any rooted tree T satisfies the three-point inequality Tij ≥ Tik ∧ Tjk (2.18) for all i, j, k. The entry Tij in the matrix T can be interpreted as the distance from the root of T to the first point at which i and j appear on different branches of T . Any rooted [n]-tree is a similarity matrix. Similarity matrices enter our discussion in section 2.6.2 when we extend the CP(α, k) process to include a parameter Σ, a similarity matrix on the index set. When used to model certain phenomena in genetics, it may be natural that Σ repre- sents a phylogenetic tree or, more generally, a hierarchical clustering of the elements of the sample. In some settings, it may be more convenient or more natural to represent rela- tionships through a matrix of pairwise distances. An n × n symmetric matrix ∆ is a dissimilarity, or distance, matrix if (1)∆ ii = 0 for all i, (2)∆ ij ≥ 0 for all i, j, and p p p (3) ∆ij ≤ ∆ik + ∆jk for all i, j, k. To any similarity matrix S, there corresponds a dissimilarity matrix D with entries Dij = Sii + Sjj − 2Sij. (2.19) 57 A particular class of dissimilarity matrices of interest is the space UT n of unrooted [n]-trees. An element ∆ ∈ UT n is a symmetric n × n matrix which satisfies (1), (2) and (3) above along with Buneman’s four-point condition ∆ij + ∆kl ≤ max{∆ik + ∆jl, ∆il + ∆jk}. (2.20) For any S ∈ Tn, the dissimilarity matrix D obtained from S by (2.19) is an unrooted [n]-tree. 2.6.2 The extended model Recall the α-permanent of a matrix X, perα X, from (2.11). The α-permanent also has the following convolution property which was shown by McCullagh and Møller Pk [78]. Let α1, . . . , αk ≥ 0 and α. := i=1 αi, then X per X = per (X[w ]) ··· per (X[w ]), (2.21) α. α1 1 αk k (w1,...,wk) where the sum is over ordered collections (w1, . . . , wk) of disjoint subsets of [n] such Sk that i=1 wi = [n] and X[w] := (Xij : i ∈ w, j ∈ w) is the sub-matrix of X with rows and columns indexed by w ⊆ [n]. For our purposes, (2.21) can be used by putting α1 = ··· = αk = α > 0 so that α. = kα and noting that for any B ∈ P[n] Y perα(X · B) = perα X[b]. b∈B In this case, (2.21) becomes X ↓#B perkα X = k perα(X · B). (2.22) (k) B∈P [n] Based on (2.22), we have that any matrix Σ for which the entries Σ1σ(1),..., Σn,σ(n) 58 (k) are all strictly positive for some σ ∈ Sn defines a probability distribution on P[n] by ↓#B µn(B; α, k, Σ) = k perα(Σ · B)/ perkα Σ (2.23) which can be seen as a generalization of (2.13). Moreover, the expression in (2.15) can be extended to include the matrix parameter Σ 0 ↓#B0 0 pn(B,B ; α, k, Σ) = k perα/k(Σ · B · B )/ perα(Σ · B) (2.24) which is reversible with respect to (2.23). 0 (k) To guarantee that there is some pair B,B ∈ P[n] for which the transition proba- bility (2.24) is positive, Σ need only satisfy positivity for a properly chosen choice of n of its entries. Thus, general choices of Σ can lead to restrictions to certain subspaces (k) within P[n] . Definition 2.6.1. For n ≥ 1, k ≥ 1, α > 0 and Σ an n × n matrix with entries which are non-negative everywhere and strictly positive on the diagonal, a Markov chain governed by the transition probabilities in (2.24) is a discrete-time cut-and- (k) paste process on P[n] with concentration parameter α and similarity matrix Σ, written CP(α, k; Σ). Remark 2.6.2. If Σ ≡ 1 is the matrix of all ones, (2.24) coincides with (2.15) and we have the standard CP(α, k) process. Note that the similarity of elements i and j, given by Σij only plays a role in 0 the probability of the transition B 7→ B if Bij = 1. In this case, a larger value of 0 0 Σij will tend to favor transitions to states B for which Bij = 1. If Bij = 0, then i and j are in distinct blocks of B and, according to the transition procedure for the cut-and-paste process, choose their blocks in B0 independently of Σ. 2.7 Properties of the CP(α, k; Σ) process Though Σ is naturally disposed toward an interpretation as a similarity matrix, the + expression (2.24) permits any matrix with strictly positive diagonal entries. Let Mn 59 denote the space of non-negative n × n matrices which are strictly positive along the diagonal. The unrestricted parameter space of the CP(α, k; Σ) process is (α, k, Σ) ∈ + + R ×N×Mn ; however, for r1, . . . , rn > 0 and c1, . . . , cn > 0, let R = diag(r1, . . . , rn) and C = diag(c1, . . . , cn) be diagonal matrices with entries ri and ci, i = 1, . . . , n, + respectively. For any Σ ∈ Mn , the matrix RΣC is the image of Σ under multiplication of the ith row of Σ by ri and the jth column of Σ by cj. The α-permanent of RΣC satisfies n Y perα(RΣC) = perα Σ × rici i=1 and the Hadamard product (RΣC) · X = R(Σ · X)C implies that 0 0 0 perα/k((RΣC) · B · B ) perα/k(R(Σ · B · B )C) perα/k(Σ · B · B ) = = . perα((RΣC) · B) perα(R(Σ · B)C) perα(Σ · B) Therefore, for fixed n ≥ 1, it is only necessary to consider matrices Σ in the space DSn of doubly stochastic n × n matrices, i.e. matrices with non-negative entries for which all row and column totals are one, which are positive along the diagonal. + For any Σ ∈ Mn , the entry Σij represents a measure of similarity between elements i and j, and, in general, it is not possible to arbitrarily permute the labels of the index set without performing a corresponding permutation of the rows and columns of Σ. In this sense, a CP(α, k; Σ) process is not exchangeable for a fixed value of Σ and, in the absence of a natural projective system of arrays with rows and columns indexed in N, it is not possible to establish any self-consistency property for finite-dimensional distributions. We show in corollary 2.7.3 that for fixed values of Σ, the CP(α, k; Σ) process on (k) P[n] is exchangeable under certain strict conditions on Σ. However, the following equivariance property holds for any fixed Σ. For an n × n matrix X and permutation σ −1 σ ∈ Sn, we write X := σXσ to denote the image of X by permutation of rows and columns by σ. 0 (k) 0 Proposition 2.7.1. Let B,B ∈ P[n] so that the transition B 7→ B is governed by 60 + (2.24) for some choice of α > 0, k ≥ 1 and Σ ∈ Mn . For any σ ∈ Sn, (2.24) satisfies 0 σ 0σ σ pn(B,B ; α, k, Σ) = pn(B ,B ; α, k, Σ ). (2.25) σ σ σ Proof. For any matrices X,Y and σ ∈ Sn, perα X = perα(X ) and (X ) · (Y ) = (X · Y )σ. It is now clear that (2.25) holds for (2.24). + If Σ ∈ Mn is introduced as an exchangeable random matrix parameter, i.e. Σ ∼ ζ + −1 for some probability measure ζ on Mn such that Σ ∼ σΣσ for any permutation σ ∈ Sn, then a sequence of partitions which is distributed according to CP(α, k; Σ) conditionally on Σ, is unconditionally exchangeable, as the next proposition shows. Proposition 2.7.2. Suppose Σ is a partially exchangeable matrix with distribution + σ ζ(·) on Mn , i.e. Σ ∼ Σ ∼ ζ for any σ ∈ Sn. Further suppose that conditional on Σ, the sequence B1,B2,... of partitions is distributed according to CP(α, k; Σ) (2.24) on (k) P[n] for some α > 0 and k ≥ 1. Then the unconditional distribution of (B1,B2,...) σ σ is exchangeable, i.e. (B1 ,B2 ,...) ∼ (B1,B2,...) for any σ ∈ Sn. Proof. Let σ :[n] → [n] be any permutation of [n]. Let B := (B1,B2,...) be a sequence of partitions governed by the CP(α, k; Σ)-model with initial distribution ρn which is equivariant with respect to Σ. For m ≥ 1, write B[m] := (B1,...,Bm) to denote the restriction of the sequence to its first m elements. For fixed α > 0 and + Σ ∈ Mn , let m−1 Y Pm(B; Σ) := P (B[m]; Σ) = ρn(B1) pn(Bj,Bj+1; Σ) j=1 where pn(·, ·; Σ) is the transition probability (2.24) with parameter (α, k, Σ). Here we treat α as a fixed parameter and therefore suppress it in the notation. Write R Pm(B) = + Pm(B; Σ)ζ(dΣ) to be the unconditional distribution of B[m]. Then Mn 61 we have Z Pm(B) := Pm(B; Σ)ζ(dΣ) + Mn Z σ σ = Pm(B ;Σ )ζ(dΣ) (2.26) + Mn Z σ −1 = Pm(B ; Σ)ζσ (dΣ) (2.27) + Mn Z σ = Pm(B ; Σ)ζ(dΣ) (2.28) + Mn σ = Pm(B ). (2.29) Here (2.26) follows by equivariance of (2.24) and (2.23) from proposition 2.7.1; (2.27) −1 follows by a change of variables Σσ 7→ Σσσ ; (2.28) follows by partial exchangeability of Σ so that ζ = ζσ−1; and (2.29) follows by definition. This holds for all m ≥ 1. In addition, the unconditional law satisfies h (k) (k)i P (B1,...,Bm,Bm+1,...,Bm0) ∈ B1 × · · · × Bm × P[n] × · · · × P[n] = P [(B1,...,Bm) ∈ B1 × · · · × Bm] . + Corollary 2.7.3. Fix n, k ≥ 1, α > 0 and Σ ∈ Mn is such that Σii = a > 0 and Σij = b ≥ 0 for i 6= j. Then the CP(α, k; Σ) process is finitely exchangeable. Σ as a partition When Σ is restricted to the space of partitions, the expression of (2.24) is simplified and further results are possible than for general matrices Σ. In particular, infinite exchangeability can be established if Σ represents an infinitely exchangeable random partition, and self-consistency of (2.24) can be shown for all fixed values of Σ ∈ P. Remark 2.6.2 points out that Σ ≡ 1 coincides with the standard CP(α, k) process. Equation (2.24) also coincides with the CP(α, k; Σ) process when Σ is the trivial 62 partition of [n] with one block. In this case, we write Σ = 1n. Another special case arises when Σ = In, the n × n identity matrix. In this case, (2.24) becomes 0 ↓#B0 n pn(B,B ; α, k, Σ = In) = k /k , the coupon-collector distribution shown in (2.17). The identity matrix corresponds to the partition of [n] into singletons, denoted 0n. Theorem 2.7.4. Let Σ be a partition of N and write Σ[n] as the restriction of Σ to [n] regarded as a matrix. For k ≥ 1 and α > 0, the transition probabilities based on (α, Σ) 0 (k) ∗ −1 are consistent. In particular, for every n ≥ 1, B,B ∈ P[n] and B ∈ Dn,n+1(B) we have 0 X ∗ 00 pn(B,B ; α, k, Σ[n]) = pn+1(B ,B ; α, k, Σ[n + 1]). (2.30) 00 0 B ∈Dn,n+1(B ) 0 (k) ∗ −1 Proof. Let B,B ∈ P[n] and B ∈ Dn,n+1(B). The partition Σ[n + 1] is obtained from Σ[n] by adding the element labeled n + 1 to one of the blocks of Σ[n] or as 00 −1 0 a member of its own block. Likewise, a partition B ∈ Dn,n+1(B ) is obtained by adding n + 1 to a block b ∈ B0 or to its own block. Let b∗ denote the block of Σ[n] ∧ B = Σ[n] · B to which n + 1 is added to obtain Σ[n + 1] ∧ B∗. Then, conditional on Σ[n + 1] ∧ B∗, the probability that n + 1 is added 0 0 to a block b∗ of B is 0 α/k+#(b∗∩b∗) 0 0 , b∗ ∈ B (n + 1 7→ b0 |Σ[n + 1] ∧ B∗, b ) = α+#b∗ P ∗ ∗ (k−#B0)α/k , b0 = ∅. α+#b∗ ∗ Based on these conditional probabilities, it is easy to see that 0 0 X α/k + #(b∗ ∩ b ) (k − #B )α/k + = 1 α + #b∗ α + #b∗ b0∈B0 and the finite-dimensional transition probabilities of the (α, k; Σ)-model satisfy (2.30) for any partition Σ ∈ P and every n ≥ 1. 63 A consequence of proposition 2.7.2 and theorem 2.7.4 is that any mixture of CP(α, k; Σ), for Σ distributed as an infinitely exchangeable random partition of N, characterizes an infinitely exchangeable partition-valued process on P(k). Theorem 2.7.5. Let Σ ∼ Π be an infinitely exchangeable random partition of N and, conditional on Σ, suppose B := (B1,B2,...) ∼ CP(α, k; Σ). Then the unconditional distribution of B is infinitely exchangeable. 2.8 Discussion The cut-and-paste process shares many of the same properties as exchangeable EFC processes [19], with the exception that the paths of the CP(ν)-process are confined to P(k). While the EFC-process [19] has a natural interpretation as a model in certain physical sciences, the CP(ν)-process has no clear interpretation as a physical model. However, the (α, k) class of models could be useful as a statistical model for relationships among statistical units which are known to fall into one of k classes. In statistical work, it is important that any observation has positive probability under the specified model. The (α, k)-process assigns positive probability to all pos- (k) sible transitions and so any observed sequence of partitions in P[n] will have positive probability for any choice of α > 0. In addition, the model is exchangeable, con- sistent and reversible, particularly attractive mathematical properties which could have a natural interpretation in certain applications. The CP(α, k; Σ) process may be natural for modeling DNA sequences where Σ represents either a block structure or phylogenetic tree structure for the units under study. CHAPTER 3 ANCESTRAL BRANCHING AND TREE-VALUED PROCESSES In this chapter, we study a Markov process on the space of fragmentation trees whose transition probabilities are a product of transition probabilities on the space of parti- tions. The result is a family of transition probabilities on the space of fragmentation trees for which we can characterize an infinitely exchangeable process under general conditions. 3.1 Introduction Stochastic processes on the state space of fragmentation trees have appeared in the literature related to inference of unknown phylogenetic trees as well as Markov chain Monte Carlo (MCMC) methods on the space of trees. For large n, the size of the space of [n]-rooted trees is prohibitively large to perform an exhaustive search and some random methods and algorithms have been derived for this purpose. Felsenstein [55] catalogs various transition procedures on the space of trees which have arisen over the years. Subtree prune-and-regraft (SPR) is one procedure for generating a new phylogenetic tree T 0 from another T by first pruning a subtree from T and then regrafting that subtree to a randomly chosen branch in the unpruned part of T . Some processes based on SPR have been studied by Aldous [4] and Aldous and Pitman [6] who show an SPR generated tree-valued process related to the Galton-Watson process. Evans and Winter [52] study a tree-valued process based on SPR which is reversible with respect to Aldous’s continuum random tree (CRT) [2]. Diaconis and Holmes [44] studied a tree-valued Markov chain in relation to random matchings and MCMC. 64 65 We are interested in studying tree-valued processes in the context of modeling sequences of phylogenetic trees and partitions generated from genetic data. Some previous work which studies trees from this perspective appears in Aldous and Popovic [8], Aldous, Krikun and Popovic [5], and Donnelly and Kurtz [45, 46]. Let n ≥ 1 and let {pb(·, ·): b ⊆ [n]} be a collection of transition probabilities on Pb, set partitions of each b ⊆ [n], such that pb(π, 1b) < 1 for every π ∈ Pb, where 1b = {b} is the one block partition of b. For t ∈ Tn and b ⊆ [n], we write t|b for the reduced subtree of t restricted to b in the natural way, and Πt is the root partition of t; see section 1.10.3 for discussion of notation and terminology for fragmentation trees. The ancestral branching kernel based on {pb, b ⊆ [n]} assigns probability 0 Y Qn(t, t ) = pb(Πt , Πt0 )/[1 − pb(Πt , 1b)] (3.1) |b |b |b b∈t0:#b≥2 0 0 to the transition t 7→ t for every pair t, t ∈ Tn. The transition probability in (3.1) is the product over all non-singleton ancestors of t0 of transition probabilities between 0 root partitions of reduced subtrees of t, t . In general, for b ⊂ N, we write Qb(·, ·) for the distribution on Tb, b-rooted fragmentation trees, corresponding to (3.1). A main result is given by theorem 3.1.1. Theorem 3.1.1. Let {pb : b ⊂ N} be a finitely exchangeable and consistent family of transition probabilities on partitions of b for each b ⊂ N. Then {Qb : b ⊂ N} based on {pb : b ⊂ N} is finitely exchangeable, consistent and characterizes an infinitely S exchangeable transition measure Q on T , σ n≥1 Tn , the space of fragmentations of N with sigma field generated by the finite sets Tn for each n ≥ 1. Furthermore, if pb is exchangeable for every b with #b = 2, and if pb(π, 1b) = ∗ ∗ pa(ϕ (π), 1a) for all a, b with #a = #b and any injection ϕ : Pb → Pa, then {Qb : b ⊂ N} infinitely exchangeable implies that {pb : b ⊂ N} is infinitely exchangeable. We are able to show the inheritance of several properties from the associated transition probability on partitions. In section 3.5, we show specific connections for the ancestral branching process based on the transition probabilities of the cut-and- paste process with parameter ν. 66 3.2 Ancestral branching kernels + A Markov kernel on a measurable space (Ω, F) is a map P :Ω × F → R such that • for each ω ∈ Ω, P (ω, ·) is a probability measure on (Ω, F), and + • for each B ∈ F, P (·,B) is measurable between F and B(R ), the Borel sigma + field of R . Let A ⊂ N be a finite subset such that #A ≥ 2 and let {PS : S ⊆ A} be a collection of Markov kernels on PS, set partitions of S, such that PS(·, 1S) < 1 for all S ⊆ A. We define the Markov kernel QA(·, ·) on TA as in (3.1) by p (Π , Π ) b t|b t0 0 Y |b QA(t, t ) = , (3.2) 1 − pb(Πt , 1b) b∈t0:#b≥2 |b the product of Markov kernels on the root partitions of the reduced subtrees of all 0 parents of t conditioned to be non-trivial, i.e. not the one block partition 1b. The form of (3.2) admits the recursive expression 0 pA(Πt, Πt0) Y 0 QA(t, t ) = Qb(t|b, t|b), (3.3) 1 − pA(Πt, 1A) b∈Πt0 which has an intuitive interpretation in terms of independent self-similar transitions on the space of reduced subtrees of the children of the root of t0. The reader familiar with the literature on fragmentation processes may draw parallels to a typical de- scription of a fragmentation process in terms of successive partitioning of fragments, see e.g. [25, 80]. The specification in (3.2) is related to this specification, but has the added feature of including a Markovian dependence on the previous state in a sequence of fragmentation trees. Definition 3.2.1. A Markov kernel on TA of the form (3.2) is called an ancestral branching (AB) Markov kernel on TA. Remark 3.2.2. The ancestral branching Markov kernel only requires associated P- valued transition probabilities to be defined on PS\{1S} for each S ⊆ A. However, we 67 always assume that we have a family of transition probabilities which is well-defined on the whole space PS and satisfies pS(·, 1S) < 1. This distinction becomes convenient when we consider infinitely exchangeable processes of AB type later on. 3.3 Exchangeable ancestral branching Markov kernels A collection of Markov kernels Q := {QA(·, ·): A ⊂ N} on (TA,A ⊂ N) is finitely exchangeable if for each n ≥ 1, A, B ⊂ N with #A = #B = n, and t ∈ TA ∗ ∗ QA(t, ·) = QB(ϕ (t), ϕ (·)) (3.4) ∗ for every bijection ϕ : A → B, where ϕ : TA → TB denotes the associated bijection TA → TB. If the kernels QA are finitely exchangeable, then there exists a canonical injection map σA : A → [n] such that QA(·, ·) = Qn(σA(·), σA(·)) =: QnσA(·, ·), the exchangeable transition probability function for n. We define the canonical injection A → [n] as follows. Suppose, without loss of generality, that A = {a1, . . . , an} with a1 < a2 < . . . < an. Then we define the canonical injection by σA : A → [n], ai 7→ i. For each A ⊆ N such that #A = n, QA(·, ·) = QnσA(·, ·) whenever {QA : A ⊂ N} is finitely exchangeable, as in (3.4). Therefore, for a finitely exchangeable family of Markovian transition probabilities, we need only specify a transition probability Qn(·, ·) on Tn for each n ≥ 1 and, in this case, we denote Qb ≡ Qnσb. Theorem 3.3.1. Let n ≥ 1 and, for each A ⊂ N with #A < ∞, let QA(·, ·) be a ∗ ∗ branching Markov kernel on TA induced by the family {PS : S ⊆ A}, where PS := {pS(B, ·): B ∈ PS} is a Markov kernel on PS. Also, assume that pA(π, 1A) = ∗ ∗ pB(ψ (π), 1B) for every finite A, B ⊂ N with #A = #B and bijection ψ : PA → PB. Then the family {QA : A ⊂ N} is finitely exchangeable if and only if the restricted ∗ collection {PS : S ⊂ N} is finitely exchangeable. Proof. Let A ⊂ N be a finite subset and P := {PS : S ⊆ A} be a family of Markov 68 kernels on {PS : S ⊆ A}. From (3.2), the AB Markov kernel on TA based on P is pb(ΠT , ΠT 0 ) 0 Y |b |b QA(T,T ) = . 1 − pb(ΠT , 1b) b∈T 0:#b≥2 |b For A, B ⊂ N with #A = #B and injection map ϕ : A → B with associated ∗ ∗ injection ϕ : TA → TB, we also write ϕ to denote the associated injection PA → PB, which should cause no confusion since it is clear from context to which we are referring. For n = 2 and A, B ⊂ N such that #A = #B = 2, let ψ : A → B be an injective ∗ map with associated injection ψ : PA → PB. In this case, #PA = #PB = 2 so that we can write A1,A2 as the elements of PA with #A1 = 1 and #A2 = 2. Likewise, we write B1 and B2 for one and two block (respectively) elements of PB. Hence, ∗ ∗ ψ (Ai) = Bi for i = 1, 2. It is assumed that pB(ψ (π), 1B) = pA(π, 1A) for every ∗ ∗ ∗ π ∈ PA. Hence, pB(ψ (π), 1B) = pB(ψ (π),B1) = pA(π, A1), 1 − pB(ψ (π),B1) = ∗ ∗−1 pB(ψ (π),B2) = pA(π, A2) = 1 − pA(π, A1), and pB = pAψ for #A = #B = 2. So {pA(·, ·):#A = 2} is exchangeable. Also, #TA = #TB = 1 implies for ∗ ∗ t ∈ TA, QA(t, t) = QB(ψ (t), ψ (t)) = 1 trivially. And so, {QA(·, ·):#A = 2} is exchangeable. Now, fix n > 2 and suppose that for any pair A, B ⊂ N with #A = #B ≤ n ∗−1 ∗−1 and any injective map ϕ : A → B that QB = QAϕ implies pB = pAϕ on ∗ ∗ ∗ ∗ ∗ ∗ ∗ PB. Consider A ,B ⊂ N with #A = #B = n + 1 and A ⊃ A, B ⊃ B and let ψ : A∗ → B∗ be an injective map A∗ → B∗ whose restriction to A → B corresponds ∗ to ϕ. Write ψ : TA∗ → TB∗ for its associated injection TA∗ → TB∗. 69 ∗ 0 Assume that QA∗ = QB∗ψ and let t, t ∈ TA∗. Then 0 pA∗ (Πt, Πt0 ) Y 0 QA∗ (t, t ) = Qb(t|b, t|b) (3.5) 1 − pA∗ (Πt, 1A∗ ) b∈Πt0 :#b≥2 pB∗ (Πψ∗(t), Πψ∗(t0)) Y ∗ ∗ 0 = Qb(ψ (t)|b, ψ (t )|b) (3.6) 1 − pB∗ (Πψ∗(t), 1B∗ ) b∈Πψ∗(t0):#b≥2 pB∗ (Πψ∗(t), Πψ∗(t0)) Y ∗ ∗ 0 = Qb(ψ (t|ψ−1(b)), ψ (t|ψ−1(b))) (3.7) 1 − pB∗ (Πψ∗(t), 1B∗ ) b∈Πψ∗(t0):#b≥2 pB∗ (Πψ∗(t), Πψ∗(t0)) Y 0 = Qb(t|b, t|b) (3.8) ∗ ∗ ∗ 1 − pB (Πψ (t), 1B ) 0 b∈Πt:#b≥2 which implies that 0 ∗ ∗ 0 pA∗(π, π ) pB∗(ψ (π), ψ (π )) = ∗ 1 − pA∗(π, 1A∗) 1 − pB∗(ψ (π), 1B∗) ∗ 0 ∗ for all one-to-one functions ψ : PA∗ → PB∗ and all π, π ∈ PA∗\{1A∗} =: PA∗; hence, 0 ∗ ∗ 0 pA∗(π, π ) = pB∗(ψ (π), ψ (π )) ∗ ∗ ∗ ∗ by assumption that pB∗(ψ (·), 1B∗) = pA∗(·, 1A∗) for all A ,B such that #A = #B∗ and all injective map ψ : A∗ → B∗. This establishes finite exchangeability for {pA(·, ·):#A ≤ n + 1}. Induction implies this holds for all n ≥ 1 and hence also implies finite exchangeability of {pA(·, ·): A ⊂ N, #A < ∞}. ∗ ∗ If {PS : S ⊂ N} is finitely exchangeable, then pA(π, 1A) = pB(ψ (π), 1B) for any ∗ A, B with #A = #B and any injection ψ : PA → PB. The reverse implication is immediate. 3.4 Consistent ancestral branching kernels Let A ⊆ N and for ∅= 6 C ⊂ B ⊆ A, let DC,B : TB → TC be the restriction map TB → TC (section 1.4.2). A family of Markov kernels {QS : S ⊆ A} defined 70 on the projective system {TS : S ⊆ A} under restriction is consistent if for all ∅= 6 C ⊂ B ⊆ A and t ∈ TC, −1 ∗ −1 QBDC,B(t, ·) := QB(t ,DC,B(·)) = QC(t, ·) (3.9) ∗ −1 for every choice of t ∈ DC,B(t). x Let S ⊂ N. Below we write S := S ∪ {x} for any x∈ / S, the set S augmented x by x. We write eSx = {{S}, {x}}, the two block partition of S having {x} as one of its blocks. Theorem 3.4.1. Let Q := {QS : S ⊆ A} be a family of ancestral branching Markov kernels based on a collection P := {PS : S ⊆ A}. The family Q is consistent if pS(π, ·) is consistent for all S ⊆ A and π 6= 1S. Moreover, if, in addition, ∗ ∗ (A) pSx(π , eSx) + pSx(π , 1Sx) = pS(π, 1S) ∗ −1 for every S ⊆ A, π ∈ PS and π ∈ DS,Sx(π), then Q consistent implies pS is consistent for all S ⊆ A. ∗ Proof. Suppose Q is consistent and (A) holds for every S ⊆ A, π ∈ PS and π ∈ −1 DS,Sx(π). We show that P is consistent by induction. For S ⊂ A such that #S = 2, TS contains exactly one element, which we denote tS. QS(tS, tS) = 1 and, for any Sx ⊆ A, X ∗ 00 X ∗ 00 X pSx(Πt∗, π) QSx(t , t ) = QSx(t , t ) = = 1 1 − pSx(Πt∗, 1Sx) 00 −1 t00∈T x π∈P x\{1 x} t ∈DS,Sx(tS) S S S ∗ for all t ∈ TSx since QSx is a transition probability. By assumption (A), X x ∗ x x ∗ x ∗ x 1 − pS (Πt , 1S ) = pS (Πt , π) + [pS(ΠtS , 1S) − pS (Πt , 1S )] −1 π∈DS,Sx (Πt0 ) X x ∗ pS(ΠtS , ΠtS ) + pS(ΠtS , 1S) = pS (Πt , π) + pS(ΠtS , 1S) −1 π∈DS,Sx (Πt0 ) X x ∗ pS(ΠtS , ΠtS ) = pS (Πt , π), −1 π∈DS,Sx (ΠtS ) 71 x and pS is consistent with pSx for every S . Now, for each S ⊂ A with #S = m < #A, assume that pT (·, ·) is consistent x c 0 for all T ⊆ S. Let S = S ∪ {x} for some x ∈ A ∩ S . Assume t, t ∈ TS and let ∗ −1 x ∗ −1 t ∈ DS,Sx(t). For a partition π ∈ PS and b ∈ π, write b ∈ π ∈ DS,Sx(π) to denote the block of π∗ which contains x and write b∗ ∈ π to denote the block of π to which x is added to obtain bx ∈ π∗. Then X ∗ 00 QSx (t , t ) = (3.10) 00 −1 0 t ∈DS,Sx (t ) ∗ X X pSx (Πt∗ , π ) Y h ∗ 0 i ∗ 00 = Qb(t|b, t|b) Qbx (t|bx , t|bx ) + 1 − pSx (Πt∗ , 1Sx ) ∗ −1 00 −1 0 b∈π∗:b6=bx π ∈DS,Sx (Πt0 ) t ∈Db∗,bx (t|b∗ ) pSx (Πt∗ , eSx ) 0 + QS(t, t ) (3.11) 1 − pSx (Πt∗ , 1Sx ) ∗ X pSx (Πt∗ , π ) Y ∗ 00 X ∗ 00 = Qb(t|b, t|b) Qbx (t|bx , t|bx ) 1 − pSx (Πt∗ , 1Sx ) ∗ −1 b∈π∗:b6=bx 00 −1 0 π ∈DS,Sx (Πt0 ) t ∈Db∗,bx (t|b∗ ) pSx (Πt∗ , eSx ) 0 + QS(t, t ) (3.12) 1 − pSx (Πt∗ , 1Sx ) ∗ X pSx (Πt∗ , π ) Y 0 pSx (Πt∗ , eSx ) 0 = Qb(t|b, t|b) + QS(t, t ) (3.13) 1 − pSx (Πt∗ , 1Sx ) 1 − pSx (Πt∗ , 1Sx ) ∗ −1 b∈π∗ π ∈DS,Sx (Πt0 ) ∗ 0 pSx (Πt∗ , eSx ) X pSx (Πt∗ , π ) 1 − pS(Πt, 1S) = QS(t, t ) + (3.14). 1 − pSx (Πt∗ , 1Sx ) 1 − pSx (Πt∗ , 1Sx ) pS(Πt, Πt0 ) ∗ −1 π ∈DS,Sx (Πt0 ) ∗ 0 Here, (3.11) follows by noticing that the restriction t|b and t|b is unaffected unless x 00 −1 0 ∗ −1 b = b and that t ∈ DS,Sx(t ) can be broken down into a sum over π ∈ DS,Sx(Πt0) 0 and a sum over trees in the inverse image of the reduced subtree t|b∗. Line (3.12) follows by bringing factors that do not depend on bx outside of the sum. Line (3.13) follows by the induction hypothesis that Qb is consistent for all b ⊆ S. And line (3.14) follows by the recursive expression of (3.3). P ∗ 00 0 ∗ −1 Consistency requires that 00 −1 0 QSx(t , t ) = QS(t, t ) for all t ∈ DS,Sx(t) t ∈DS,Sx(t ) 72 and hence ∗ p x(Π ∗, e x) X p x(Π ∗, π ) 1 − p (Π , 1 ) S t S + S t S t S = 1, 1 − pSx(Π ∗, 1Sx) 1 − pSx(Π ∗, 1Sx) pS(Πt, Π 0) t ∗ −1 t t π ∈DS,Sx(Πt0) which is equivalent to 1 − pS(Πt, 1S) X ∗ pSx(Πt∗, eSx) + pSx(Πt∗, π ) = 1 − pSx(Πt∗, 1Sx). pS(Πt, Π 0) t ∗ −1 π ∈DS,Sx(Πt0) P ∗ Suppose that ∗ −1 pSx(Πt∗, π ) 6= pS(Πt, Πt0), then π ∈DS,Sx(Πt0) 1 − pS(Πt, 1S) X ∗ pSx(Πt∗, π ) 6= 1 − pS(Πt, 1S) pS(Πt, Π 0) t ∗ −1 π ∈DS,Sx and ∗ p x(Π ∗, e ) X p x(Π ∗, π ) 1 − p (Π , 1 ) S t n+1 + S t S t S 6= 1 1 − pSx(Π ∗, 1 ∗) 1 − pSx(Π ∗, 1Sx) pS(Πt, Π 0) t S ∗ −1 t t π ∈DS,Sx(Πt0) by the assumption that pT (·, ·) is consistent for all T ⊆ S and assumption (A). Hence, we conclude that consistency of QS and QSx, along with assumption (A), ∗ implies that pS(π, ·) and pSx(π , ·) are consistent for all π ∈ PS with #π > 1 and all ∗ −1 π ∈ DS,Sx(π). Reversal of the above argument shows that consistency of pS(π, ·) for π with #π > 1 implies consistency of QS in (3.14). Infinitely exchangeable kernels From section 1.4.2, the collection (Tn, n ≥ 1) is a projective system under restriction and so a collection {QA : A ⊂ N} of Markov ker- nels on TA for each finite A characterizes an infinitely exchangeable kernel Q on T if it satisfies the necessary finite exchangeability and self-consistency conditions. Putting together theorems 3.3.1 and 3.4.1, we arrive at a condition for the infinite exchange- 73 ability of Q in terms of associated partition-valued Markov kernels. In particular, if P := {pS : S ⊂ N} is finitely exchangeable and consistent, and pS(·, 1S) < 1 for every S, then the collection Q := {QS : S ⊂ N} based on P is infinitely exchangeable ∞ S and there is a unique transition measure Q on T , σ n≥1 Tn such that for every 0 n ≥ 1 and t, t ∈ Tn, ∞ ∞ ∗ ∗ 0 0 Q (t , {t ∈ T : t|[n] = t }) = Qn(t, t ) ∞ ∗ ∗ for all t ∈ {t ∈ T : t|[n] = t}. The coalescent process does not satisfy this condition because it becomes absorbed in the one-block state almost surely, but other known processes do, e.g. exchangeable fragmentation-coalescence (EFC) processes [19] and CP(ν) Markov processes [40]. We now turn our attention to the special properties of the ancestral branching kernel induced by the transition probabilities of the CP(ν)- process. 3.5 Cut-and-paste ancestral branching processes For the rest of this chapter, the space T is assumed to be equipped with the sigma S field σ n≥1 Tn as in the previous sections. ↓(k) ν For n ≥ 1, k ≥ 2 and ν a probability measure on R , let pn(·, ·) denote (k) ν ν the CP(ν) transition probability on P[n] in (2.2) and qn(·, ·) = 1 − pn(·, ·) its com- ν plementary probability. The family {pn(·, ·), n ≥ 1} is infinitely exchangeable on S (k) ν (k) P, σ n≥1 P[n] and so defines a unique transition probability pA(·, ·) on PA for each A ⊂ N by ν ν pA(·, ·) := p#A(·, ·) for #A < ∞ and pν (·, ·) = pν (·, ·) otherwise. A N ν Furthermore, if ν is non-degenerate at (1, 0,..., 0), then pb (·, 1b) < 1 for all b ⊂ N with #b > 1; and so, (3.2) is well-defined and the results of section 3.4 hold. In particular, the T -valued process induced by the finite-dimensional transition probabilities (2.2) is infinitely exchangeable. 74 3.5.1 Construction of the cut-and-paste ancestral branching Markov chain on T We introduce a genealogical indexing system to label the elements of tA ∈ TA (chapter 1.2.1 of Bertoin [25]). We write ∞ [ n U := N n=0 0 to denote the infinite set of all indices, with convention that N = {∅}. For a fragmentation tree T , the nth generation of T is the collection of children t ∈ T such that # anc(t) = n − 1. For each u = (u1, . . . , un) ≡ u1u2 ··· un ∈ U, n is the generation of u. Let u− := (u1, . . . , un−1) denote the parent of u and let ui := (u, i) := (u1, . . . , un, i) denote the ith child of u. The ith child of t ∈ T is defined as the ith element to appear in a list when the elements of frag(t), the children of t, are listed in order of their least element. A Markov chain on T (k) governed by (3.2) can be constructed by a genealogical branching procedure as follows. Let k ≥ 2 and ν be a probability measure on R↓(k) which is non-degenerate at (1, 0,..., 0). For T,T 0 ∈ T (k), the transition T 7→ T 0 occurs as follows. Generate u (k) (k) {B : u ∈ U} i.i.d. %ν partition sequences, where %ν := %ν ⊗ · · · ⊗ %ν is the k-fold product measure of paintboxes based on ν, and {σu : u ∈ U} i.i.d. k-tuples of i.i.d. uniform permutations of [k]. Genealogical Branching Procedure ∅ ∅ (i) Put ΠT 0 = CP(ΠT ,B , σ ), the partition obtained from the column totals of ∅ σ∅ ΠT ∩ (B ) , as shown in section 2.2; (ii) for Au ∈ T 0, put Auj equal to the jth block of CP(Π ,Bu, σu) listed in order T|Au of least elements. In other words, each Bu is an independent k-tuple of independent paintboxes based on ν and we index this sequence just as we index the vertices of a tree using U. Likewise, u u u each σ is an independent k-tuple (σ1 , . . . , σk ) of i.i.d. uniform permutations of [k]. 75 The next state T 0 is obtained from T by a sequential branching procedure which starts from the root and progressively branches the roots of the subtrees restricted to each child of T 0. The children of T 0 are given by {Au, u ∈ U} and for each n ≥ 1 the 0 0 u restriction to [n] of T is T|[n] = {A ∩ [n], u ∈ U}, excluding the empty set. The genealogical branching procedure simultaneously generates sequences of trees on Tn for every n ≥ 1. It should be plain that this construction is equivalent to that in section 3.2 since it uses the matrix construction of the CP(ν) transition probabilities (k) on PA . The benefit to this construction is that it gives an explicit recipe which will be employed in the proofs of various properties of the CP(ν)-based ancestral branching process in later sections. Proposition 3.5.1. Let T 7→ T 0 ∈ T (k) be a transition generated by the above genealogical branching procedure. For n ≥ 1, the finite-dimensional transition proba- 0 bility of the restricted transition T|[n] 7→ T|[n] is ν pb (ΠT , ΠT 0 ) ν 0 Y |b |b Qn(T,T ) := ν . (3.15) q (ΠT , 1b) b∈T 0 b |b ν ν Proof. Write pn(·, ·) ≡ pn(·, ·) and qn(·, ·) ≡ qn(·, ·). For n ≥ 1, the branching 0 u(M) of the root of T|[n] given T|[n] is given by the children of A|[n] where u(m) := u(m) (1,..., 1, 0,...) ∈ U and M is chosen to be the smallest m ≥ 0 such that A = 1n | {z } |[n] m times u(m+1) u(m) 0 and A|[n] 6= 1n. If M = 0, we put u(m) = ∅ and A|[n] is the root of T restricted 0 to [n]. The distribution of the branching of the root of T|[n] given T|[n] obtained in this way is ∞ pn(ΠT , ΠT 0 ) X i |[n] |[n] pn(ΠT , ΠT 0 )pn(ΠT , 1n) = . |[n] |[n] |[n] qn(Π , 1n) i=0 T|[n] By independence of the steps of the procedure, we can write the distribution of 76 the transition T 7→ T 0 recursively, as in (3.3), by 0 pn(ΠT , ΠT 0) Y 0 Qn(T,T ) = Qb(T|b,T|b). qn(ΠT , 1n) b∈ΠT 0 Iterating the above argument yields (3.15). 3.5.2 Equilibrium measure ν 0 The form of Qn(T,T ) in (3.15) is a product of independent transition probabilities of the branching at the root in each of the subtrees of T 0. It is known that for ν ↓(k) ν non-degenerate at (1, 0,..., 0) ∈ R , pn(·, ·) has a unique equilibrium distribution ν (k) for each n ≥ 1 [40]. Since pn(B,B) > 0 for every n ≥ 1 and B ∈ P[n] , we have that ν (k) ν Qn(t, t) > 0 for all t ∈ Tn and so each Qn(·, ·) is aperiodic for non-degenerate ν ∈ ↓(k) ν ν R . Irreducibility of Qn(·, ·) follows by the irreducibility of pn(·, ·) and assignment (k) of positive probability to B 7→ B for every B ∈ P[n] . The following proposition is immediate. Proposition 3.5.2. Let ν be a probability measure on R↓(k) such that ν((1, 0,..., 0)) < ν 1 and let Qn(·, ·) be the CP(ν)-ancestral branching Markov kernel, then there exists a ν (k) ν unique measure ρn(·) on Tn which is stationary for Qn(·, ·) for each n ≥ 1. ν ν The existence of ρn and the finite exchangeability and consistency of Qn for each ν n ≥ 1 induce finite exchangeability and consistency for the collection (ρn, n ≥ 1) of equilibrium measures. The proof of the following proposition is identical to the proof of theorem 2.2.6 with the obvious changes of notation. Proposition 3.5.3. Let (Qn(·, ·), n ≥ 1) be an infinitely exchangeable collection of ancestral branching Markov kernels (3.2) on (Tn, n ≥ 1) and suppose for each n ≥ 1 ρn(·) is the unique stationary distribution for Qn(·, ·). Then the family (ρn(·), n ≥ 1) is infinitely exchangeable. Moreover, there exists a unique measure ρ on T such that ρ {T ∈ T : T|[n] = Tn} = ρn(Tn). 77 Remark 3.5.4. The above results for the equilibrium measure ρ apply specifically to the CP(ν) ancestral branching process under the condition that ν is non-degenerate at (1, 0,..., 0) ∈ R↓(k). The above proposition can be extended to general infinitely exchangeable ancestral branching Markov chains by modifying the condition on ν 6= (1, 0,..., 0) to state that, for each n ≥ 1, pn is irreducible and pn(B,B) > 0 for every B ∈ P[n]. Remark 3.5.5. There appears to be nothing particular to the ancestral branching, or tree-valued process in general, in our proof of theorem 2.2.6. In particular, for a general infinitely exchangeable collection of transition measures (Pn(·, ·), n ≥ 1) on a countably indexed projective system such that each Pn has unique stationary measure πn, the collection (πn, n ≥ 1) is infinitely exchangeable. 3.5.3 Continuous-time ancestral branching process An infinitely exchangeable collection (Qn, n ≥ 1) of ancestral branching transition probabilities in (3.2) can be embedded in continuous time in a straightforward way by defining the Markovian infinitesimal jump rates rn(·, ·) on Tn ( 0 0 0 λQn(T,T ),T 6= T rn(T,T ) = (3.16) 0, otherwise, for some λ > 0. Since λ acts only as a scaling parameter for time, we can assume, without loss of generality, that λ = 1. Definition 3.5.6. A process T := (T (t), t ≥ 0) is an ancestral branching Markov process if for each n ≥ 1, the restriction T|[n] := (T|[n](t), t ≥ 0) is a Markov process on Tn with infinitesimal transition rates rn(·, ·) in (3.16) for (Qn, n ≥ 1) as in (3.2). A process on T whose finite-dimensional restrictions are governed by rn can be constructed by running a Markov chain on Tn governed by (3.2) in which only transi- tions T 7→ T 0 for T 6= T 0 are permitted, and adding a hold time which is exponentially distributed with mean 1/[1 − rn(T,T )]. The following proposition is a corollary of theorems 3.3.1 and 3.4.1. 78 Corollary 3.5.7. For an infinitely exchangeable family (Qn(·, ·), n ≥ 1) of ancestral branching kernels (3.2), the collection (Rn, n ≥ 1) of finite-dimensional Q-matrices 0 0 0 based on (Qn, n ≥ 1) with entries Rn(T,T ) = rn(T,T ) as in (3.16) for T 6= T and Rn(T,T ) = Qn(T,T ) − 1 are infinitely exchangeable. ν ν We write Rn to denote the Q-matrix based on the transition kernel Qn in (3.15). The existence of a continuous-time process with embedded jump chain governed by (3.15) is clear by corollary 3.5.7 and the discussion at the end of section 3.4. Theorem 3.5.8. There exists a continuous-time Markov process (T (t), t ≥ 0) on (k) S (k) ν T , σ n Tn governed by R such that ν ∞ 00 (k) 00 0 ν 0 R (T , {T ∈ T : T|[n] = T }) = Rn(T,T ), ∞ ∗ (k) ∗ for every T ∈ {T ∈ T : T|[n] = T }. Proof. Corollary 3.5.7 establishes that the finite-dimensional infinitesimal jump rates ν (rn, n ≥ 1) are finitely exchangeable and consistent. Kolmogorov’s extension theorem ν ν implies the existence of R with finite-dimensional restrictions given by (Rn, n ≥ 1). ν ν Furthermore, for each n ≥ 1 and T ∈ Tn, −Rn(T,T ) = 1 − Qn(T,T ) < 1 < ∞ so that the finite-dimensional paths are c`adl`agfor each n, which implies the paths of (T (t), t ≥ 0) governed by Rν are c`adl`ag. The transition rates above are defined in terms of a collection of infinitely ex- changeable transition probabilities (Qn(·, ·), n ≥ 1). If Qn has unique equilibrium measure ρ, then so does its associated continuous-time process. We have the follow- ing corollary for the stationary measure of the continuous-time process. Corollary 3.5.9. Let (T (t), t ≥ 0) be a continuous-time process governed by an infinitely exchangeable collection (Qn, n ≥ 1) of ancestral branching transition proba- bilities (3.2). If the characteristic measure Q on T has unique equilibrium measure ρ as in proposition 3.5.3, then (T (t), t ≥ 0) has unique equilibrium measure ρ. We now restrict our attention to the CP(ν) subfamily of ancestral branching processes on T (k) for fixed k ≥ 1. We index transition measures and stationary ν ν measures by ν, e.g. Qn, pn, etc., to make this explicit. 79 3.5.4 Poissonian construction A consequence of the above continuous-time embedding and the alternative speci- fication of the cut-and-paste ancestral branching algorithm given in section 3.5.1 is another construction via a Poisson point process. u + Q hQk (k)i Let P = {(t, B : u ∈ U)} ⊂ R × u∈U j=1 P be a Poisson point process N (k) (k) with intensity measure dt ⊗ u∈U %ν , where %ν is the k-fold product measure Qk (k) u u u u Qk (k) %ν ⊗ · · · ⊗ %ν on j=1 P . For each (t, B ) ∈ P , B := (B1 ,...,Bk ) ∈ j=1 P (k) is distributed as %ν and is labeled according to the genealogical index system of section 3.5.1. Construct a continuous time CP(ν)-ancestral branching Markov process as follows. Let τ ∈ T (k) be an infinitely exchangeable random fragmentation tree. For each n ≥ 1, put T|[n](0) = τ|[n] and for t > 0 • if t is not an atom time for P , then T|[n](t) = T|[n](t−); • if t is an atom time for P so that (t, Bu : u ∈ U) ∈ P , generate σ := (σu : u ∈ Q hQk i U) ∈ u∈U j=1 Sk , an i.i.d. collection of k-tuples of uniform permutations of [k]. Put T := T (t−) and T 0 equal to the tree constructed from T , {Bu : u ∈ U} and σ through the function CP(·, ·, ·) which is described in section 2.2. If 0 0 T|[n] 6= T|[n], put T|[n](t) = T|[n]; otherwise, put T|[n](t) = T|[n](t−). Proposition 3.5.10. The above process T is a Markov process on T (k) with transition matrix Rν defined by theorem 3.5.8. Proof. By the above construction, for every n ≥ 1 and t > 0, T|[n](t) evolves according ν −1 to rn in (3.16), Dm,nT|[n](t) = T|[m](t) for all m ≤ n, and T|[p](t) ∈ Dn,p(T|[n](t)) ν for all p > n. Hence, the restriction T|[n] is a Rn-governed Markov process for each ν n ≥ 1 and the result is clear by consistency of Rn (corollary 3.5.7). 3.5.5 Feller process In section 2.3.1, we showed that the cut-and-paste process with finite-dimensional Markovian jump rates corresponding to the transition probabilities in (2.1) is a Feller 80 process. We now show that the ancestral branching Markov process on T (k) which is associated to the CP(ν) Markov process is also Fellerian. + Define the metric d : T × T → R by 0 0 d(T,T ) := 1/ max{n ∈ N : T|[n] = T|[n]}, (3.17) for every T,T 0 ∈ T , with the convention that 1/∞ = 0. Proposition 3.5.11. d is an ultrametric on T . That is, for any T,T 0,T 00 ∈ T , d(T,T 0) ≤ max(d(T,T 00), d(T 0,T 00)). Proof. Positivity and symmetry are obvious. To see that the ultrametric inequality holds, let T,T 0,T 00 ∈ T so that d(T,T 0) = 1/a for some a ≥ 1. Now suppose that d(T,T 00) = 1/b ≥ 1/a. Then the ultrametric property is trivially satisfied. If 00 00 0 0 d(T,T ) = 1/b < 1/a then T|[b] = T|[b] for b > a and T|[a] = T|[a] but T|[a+1] 6= T|[a+1] by assumption. Hence, d(T 0,T 00) = 1/a and the ultrametric property holds. Proposition 3.5.12. (T , d) is a compact space. Proof. Let (T 1,T 2,...) be a sequence in T . Any element T ∈ T can be written as a compatible sequence of finite-dimensional restrictions, T := (T|[1],T|[2],...) := (T1,T2,...). The set Tn is finite for each n, and so one can extract a convergent subsequence (T (1),T (2),...) of (T 1,T 2,...) by the diagonal procedure such that d(T (i),T (j)) ≤ 1/ min{i, j} for all i, j. 0 0 Lemma 3.5.13. Cf := {f : T → R : ∃n ∈ N s.t. d(T,T ) ≤ 1/n ⇒ f(T ) = f(T )} 0 is dense in the space of continuous functions T → R under the metric ρ(f, f ) := 0 supτ∈T |f(τ) − f (τ)|. Proof. Let ϕ : T → R be a continuous function. Then for every > 0 there exists 0 0 0 n() ∈ N such that τ, τ ∈ T satisfying d(τ, τ ) ≤ 1/n() implies |ϕ(τ) − ϕ(τ )| ≤ . For fixed > 0, let N = n() and define f : T → R as follows. First, partition T into equivalence classes {τ ∈ T : τ|[N] = t|[N]} for each t ∈ T . For each equivalence 81 class U, choose a representative elementu ˜ ∈ U and put f(u) := ϕ(˜u) for all u ∈ U so that f ∈ Cf . For any t ∈ T , let t˜ denote the representative of t obtained in this way. Hence, f(t) = f(t0) = f(t˜) for all t, t0 such that d(t, t0) ≤ 1/N. Thus, |f(τ) − ϕ(τ)| = |ϕ(˜τ) − ϕ(τ)| ≤ for all τ ∈ T by continuity of ϕ and ρ(f, ϕ) = supτ |f(τ) − ϕ(τ)| ≤ , which establishes density of Cf . Let Pt be the semi-group of a CP(ν)-ancestral branching process T (·), i.e. for any (k) continuous ϕ : T → R Ptϕ(τ) := Eτ ϕ(T (t)), the expectation of ϕ(T (t)) given T (0) = τ. Corollary 3.5.14. A CP(ν)-ancestral branching Markov process has the Feller prop- erty, i.e. (k) • for every continuous function ϕ : T → R and τ ∈ T , one has lim Ptϕ(τ) = ϕ(τ), and t↓0 • for all t > 0, τ 7→ Ptϕ(τ) is continuous. Proof. The proof follows the same line of reasoning as corollary 2.3.6. Let ϕ be a (k) continuous function T → R. For g ∈ Cf , limt↓0 Ptg(τ) = g(τ) is clear since the first jump-time of T (·) is exponential with finite mean. Denseness of Cf establishes the first point. For the second point, let n ≥ 1 and τ, τ 0 ∈ T (k) such that d(τ, τ 0) ≤ 1/n, i.e. 0 τ|[n] = τ|[n]. Use the same Poisson point process P , as in section 3.5.4, to construct 0 0 0 0 T (·) and T (·) such that T (0) = τ and T (0) = τ . By construction, T|[n] = T|[n] 82 0 and d(T (t),T (t)) ≤ 1/n for all t ≥ 0. Hence, for any continuous ϕ, τ 7→ Ptϕ(τ) is continuous. By corollary 3.5.14, we can characterize the CP(ν)-ancestral branching Markov ν process (T (t), t ≥ 0) with finite-dimensional rates (rn(·, ·), n ≥ 1) by its infinitesimal generator G given by Z G(f)(τ) = (f(τ 0) − f(τ))Rν(τ, dτ 0) T (k) ν for every f ∈ Cf , where R is from theorem 3.5.8. 3.6 Mass fragmentations + A mass fragmentation of x ∈ R is a collection Mx of masses such that (i) x ∈ Mx and Pk (ii) there are m1, . . . , mk ∈ Mx such that i=1 mi ≤ x and Mx = {x} ∪ Mm1 ∪ · · · ∪ Mmk . We write Mx to denote the space of mass fragmentations of x. Essentially, a mass fragmentation of x is a fragmentation tree whose vertices are labeled by masses such that the children of a vertex comprise a ranked-mass partition of its parent vertex. Pk The case where children {m1, . . . , mk} of a vertex m satisfy i=1 mi < m is called a dissipative mass fragmentation. We are interested in conservative mass fragmentations which have the property that the children {m1, . . . , mk} of every vertex m ∈ Mx Pk satisfy i=1 mi = m. It is plain that Mx is isomorphic to M1 by scaling, i.e. Mx = xM1 and so it is sufficient to study M1. See Bertoin [25] for a study of Markov processes on M1 called fragmentation chains. We construct a Markov process on M1 which corresponds to the associated mass fragmentation valued process of the CP(ν)-ancestral branching Markov process on T (k). 83 Recall from definition 1.8.2 in section 1.8.1 that a partition B = {B1,B2,...} ∈ P is said to possess asymptotic frequency ||B|| if each of its blocks has asymptotic ↓ ↓ frequency and we write ||B|| := (||B1||,...) ∈ R , the decreasing rearrangement of block frequencies of B. According to Kingman’s correspondence (theorems 1.8.1 and 1.8.3) any infinitely exchangeable partition B of N possesses asymptotic frequencies which are distributed according to ν; where ν is the unique measure on R↓ such that B ∼ %ν. 3.6.1 Associated mass fragmentation process ↓(k) (k) Fix k ≥ 2 and let ν be a probability measure on R . Let M1 := {µ ∈ M1 : # frag(A) ≤ k for every A ∈ µ} be the subspace of conservative mass fragmentations (k) of 1 such that each A ∈ µ ∈ M1 has at most k children. (k) (k) Construct a Markov chain on M1 as follows. For µ ∈ M1 , the transition (k) u (k) µ 7→ µ˜ ∈ M1 is generated by an i.i.d. collection S := {s : u ∈ U} of ν mass u u u Qk ↓(k) partitions, i.e. each s := (s1, . . . , sk) ∈ i=1 R is an i.i.d. collection of mass partitions distributed according to ν and sw is independent of sv for all w 6= v, and an i.i.d. collection Σ := {σu : u ∈ U} of k-tuples of i.i.d. uniform permutations of [k]. (i) Write µ := {µu : u ∈ U} andµ ˜ := {µ˜u : u ∈ U}. (ii) Putµ ˜∅ = 1, the root ofµ ˜. (iii) Givenµ ˜u ∈ µ˜, putµ ˜uj equal to the jth largest column total of the matrix u u u s1. s2. . . . sk. u 1 u 1 u u 1 u u 1 u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 1,σ1 (1) 1,σ1 (2) 1,σ1 (k) u 2 u 2 u u 2 u u 2 u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 2,σ2 (1) 2,σ2 (2) 2,σ2 (k) . . . .. . . . . . . u k u k u u k u u k u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u k,σk (1) k,σk (2) k,σk (k) ↓ uj Pk u i u 1 k i.e.µ ˜ := i=1 µ˜ µ si,σu(m), m = 1, . . . , k , where µ , . . . , µ correspond i j to the mass fragmentation of the root of µ. 84 Definition 3.6.1. For a fragmentation tree T ∈ T , we write M(T ) to denote the associated mass fragmentation of T , i.e. the mass fragmentation of 1 obtained by replacing each child of T by its asymptotic frequency, if it exists. If M(T ) does not exist we put M(T ) = ∂, an extra point added to M1. The map M : T → M1 ∪ {∂} such that T 7→ M(T ) is a natural measurable mapping. For the remainder of this chapter, we equip M1 with the sigma field σ(M) generated S (k) by {m ∈ M1} ∪ {∂}, which makes the map M measurable between σ n Tn and σ(M1). Theorem 3.6.2. Let T := (Tn, n ≥ 0) be a CP(ν)-ancestral branching Markov chain with transition measure Qν(·, ·) on T (k), and with initial distribution τ, which is some infinitely exchangeable measure on T . Let µ := (µn, n ≥ 1) be the Markov chain on (k) M1 generated from the above procedure. Then M(T) =L µ. ν Moreover, the transition measure λ (·, ·) for µ ∈ M1 and C ∈ σ(M) is ν ν −1 λ (µ, C) = Q (Tµ, M (C)) −1 (k) where Tµ is any element of M (µ) := {T ∈ T : M(T ) = µ}. Proof. Fix k ≥ 2 and ν a probability measure on R↓(k) such that ν((1, 0,..., 0)) < 1. ν For T ∼ Q (·, ·) with initial distribution T0 ∼ τ, the set of children {t1, . . . , tm} of t forms an exchangeable partition of {t} ⊂ N, for every n ≥ 1 and t ∈ Tn, and so possesses asymptotic frequency ||t|| almost surely by Kingman’s correspondence. ν The construction of the Markov chain T with transition measure Q (·, ·) con- structed in section 3.5.1 can also be specified as follows. Let S := {su : u ∈ U} be the collection of mass partitions in the construction at the beginning of this section. Given u Q hQk (k)i u u u S, generate B := {B : u ∈ U} ∈ u∈U i=1 P by letting B := (B1 ,...,Bk ) and Bu ∼ % u independently of all other Bv. Constructed in this way, {Bu : u ∈ U} j sj i (k) is a collection of independent %ν partitions whose asymptotic frequencies satisfy u u u ||Bj || = sj almost surely. Furthermore, the unconditional distribution of each B is (k) %ν . 85 Next, we let Σ := {σu : u ∈ U} be a collection of i.i.d. k-tuples of i.i.d. uniform permutations of [k] and generate transitions of T from the construction of section 3.5.1 based on Σ and {Bu : u ∈ U}. Using the same Σ and S, generate a Markov chain ν µ on M1 as above. Then T is a Markov chain with transition measure Q (·, ·) on (k) S (k) T , σ n≥1 Tn and, furthermore, the associated mass fragmentation chain M(T) := (M(Tn), n ≥ 1) is equal to µ almost surely. By the construction of transitions on M1 at the beginning of this section, it is clear that µ is a Markov chain. Hence, the function M(T) is a Markov chain and so the result of Burke and Rosenblatt [36] states that it is necessary that the transition measure of M(T) satisfies Z ν −1 Q M (m, C) = Q(Tm, dt) M−1(C) −1 for all Tm ∈ M (m) := {T ∈ T : M(T ) = m} and C ∈ σ(M). Finally, since M(T) = µ almost surely and M is measurable, the transition measure ν ν ν −1 λ of µ on M1 satisfies λ = Q M . Corollary 3.6.3. The associated mass fragmentation process M(T) of a CP(ν) an- cestral branching process on T (k) exists almost surely. 3.6.2 Equilibrium measure As in section 3.5.2, suppose ν is non-degenerate at (1, 0,..., 0) ∈ R↓(k). Proposition ν 3.5.3 states that a Markov chain T := (Tn, n ≥ 1) governed by Q (·, ·) possesses a unique equilibrium measure ρν(·). The following theorem follows immediately from this fact and from theorem 3.6.2. Theorem 3.6.4. Let ν be a probability measure on R↓(k) such that ν((1, 0,..., 0)) < ν −1 1. The mass fragmentation chain µ := (µn, n ≥ 1) on M1 governed by Q M (·, ·) ν (k) possesses a unique stationary measure ζ (·). Moreover, for µ ∈ M1 , ν ν −1 ζ (µ) = ρ (M (µ)) 86 where ρν(·) is the unique equilibrium measure of Qν(·, ·) on T (k) from proposition 3.5.3. ν Proof. Let µ be a Markov chain on M1 with transition measure λ (·, ·) governed by the transition procedure at the beginning of section 3.6. By theorem 3.6.2, λν ≡ ν −1 ν Q M where Q (·, ·) is the transition measure of the CP(ν)-ancestral branching Markov chain on T (k) with unique equilibrium measure ρν(·) from proposition 3.5.3. It is shown in theorem 3.6.2 that µ is equal in distribution to the associated mass fragmentation chain of a Markov chain on T (k) governed by Qν(·, ·). Hence, for S (k) B ∈ σ n Tn Z ρν(B) = Qν(τ, B)ρν(dτ) T (k) and for C ∈ σ(M) ν −1 ν −1 ρ M (C) = ρ [M (C)] Z ν −1 ν = Q (τ, M (C))ρ (dτ) T (k) Z ν −1 ν −1 = Q M (µ, C)ρ M (dµ) M1 Z ν ν −1 = λ (µ, C)ρ M (dµ), M1 ν ν −1 ν which shows that ζ := ρ M is stationary for λ. Uniqueness of ρ implies unique- ness of ζν. 3.6.3 Poissonian construction Just as the CP(ν)-ancestral branching process on T (k) admits a Poissonian con- struction of the T (k)-valued process in continuous time (section 3.5.4), so does its associated mass fragmentation-valued process. ↓(k) u + Let ν be a probability measure on R . Let S = {(t, s ): u ∈ U} ⊂ R × Q hQk ↓(k)i N (k) u∈U i=1 R be a Poisson point process with intensity dt ⊗ u∈U ν where (k) Qk ↓(k) u u u ν := ν⊗· · ·⊗ν is the k-fold product measure on i=1 R and s := (s1, . . . , sk) ∈ Qk ↓(k) i=1 R for each u ∈ U. 87 Construct a Markov process µ := (µ(t), t ≥ 0) in continuous-time on M1 as follows. Let µ0 ∈ M1 be a random mass fragmentation. Put µ(0) = µ0 and • if t is not an atom time for S, µ(t) = µ(t−); u v w • if t is an atom time for S, generate Σt := {σ : u ∈ U} where σ and σ u u u are independent for all v 6= w and σ := (σ1 , . . . , σk ) is an i.i.d. sequence of uniform permutations of [k] for each u ∈ U. Given (t, su) ∈ S, σu and µ(t−) = {µu : u ∈ U}, put µ(t) = {µ˜u : u ∈ U} where 1)˜µ∅ = 1 and 2) givenµ ˜u, putµ ˜uj equal to the jth largest column total of the matrix u u ri s1. s2. . . . sk. u 1 u 1 u u 1 u u 1 u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 1,σ1 (1) 1,σ1 (2) 1,σ1 (k) u 2 u 2 u u 2 u u 2 u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u 2,σ2 (1) 2,σ2 (2) 2,σ2 (k) . . . .. . . . . . . u k u k u u k u u k u µ˜ µ µ˜ µ s u µ˜ µ s u ... µ˜ µ s u k,σk (1) k,σk (2) k,σk (k) ↓ uj Pk u i u i.e.µ ˜ := i=1 µ˜ µ si,σu(m), m = 1, . . . , k . i j Theorem 3.6.5. Let T := (T (t), t ≥ 0) be a CP(ν)-ancestral branching Markov process from section 3.5.3 and let X := (X(t), t ≥ 0) be the Markov process on M1 generated from the above Poisson point process, then M(T ) =L X. ↓(k) Proof. Let k ∈ N and ν be a measure on R . u + Q hQk ↓(k)i Let S = {(t, s ): u ∈ U} ⊂ R × u∈U i=1 R be a Poisson point N (k) process with intensity dt ⊗ u∈U ν as shown above and let X := (X(t), t ≥ 0) u be the process on M1 constructed above. Given S, generate P := {(t, B ): u ∈ + Q hQk (k)i u u U} ⊂ R × u∈U i=1 P where for each (t, s : u ∈ U) ∈ S we let B := k (Bu,...,Bu) ∈ Q P(k) be a k-tuple of partitions such that Bu ∼ % u for each 1 k i=1 i si i = 1, . . . , k and all components are independent. Thus, we have that P is a Poisson 88 + Q hQk (k)i N (k) point process on R × u∈U i=1 P with intensity measure dt ⊗ u∈U %ν . Given P and S, generate Σ := {σu : u ∈ U} independently of P and S such that σv w u u u and σ are independent for all v 6= w and each σ = (σ1 , . . . , σk ) is an i.i.d. collection of uniform permutations of [k]. (k) Let T := (T (t), t ≥ 0) be the process on T constructed from Σ and P , as shown in section 3.5.4, so that T is a CP(ν)-ancestral branching Markov process. Likewise, let X := (X(t), t ≥ 0) be the process on M1 constructed from Σ and S shown above. Now for any atom time t ≥ 0, let T (t−) = τ and T (t) =τ ˜. From section 3.5.1, let u ∈ U andτ ˜uj = CP(Π ,Bu, σu). Since τ andτ ˜ are infinitely exchangeable τ|τu u and #τ = ∞ almost surely, we have that Π = (Π ) u almost surely and hence τ|τu τ |τ (Πτ )|τu 6= 1τu almost surely. Therefore, we have k uj u \ [ i u τ˜ =τ ˜ (τ ∩ B u ) i,σi (j) i=1 for each u ∈ U and j = 1, . . . , k which has asymptotic frequency k k u X i u u X i u ||τ˜ || ||τ ||||B u || =µ ˜ µ s u a.s. i,σi (j) i,σi (j) i=1 i=1 Hence we have that µ = M(T) almost surely; and so, µ =L M(T). Corollary 3.6.6. The process M(T) := (M(T (t)), t ≥ 0) exists almost surely. 3.7 Weighted trees