Random Permutations and Partition Models

i i “-Random_Permutations_and_Partition_Models”—//—:—page—# Metadata of the chapter that will be visualized online Book Title International Encyclopedia of Statistical Science Book CopyRight - Year Copyright Holder Springer-Verlag Title Random Permutations and Partition Models Author Degree Given Name Peter Particle Family Name McCullagh Suffix Phone Fax Email Affiliation Role John D. MacArthur Distinguished Service Professor Division Organization University of Chicago Country USA UNCORRECTED PROOF i i i i “-Random_Permutations_and_Partition_Models”—//—:—page—# R − In matrix notation, Bσ = σBσ ,sotheactionbyconju- Random Permutations and gation permutes both the rows and columns of B in the Partition Models same way. The block sizes are preserved and are maximally Peter McCullagh invariant under conjugation. In this way, the partitions [ ] John D. MacArthur Distinguished Service Professor of may be grouped into five orbits or equivalence University of Chicago, USA classes as follows: , ∣ [],∣ [],∣∣ [],∣∣∣. Set Partitions Thus, for example, ∣ is the representative element for ≥ [ ]={ } For n , a partition B of the finite set n ,...,n is one orbit, which also includes ∣ and ∣. ● A collection B ={b,...} of disjoint non-empty sub- The symbol #B applied to a set denotes the number of sets, called blocks, whose union is [n] elements, so #B is the number of blocks, and #b is the size ∈ E ● An equivalence relation or Boolean function B∶[n]× of block b B.If n is the set of equivalence relations on [ ] [ ] E [n]→{, } that is reflexive, symmetric and transitive n , or the set of partitions of n ,thefirstfewvaluesof# n E ● AsymmetricBooleanmatrixsuchthatBij = ifi, j are,,,,,calledBellnumbers.Moregenerally,# n belong to the same block is the nth moment of the unit Poisson distribution whose exponential generating function is exp(et − ).Inthedis- These equivalent representations are not distinguished in cussion of explicit probability models on En,itishelpfulto the notation, so B is a set of subsets, a matrix, a Boolean use the ascending and descending factorial symbols function, or a subset of [n]×[n], as the context demands. In ↑ practice, a partition is sometimes written in an abbreviated α r = α(α + )⋯(α + r − )= Γ(r + α)/Γ(α) form, such as B = ∣ for a partition of [].Inthisnotation, ↓r = ( − )⋯( − + ) the five partitions of [] are k k k k r ↓ , ∣, ∣, ∣, ∣∣. for integer r ≥ . Note that k r = for positive integers ↑ r > k.Byconventionα = . The blocks are unordered,UNCORRECTED so∣ is the same partition as PROOF ∣and∣. ∗ A partition B is a sub-partition of B if each block of Partition Model ∗ The term partition model refers to a probability distribu- B is a subset of some block of B or, equivalently, if Bij = ∗ E = tion, or family of probability distributions, on the set n implies Bij . This relationship is a partial order denoted ∗ ∗ of partitions of [n]. In some cases, the probability is con- by B ≤ B , which can be interpreted as B ⊂ B if each parti- centrated on the the subset E k ⊂E of partitions having tion is regarded as a subset of [n]. The partition lattice E n n n E ( )= [ ] k or fewer blocks. A distribution on n such that pn B is the set of partitions of n with this partial order. To each − ′ p (σBσ ) for every permutation σ∶[n]→[n] is said to pair of partitions B, B there corresponds a greatest lower n ′ be finitely exchangeable. Equivalently, p is exchangeable bound B ∧ B , which is the set intersection or Hadamard n if p (B) depends only on the block sizes of B. component-wise matrix product. The least upper bound n ′ Historically, the most important examples are Dirichlet- B ∨ B is the least element that is greater than both, the ′ multinomial random partitions generated for fixed k in transitive completion of B ∪ B . The least element of En is three steps as follows. the partition n with n singleton blocks, and the greatest element is the single-block partition denoted by n. ● First generate the random probability vector π = ∶[ ]→[] A permutation σ n n induces an action (π,...,πk) from the Dirichlet distribution with ↦ σ σ ( )= ( ( ) ( )) B B by composition such that B i, j B σ i , σ j . parameter (θ,...,θk). Miodrag Lovric (ed.), International Encyclopedia of Statistical Science, DOI ./----, © Springer-Verlag Berlin Heidelberg i i i i “-Random_Permutations_and_Partition_Models”—//—:—page—# R Random Permutations and Partition Models ● Given π,thesequenceY,...,Yn, . is independent lattice. These deletion maps make the sets {E, E,...} into and identically distributed, each component taking aprojectivesystem values in {,...,k} with probability π.Eachsequence Dn+ Dn ⋯E + → E → E − ⋯ of length n in which the value r occurs nr ≥ timeshas n n n probability =( ) ↑ Afamilyp p, p,... in which pn is a probability ( ) ∏k nj Γ θ. j= θj distribution on En is said to be mutually consistent, or , Γ(n + θ.) Kolmogorov-consistent, if each pn− is the marginal dis- = ∑ tribution obtained from pn under deletion of element n where θ. θj. − [ ] ( )= ( ) ● Now forget the labels , . , k and consider only the from the set n .Inotherwords,pn− A pn Dn A for A ⊂En−. Kolmogorov consistency guarantees the exis- partition B generated by the sequence Y,i.e.Bij = tence of a random partition of the natural numbers whose if Yi = Yj. The distribution is exchangeable, but an explicit simple formula is available only for the uni- finite restrictions are distributed as pn. The partition is infinitely exchangeable if each pn is finitely exchangeable. form case θj = λ/k, which is now assumed. The number ↓ of sequences generating the same partition B is k #B, Some authors, for example Kingman (), refer to p as a and these have equal probability in the uniform case. partition structure. Consequently, the induced partition has probability An exchangeable partition process may be generated from an exchangeable sequence Y, Y,... bythetrans- ↑#b ↓ ( ) ∏ ( / ) #B Γ λ b∈B λ k formation Bij = ifYi = Yj and zero otherwise. The pnk(B,λ)=k ,() Γ(n + λ) Dirichlet-multinomial and the Ewens processes are gener- called the uniform Dirichlet-multinomial partition ated in this way. Kingman’s () paintbox construction ↓ distribution. The factor k #B ensures that partitions shows that every exchangeable partition process may be having more than k blocks have zero probability. generated from an exchangeable sequence in this manner. Let B be an infinitely exchangeable partition, In the limit as k →∞, the uniform Dirichlet-multinomial ∗ B[n]∼pn,letB be a fixed partition in En,andsup- ∗ partition becomes pose that the event B[n]≤B occurs. Then B[n] lies ∗ #B [ ] [ ]= λ ∏ ∈ Γ(#b) in the lattice interval n, B , which means that B n ( )= b B pn B,λ ↑n .()B[b]∣B[b]∣ . is the concatenation of partitions of the λ ∗ ∗ blocks b ∈ B .Foreachblockb ∈ B ,therestrictionB[b] is This is the celebrated Ewens distribution, or Ewens sam- distributed as p#b, so it is natural to ask whether, and under pling formula, which arises in population genetics as the ∗ what conditions, the blocks of B are partitioned indepen- partition generated by allele type in a population evolving ∗ dently given B[n]≤B . Conditional independence implies according to the Fisher–Wright model by random muta- that tion with no selective advantage of allele types (Ewens ∗ pn(B ∣ B[n]≤B )= ∏ p#b(B[b]),() ). The preceding derivation, a version of which can be ∗ UNCORRECTED PROOF b∈B foundinChap.ofKingman(),goesbacktoWatter- which is a type of non-interference or lack-of-memory son (). The Ewens partition is the same as the partition property not dissimilar to that of the exponential distribu- generated by a sequence drawn according to the Blackwell- tion on the real line. It is straightforward to check that the McQueen urn scheme (Blackwell and MacQueen ). condition is satisfied by () but not by (). Aldous () Although the derivation makes sense only if k is a shows that conditional independence uniquely character- positve integer, the distribution () is well defined for neg- izes the Ewens family. ative values −λ < k < . For a discussion of this and the connection with GEM distributions and Poisson-Dirichlet Chinese Restaurant Process distributions, see Pitman (, Sect. .). A partition process is a random partition B ∼ p of a count- ably infinite set {u, u,...}, and the restriction B[n] of B Partition Processes and Partition to {u,...,un} is distributed as pn. The conditional distri- Structures bution of B[n + ] given B[n] is determined by the proba- Deletion of element n from the set [n], or deletion of the bilities assigned to those events in En+ that are consistent last row and column from B ∈En, determines a map Dn∶En →En−, a projection from the larger to the smaller i i i i “-Random_Permutations_and_Partition_Models”—//—:—page—# Random Permutations and Partition Models R with B[n],i.e.theeventsun+ ↦ b for b ∈ B and b = /.For with probability () or (), as determined by the partition the uniform Dirichlet-multinomial model (), these are process. If the table is occupied, the new arrival sits to the left of one customer selected uniformly at random from the ⎪⎧ ⎪(#b + λ/k)/(n + λ) b ∈ B table occupants. The random permutation thus generated pr(u + ↦ b ∣ B[n]=B)=⎨ n ⎪ ↦ ( ) ( ) ⎩⎪ λ( − #B/k)/(n + λ) b = /. is j σ j from j to the left neighbour σ j . Provided that the partition process is consistent and () exchangeable, the distributions p on permutations of [n] In the limit as k →∞,weobtain n are exchangeable and mutually consistent under the pro- ⎧ → ⎪#b/(n + λ) b ∈ B jection Πn Πn− on permutations in which element ( ↦ ∣ [ ]= )=⎨ pr un+ b B n B () n is deleted from the cyclic representation (Pitman , ⎪ /( + ) = / ⎩ λ n λ b , Sect. .). In this way, every infinitely exchangeable ran- which is the conditional probability for the Ewens process.

Random Permutations and Partition Models

John Ashworth Nelder: 8 October 1924 – 7 August 2010

Strength in Numbers: the Rising of Academic Statistics Departments In

Statistics Making an Impact

IMS Bulletin 33(5)

Probability, Analysis and Number Theory

Newsletter December 2012

Australasian Applied Statistics Conference, 2018 and Pre-Conference Workshops

“I Didn't Want to Be a Statistician”

Some Statistical Heresies

Alan Agresti

Historia Del Razonamiento Estadístico

2.4 Combining Evidence: Purpose Is Everything (Stephen Senn)