School of Computer Science
Probabilistic Graphical Models
Dirichlet Process and Hiarchical DP
-- case studies in genetics and text analysis
Eric Xing Lecture 20, March 31, 2014
Reading:
© Eric Xing @ CMU, 2005-2014 1 Dirichlet Process
A CDF, G, on possible worlds 6 4 of random partitions follows a 5 3 Dirichlet Process if for any 2 1 measurable finite partition a distribution (1,2, .., m):
A discrete distribution over a continue space
(G(1), G(2), …, G(m) ) ~ another Dirichlet( G ( ), …., G0( ) ) distribution 0 1 m
where G0 is the base measure and is the scale parameter Thus a Dirichlet Process G defines a distribution of distribution
© Eric Xing @ CMU, 2005-2014 2 Stick-breaking Process
0 0.4 0.4 ∞ G ∑ k ( k ) k 1 0.6 0.5 0.3 G k ~ 0 ∞ Location 0.3 0.8 0.24 ∑ k 1 k 1 k -1
k k ∏ (1- k ) j 1 Mass G0 k ~ Beta(1,)
© Eric Xing @ CMU, 2005-2014 3 Chinese Restaurant Process
1 2
0 P( ci = k| c-i) = 1 0 1 1+ 1+ 0 1 1 2 + 2 + 2 + 1 2 3 + 3 + 3 +
m1 m2 .... i+ -1 i+ -1 i+ -1 CRP defines an exchangeable distribution on partitions over an (infinite) sequence of samples, such a distribution is formally known as the Dirichlet Process (DP) © Eric Xing @ CMU, 2005-2014 4 Graphical Model Representations of DP mixture
G0 G0
G
i yi
xi xi N N
The CRP construction The Stick-breaking construction
© Eric Xing @ CMU, 2005-2014 5 Case I: Ancestral Inference
Ak k
Hn1 Hn2
Gn N
Essentially a clustering problem, but …
Better recovery of the ancestors leads to better haplotyping results (because of more accurate grouping of common haplotypes)
True haplotypes are obtainable with high cost, but they can validate model more subjectively (as opposed to examining saliency of clustering)
Many other biological/scientific utilities © Eric Xing @ CMU, 2005-2014 6 Example: DP-haplotyper [Xing et al, 2004]
Clustering human populations
G 0 DP G
K infinite mixture components A (for population haplotypes)
Hn1 Hn2 Likelihood model (for individual G haplotypes and genotypes) n N
Inference: Markov Chain Monte Carlo (MCMC)
Metropolis Hasting
© Eric Xing @ CMU, 2005-2014 7 The DP Mixture of Ancestral Haplotypes
The customers around a table in CRP form a cluster
associate a mixture component (i.e., a population haplotype) with a table
sample {a, } at each table from a base measure G0 to obtain the population haplotype and nucleotide substitution frequency for that component
1 4 8 9 2 {A} {A} {A} {A} {A} {A} …
6 7 3 5
With p(h|{ }) and p(g|h1,h2), the CRP yields a posterior distribution on the number of population haplotypes (and on the haplotype configurations and the nucleotide substitution frequencies)
© Eric Xing @ CMU, 2005-2014 8 Inheritance and Observation Models
Single-locus mutation model A1 AC Hi A2 Ancestral ie e C i1 A3 pool for h a … t t PH (ht | at , ) 1 C i2 for ht at | B |1 h a with prob. t t H i1 Haplotypes H Noisy observation model i2 H ,H G i 1 i 2 i
P ( g | h , h ) : G 1 2 G Genotype g t h 1,t h 2,t with prob. i
© Eric Xing @ CMU, 2005-2014 9 MCMC for Haplotype Inference
Gibbs sampling for exploring the posterior distribution under the proposed model Integrate out the parameters such as or , and sample c , a i e k and h ie
p(c k | c ,h,a) p(c k | c ) p(h | a h ,c) ie [ie ] ie [ie ] ie k, [ie ] Posterior Prior x Likelihood
CRP
Gibbs sampling algorithm: draw samples of each random variable to be sampled given values of all the remaining variables
© Eric Xing @ CMU, 2005-2014 10 MCMC for Haplotype Inference
(j) 1. Sample cie , from
2. Sample ak from
(j) 3. Sample hie from
For DP scale parameter : a vague inverse Gamma prior
© Eric Xing @ CMU, 2005-2014 11 Convergence of Ancestral Inference
© Eric Xing @ CMU, 2005-2014 12 DP vs. Finite Mixture via EM
0.45 0.4 0.35 0.3 0.25 Series1DP 0.2 Series2EM 0.15
individual error error individual 0.1 0.05 0 12345 data sets
© Eric Xing @ CMU, 2005-2014 13 Multi-population Genetic Demography
Pool everything together and solve 1 hap problem?
--- ignore population structures
Solve 4 hap problems separately?
--- data fragmentation
Co-clustering … solve 4 coupled hap problems jointly
© Eric Xing @ CMU, 2005-2014 14 Solving Multiple Clustering Problems ?
G ~ DP( G0 ) G ~ DP( G0 ) G ~ DP( G0 )
… … … Nature articles PNAS articles Science articles
G ~ DP( G0 ) 1 3 2 4 5 6 ?
G ~ DP( G0 ) © Eric Xing @ CMU, 2005-2014 15 How? And what is the challenge?
How to share mixture components?
Because the base measure is continuous, we have zero probability of picking the same component twice.
If we want to pick the same topic twice, we need to use a discrete base measure.
For example, if we chose the base measure to be
We want there to be an infinite number of topics, so we want an infinite, discrete base measure.
We want the location of the topics to be random, so we want an infinite, discrete, random base measure.
© Eric Xing @ CMU, 2005-2014 16 Hierarchical Dirichlet Process (Teh et al, 2006)
Solution: Sample the base measure from a Dirichlet process!
© Eric Xing @ CMU, 2005-2014 17 Hierarchical Dirichlet Process [Teh et al., 2005, Xing et al. 2005]
Two level CRP scheme: The Chinese restaurant franchise
At the i-th step in j-th "group",
Oracle m jk Choose k with prob. n m k k jk 0 Choose k with prob. n Gototheupper level DP k Drawa new sample with prob. 0 m with prob. k jk 0 nk
© Eric Xing @ CMU, 2005-2014 18 Chinese restaurant franchise
Imagine a franchise of restaurants, serving an infinitely large, global menu.
Each table in each restaurant orders a single dish.
Let nrt be the number of customers in restaurant r sitting at table t.
Let mrd be the number of tables in restaurant r serving dish d.
Let m.d be the number of tables, across all restaurants, serving dish d.
© Eric Xing @ CMU, 2005-2014 19 Recall: Graphical Model Representations of DP
G0 G0
G
i yi
xi xi N N
The Pólya urn construction The Stick-breaking construction
© Eric Xing @ CMU, 2005-2014 20 Hierarchical DP Mixture
H
G0 H
Gj j
i yi
xi xi N N J J ~ H k ∞ Stick( , ): Stick( ), G0 ∑ k ( k) k 1 k ' ~ Beta , , ' ' . k 1 jk k 1 l 1 l jk jk 1 jl ∞ l 1 j Stick( , ), Gj ∑ k ( k) k 1 © Eric Xing @ CMU, 2005-2014 21 Results - Simulated Data
5 populations with 20 individuals each (two kinds of mutation rates)
5 populations share parts of their ancestral haplotypes
the sequence length = 10
Haplotype error
© Eric Xing @ CMU, 2005-2014 22 Results - International HapMap DB
Different sample sizes, and different # of sub-populations
© Eric Xing @ CMU, 2005-2014 23 Constructing a topic model with infinitely many topics
LDA: Each distribution is associated with a distribution over K topics.
Problem: How to choose the number of topics?
Solution:
Infinitely many topics!
Replace the Dirichlet distribution over topics with a Dirichlet process!
Problem: We want to make sure the topics are shared between documents
© Eric Xing @ CMU, 2005-2014 24 Infinite Topic Models
H H H Stick Dirichlet breaking
j j j k
zi zi zi
wi wi wi N N N J A single image A single image J images with k topic with inf-topic with inf-topic
An LDA A DP An HDP
© Eric Xing @ CMU, 2005-2014 An infinite topic model
Restaurants = documents; dishes = topics.
Let H be a V-dimensional Dirichlet distribution, so a sample from H is a distribution over a vocabulary of V words.
Sample a global distribution over topics,
For each document m=1,…,M
Sample a distribution over topics, Gm~DP(γ,G0).
For each word n=1,…,Nm
Sample a topic ϕmn~Discrete(G0).
Sample a word wmk~Discrete(ϕmn).
© Eric Xing @ CMU, 2005-2014 26 The “right” number of topics
© Eric Xing @ CMU, 2005-2014 27 Dynamic Dirichlet Process
Two Main Ideas:
Infinite HMM: a hidden Markov DP (see appendix)
Dependent DP/HDP: directly evolving a DP/HDP
© Eric Xing @ CMU, 2005-2014 Evolutionary Clustering
Adapts the number of mixture components over time
Mixture components can die out
New mixture components are born at any time
Retained mixture components parameters evolve according to a Markovian dynamics
Phy Topics Bio CS Research Papers
1900 2000
© Eric Xing @ CMU, 2005-2014 The Chinese Restaurant Process
Customers correspond to data points
Tables correspond to clusters/mixture components
Dishes correspond to parameter of the mixtures
1 2
© Eric Xing @ CMU, 2005-2014 Temporal DPM [Ahmed and Xing 2008]
The Recurrent Chinese Restaurant Process
The restaurant operates in epochs
The restaurant is closed at the end of each epoch
The state of the restaurant at time epoch t depends on that at time epoch t-1
Can be extended to higher-order dependencies.
© Eric Xing @ CMU, 2005-2014 T=1 1,1 2,1 3,1
Dish eaten at table 3 at time epoch 1 OR the parameters of cluster 3 at time epoch 1
Generative Process ‐Customers at time T=1 are seated as before:
‐ Choose table j Nj,1 and Sample xi ~ f(j,1) ‐ Choose a new table K+1
‐ Sample K+1,1 ~ G0 and Sample xi ~ f(K+1,1)
© Eric Xing @ CMU, 2005-2014 T=1 1,1 2,1 3,1
N1,1=2 N2,1=3 N3,1=1 T=2
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1
1 3 6 6 2 6 6
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1
2 6
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,1 3,1
Sample 1,2 ~ P(.| 1,1)
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,1 3,1
1 3 6 1 6 1 1 2 6 1 6 1
And so on ……
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,2 3,1 4,2
Died out cluster Newly born cluster
At the end of epoch 2
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,2 3,1 4,2
N1,2=2 N2,2=2 N4,2=1 T=3
© Eric Xing @ CMU, 2005-2014 Temporal DPM
Can be extended to model higher-order dependencies
Can decay dependencies over time
Pseudo-counts for table k at time t is
W w e Nk ,tw w1 History size
Decay factory Number of customers sitting at table K at time epoch t‐w
© Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1
N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,2 3,1 4,2
N2,3 T=3 1,2 2,2 4,2
N2,3 =
© Eric Xing @ CMU, 2005-2014 TDPM Generative Power
DPM W=T
TDPM Power‐law curve W=4
Independent DPMs W= 0 any)
© Eric Xing @ CMU, 2005-2014 Results: NIPS 12
Building a simple dynamic topic model
Chain dynamics is as before
Emission model for document xk,t is:
Project k,t over the simplex
Sample xk,t|ct,i ~ Multinomial(.|Logisitic(k,t)) Unlike LDA here a document belongs to one topic
Use this model to analyze NIPS12 corpus
Proceeding of NIPS conference 1987-1999
© Eric Xing @ CMU, 2005-2014 © Eric Xing @ CMU, 2005-2014 The Big Picture Time
K‐means DynamicFixed-dimensions clustering Dynamic clustering
ci
xi N
DPM TDPM G
Dimension
ci
Model xi N
© Eric Xing @ CMU, 2005-2014 Summary
A non-parametric Bayesian model for Pattern Uncovery
Finite mixture model of latent patterns (e.g., image segments, objects) infinite mixture of propotypes: alternative to model selection hierarchical infinite mixture temporal infinite mixture model
Applications in general data-mining …
© Eric Xing @ CMU, 2005-2014 Appendix:
What if we have an HMM with unknown # of states?
E.g., “recombination” over unknown number of chromosomes?
© Eric Xing @ CMU, 2005-2014 48 A common inheritance model to begin with …
Each individual haplotype is a mosaic of ancestral haplotypes
Ancestral chromosomes (K=5)
x x Individual x chromosomes x x x
x x x x x
© Eric Xing @ CMU, 2005-2014 49 The Hidden Markov Model
Ct+1 1 2 … K
1
2 Ct : .
K
A 1 Transition process: recombination A2 C Ancestral i1 A3 pool dr dr p(c i t, 1 k'| c i t, k) e k k, ' (1 e ) (k, k') … C Emission process: mutation i2 I ( hi,t ak ,t ) H I (h a ) 1 i1 p(h | a , ) i,t k ,t k Haplotypes i ,t k ,t k k H | B | 1 i2
How many recombining ancestors? Gi Genotype
© Eric Xing @ CMU, 2005-2014 50 Hidden Markov Dirichlet Process
(Xing and Sohn, Bayesian Analysis, 2007)
Hidden Markov Dirichlet process mixtures Extension of HMM model to infinite ancestral space Infinite dimensional transition matrix Each row of the transition matrix is modeled with a DP
Ct+1 123...
1
2 C t 3 … … : .
© Eric Xing @ CMU, 2005-2014 51 HMDP as a Graphical Model
Ancestor allele reconstruction
Inferring population structure
Inferring recombination hotspot A H ....
Cy1 Cy2 Cy3 … CyNN H
© Eric Xing @ CMU, 2005-2014 52 Recombination Analysis
CEU YRI HCB+JPT HapMap4
© Eric Xing @ CMU, 2005-2014 53