
School of Computer Science Probabilistic Graphical Models Dirichlet Process and Hiarchical DP -- case studies in genetics and text analysis Eric Xing Lecture 20, March 31, 2014 Reading: © Eric Xing @ CMU, 2005-2014 1 Dirichlet Process A CDF, G, on possible worlds 6 4 of random partitions follows a 5 3 Dirichlet Process if for any 2 1 measurable finite partition a distribution (1,2, .., m): A discrete distribution over a continue space (G(1), G(2), …, G(m) ) ~ another Dirichlet( G ( ), …., G0( ) ) distribution 0 1 m where G0 is the base measure and is the scale parameter Thus a Dirichlet Process G defines a distribution of distribution © Eric Xing @ CMU, 2005-2014 2 Stick-breaking Process ∞ 0 0.4 0.4 G ∑ k (k ) k 1 0.6 0.5 0.3 k ~ G0 ∞ Location 0.3 0.8 0.24 ∑ k 1 k 1 k -1 k k ∏ (1 - k ) j 1 Mass G0 k ~ Beta(1, ) © Eric Xing @ CMU, 2005-2014 3 Chinese Restaurant Process 1 2 P(ci = k | c-i ) = 1 0 0 1 1+ 1+ 0 1 1 2 + 2 + 2 + 1 2 3 + 3 + 3 + m1 m2 .... i + -1 i + -1 i + -1 CRP defines an exchangeable distribution on partitions over an (infinite) sequence of samples, such a distribution is formally known as the Dirichlet Process (DP) © Eric Xing @ CMU, 2005-2014 4 Graphical Model Representations of DP mixture G0 G0 G i yi xi xi N N The CRP construction The Stick-breaking construction © Eric Xing @ CMU, 2005-2014 5 Case I: Ancestral Inference Ak k Hn1 Hn2 Gn N Essentially a clustering problem, but … Better recovery of the ancestors leads to better haplotyping results (because of more accurate grouping of common haplotypes) True haplotypes are obtainable with high cost, but they can validate model more subjectively (as opposed to examining saliency of clustering) Many other biological/scientific utilities © Eric Xing @ CMU, 2005-2014 6 Example: DP-haplotyper [Xing et al, 2004] Clustering human populations G 0 DP G K infinite mixture components A (for population haplotypes) Hn1 Hn2 Likelihood model (for individual G haplotypes and genotypes) n N Inference: Markov Chain Monte Carlo (MCMC) Gibbs sampling Metropolis Hasting © Eric Xing @ CMU, 2005-2014 7 The DP Mixture of Ancestral Haplotypes The customers around a table in CRP form a cluster associate a mixture component (i.e., a population haplotype) with a table sample {a, } at each table from a base measure G0 to obtain the population haplotype and nucleotide substitution frequency for that component 1 4 8 9 2 {A} {A} {A} {A} {A} {A} … 6 7 3 5 With p(h|{ }) and p(g|h1,h2), the CRP yields a posterior distribution on the number of population haplotypes (and on the haplotype configurations and the nucleotide substitution frequencies) © Eric Xing @ CMU, 2005-2014 8 Inheritance and Observation Models Single-locus mutation model A1 AC H i A2 Ancestral ie e C i1 A3 pool for h a … t t P (h | a , ) C H t t 1 i2 for ht at | B | 1 h a with prob. t t H i1 Haplotypes H Noisy observation model i2 H , H G i1 i2 i PG (g | h1 , h2 ) : G Genotype gt h1,t h2,t with prob. i © Eric Xing @ CMU, 2005-2014 9 MCMC for Haplotype Inference Gibbs sampling for exploring the posterior distribution under the proposed model Integrate out the parameters such as or , and sample c , a ie k and h ie p(c k | c ,h,a) p(c k | c ) p(h | a h ,c) ie [ie ] ie [ie ] ie k, [ie ] Posterior Prior x Likelihood CRP Gibbs sampling algorithm: draw samples of each random variable to be sampled given values of all the remaining variables © Eric Xing @ CMU, 2005-2014 10 MCMC for Haplotype Inference (j) 1. Sample cie , from 2. Sample ak from (j) 3. Sample hie from For DP scale parameter : a vague inverse Gamma prior © Eric Xing @ CMU, 2005-2014 11 Convergence of Ancestral Inference © Eric Xing @ CMU, 2005-2014 12 DP vs. Finite Mixture via EM 0.45 0.4 0.35 0.3 0.25 Series1DP 0.2 Series2EM 0.15 individual error error individual 0.1 0.05 0 12345 data sets © Eric Xing @ CMU, 2005-2014 13 Multi-population Genetic Demography Pool everything together and solve 1 hap problem? --- ignore population structures Solve 4 hap problems separately? --- data fragmentation Co-clustering … solve 4 coupled hap problems jointly © Eric Xing @ CMU, 2005-2014 14 Solving Multiple Clustering Problems ? G ~ DP(G0 ) G ~ DP(G0 ) G ~ DP(G0 ) … … … Nature articles PNAS articles Science articles G ~ DP(G0 ) 1 3 2 4 5 6 ? G ~ DP(G0 ) © Eric Xing @ CMU, 2005-2014 15 How? And what is the challenge? How to share mixture components? Because the base measure is continuous, we have zero probability of picking the same component twice. If we want to pick the same topic twice, we need to use a discrete base measure. For example, if we chose the base measure to be We want there to be an infinite number of topics, so we want an infinite, discrete base measure. We want the location of the topics to be random, so we want an infinite, discrete, random base measure. © Eric Xing @ CMU, 2005-2014 16 Hierarchical Dirichlet Process (Teh et al, 2006) Solution: Sample the base measure from a Dirichlet process! © Eric Xing @ CMU, 2005-2014 17 Hierarchical Dirichlet Process [Teh et al., 2005, Xing et al. 2005] Two level CRP scheme: The Chinese restaurant franchise At the i-th step in j-th "group", Oracle m jk Choosek with prob. n m k k jk 0 Choosek with prob. n Gototheupper level DP k Drawa new sample with prob. 0 m with prob. k jk 0 nk © Eric Xing @ CMU, 2005-2014 18 Chinese restaurant franchise Imagine a franchise of restaurants, serving an infinitely large, global menu. Each table in each restaurant orders a single dish. Let nrt be the number of customers in restaurant r sitting at table t. Let mrd be the number of tables in restaurant r serving dish d. Let m.d be the number of tables, across all restaurants, serving dish d. © Eric Xing @ CMU, 2005-2014 19 Recall: Graphical Model Representations of DP G0 G0 G i yi xi xi N N The Pólya urn construction The Stick-breaking construction © Eric Xing @ CMU, 2005-2014 20 Hierarchical DP Mixture H G0 H Gj j i yi xi xi N N J J k ~ H ∞ Stick( , ) : Stick( ), G0 ∑k (k ) k 1 k k 1 ' ~ Beta , 1 , ' 1 ' . ∞ jk k l 1 l jk jk jl l 1 j Stick( , ),Gj ∑ k (k ) k 1 © Eric Xing @ CMU, 2005-2014 21 Results - Simulated Data 5 populations with 20 individuals each (two kinds of mutation rates) 5 populations share parts of their ancestral haplotypes the sequence length = 10 Haplotype error © Eric Xing @ CMU, 2005-2014 22 Results - International HapMap DB Different sample sizes, and different # of sub-populations © Eric Xing @ CMU, 2005-2014 23 Constructing a topic model with infinitely many topics LDA: Each distribution is associated with a distribution over K topics. Problem: How to choose the number of topics? Solution: Infinitely many topics! Replace the Dirichlet distribution over topics with a Dirichlet process! Problem: We want to make sure the topics are shared between documents © Eric Xing @ CMU, 2005-2014 24 Infinite Topic Models H H H Stick Dirichlet breaking j j j k zi zi zi wi wi wi N N N J A single image A single image J images with k topic with inf-topic with inf-topic An LDA A DP An HDP © Eric Xing @ CMU, 2005-2014 An infinite topic model Restaurants = documents; dishes = topics. Let H be a V-dimensional Dirichlet distribution, so a sample from H is a distribution over a vocabulary of V words. Sample a global distribution over topics, For each document m=1,…,M Sample a distribution over topics, Gm~DP(γ,G0). For each word n=1,…,Nm Sample a topic ϕmn~Discrete(G0). Sample a word wmk~Discrete(ϕmn). © Eric Xing @ CMU, 2005-2014 26 The “right” number of topics © Eric Xing @ CMU, 2005-2014 27 Dynamic Dirichlet Process Two Main Ideas: Infinite HMM: a hidden Markov DP (see appendix) Dependent DP/HDP: directly evolving a DP/HDP © Eric Xing @ CMU, 2005-2014 Evolutionary Clustering Adapts the number of mixture components over time Mixture components can die out New mixture components are born at any time Retained mixture components parameters evolve according to a Markovian dynamics Phy Topics Bio CS Research Papers 1900 2000 © Eric Xing @ CMU, 2005-2014 The Chinese Restaurant Process Customers correspond to data points Tables correspond to clusters/mixture components Dishes correspond to parameter of the mixtures 1 2 © Eric Xing @ CMU, 2005-2014 Temporal DPM [Ahmed and Xing 2008] The Recurrent Chinese Restaurant Process The restaurant operates in epochs The restaurant is closed at the end of each epoch The state of the restaurant at time epoch t depends on that at time epoch t-1 Can be extended to higher-order dependencies. © Eric Xing @ CMU, 2005-2014 T=1 1,1 2,1 3,1 Dish eaten at table 3 at time epoch 1 OR the parameters of cluster 3 at time epoch 1 Generative Process ‐Customers at time T=1 are seated as before: ‐ Choose table j Nj,1 and Sample xi ~ f(j,1) ‐ Choose a new table K+1 ‐ Sample K+1,1 ~ G0 and Sample xi ~ f(K+1,1) © Eric Xing @ CMU, 2005-2014 T=1 1,1 2,1 3,1 N1,1=2 N2,1=3 N3,1=1 T=2 © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1 © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1 1 3 6 6 2 6 6 © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,1 2,1 3,1 2 6 © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,1 3,1 Sample 1,2 ~ P(.| 1,1) © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,2 2,1 3,1 1 3 6 1 6 1 1 2 6 1 6 1 And so on …… © Eric Xing @ CMU, 2005-2014 1,1 2,1 3,1 T=1 N1,1=2 N2,1=3 N3,1=1 T=2 1,2
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages53 Page
-
File Size-