Probabilistic Circuits: Representations, Inference, Learning and Theory
Total Page:16
File Type:pdf, Size:1020Kb
Inference Probabilistic Representations Learning Circuits Theory Antonio Vergari YooJung Choi University of California, Los Angeles University of California, Los Angeles Robert Peharz Guy Van den Broeck TU Eindhoven University of California, Los Angeles January 7th, 2021 - IJCAI-PRICAI 2020 Fully factorized NaiveBayes AndOrGraphs PDGs Trees PSDDs CNets LTMs SPNs NADEs Thin Junction Trees ACs MADEs MAFs VAEs DPPs FVSBNs TACs IAFs NAFs RAEs Mixtures BNs NICE FGs GANs RealNVP MNs The Alphabet Soup of probabilistic models 2/153 Fully factorized NaiveBayes AndOrGraphs PDGs Trees PSDDs CNets LTMs SPNs NADEs Thin Junction Trees ACs MADEs MAFs VAEs DPPs FVSBNs TACs IAFs NAFs RAEs Mixtures BNs NICE FGs GANs RealNVP MNs Intractable and tractable models 3/153 Fully factorized NaiveBayes AndOrGraphs PDGs Trees PSDDs CNets LTMs SPNs NADEs Thin Junction Trees ACs MADEs MAFs VAEs DPPs FVSBNs TACs IAFs NAFs RAEs Mixtures BNs NICE FGs GANs RealNVP MNs tractability is a spectrum 4/153 Fully factorized NaiveBayes AndOrGraphs PDGs Trees PSDDs CNets LTMs SPNs NADEs Thin Junction Trees ACs MADEs MAFs VAEs DPPs FVSBNs TACs IAFs NAFs RAEs Mixtures BNs NICE FGs GANs RealNVP MNs Expressive models without compromises 5/153 Fully factorized NaiveBayes AndOrGraphs PDGs Trees PSDDs CNets LTMs SPNs NADEs Thin Junction Trees ACs MADEs MAFs VAEs DPPs FVSBNs TACs IAFs NAFs RAEs Mixtures BNs NICE FGs GANs RealNVP MNs a unifying framework for tractable models 6/153 Why tractable inference? or expressiveness vs tractability 7/153 Why tractable inference? or expressiveness vs tractability Probabilistic circuits a unified framework for tractable probabilistic modeling 7/153 Why tractable inference? or expressiveness vs tractability Probabilistic circuits a unified framework for tractable probabilistic modeling Learning circuits learning their structure and parameters from data 7/153 Why tractable inference? or expressiveness vs tractability Probabilistic circuits a unified framework for tractable probabilistic modeling Learning circuits learning their structure and parameters from data Advanced representations tracing the boundaries of tractability and connections to other formalisms 7/153 Why tractable inference? or the inherent trade-off of tractability vs. expressiveness Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? © fineartamerica.com 9/153 Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? q2: Which day is most likely to have a traffic jam on my route to campus? © fineartamerica.com 9/153 Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? q2: Which day is most likely to have a traffic jam on my route to campus? How to answer several of these probabilistic queries? © fineartamerica.com 9/153 “What is the most likely street to have a traffic jam at 12.00?” q1? answering queries… 10/153 “What is the most likely street to have a traffic jam at 12.00?” X1 X2 X3 X4 X5 x8 q1? x7 x6 x5 x4 x3 x2 x1 answering queries… 10/153 “What is the most likely street to have a traffic jam at 12.00?” X1 X2 X3 X4 X5 x8 q1(m1)? ≈ x7 x6 pm1 (Y j X) x5 x4 x3 x2 x1 …by fitting predictive models! 10/153 “What is the most likely street to have a traffic jam at 12.00?” X1 X2 X3 X4 X5 x8 q1(m1)? ≈ x7 x6 pm1 (Y j X) x5 x4 x3 x2 x1 …by fitting predictive models! 10/153 “What is the most likely time to see a traffic jam at Sunset Blvd.?” X1 X2 X3 X4 X5 x8 q1(m1)? x7 x6 x5 x4 x3 x2 q2(m2)? ≈ x1 pm2 (Y j X) …by fitting predictive models! 10/153 “What is the probability of a traffic jam on Westwood Blvd. on Monday?” X1 X2 X3 X4 X5 x8 q1(m1)? x7 x6 x5 q3(m?)? pm? (Y) ? x4 x3 x2 q2(m2)? x1 …by fitting predictive models! 10/153 q1(m)? X1 X2 X3 X4 X5 q2(m)? x8 ≈ x7 ... x6 x5 qk(m)? pm(X) x4 x3 x2 x1 …by fitting generative models! 10/153 X1 X2 X3 X4 X5 x8 q1(m)? x7 x6 q2(m)? x5 ≈ x4 ... x3 x2 qk(m)? pm(X) x1 …e.g. exploratory data analysis 10/153 Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? © fineartamerica.com 11/153 Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? X = fDay; Time; JamStr1; JamStr2;:::; JamStrNg q1(m) = pm(Day = Mon; JamWwood = 1) © fineartamerica.com 11/153 Why probabilistic inference? q1: What is the probability that today is a Monday and there is a traffic jam on Westwood Blvd.? X = fDay; Time; JamStr1; JamStr2;:::; JamStrNg q1(m) = pm(Day = Mon; JamWwood = 1) © fineartamerica.com ) marginals 11/153 Why probabilistic inference? q2: Which day is most likely to have a traffic jam on my route to campus? X = fDay; Time; JamStr1; JamStr2;:::; JamStrNg W ^ q2(m) = argmaxd pm(Day = d i2route JamStri) © fineartamerica.com 11/153 Why probabilistic inference? q2: Which day is most likely to have a traffic jam on my route to campus? X = fDay; Time; JamStr1; JamStr2;:::; JamStrNg W ^ q2(m) = argmaxd pm(Day = d i2route JamStri) © fineartamerica.com ) marginals + MAP + logical events 11/153 Tractable Probabilistic Inference A class of queries Q is tractable on a family of probabilistic models M iff for any query q 2 Q and model m 2 M exactly computing q(m) runs in time O(poly(jmj)). 12/153 Tractable Probabilistic Inference A class of queries Q is tractable on a family of probabilistic models M iff for any query q 2 Q and model m 2 M exactly computing q(m) runs in time O(poly(jmj)). ) often poly will in fact be linear! 12/153 Tractable Probabilistic Inference A class of queries Q is tractable on a family of probabilistic models M iff for any query q 2 Q and model m 2 M exactly computing q(m) runs in time O(poly(jmj)). ) often poly will in fact be linear! ) Note: if M is compact in the number of random variables X, that is, jmj 2 O(poly(jXj)), then query time is O(poly(jXj)). 12/153 Tractable Probabilistic Inference A class of queries Q is tractable on a family of probabilistic models M iff for any query q 2 Q and model m 2 M exactly computing q(m) runs in time O(poly(jmj)). ) often poly will in fact be linear! ) Note: if M is compact in the number of random variables X, that is, jmj 2 O(poly(jXj)), then query time is O(poly(jXj)). ) Why exactness? Highest guarantee possible! 12/153 Stay tuned for... Next: 1. What are classes of queries? 2. Are my favorite models tractable? 3. Are tractable models expressive? After: We introduce probabilistic circuits as a unified framework for tractable probabilistic modeling 13/153 Q M : tractable bands 14/153 Complete evidence (EVI) q3: What is the probability that today is a Monday at 12.00 and there is a traffic jam only on Westwood Blvd.? © fineartamerica.com 15/153 Complete evidence (EVI) q3: What is the probability that today is a Monday at 12.00 and there is a traffic jam only on Westwood Blvd.? X = fDay; Time; JamWwood ; JamStr2;:::; JamStrNg q3(m) = pm(X = fMon; 12:00; 1; 0;:::; 0g) © fineartamerica.com 15/153 Complete evidence (EVI) q3: What is the probability that today is a Monday at 12.00 and there is a traffic jam only on Westwood Blvd.? X = fDay; Time; JamWwood ; JamStr2;:::; JamStrNg q3(m) = pm(X = fMon; 12:00; 1; 0;:::; 0g) …fundamental in maximum likelihood learning © fineartamerica.com Q MLE θm = argmaxθ x2D pm(x; θ) 15/153 Generative Adversarial Networks E E − minθ maxϕ x∼pdata(x) log Dϕ(x) + z∼p(z) log(1 Dϕ(Gθ(z))) Goodfellow et al., “Generative adversarial nets”, 2014 16/153 Generative Adversarial Networks E E − minθ maxϕ x∼pdata(x) log Dϕ(x) + z∼p(z) log(1 Dϕ(Gθ(z))) no explicit likelihood! ) adversarial training instead of MLE ) no tractable EVI good sample quality ) but lots of samples needed for MC unstable training ) mode collapse Goodfellow et al., “Generative adversarial nets”, 2014 17/153 Q EVI M : GANs tractable bands 18/153 Variational Autoencoders R pθ(x) = pθ(x j z)p(z)dz an explicit likelihood model! Rezende et al., “Stochastic backprop. and approximate inference in deep generative models”, 2014 Kingma and Welling, “Auto-Encoding Variational Bayes”, 2014 19/153 Variational Autoencoders ≥ E j − KL j jj log pθ(x) z∼qϕ(zjx) log pθ(x z) (qϕ(z x) p(z)) an explicit likelihood model! ... but computing log pθ(x) is intractable ) an infinite and uncountable mixture ) no tractable EVI we need to optimize the ELBO… ) which is “tricky” [Alemi et al. 2017; Dai et al. 2019; Ghosh et al. 2019] 20/153 Q EVI M : GANs VAEs tractable bands 21/153 Normalizing flows −1 −1 δf pX(x) = pZ(f (x)) det δx Z an explicit likelihood! …plus structured Jacobians f f −1 ) tractable EVI queries! many neural variants X RealNVP [Dinh et al. 2016], MAF [Papamakarios et al. 2017] MADE [Germain et al. 2015], PixelRNN [Oord et al. 2016] 22/153 Normalizing flows −1 −1 δf pX(x) = pZ(f (x)) det δx Z an explicit likelihood! …plus structured Jacobians f f −1 ) tractable EVI queries! many neural variants X RealNVP [Dinh et al.