Tutorial on Probabilistic Programming with Pymc3

185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02 - 04.04.2017 Tutorial on Probabilistic Programming with PyMC3 [email protected] http://hci-kdd.org/machine-learning-for-health-informatics-course Holzinger Group hci-kdd.org 1 MAKE Health T01 Schedule ▪ 01. Introduction to Probabilistic Programming ▪ 02. PyMC3 ▪ 03. linear regression – the Bayesian way ▪ 04. generalized linear models with PyMC3 Holzinger Group hci-kdd.org 2 MAKE Health T01 01. Introduction and Overview ▪ Probabilistic Programming (PP) ▪ allows automatic Bayesian inference ▪ on complex, user-defined probabilistic models ▪ utilizing “Markov chain Monte Carlo” (MCMC) sampling ▪ PyMC3 ▪ a PP framework ▪ compiles probabilistic programs on-the-fly to C ▪ allows model specification in Python code Salvatier J, Wiecki TV, Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 https://doi.org/10.7717/peerj-cs.55 Holzinger Group hci-kdd.org 3 MAKE Health T01 Properties of Probabilistic Programs ▪ IS NOT ▪ Software that behaves probabilistically ▪ General programming language ▪ IS ▪ Toolset for statistical / Bayesian modeling ▪ Framework to describe probabilistic models ▪ Tool to perform (automatic) inference ▪ Closely related to graphical models and Bayesian networks ▪ Extension to basic language (e.g. PyMC3 for Python) “does in 50 lines of code what used to take thousands” Kulkarni, T. D., Kohli, P., Tenenbaum, J. B. & Mansinghka, V. Picture: A probabilistic programming language for scene perception. in Proceedings of the ieee conference on computer vision and pattern recognition 4390–4399 (2015). Holzinger Group hci-kdd.org 4 MAKE Health T01 Probabilistic Programs ▪ Machine learning algorithms / models often a black box PP “open box” ▪ Simple approach 1. Define and build model 2. Automatic inference 3. Interpretation of results not much equations anymore! ▪ “inference”: guess latent variables based on observations, using e.g. MCMC Holzinger Group hci-kdd.org 5 MAKE Health T01 Markov chain Monte Carlo (MCMC) ▪ Markov chain ▪ Stochastic process ▪ “memoryless” (Markov property) ▪ Conditional probability distribution of future states depends only upon the present state ▪ Sampling from probability distributions ▪ State of chain sample of distribution ▪ Quality improves with number of steps ▪ Class of algorithms / methods ▪ Numerical approximation of complex integrals Holzinger Group hci-kdd.org 6 MAKE Health T01 Markov chain Monte Carlo (MCMC) (animated) Holzinger Group hci-kdd.org 7 MAKE Health T01 Markov chain Monte Carlo (MCMC) ▪ Metropolis-Hastings: random walk ▪ Gibbs-sampling: popular, complex, no tuning ▪ PyMC3 ▪ No-U-Turn Sampler (NUTS) ▪ Hamiltonian Monte Carlo (HMC) ▪ Metropolis ▪ Slice ▪ BinaryMetropolis Holzinger Group hci-kdd.org 8 MAKE Health T01 Prior & posterior distributions ▪ Quantity of interest: 휃 (theta) ▪ Prior = probability distribution ▪ Uncertainty before observation: p(휃) ▪ Belief in absence of data ▪ Posterior = probability distribution ▪ Uncertainty after observation X: p(휃|X) ▪ Likelihood: p(푋|휃) 푃표푠푡푒푟표푟 ∝ 퐿푘푒푙ℎ표표푑 × 푃푟표푟 푝 푥 휃 푝(휃) 푝 휃 푥 = 푝(푥) ▪ Calculating posterior from observation and prior = updating beliefs Coin toss example Holzinger Group hci-kdd.org 9 MAKE Health T01 PyMC3 ▪ Python 3 package / framework ▪ Probabilistic machine learning ▪ specification and fitting of Bayesian models ▪ Inference by MCMC & variational fitting algorithms ▪ Performance enhancements: cross-compilation to C (Python numerical computation package “Theano”) ▪ Accessible, natural syntax ▪ Various capabilities: GPU computing, sampling backends, object-oriented, extendable design Holzinger Group hci-kdd.org 10 MAKE Health T01 Live presentation ▪ PyMC3 syntax introduction ▪ linear regression – the Bayesian way ▪ generalized linear models with PyMC3 Holzinger Group hci-kdd.org 11 MAKE Health T01 Thank you! Holzinger Group hci-kdd.org 12 MAKE Health T01 Sample Questions ▪ What is probabilistic programming? What type of problems can be solved? ▪ What are the typical steps in a probabilistic program? ▪ What is inference? ▪ What is PyMC3? ▪ What is the posterior distribution? Holzinger Group hci-kdd.org 13 MAKE Health T01 Appendix ▪ Main sources ▪ Salvatier J, Wiecki TV, Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 https://doi.org/10.7717/peerj-cs.55 ▪ Davidson-Pilon, C. Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference. (Addison-Wesley Professional, 2015). ▪ PyMC3’s documentation http://pymc-devs.github.io/pymc3/index.html Holzinger Group hci-kdd.org 14 MAKE Health T01.

Tutorial on Probabilistic Programming with Pymc3

University of North Carolina at Charlotte Belk College of Business

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Arxiv:1801.02309V4 [Stat.ML] 11 Dec 2019

An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling

Hamiltonian Monte Carlo Swindles

Is Structure Based Drug Design Ready for Selectivity Optimization?

Hamiltonian Monte Carlo Within Stan

Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables

Migrating MATLAB®To Python

Markov Chain Monte Carlo Methods for Estimating Systemic Risk Allocations

The Geometric Foundations of Hamiltonian Monte Carlo 3

Probabilistic Programming in Python Using Pymc3