Tamara Broderick ITT Career Development Assistant Professor, MIT

Automated Scalable Bayesian Inference via Data Summarization Tamara Broderick ITT Career Development Assistant Professor, MIT

With: Trevor Campbell, Jonathan H. Huggins “Core” of the data set

1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall”

Football Curling

1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall”

Football Curling

1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall”

Football Curling

1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set

Football Curling

[Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011] 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set

Football Curling

• Theoretical guarantees on quality

Football Curling

• Theoretical guarantees on quality • Previous heuristics: data squashing, big data GPs

[Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; 1 Madigan et al 2002] “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set

Football Curling

• Theoretical guarantees on quality • Previous heuristics: data squashing, big data GPs • Cf. subsampling

Football Curling

• Theoretical guarantees on quality • Previous heuristics: data squashing, big data GPs • Cf. subsampling

Football Curling

• Theoretical guarantees on quality • Previous heuristics: data squashing, big data GPs • Cf. subsampling

Football Curling

• Theoretical guarantees on quality • Previous heuristics: data squashing, big data GPs • Cf. subsampling • How to develop coresets for diverse tasks/geometries? [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; Madigan 1 et al 2002; Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] Roadmap

• The “core” of the data set Roadmap

• The “core” of the data set • Approximate Bayes review • Uniform data subsampling isn’t enough • Importance sampling for “coresets” • Optimization for “coresets” • Approximate sufﬁcient statistics Roadmap

2 Bayesian inference

[Gillon et al 2017]

[Grimm et al 2018]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Woodard et al 2017]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Woodard [amcharts.com 2016][Meager 2018a,b] et al 2017]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017]

2 Bayesian inference

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Kuikka et al 2014] [Baltic Salmon Fund]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017]

2 Bayesian inference • Goals: good estimates, uncertainties; interpretable; complex; expert information

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Kuikka et al 2014] [Baltic Salmon Fund]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017]

2 Bayesian inference • Goals: good estimates, uncertainties; interpretable; complex; expert information

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Kuikka et al 2014] [Baltic Salmon Fund]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017] [mc-stan.org]

2 Bayesian inference • Goals: good estimates, uncertainties; interpretable; complex; expert information

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Kuikka et al 2014] [Baltic Salmon Fund]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017] [mc-stan.org] • Challenge: existing methods can be slow, tedious, unreliable

2 Bayesian inference • Goals: good estimates, uncertainties; interpretable; complex; expert information

[Gillon et al 2017]

[ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [Stone et al 2014]

[Kuikka et al 2014] [Baltic Salmon Fund]

[Woodard [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan et al 2017] [Julian Hertzog 2016] 2017] [mc-stan.org] • Challenge: existing methods can be slow, tedious, unreliable • Our proposal: develop coresets for scalable, automated 2 algorithms with error bounds for output quality Bayesian inference

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ |

(xn, yn)

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ |

Normal Phishing (xn, yn)

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓

Normal Phishing (xn, yn)

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓

Normal Phishing (xn, yn)

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

[Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017]

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

[Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-ﬁeld) variational Bayes: (MF)VB

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

[Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-ﬁeld) variational Bayes: (MF)VB • Fast

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2

Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

[Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-ﬁeld) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] (3.6M Wikipedia, 32 cores, ~hour) • Misestimation & lack of quality guarantees [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2017; Opper, Winther 2003; Giordano, Broderick, Jordan 2015]

3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 p(✓ x) | MFVB Normal q⇤(✓) Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / |

Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln n=1 X

Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln n=1 X • Coreset log likelihood

Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln n=1 X • Coreset log likelihood w N k k0 ⌧

Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln N n=1 X • Coreset log likelihood (w, ✓):= w (✓) s.t. L nLn n=1 w N X k k0 ⌧

Normal Phishing

4 Bayesian coresets

(w) L L Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln N n=1 X • Coreset log likelihood (w, ✓):= w (✓) s.t. L nLn n=1 w N X k k0 ⌧ • ε-coreset: (w) ✏ kL Lk

(w) L L Normal Phishing

4 Bayesian coresets

• Posterior p(✓ y) ✓ p(y ✓)p(✓) | / | N • Log likelihood n(✓) := log p(yn ✓), (✓):= (✓) L | L Ln N n=1 X • Coreset log likelihood (w, ✓):= w (✓) s.t. L nLn n=1 w N X k k0 ⌧ • ε-coreset: (w) ✏ kL Lk • Approximate posterior close in Wasserstein distance dW (pw( y),p( y)) Cj (w) WFID,j 1, 2 AAACXHicbVHRShtBFJ1dtcZY21ihL325NBQSiGFXCvooVUoLfbDQGCETltnZiY7Ozi4zd2vDuD/pmy/9lXY25iHVXhg4c8653Lln0lJJi1H0EIRr6xsvNltb7e2XO69ed3bfnNuiMlyMeKEKc5EyK5TUYoQSlbgojWB5qsQ4vTlp9PFPYaws9A+cl2Kas0stZ5Iz9FTSsVnixom7rutemdz2KM8KhDuY9wdQrtz6QJWAk8YI9A5ozvCKM+W+1b3bPuyvEl5PHEXxC03uxp+/ntb1AK6BSg3UxYMDWiedbjSMFgXPQbwEXbKss6RzT7OCV7nQyBWzdhJHJU4dMyi5EnWbVlaUjN+wSzHxULNc2KlbhFPDB89kMCuMPxphwa52OJZbO89T72yWsE+1hvyfNqlwdjR1UpcVCs0fB80qBVhAkzRk0giOau4B40b6twK/YoZx9P/R9iHET1d+Ds4PhnE0jL9/7B5/WsbRIu/Ie9IjMTkkx+QLOSMjwskD+RO0gq3gd7geboc7j9YwWPbskX8qfPsXy6+y+A== j ·| ·|  kL Lk 2 { } (w) L L Normal Phishing

4 [Huggins, Kasprzak, Campbell, Broderick, forthcoming] Roadmap

Normal Phishing