Tamara Broderick ITT Career Development Assistant Professor, MIT

Tamara Broderick ITT Career Development Assistant Professor, MIT

Automated Scalable Bayesian Inference via Data Summarization Tamara Broderick ITT Career Development Assistant Professor, MIT With: Trevor Campbell, Jonathan H. Huggins “Core” of the data set 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” Football Curling 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” Football Curling 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” Football Curling 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011] 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011] 1 “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality • How to develop coresets for diverse tasks/geometries? [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; 1 Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality • How to develop coresets for diverse tasks/geometries? • Previous heuristics: data squashing, big data GPs [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; Madigan 1 et al 2002; Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality • How to develop coresets for diverse tasks/geometries? • Previous heuristics: data squashing, big data GPs • Cf. subsampling [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; Madigan 1 et al 2002; Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality • How to develop coresets for diverse tasks/geometries? • Previous heuristics: data squashing, big data GPs • Cf. subsampling [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; Madigan 1 et al 2002; Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] “Core” of the data set • Observe: redundancies can exist even if data isn’t “tall” • Coresets: pre-process data to get a smaller, weighted data set Football Curling • Theoretical guarantees on quality • How to develop coresets for diverse tasks/geometries? • Previous heuristics: data squashing, big data GPs • Cf. subsampling [Bădoiu, Har-Peled, Indyk 2002; Agarwal et al 2005; Feldman & Langberg 2011; DuMouchel et al 1999; Madigan 1 et al 2002; Huggins, Campbell, Broderick 2016; Campbell, Broderick 2017; Campbell, Broderick 2018] Roadmap • The “core” of the data set Roadmap • The “core” of the data set • Bayes setup • Uniform data subsampling isn’t enough • Importance sampling for “coresets” • Optimization for “coresets” Roadmap • The “core” of the data set • Bayes setup • Uniform data subsampling isn’t enough • Importance sampling for “coresets” • Optimization for “coresets” Bayesian inference 1 Bayesian inference [Gillon et al 2017] [Grimm et al 2018] 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] [Woodard et al 2017] 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan [Woodard et al 2017] [Julian Hertzog 2016] 2017] 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan [Woodard et al 2017] [Julian Hertzog 2016] 2017] • Goal: Report point estimates, coherent uncertainties 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan [Woodard et al 2017] [Julian Hertzog 2016] 2017] • Goal: Report point estimates, coherent uncertainties • Challenge: existing methods can be slow, tedious, unreliable 2 Bayesian inference [Gillon et al 2017] [ESO/ L. Calçada/ [Grimm et al 2018] M. Kornmesser 2017] [Abbott et al 2016a,b] [amcharts.com 2016][Meager 2018a,b] [Chati, Balakrishnan [Woodard et al 2017] [Julian Hertzog 2016] 2017] • Goal: Report point estimates, coherent uncertainties • Challenge: existing methods can be slow, tedious, unreliable • Our proposal: use efficient data summaries for scalable, automated algorithms with error bounds for finite data 2 Bayesian inference 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | (xn, yn) 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | Normal Phishing (xn, yn) 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ Normal Phishing (xn, yn) 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ Normal Phishing (xn, yn) 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB • Fast 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] (3.6M Wikipedia, 32 cores, ~hour) 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 Normal Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] (3.6M Wikipedia, 32 cores, ~hour) • Misestimation & lack of quality guarantees [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2017; Opper, Winther 2003; Giordano, Broderick, Jordan 2015, 2017; Huggins, Campbell, Kasprzak, Broderick 2018] 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 p(✓ x) | MFVB Normal q⇤(✓) Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] (3.6M Wikipedia, 32 cores, ~hour) • Misestimation & lack of quality guarantees [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2017; Opper, Winther 2003; Giordano, Broderick, Jordan 2015, 2017; Huggins, Campbell, Kasprzak, Broderick 2018] 3 p(✓ y) p(y ✓)p(✓) Bayesian inference | /✓ | ✓ ✓2 p(✓ x) | MFVB Normal q⇤(✓) Exact Phishing posterior (xn, yn) [Bishop 2006] ✓1 [Bardenet, Doucet, • MCMC: Eventually accurate but can be slow Holmes 2017] • (Mean-field) variational Bayes: (MF)VB [Broderick, Boyd, Wibisono, • Fast, streaming, distributed Wilson, Jordan 2013] (3.6M Wikipedia, 32 cores, ~hour) • Misestimation & lack of quality guarantees [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2017; Opper, Winther 2003; Giordano, Broderick, Jordan 2015, 2017; Huggins, Campbell, Kasprzak, Broderick 2018] • Automation: e.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    93 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us