Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad Alkhouja. Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce. ACM International Conference on World Wide Web, 2012, 10 pages. @inproceedings{Zhai:Boyd-Graber:Asadi:Alkhouja-2012, Author = {Ke Zhai and Jordan Boyd-Graber and Nima Asadi and Mohamad Alkhouja}, Url = {docs/mrlda.pdf}, Booktitle = {ACM International Conference on World Wide Web}, Title = {{Mr. LDA}: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce}, Year = {2012}, Location = {Lyon, France}, } Links: • Code [http://mrlda.cc] • Slides [http://cs.colorado.edu/~jbg/docs/2012_www_slides.pdf] Downloaded from http://cs.colorado.edu/~jbg/docs/mrlda.pdf 1 Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce Ke Zhai Jordan Boyd-Graber Nima Asadi Computer Science iSchool and UMIACS Computer Science University of Maryland University of Maryland University of Maryland College Park, MD, USA College Park, MD, USA College Park, MD, USA
[email protected] [email protected] [email protected] Mohamad Alkhouja iSchool University of Maryland College Park, MD, USA
[email protected] ABSTRACT In addition to being noisy, data from the web are big. The MapRe- Latent Dirichlet Allocation (LDA) is a popular topic modeling tech- duce framework for large-scale data processing [8] is simple to learn nique for exploring document collections. Because of the increasing but flexible enough to be broadly applicable. Designed at Google prevalence of large datasets, there is a need to improve the scal- and open-sourced by Yahoo, Hadoop MapReduce is one of the ability of inference for LDA.