RESEARCH ARTICLE Latent environment allocation of microbial community data Koichi Higashi1, Shinya Suzuki2, Shin Kurosawa2, Hiroshi Mori1, Ken Kurokawa1* 1 Genome Evolution Laboratory, National Institute of Genetics, Mishima, Japan, 2 Department of Biological Information, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan *
[email protected] a1111111111 a1111111111 Abstract a1111111111 a1111111111 As data for microbial community structures found in various environments has increased, a1111111111 studies have examined the relationship between environmental labels given to retrieved microbial samples and their community structures. However, because environments contin- uously change over time and space, mixed states of some environments and its effects on community formation should be considered, instead of evaluating effects of discrete envi- OPEN ACCESS ronmental categories. Here we applied a hierarchical Bayesian model to paired datasets containing more than 30,000 samples of microbial community structures and sample Citation: Higashi K, Suzuki S, Kurosawa S, Mori H, Kurokawa K (2018) Latent environment allocation description documents. From the training results, we extracted latent environmental topics of microbial community data. PLoS Comput Biol that associate co-occurring microbes with co-occurring word sets among samples. Topics 14(6): e1006143. https://doi.org/10.1371/journal. are the core elements of environmental mixtures and the visualization of topic-based sam- pcbi.1006143 ples clarifies the connections of various environments. Based on the model training results, Editor: Nicola Segata, University of Trento, ITALY we developed a web application, LEA (Latent Environment Allocation), which provides the Received: September 28, 2017 way to evaluate typicality and heterogeneity of microbial communities in newly obtained Accepted: April 16, 2018 samples without confining environmental categories to be compared.