1,520 Reference Genomes from Cultivated Human Gut Bacteria Enable Functional Microbiome Analyses
Total Page:16
File Type:pdf, Size:1020Kb
RESOURCE https://doi.org/10.1038/s41587-018-0008-8 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses Yuanqiang Zou 1,2,3,13, Wenbin Xue1,2,13, Guangwen Luo1,2,4,13, Ziqing Deng 1,2,13, Panpan Qin 1,2,5,13, Ruijin Guo1,2, Haipeng Sun1,2, Yan Xia1,2,5, Suisha Liang1,2,6, Ying Dai1,2, Daiwei Wan1,2, Rongrong Jiang1,2, Lili Su1,2, Qiang Feng1,2, Zhuye Jie1,2, Tongkun Guo1,2, Zhongkui Xia1,2, Chuan Liu1,2,6, Jinghong Yu1,2, Yuxiang Lin1,2, Shanmei Tang1,2, Guicheng Huo4, Xun Xu1,2, Yong Hou 1,2, Xin Liu 1,2,7, Jian Wang1,8, Huanming Yang1,8, Karsten Kristiansen 1,2,3,9, Junhua Li 1,2,10*, Huijue Jia 1,2,11* and Liang Xiao 1,2,6,9,12* Reference genomes are essential for metagenomic analyses and functional characterization of the human gut microbiota. We present the Culturable Genome Reference (CGR), a collection of 1,520 nonredundant, high-quality draft genomes generated from >6,000 bacteria cultivated from fecal samples of healthy humans. Of the 1,520 genomes, which were chosen to cover all major bacterial phyla and genera in the human gut, 264 are not represented in existing reference genome catalogs. We show that this increase in the number of reference bacterial genomes improves the rate of mapping metagenomic sequencing reads from 50% to >70%, enabling higher-resolution descriptions of the human gut microbiome. We use the CGR genomes to annotate functions of 338 bacterial species, showing the utility of this resource for functional studies. We also carry out a pan-genome analysis of 38 important human gut species, which reveals the diversity and specificity of functional enrichment between their core and dispensable genomes. he human gut microbiota refers to the all the microorganisms database belong to the human gut microbiota. Rather, the focus that inhabit the human gastrointestinal tract. Diverse roles of has been on clinically relevant pathogenic bacteria, which are over- the gut microbiota in human health and disease have been rec- represented in the microbial databases. The first catalog of 178 ref- T 1,2 ognized . Metagenomic studies have transformed our understand- erence bacterial genomes for the human microbiota was reported ing of the taxonomic and functional diversity of human microbiota, by the Human Microbiome Project (HMP)14 in 2010. To date, the but more than half of the sequencing reads from a typical human HMP has sequenced >2,000 microbial genomes cultivated from fecal metagenome cannot be mapped to existing bacterial reference human body sites, 437 of which are gut microbiota (data accessed genomes3,4. The lack of high-quality reference genomes has become an 8 September 2017). However, the number of reference gut bacterial obstacle for high-resolution analyses of the human gut microbiome. genomes is still far from saturated. Although the previously reported Integrated Gene Catalog (IGC) We present a reference catalog of genomes of cultivated human has enabled metagenomic, metatranscriptomic and metaproteomic gut bacteria (named the CGR), established by culture-based isola- analyses3,5,6, the gap between compositional and functional analy- tion of > 6,000 bacterial isolates from fecal samples of healthy indi- ses can only be filled by individual bacterial genomes. Genes that viduals. The CGR comprises 1,520 nonredundant, high-quality co-vary among samples can be clustered into metagenomic linkage draft bacterial genomes, contributing at least 264 new reference groups7, metagenomic clusters8 and metagenomic species9,10, whose genomes to the gut microbiome. After inclusion of CGR genomes, annotation depends on alignment to the limited number of exist- the mapping rate of selected metagenomic datasets improved from ing reference genomes. Other metagenomics-based analyses of the around 50% to over 70%. In addition to improving metagenomic gut microbiome—for example, single nucleotide polymorphisms analyses, the CGR will improve functional characterization and (SNPs), indels and copy number variations—rely on the coverage pan-genomic analyses of the gut microbiota at high resolution. and quality of reference genomes11–13. Despite the rapid increase in the number of sequenced bacte- Results rial and archaeal genomes, reference genomes for gut bacteria are Expanded catalog of gut bacterial genomes. We obtained underrepresented. It is estimated that < 4% of the bacterial genomes 6,487 bacterial isolates from fresh fecal samples donated by 155 in the US National Center for Biotechnology Information (NCBI) healthy volunteers by using 11 different media under anaerobic 1BGI-Shenzhen, Shenzhen, China. 2China National Genebank, BGI-Shenzhen, Shenzhen, China. 3Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark. 4Key Laboratory of Dairy Science, College of Food Sciences, Northeast Agricultural University, Harbin, Heilongjiang, China. 5BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China. 6Shenzhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, Shenzhen, China. 7BGI-Qingdao, BGI-Shenzhen, Qingdao, China. 8James D. Watson Institute of Genome Sciences, Hangzhou, China. 9Qingdao-Europe Advanced Institute for Life Sciences, Qingdao, China. 10School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China. 11Macau University of Science and Technology, Taipa, Macau, China. 12Department of Digestive Diseases, Huashan Hospital of Fudan University, Shanghai, China. 13These authors contributed equally: Yuanqiang Zou, Wenbin Xue, Guangwen Luo, Ziqing Deng, Panpan Qin. *e-mail: [email protected]; [email protected]; [email protected] NATURE BIOTECHNOLOgy | VOL 37 | FEBRUARY 2019 | 179–185 | www.nature.com/naturebiotechnology 179 RESOURCE NATURE BIOTECHNOLOGY Clu 243 Clu 107 Clu 197 Phylum annotation Clu 149 Clu 126 Clu 292 Clu 14 Clu 17 9 Clu 20 Clu 65 1 Clu 73 127 u 28 Clu 184 Clu 1 u 262 7 9 Clu 28 6 Clu 248 Clu 46 2 CluC 9 u 78 CluC Clu 262 7 Clu 128 5 CluC 101 Clu 81 CluC 7 3 Clu 78 0 Clu 1 CluC 153 2 Clu 18 Clu 23 4 Clu 199 CluC 246 Clu Clu 307 Clu 222 Clu 319 62 44 CluC 60 Clu 283 4 CluC 320 92 6 Clu Clu 47 Clu 279 CluC 124 2 Clu CluC 64 3 CluC 162 Clu 75 Clu 198 Clu 32 268 Bacteroidetes CluC 84 Clu 2 8 61 7 CluC 10 Clu 133 5 CluC 77 9 9 Clu 121 02 CluC 1 Clu 27 Clu 155 94 Clu 256 CluC 25 Clu Clu 0 138 Clu 109 138 147 Clu 17 CluC Clu 103 CluC 33 Clu 45 7 Clu 220 3 Clu 115 Clu 52 Actinobacteria Clu 5 Clu 142 13 2 CluC 53 Clu Clu 331 4 CluC 131 54 4 CluC 66 Clu 22 CluC 337 Clu 249 4 CluC 51 Clu 48 9 CluC 176 Clu 223 CluC 216 Clu 26 3 CluC 50 9 Clu 0 CluC 3 Firmicutes 49 4 Clu 6 CluC 285 Clu 9 CluC 21 125 Clu CluC 42 Clu 338 6 CluC 173 Clu 8 Clu 150 24 Clu 1 4 CluC 130 Clu 2 233 CluC 135 21 Proteobacteria Clu 95 CluC 299 Clu 116 5 CluC 309 Clu 6 CluC 239 102 CluC 291 Clu 61 2 Clu 105 CluC 191 CluC 287 Clu 205 CluC 247 Clu 229 5 CluC 89 Fusobacteria 9 1 CluC 30 CluC 163 293 CluC CluC 278 3 CluC 27 Clu 57 2 15 Clu CluC 271 CluC 297 288 CluC Clu 91 Clu 83 Clu 26 26 7 CluC 2 Clu 234 4 CluC 328 Clu 16 4 CluC 323 Clu 140 86 Clu 1 Clu 196 CluC 175 Clu 3 06 Clu 257 Clu 251 1 CluC 136 Clu 252 CluC 266 Clu 21 3 CluC 201 Clu 228 8 Clu 114 Clu 32 2 CluC 110 Clu 76 CluC 275 Clu 334 Clu 36 Clu 211 Clu 141 Clu 2500 CluC 59 Clu 255 CluC 8 Clu 210 CluC 167 Clu 2988 Clu 277 Clu 87 Clu 258 Clu 588 CluC 305 Clu 2644 ClC u 300 Clu 2322 CluC 67 Clu 1466 CluC 139 Clu 1299 CluC 219 6 Clu 206 Clu 276 Clu 231 CluC 55 Clu 3188 Clu 104 Clu 156 ClC u 37 6 CluC 2 Clu 336 08 6 Clu 166 Clu 159 5 CluC 118 Clu 325 8 Clu 1 Clu 178 82 3 Clu 237 Clu 203 600 CluC 2 Clu 1 42 Clu 30 Clu 261 8 6 CluC 241 Clu 106 0 ClC u 314 Clu 170 6 Clu 96 Clu 26 5 Clu 137 Clu 19 4 CluC 100 Clu 174 CluC 111 Clu 311 Clu 233 171 Clu 2 CluC 28 Clu 312 4 CluC 3 Clu 40 30 0 Clu 190 Clu 230 9 CluC 316 179 Clu 5 CluC 2 Clu 15 74 4 Clu Clu 4 281 0 CluC 215 Clu 20 CluC Clu 3 11 7 Clu 20 9 Clu 327 8 CluC 18 4 Clu 148 3 Clu 9 217 Clu 183 5 CluC 2 09 Clu 335 4 CluC 72 Clu 254 2 Clu 80 Clu 12 Clu 212 8 CluC 99 Clu 238 2 2 Clu 22 6 CluC CluC 74 Clu 56 7 263 3 Clu Clu 207 169 6 CluC 333 Clu 113 p Clu 85 Clu 296 3 Clu 9 CluC 25 Outgroup 8 3 Clu 303 CluC 157 Clu 108 Clu Clu 11 8 CluC 18 4 324 Clu 158 2 Clu Clu 44 5 5 CluC 68 Clu 2 225 CluC 315 0 Clu 35 CluC 16 2 CluC 2 Clu 310 8 CluC 86 8 Clu 112 5 82 Clu 38 8 Clu 8 Clu Clu 295 Clu 98 Clu 18 8 63 CluC 18 9 Clu 290 CluC 265 7 Clu 321 Clu 2 Clu 79 CluC 1 1 Clu 16 CluC 31 180 4 Clu 82 Clu 14 9 Clu 32 Clu 17 Clu 34 Clu 21 7 CluC 33 Clu 29 3 3 Clu 2 Clu 28 2 Clu 145 72 Clu 41 CluC 2 4 CluC 240 151 Clu 13 2 0 CluC 43 1 Clu 280 Clu 2 Clu 3 27 6 Clu Clu 313 3 Clu 71 Clu 154 Clu 21 Clu 70 0 Clu 11 7 CluC 253 Clu 3 0 CluC CluC 188 3 120 Clu 2 04 4 Clu 165 CluC 132 Clu 19 Clu 1 Clu 24 Clu 326 4 2 9 5 Clu 193 9 44 Clu 3 5 Clu 97 8 Clu 90 7 Clu 23 7 Clu 194 2 Clu 294 5 Clu 2 Clu 302 Clu 269 Clu 235 Clu 28 Fig.