Australian Centre for Ecogenomics

Classification of the family :application of Nanopore long read sequencing to link 16S rRNA gene amplicon and metagenome assembled genome-derived taxonomies Maximillian Lacour1, Kate Bowerman1, Antiopi Varelias2, Philip Hugenholtz1 1Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia 2QIMR Berghofer Medical Research Institute, Brisbane, Australia

Introduction The family Muribaculaceae, previ- Long read linking ously known as S24-7 or Homeothermaceae, is a Metagenome predominate member of the mouse gut microbi- Nanopore long read sequencing on average ome and common microbe in other mammalian gut 237 UBA932 sequences reads greater than1000 bp in environments¹. length. 69 Rinkenellaceae Most of what we know about Muribaculaceae has Short read Long reads can sequence entire 16S rRNA sequencing Long read 53 vadinHA17 come from 16S rRNA gene studies or short read genes along with flanking regions, which can sequencing 15 Salinvirgaceae metagenomic studies. be used to help infer taxonomy. 63 P3 Muribaculaceae is large family with ~3000 16S Seven laboratory mouse faecal samples were 21 WCHB1-69 rRNA gene representatives in the non-redundant sequenced and assembled with our in house Short reads Long reads 14 Lentimicrobiaceae SILVA REF 132 16S rRNA gene database². assembly pipeline using a hybrid assembly 31 Prolixibacteraceae method to polish long reads. 41 Marinifilaceae ~250 metagenomic assembled genome (MAG), Short read assmenbly Hybrid long read representing 13 genera, in the Genome Taxonomy assembly 29 Marinilabiliaceae Order Bacteroidales Only 16S rRNA genes located in contigs great- Database (GTDB)³. er than 5000 bp in length were selected for 118 Paludibacteraceae connecting taxonomies. Muribaculaceae is difficult to culture and isolate in Contigs missing 16S rRNA gene High quality long contigs with partial or full length 16S rRNA genes along with large flanking regions 131 Dysgonomonadaceae laboratory conditions and only six isolates, repre- 39 genome – 16S rRNA gene taxonomic links senting three genera, have been published⁴. were made. 97 Tannerellaceae Figure 3. Advantages of long read sequencing for connecting 16S rRNA gene to genomes. 14 Barnesiellaceae Short read shotgun sequencing techniques, such Links between isolate represented genera, Identifying and associating 16S rRNA genes to MAGs is complicat- 8 Coprobacteraceae as Illumina, produce reads too small to associate Muribaculum, Duncaniella, and Paramuribacu- ed by short read length, skewed species abundance and high simi- 16S rRNA genes to MAGs when sequencing envi- larity of 16S rRNA genes. Long read sequencing overcomes these 124 lum, were reinforced by 20 links. limitations by sequencing reads long enough to assemble entire ronmental samples. 16S rRNA genes along with large flanking regions. Long reads are Further links found for CAG-485, and new links hybrid assembled with the assistance of short reads to overcome 1134 The result is poor taxonomic linking between high error rates typically associated with long read sequencing. Bacteroidaceae made for UBA7173, UBA3263, and CAG-873. metagenomic genome taxonomy and 16S rRNA gene taxonomy. 249 Muribaculaceae To address this poor taxonomic connectivity, we Genome tree 16S rRNA gene tree Bootstrap >90% proposed two methods. 70-90% Pr03_barcode02_bin.28 Direct linking via single cell approach Pr03_barcode02_bin.28 Pr03_barcode02_bin.25 Pr03_barcode02_bin.25 Pr02_barcode09_bin.2 Improve MAG quality via long read sequencing Pr02_barcode09_bin.2 Pr02_barcode06_bin.4 Pr02_barcode06_bin.4 Pr03_barcode02_bin.55 GB_GCA_001701075.1 Unc00lqx 55 55 CAG−485 Pr02_barcode01_bin.30 CAG−485 GB_GCA_002361235.1 Pr03_barcode02_bin.64 Figure 1. Muribaculaceae is a large family, consisting of 4 CAG−2794 CAG−279 GB_GCA_002361155.1 Unc0oizq Pr02_barcode01_bin.35 CAG-485 CAG-485 Unc00pct 55 55 13 genera, within the Bacteroidales order. C941 Pr02_barcode01_bin.11 Unc33258 C941 Pr02_barcode01_bin.35 Muribaculaceae Pr03_barcode02_bin.55 Top, Genome tree of major families within the Bacteroidales 1 1 M3 Pr02_barcode01_bin.11 M3 Pr02_barcode01_bin.30 22 22 Pr02_barcode01_bin.8 order. Right, Genome tree of Muribaculaceae family. Trees MuribaculumMuribaculum Pr03_barcode02_bin.64 Unc00usa Unc0dym7 build with FASTTREE based on the alignment of 120 con- 15 15 Duncaniella GB_GCA_002404675.1 Duncaniella Pr02_barcode01_bin.8 Pr03_barcode02_bin.9 catenated marker genes. Bootstrap support derived by 100 2 CAG_1031 2 CAG_1031 Pr03_barcode02_bin.63 Pr03_barcode02_bin.9 Muribaculum-1 Pr02_barcode01_bin.6 replicates. Bootstrap values of >90% illustrated by black dia- 5 K10 5 K10 Pr03_barcode02_bin.63 Unc00e2s Pr02_barcode01_bin.6 Muribaculum_intestinale_GCF_004803695.1 17 monds, support of 70-90% illustrated by white diamonds, Paramuribaculum17 Paramuribaculum Pr02_BC04_bin.26 Muribaculum_intestinale_GCF_003024855.1 Muribaculum_intestinale_GCF_004803695.1 Muribaculum_intestinale_GCF_003024845.1 <70% not shown. 54 54 CAG−873 CAG−873 Unc0odg2 GB_GCA_001689405.1 Muribaculum Pr03_barcode02_bin.32 11 UBA3263 11 UBA3263 GB_GCA_003150235.1 Pr02_barcode05_bin.11 Muribaculum_intestinale_GCF_003024855.1 Muribaculum-2 Pr02_barcode05_bin.20 0.10 1 UBA7173_A 1 UBA7173_A Muribaculum_intestinale_GCA_003024845.1 Pr02_barcode04_bin.26 7 Unc37483 >90% 7 UBA7173 Pr03_barcode02_bin.32 Bootstrap 70-90% UBA7173 Pr02_barcode09_bin.11 Pr02_Barcode05_bin.11 Pr02_barcode09_bin.19 Pr02_barcode09_bin.11 Pr03_barcode02_bin.19 Pr02_barcode09_bin.19 Pr02_barcode06_bin.13 Pr03_barcode02_bin.19 Unc22588 Duncaniella_muris_GCF_004803935.1 Duncaniella_muris_GCF_003024805.1 Duncaniella_muris_GCF_003024805.1 Duncaniella_muris_GCA_005304985.1 Unc05pvc Duncaniella_muris_GCA_004803935.1 Duncaniella_muris_GCF_005304985.1 Pr03_barcode02_bin.13 Unc00iqk Pr02_barcode01_bin.10 Unc04qph Duncaniella Pr03_barcode02_bin.4 Single cell linking Duncaniella_muris_GCF_004803915.1 Duncaniella Unc00ntt Pr02_barcode01_bin.36 Duncaniella_muris_GCF_004766125.1 Pr02_barcode09_bin.8 Unc16475 Single cells were collected from laboratory mice faecal samples via flow activated cell sorting. Pr02_BC06_bin.13 Pr03_barcode02_bin.13 Pr02_barcode01_bin.29 Pr02_barcode01_bin.10 Pr02_barcode01_bin.29 Pr03_barcode02_bin.4 Pr02_barcode01_bin.36 Duncaniella_muris_GCF_004766125.1 Pr02_barcode09_bin.8 DNA amplified using multiple displacement amplification and sequenced with shotgun metagenomic se- GB_GCA_001701225.1 Duncaniella_muris_GCF_004803915.1 GB_GCA_001941205.1 Unc0jju1 quencing. GB_GCA_001689515.1 K10 Unc00jfi Pr02_barcode01_bin.32 Paramuribaculum_intestinale_GCA_003024815.1 Unc44393 Pr03_barcode02_bin.11 CAG-873-2 Unc00lkl Pr02_barcode05_bin.9 Pr02_barcode01_bin.15 14 single cell amplified genomes (SAGs) obtained, three of which were Muribaculaceae that had partial Pr02_barcode01_bin.26 Paramuribaculum Pr02_barcode01_bin.40 UBA3263 Pr03_barcode02_bin.30 Pr02_barcode05_bin.20 Unc00ge4 or full 16S rRNA genes. Pr03_barcode02_bin.59 Paramuribaculum_intestinale_GCF_003024815.1 GB_GCA_001701115.1 Pr03_barcode02_bin.11 Pr03_barcode02_bin.60 Pr02_barcode01_bin.26 Pr02_barcode01_bin.32 Pr03_barcode02_bin.59 Paramuribaculum Pr02_barcode05_bin.9 Reinforced genus Duncaniella linking and first genome to 16S rRNA gene taxonomic linking for GB_GCA_002491245.1 Unc04lil Pr02_barcode01_bin.41 CAG−873 Unc05dol CAG-485. Pr02_barcode01_bin.4 Unc00hcq Pr02_barcode01_bin.18 Pr03_barcode02_bin.15 UBA7173 Unc00ftj Pr02_barcode01_bin.40 UBA3263 Unc08gwn Pr03_barcode02_bin.30 Pr03_barcode02_bin.60 Pr02_barcode01_bin.18 CAG-873 Unc02b29 GB_GCA_002491305.1 UBA7173 Unc094u4 GB_GCA_001701135.1 Pr02_barcode01_bin.41 Genome tree 16S rRNA gene tree Pr02_barcode01_bin.4 GB_GCA_001701225.1 GB_GCA_900318865.1 GB_GCA_001689425.1 GB_GCA_900318645.1 C941 0.10 SAG022 GCF_004803915.1 Duncaniella SAG022 0.10 >90% Duncaniella_muris_GCA_005304985.1 Duncaniella Duncaniella_muris_GCA_005304985.1 Bootstrap 70-90% Duncaniella_muris_GCA_004803935.1 Duncaniella_muris_GCA_004803935.1 Paramuribaculum_intestinale_GCA_003024815.1 Paramuribaculum Muribaculum_intestinale_GCA_003024845.1 Muribaculum_intestinale_GCA_003024845.1 Muribaculum_intestinale_GCF_003024855.1 Muribaculum Muribaculum Muribaculum_intestinale_GCF_003024855.1 SAG021 Paramuribaculum_intestinale_GCA_003024815.1 GB_GCA_002361215.1 Paramuribaculum GB_GCA_002404675.1 Unc00mrw GB_GCA_002493515.1 Unc091fr GB_GCA_002493045.1 Unc091fr GB_GCA_001701075.1 CAG-485 CAG-485 SAG021 GB_GCA_002491945.1 Muribaculaceae GB_GCA_001701075.1 SAG017 Figure 4. Linking of genome taxonomy and 16S rRNA gene taxonomy via long read sequencing. SAG017 Unc04gu6 SAG020 Left, Trimmed maximum likelihood tree, based on alignment of 120 concatenated marker genes, of 249 Muribaculaceae genomes sourced from the GTDB r89 and 39 nanopore long read sequenced MAGs. Bootstrap support derived by 100 replicates. Right, Trimmed 16S rRNA Figure 2. Single cell approach for linking genome and 16S rRNA gene taxonomies. gene tree de-novo generated with RAXML using 3003 Muribaculaceae 16S rRNA genes, collected from the SILVA database, and 39 16S Single cells collected from wild type mice were sequenced then assembled with SPADES single cell pipeline. Bins were annotated usin- rRNA gene sequences found within long read contigs that were >5000 bp in length. 16S rRNA genes were aligned to sequences from the PROKKA. Three single cells were identified contain partial or full 16S rRNA genes. These genes were aligned to the SILVA database with SILVA database using SINA aligner. Taxonomy of 16S rRNA gene was identified and applied using the genome tree taxonomies as a refer- SINA aligner and placed into a 16S rRNA gene tree. Single cell genomes were placed into a genome tree build based on the alignment of ence. Both trees have been manually trimmed to contain mostly only user sequences and isolate representatives. Coloured linking be- 120 concatenated marker genes. Connected in yellow is a single cell genome and associated 16S rRNA gene for the Duncaniella genus. tween trees indicates connectivity between genomes and 16S rRNA genes. New genera linking shown in this figure are, CAG-485, Links in purple are between single cells and associated 16S rRNA genes linking the genus CAG-485. Also placed in trees, but not linked, CAG-873, UBA3263, and UBA7173. are Muribaculaceae isolate representatives D. muris, M. intestinale, and P. intestinale.

References Acknowledgments Conclusion

[1] Kate L. Ormerod, David L. A. Wood, Nancy Lachner, Shaan L. Gellatly, Joshua N. Daly, Jeremy D. Parsons, Cristiana G. We thank Isabelle Krippner and the ACE sequencing team for library Long read sequencing is a powerful tool for associ- For the first time we were able to apply taxono- O. Dal’Molin, Robin W. Palfreyman, Lars K. Nielsen, Matthew A. Cooper, Mark Morrison, Philip M. Hansbro, Philip Hugen- preparation and both long read and short read sequencing, Pierre holtz. 2016. Microbiome; 4: 36. Alain Chaumeil for advice on genome tree building and curation of ating long and highly similar genes, such as 16S my in Muribaculaceae 16S rRNA gene trees for the GTDB, Mitchell Sullivan for the development of our in house long [2] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. 2013. Nucleic Acids Res; 4: read assembly pipeline, Antiopi Varelias and the team at the QIMR rRNA genes, to MAGs. the genera CAG-485, UBA7173, UBA3263, and D590–D596. Bone marrow transplantation for suppling mouse faecal samples. Work was funded the University of Queensland and the Australian CAG-873. [3] Donovan H Parks, Maria Chuvochina, David W Waite, Christian Rinke, Adam Skarshewski, Pierre-Alain Chaumeil , Philip National Health and Medical Research Council and National Insti- Hugenholtz. 2018. Nature Biotechnology; 36: 996–1004. tutes of Health. Personal thanks to Oxford Nanopore Technologies We were able to confirm 16S rRNA genera classifi- Ltd. for providing a bursary. [4] Ilias Lagkouvardos, Till R. Lesker, Thomas C. A. Hitch, Eric J. C. Gálvez, Nathiana Smit, Klaus Neuhaus, Jun Wang, cation for isolate represented genera, Muribaculum, Potential future applications to apply improved John F. Baines, Birte Abt, Bärbel Stecher, Jörg Overmann, Till Strowig , Thomas Clavel. 2019. Microbiome; 7: 28. Duncaniella, and Paramuribaculum, via additional 16S rRNA gene taxonomy to previous 16S long read and single cell connections. rRNA gene based studies.