Evaluation of the Effects of Library Preparation Procedure and Sample

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.12.439578; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 TITLE: Evaluation of the effects of library preparation procedure and sample 2 characteristics on the accuracy of metagenomic profiles 3 4 Christopher A Gaulkea,b,c,#*, Emily R Schmeltzera*, Mark Dasenkod, Brett M. Tylerd,e, 5 Rebecca Vega Thurbera, Thomas J Sharptona,d,f 6 7 *These authors contributeD equally. 8 9 AfFiliations 10 aDepartment oF Microbiology, Oregon State University, Corvallis, OR 97331 11 bDepartment oF Pathobiology, University oF Illinois Urbana-Champaign, Urbana, IL 12 61802 13 cCarl R. Woese Institute For Genomic Biology, University oF Illinois Urbana- 14 Champaign, Urbana, IL 61802 15 dCenter For Genome Research anD Biocomputing, Oregon State University, Corvallis, 16 OR 97331 17 eDepartment oF Botany anD Plant Pathology, Oregon State University, Corvallis, OR 18 97331 19 fDepartment oF Statistics, Oregon State University, Corvallis, OR 97331 20 21 Corresponding Author Information 22 #Christopher A Gaulke: [email protected] 23 24 ABSTRACT 25 Shotgun metagenomic sequencing has transformeD our unDerstanDing oF microbial 26 community ecology. However, preparing metagenomic libraries For high-throughput DNA 27 sequencing remains a costly, labor-intensive, anD time-consuming proceDure, which in 28 turn limits the utility oF metagenomes. Several library preparation proceDures have 29 recently been DevelopeD to oFFset these costs, but it is unclear how these newer 30 proceDures compare to current stanDards in the FielD. In particular, it is not clear iF all such 31 proceDures perform equally well across DiFFerent types oF microbial communities, or iF 32 Features oF the biological samples being processeD (e.g., DNA amount) impact the 33 accuracy oF the approach. To aDDress these questions, we assesseD how Five DiFFerent 34 shotgun DNA sequence library preparation methoDs, incluDing the commonly useD 35 Nextera® Flex kit, perform when applieD to metagenomic DNA. We measureD each 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.12.439578; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 36 methoD’s ability to proDuce metagenomic Data that accurately represents the unDerlying 37 taxonomic anD genetic Diversity oF the community. We performeD these analyses across 38 a range oF microbial community types (e.g., soil, coral-associateD, mouse-gut-associateD) 39 anD input DNA amounts. We find that the type oF community anD amount oF input DNA 40 inFluence each methoD’s performance, inDicating that careFul consiDeration may be 41 neeDeD when selecting between methoDs, especially For low complexity communities. 42 However, cost-eFFective preparation methoDs we assesseD are generally comparable to 43 the current golD stanDard Nextera® DNA Flex kit For high-complexity communities. 44 Overall, the results From this analysis will help expanD anD even Facilitate access to 45 metagenomic approaches in Future stuDies. 46 47 IMPORTANCE 48 Metagenomic library preparation methoDs anD sequencing technologies continue to 49 aDvance rapiDly, allowing researchers to characterize microbial communities in previously 50 underexploreD environmental samples anD systems. However, wiDely-accepteD 51 stanDardizeD library preparation methoDs can be cost-prohibitive. Newly available 52 approaches may be less expensive, but their eFFicacy in comparison to stanDardizeD 53 methoDs remains unknown. In this stuDy, we compareD Five DiFFerent metagenomic library 54 preparation methoDs. We evaluateD each methoD across a range oF microbial 55 communities varying in complexity and quantity of input DNA. Our FinDings Demonstrate 56 the importance oF consiDering sample properties, incluDing community type, composition, 57 anD DNA amount, when choosing the most appropriate metagenomic library preparation 58 methoD. 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.12.439578; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 59 INTRODUCTION 60 Recent aDvancements in high-throughput sequencing have revolutionizeD 61 genomic Discovery anD unlockeD new insights regarding the Diversity anD Function oF 62 microbial communities (1–4). For example, shotgun metagenomic sequencing has 63 clariFieD how the Functional capacity oF the gut microbiome links to human health (5–8), 64 improveD the eFFicacy oF antibiotic resistance gene Discovery (9–12), identified beneFicial 65 soil microbes For agricultural use (13–15), and uncovereD novel, meDically relevant 66 biosynthetic gene clusters in marine microbes (16–18). However, while metagenomes 67 oFFer rich opportunity to transform Discovery, the Financial cost oF proDucing metagenomic 68 Data limits their application. Because much oF this cost is associateD with the preparation 69 oF metagenomic DNA For high-throughput sequencing, there is hope that emergent 70 economical proDucts anD proceDures can expanD the scope oF metagenomic 71 investigations. 72 Illumina’s Nextera® XT anD DNA Flex kits (the latter now known as the “Illumina® 73 DNA Prep”) have been the most wiDely useD methoDs For preparing metagenomic 74 libraries and have eFFectively serveD as inDustry stanDard approaches. InDeeD, Illumina 75 DNA sequencing platForms remain the most wiDely utilizeD For generating genomic and 76 metagenomic data, and their library preparation kits are accordingly useD to prepare 77 samples For sequencing. Due to their Frequent use, these kits are subject to extensive 78 evaluation anD reFinement. For example, Illumina recently released an upDateD version 79 oF their “golD stanDard” Nextera® XT kit, which was rebranDeD as the Nextera® DNA Flex 80 (anD now Illumina® DNA Prep). This new kit allows greater Flexibility across a wiDer range 81 oF genomes, From small genomes (microbial anD amplicons) to more complex genomes 3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.12.439578; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 82 FounD in eukaryotic anD human systems. The Flex kit also resolveD sequencing biases 83 identified in the Nextera® XT kit that occur in genomic regions with extreme GC-content 84 (19). These Features oF the Nextera® DNA Flex kit have contributeD to its broaD aDoption 85 in metagenomic investigations. 86 One DownsiDe to the Nextera® DNA Flex kit is its relatively high price, which 87 presently costs roughly $46 per sample. While this cost may be reasonable consiDering 88 the DemanD For the proDuct and its observeD eFFicacy, it is high enough that it limits the 89 scale oF many metagenomic investigations. For example, stuDies performing high- 90 throughput analyses on hunDreDs or thousanDs oF samples may be ForceD to utilize non- 91 metagenomic approaches (e.g., 16S rRNA gene sequencing) Due to this library 92 preparation expense. In the eFFort to circumvent this challenge, several alternative anD 93 competitive genomic library preparation methoDs have recently been DevelopeD anD 94 applieD to metagenomic investigations. These approaches fall into two categories: 95 methoDs that increase the economy oF the Illumina Nextera® by moDiFying various 96 aspects oF the manuFacturing protocols (e.g. Baym et al. 2015 (20), anD those that use 97 entirely DiFFerent technologies (e.g. seqWell plexWell™ 96). These approaches holD great 98 promise to improve the throughput oF metagenomic investigations by reDucing library 99 preparation costs. For example, the recent methoD known as ‘Hackflex’ achieves an 100 eleven-FolD Decrease in per sample reagent costs compareD to the Illumina kit protocols 101 (21). 102 Although several alternative library preparation approaches have been assesseD 103 From the perspective oF whole genome sequencing, very little is known about their 104 accuracy anD precision when applied to metagenomic investigations. It is crucial that the 4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.12.439578; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 105 performance oF novel library preparation proceDures be speciFically assesseD in Diverse 106 metagenomic communities as DiFFerent community types proviDe the unique sequencing 107 challenges not common to traDitional whole genome sequencing. For example, 108 metagenomic communities vary in complexity, with some communities having few Distinct 109 taxa (e.g., insect gut) while others are very highly Diverse (e.g., soil). Library preparation 110 proceDures may vary in their ability to unbiaseDly sample

Load more