TECHNICALRELEASE Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies John M. Sutton1, Joshua D. Millwood1, A. Case McCormack1 and Janna L. Fierst1,* 1 Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA ABSTRACT High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects. Subjects Animal and Plant Sciences, Molecular Genetics, Bioinformatics DATA DESCRIPTION Background Submitted: 17 February 2021 Accepted: 05 July 2021 Many factors affect the quality of a de novo genome assembly. Genome size increases the Published: 13 July 2021 size of the “puzzle” to put together, while the size of the pieces (sequence reads) remains the same. Repetitive regions and mobile genetic elements create unique challenges because * Corresponding author. E-mail:
[email protected] they are present in more than one location in the genome, so without contextual information it is difficult to identify how many copies exist in the complete genome.