De Novoassembly of a Chromosome-Scale Reference Genome
Total Page:16
File Type:pdf, Size:1020Kb
DE NOVO ASSEMBLY OF A CHROMOSOME-SCALE REFERENCE GENOME FOR THE NORTHERN FLICKER COLAPTES AURATUS The Texas Tech community has made this publication openly available. Please share how this access benefits you. Your story matters to us. Citation Jack P Hruska, Joseph D Manthey, De novo assembly of a chromosome-scale reference genome for the northern flicker Colaptes auratus, G3 Genes|Genomes|Genetics, Volume 11, Issue 1, January 2021, jkaa026, https://doi.org/10.1093/g3journal/jkaa026 Citable Link https://hdl.handle.net/2346/87602 Terms of Use CC BY 4.0 Title page template design credit to Harvard DASH. 2 G3, 2021, 11(1), jkaa026 DOI: 10.1093/g3journal/jkaa026 Advance Access Publication Date: 9 December 2020 Genome Report De novo assembly of a chromosome-scale reference genome for the northern flicker Colaptes auratus Downloaded from https://academic.oup.com/g3journal/article/11/1/jkaa026/6028989 by Texas Tech Univ. Libraries user on 09 August 2021 Jack P. Hruska * and Joseph D. Manthey Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409-43131, USA *Corresponding author: Department of Biological Sciences, Texas Tech University, 2901 Main Street, Lubbock, TX 79409-43131, USA. [email protected] Abstract Thenorthernflicker,Colaptes auratus, is a widely distributed North American woodpecker and a long-standing focal species for the study of ecology, behavior, phenotypic differentiation, and hybridization. We present here a highly contiguous de novo genome assembly of C. auratus, the first such assembly for the species and the first published chromosome-level assembly for woodpeckers (Picidae). The as- sembly was generated using a combination of short-read Chromium 10Â and long-read PacBio sequencing, and further scaffolded with chromatin conformation capture (Hi-C) reads. The resulting genome assembly is 1.378 Gb in size, with a scaffold N50 of 11 and a scaffold L50 of 43.948 Mb. This assembly contains 87.4–91.7% of genes present across four sets of universal single-copy orthologs found in tetra- pods and birds. We annotated the assembly both for genes and repetitive content, identifying 18,745 genes and a prevalence of 28.0% repetitive elements. Lastly, we used fourfold degenerate sites from neutrally evolving genes to estimate a mutation rate for C. auratus, which we estimated to be 4.007Â 10À9 substitutions/site/year, about 1.5Â times faster than an earlier mutation rate estimate of the family. The highly contiguous assembly and annotations we report will serve as a resource for future studies on the genomics of C. auratus and comparative evolution of woodpeckers. Keywords: Colaptes auratus; woodpeckers; PacBio; Hi-C; genome assembly Introduction differentiation echoes the genomic dynamics of other avian hy- brid zones, namely the golden-winged Vermivora chrysoptera and The northern flicker Colaptes auratus is a polytypic North blue-winged Vermivora cyanoptera complex, wherein only a few ge- American woodpecker with a distribution spanning from Alaska nomic regions associated with genes that determine plumage to northern Nicaragua, Cuba, and the Cayman Islands. Colaptes color and pattern differentiate the two species (Toews et al. 2016). auratus consists of up to 13 described subspecies (Gill et al. 2020) A chromosome-level reference genome for the complex will not and 5 morphological groups (Short 1982). Currently, the taxon- only facilitate the identification of the genetic basis of pheno- omy of C. auratus is uncertain; some authorities consider it to types (Kratochwil and Meyer 2015), a long-standing goal in evolu- form a species complex along with the gilded flicker Colaptes tionary biology research, but also provide researchers a valuable chrysoides, while others have suggested that one of the subspe- resource for the examination of emerging fields in genome biol- cies, C.auratus mexicanoides, is best considered a separate species ogy, such as the evolutionary dynamics of transposable element (del Hoyo et al. 2014). In addition, hybridization between morpho- (TE) proliferation (Manthey et al. 2018), for which woodpeckers logical groups in secondary contact is prevalent, primarily be- are especially well suited. tween the yellow-shafted and red-shafted flickers, who form a Here, we describe Caur_TTU_1.0, a de novo assembly that hybrid zone that extends from northern Texas to southern was built from a wild caught C. auratus female. We used Alaska (Wiebe and Moore 2020). The yellow-shafted/red-shafted three sequencing strategies: 10Â Chromium, PacBio, and chro- hybrid zone has become a prominent study system for the conse- matin conformation capture (Hi-C) to assemble the first pub- quences of secondary contact (e.g., Moore and Koenig 1986; lished chromosome-level genome for C. auratus and Picidae. Wiebe 2000). Despite there being marked phenotypic differentia- As whole-genome sequencing becomes more feasible and tion between red-shafted and yellow-shafted flickers, genetic di- prevalent, high-quality reference genomes will undoubtedly vergence between these groups is remarkably shallow, even serve as essential resources. We expect the chromosome-level when sampling thousands of markers across the genome assemblypresentedherewillbeofgreatusetothoseinter- (Manthey et al. 2017; Aguillon et al. 2018). The paradoxical con- ested in the genomic evolution of woodpeckers and birds, at junction of shallow genetic divergence and marked phenotypic large. Received: October 21, 2020. Accepted: November 12, 2020 VC The Author(s) 2020. Published by Oxford University Press on behalf of Genetics Society of America. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2|G3, 2021, Vol. 11, No. 1 Materials and methods between 10k, 30k, 60k, and 100k. After all iterations were run, the DNA extraction, library preparation, and assembly with greatest scaffold N50 and size was selected and sequencing used in subsequent rounds. Lastly, we used the Hi-C reads to fur- ther scaffold and fix mis-assemblies using the 3D-DNA pipeline We obtained breast muscle tissue from a vouchered C. auratus (Durand et al. 2016; Dudchenko et al. 2017). specimen (MSB 48083) deposited at the Museum of Southwestern To assess the spatial order of the scaffolds of the Biology (MSB). The specimen was a wild female collected on July Caur_TTU_1.0 assembly, we aligned it to the Chicken Gallus gallus 11, 2017 in Cibola County, New Mexico (see MSB database for chromosome-level assembly (GRCg6a, GCF_000002315.6, https:// complete specimen details) and exhibited the ‘red-shafted’ mor- www.ncbi.nlm.nih.gov/genome/? term¼Gallus%20) using the Downloaded from https://academic.oup.com/g3journal/article/11/1/jkaa026/6028989 by Texas Tech Univ. Libraries user on 09 August 2021 phology associated with C. auratus populations of western North nucmer module of MUMMER v 4.0.0b2 (Kurtz et al. 2004). We sub- America. We used a combination of 10Â Chromium, PacBio, and sequently filtered alignments using MUMMER’s delta-filter mod- Hi-C sequencing data for genome assembly. 10Â Chromium li- ule while setting the minimum alignment identity to 70% and brary sequencing was carried out by the HudsonAlpha Institute allowing many-to-many alignments. A tab-delimited text file for Biotechnology (Huntsville, AL, USA). They performed high- that includes information on the position, percent identity, and molecular weight DNA isolation, quality control, library prepara- length of each alignment was produced using MUMMER’s show- tion, and shotgun sequencing on one lane of an Illumina HiSeqX. coords module (Supplementary File S18). This file was used as in- For long-read PacBio sequencing, we used the services of RTL put to create a synteny plot with OmicCircos (Hu et al. 2014; R Genomics (Lubbock, TX, USA). They performed high-molecular Core Team 2018). Subsequently, the Caur_TTU_1.0 scaffolds were weight DNA isolation using Qiagen (Hilden, Germany) high- renamed according to their corresponding Chicken chromosome. molecular weight DNA extraction kits, PacBio SMRTbell library Scaffolds that did not show strong synteny to Chicken chromo- preparation, size selection using a Blue Pippin (Sage Science), and somes were not renamed. sequencing on six Pacific Biosciences Sequel SMRTcells 1M v2 Genome assembly metrics were obtained using the function with Sequencing 2.1 reagents. Hi-C library preparation was per- stats.sh from the BBMap v 38.22 package (Bushnell 2014). formed with an Arima Genomics Hi-C kit (San Diego, CA, USA) by Genome completeness was estimated using Tetrapoda and Aves the Texas A&M University Core facility. The Hi-C library was single-copy orthologous gene sets from both BUSCO v3 (Simao~ then sequenced on a partial lane of an Illumina NovaSeq S1 flow et al. 2015; Waterhouse et al. 2018) and BUSCO v4 (Seppey et al. cell at the Texas Tech University Center for Biotechnology and 2019). We submitted our genome assembly to the NCBI genome Genomics. submission portal, where a scan for contaminants detected no Genome assembly, polishing, scaffolding, and abnormalities in our assembly. quality assessment Genome annotation We generated an initial assembly using the raw PacBio long reads Repetitive element annotation and window analysis: with CANU v 1.7.1 (Koren et al. 2017). Reads were corrected, We annotated TEs and repetitive content in the Caur_TTU_1.0 as- trimmed, and assembled using CANU default parameters, while sembly using a custom de novo repeat library and RepBase verte- specifying a normal coarse sensitivity level (ÀcorMhapSensitivity brate database v 24.03 (Jurka et al. 2005). The custom repeat flag), setting the expected fraction error in an alignment of two library was constructed from the C. auratus genome assembly corrected reads to 0.065 (ÀcorrectedErrorRate flag) and setting (prior to Hi-C scaffolding) and other in-progress lab genome as- the estimated genome size to 1.6 Gb, which corresponds with pre- sembly projects in songbirds (Supplementary File S15).