First Complete Genome Sequence in Arborophila And
Total Page:16
File Type:pdf, Size:1020Kb
Zhou et al. Avian Res (2018) 9:45 https://doi.org/10.1186/s40657-018-0136-3 Avian Research RESEARCH Open Access First complete genome sequence in Arborophila and comparative genomics reveals the evolutionary adaptation of Hainan Partridge (Arborophila ardens) Chuang Zhou1†, Shuai Zheng1†, Xue Jiang2, Wei Liang3, Megan Price1, Zhenxin Fan2, Yang Meng1* and Bisong Yue1* Abstract Background: The Hainan Partridge (Arborophila ardens, Phasianidae, Galliformes) is an endemic species of Hainan Island, China, and it is classifed as globally vulnerable species. There are at least 16 species in genus Arborophila and no genome sequence is available. Methods: The whole genome of Hainan Partridge was de novo sequenced (with shotgun approach on the Illumina 2000 platform) and assembled. Results: The genome size of Arborophila ardens is about 1.05 Gb with a high N50 scafold length of 8.28 Mb and it is the frst high quality genome announced in Arborophila genus. About 9.19% of the genome was identifed as repeat sequences and about 5.88 million heterozygous SNPs were detected. A total of 17,376 protein-coding genes were predicted and their functions were annotated. The genome comparison between Hainan Partridge and Red Jungle- fowl (Gallus gallus) demonstrated a conserved genome structure. The phylogenetic analysis indicated that the Hainan Partridge possessed a basal phylogenetic position in Phasianidae and it was most likely derived from a common ancestor approximately 36.8 million years ago (Mya). We found that the Hainan Partridge population had experienced bottleneck and its efective population decreased from about 1,040,000 individuals 1.5 Mya to about 200,000 indi- viduals 0.2 Mya, and then recovered to about 460,000 individuals. The number of 1:1 orthologous genes that were predicted to have undergone positive selection in the Hainan Partridge was 504 and some environmental adaptation related categories, such as response to ultraviolet radiation were represented in GO distribution analysis. Conclusions: We announced the frst high quality genome in Arborophila genus and it will be a valuable genomic resource for the further studies such as evolution, adaption, conservation, not only on Hainan Partridge but also on Arborophila or Phasianidae species. Keywords: Hainan Partridge, Arborophila ardens, Comparative genomics, Positive selection *Correspondence: [email protected]; [email protected] †Chuang Zhou and Shuai Zheng contributed equally to this work 1 Key Laboratory of Bioresources and Ecoenvironment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu 610064, China Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Zhou et al. Avian Res (2018) 9:45 Page 2 of 8 Background correctness of the assembly, we aligned it to the Red Jun- Te Hainan Partridge (Arborophila ardens, Phasianidae, glefowl (Gallus gallus) reference genome. Visualization and Galliformes) is a species endemic to Hainan Island of the Hainan Partridge/Red Junglefowl genome align- of China, distributed mainly in tropical evergreen for- ment was performed using LAST (Kiełbasa et al. 2011) ests between 600 and 1600 m above the sea level (Gao with the settings suggested by the software developers for 1998; Yang et al. 2011). During the past few decades, the similarly distantly related taxa. We used CEGMA (Parra population of this partridge was reported to have rapidly et al. 2007) and BUSCO (Simão et al. 2015) to evaluate declined owing to their poor fying ability and ground- the genome completeness. dwelling nature, making them vulnerable to human activ- ities, non-indigenous predators and rapid loss of habitat Gene prediction and annotation (Chang et al. 2012; Liang et al. 2013; Rao et al. 2017). Terefore, it was listed as a frst class protected species in We combined the de novo and homology-based pre- China (Zhang et al. 2003) and global vulnerable species diction to identify protein-coding genes (PCGs) in the (IUCN 2018). Historical and ongoing population declines genome. Te de novo prediction was performed on the (Chang et al. 2012; Chen et al. 2015a) and a suite of per- assembled genomes with repetitive sequences masked sistent and novel threats (Liang et al. 2013) have led to as “N” based on the HMM (hidden Markov model) algo- governmental protection of this species in much of their rithm. AUGUSTUS (Stanke et al. 2006) and GENSCAN range. However, the whole genome of Hainan Partridge (Burge and Karlin 1997) programs were executed to the is not currently available and few studies have been con- PCGs using appropriate parameters. For the homol- ducted for the exploration related to the genetic mecha- ogy prediction, proteins of the Red Junglefowl, Turkey nisms of the environmental adaption to Hainan Island of (Meleagris gallopavo), Zebra Finch (Taeniopygia gut- the Hainan Partridge. To provide genome-scale insights tata), and human (Homo sapiens) were mapped onto into the vulnerable Hainan Partridge, facilitate compara- the genome using TblastN (Altschul et al. 1997) with tive studies of avian genomics and further the develop- an E-value cutof of 1E−5. To obtain the best matches ment of genetic tools for Hainan Partridge research and of each alignment, the results yielded from TblastN conservation, we sequenced the genome of the Hainan were processed by SOLAR (Yu et al. 2006). Homologous Partridge. Te results will provide a better understand- sequences were successively aligned against the match- ing of the factors that shape the evolutionary history ing gene models using GeneWise (Birney et al. 2004). We of the Hainan Partridge and eventually improve its used EVidenceModeler (EVM) (Haas et al. 2008) to inte- conservation. grate the above evidence and obtained a consensus gene set. Methods Sampling and sequencing Functional annotation Muscle sample was collected from a wild dead male Functional annotation of all genes was undertaken based Hainan Partridge which was preserved in the Natural on the best match derived from the alignments to pro- History Museum of Sichuan University (NCBI Taxonomy teins annotated in Swissprot and TrEMBL databases ID: 1206065). A whole genome shotgun approach on the (Boeckmann et al. 2003) and BlastP tools with the same Illumina 2000 platform was performed to sequence the E-value cut-of of 1E−5 was applied. Descriptions of gene genome. Two paired-end libraries with insert sizes of products from Gene Ontology ID were retrieved from the 230 bp and 500 bp, as well as three mate-paired libraries results of Swissprot. We also annotated proteins against with insert sizes of 2 kb, 5 kb and 10 kb were constructed. the NCBI non-redundant (Nr) protein database. Te motifs and domains of genes were annotated using Inter- Genome size estimation, genome assembly ProScan (Hunter et al. 2008) against publicly available and completeness evaluation databases, including ProDom (Bru et al. 2005), PRINTS Before assembly, a 17-Kmer analysis was performed to (Attwood et al. 2000), PIRSF (Wu et al. 2004), Pfam (Finn estimate the genome size and the assembly was frstly et al. 2013), ProSiteProfles (Sigrist et al. 2002), PAN- performed by SOAPdenovo2 (Luo et al. 2012) with THER (Tomas et al. 2003), SUPERFAMILY (Gough the parameters set as “all -d 5 –M 3 –k 25”. After using and Chothia 2002), and SMART (Letunic et al. 2004). To SSPACE (Boetzer et al. 2011) to build super-scafolds, fnd the best match and involved pathway for each gene, intra-scafold gaps were then flled using Gapcloser all genes were uploaded to KAAS (Moriya et al. 2007), a (Tigano et al. 2018), which is distributed with SOAP, with web server for functional annotation of genes against the reads from short-insert libraries. In order to verify the manually corrected KEGG genes database by BLAST, using the bi-directional best hit (BBH) method. Zhou et al. Avian Res (2018) 9:45 Page 3 of 8 Analyses of gene family, phylogeny, and divergence Hainan Partridge is 1.08 Gb on the basis of K-mer analy- We used orthoMCL (Li et al. 2003) to defne orthologous sis and it is similar to the reported avian genomes (Cai genes from 10 avian genomes (Hainan Partridge, Red et al. 2013). Te total length of all scafolds was 1.05 Gb Junglefowl, Turkey, Chinese Monal Lophophorus lhuysii, with the scafold N50 8.28 Mb (Table 2). Te genome Japanese Quail Coturnix japonica, Rock Dove Columba completeness was evaluated using CEGMA methods livia, Mallard Anas platyrhynchos, Peregrine Falcon Falco with the results of 83.47% for completeness and 90.32% peregrinus, Zebra Finch, Ostrich Struthio camelus). Phy- for partial gene set (Additional fle 1: Table S1). A total logenetic tree of these 10 birds was constructed using of 91.4% of the eukaryotic 1:1 genes were captured nucleotide sequences of 1:1 orthologous genes. Coding according to the BUSCO evaluations (Additional fle 1: sequences from each 1:1 orthologous family were aligned Table S2). At the same time, visual inspection of the by PRANK (Nick and Ari 2010) and concatenated to one alignment of the Hainan Partridge scafolds against the sequence for each species for building the tree. Modeltest Red Junglefowl reference genome also indicated high (Posada and Crandall 1998) was used to select the best synteny and assembly correctness.