The Evolution Study on Oryza Rufipogon. Dw by Whole-Genome
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Genetics (2019) 98:90 © Indian Academy of Sciences https://doi.org/10.1007/s12041-019-1136-8 RESEARCH NOTE The evolution study on Oryza rufipogon. dw by whole-genome sequencing JILIN WANG, SONG YAN, SHIYOU LUO, WEI DENG, XIANHUA SHEN, DAZHOU CHEN∗ and HONGPING CHEN∗ Rice Research Institute, Jiangxi Academy of Agricultural Sciences, Nanchang 330200, People’s Republic of China *For correspondence. E-mail: Dazhou Chen, [email protected]; Hongping Chen, [email protected]. Received 26 December 2018; revised 23 May 2019; accepted 17 June 2019; published online 5 September 2019 Abstract. The species of Oryza rufipogon. dw was first discovered at Dongxiang, Jiangxi in 1978. It is recognized as abundant in genetic resources with the characteristics of cold and insect resistance. A total of 100.15 Gb raw data was obtained from seven pair-end libraries by Illumina Hiseq4000 platform. Subsequently, a draft assembly genome of O. rufipogon. dw was generated with a final size of 422.7 Mb with a contig N50 of 15 kb and a scaffold N50 of 296.2 bb. The assembly genome size was higher than the estimated genome size (413 Mb) based on k-mer analysis. The identified repeat sequences accounted for 40.09% of the entire genome, and 32,521 protein-coding genes with an average of 4.59 exons per gene was annotated in five databases. Phylogenetic analysis using 1460 single-copy gene, O. rufipogon. dw was close with O. rufipogon by Bayes method. The wild rice species of O. rufipogon. dw divergence was estimated at ∼0.3 million years ago (Mya) from O. rufipogon,and ∼0.6 Mya from the O. sativa. The draft genome of O. rufipogon. dw provided an essential resource for its origin and evolution study. Keywords. genome assembly; gene annotation; phylogenetic; evolution; Oryza rufipogon. dw. Introduction exploited with the use of transgenic technology (Kovach et al. 2007). Rice accounts for more than half of human food sources The species of O. rufipogon. dw was first discovered all over the world with Asia has the main growing areas at Dongxiang, Jiangxi in 1978 (Zhang et al. 2016). It is (Krishnan et al. 2014). Oryza has attracted a great deal considered as the northern rice population in the world of attention in the scientific arena, hence the develop- (28◦14N), with abundant genetic traits for cultivated rice ment of the genus has led to many genome sequencing improvement such as cold resistance, insect resistance, research in recent years (Sakai et al. 2014). Oryza sativa, biotic and abiotic stress (Zhang et al. 2006). Also, it is commonly cultivated rice, is widely cultivated all over a valuable resource for fundamental research on genetic the world and is one of the most important grains in diversity, and has the advantages of high yield and het- human food (Huang et al. 2012). Some studies have erosis. Comparative genomics provide some new insights shown that O. sativa is considered to be domesticated in on the evolution of genes and genomes (Bennetzen 2007). East Asia from O. rufipogon. However, in the process of The complexity of the plant genome is caused by its dupli- domestication nearly 30–40% of the genetic variation is cation, sequence rearrangement and transposable element lost (Sun et al. 2001). These findings suggest that wild (Bennetzen 2007). Even the closely related species, such as germplasm is the most precious source among genetic Arabidopsis or Oryza produce significant fluctuations in variation sources (Kovach et al. 2007). These useful varia- genome size, gene collinearity and gene number (Hu et al. tions can be exploited quickly by genomic technology and 2011). Jilin Wang and Song Yan contributed equally to this work. Electronic supplementary material: The online version of this article (https://doi.org/10.1007/s12041-019-1136-8) contains supplemen- tary material, which is available to authorized users. 1 90 Page 2 of 5 Jilin Wang et al. This study presents a draft assembly and annotation on the RepBase library and annotated by the Repeat- of the O. rufipogon. dw genome, and detailed its differ- Masker and RepeatProteinMask software. De novo is a entiation time and comparison with other rice species library of repetitive sequences predicted by the de novo in genome collinearity. Our study provided an under- prediction methods LTR-FINDER and RepeatScout standing of the domestication process and evolutionary (Price et al. 2005), and then obtained by using the soft- status of O. rufipogon. dw from the analysis of O. rufi- ware RepeatMasker. Combinational data were the result pogon. dw functional genes, phylogeny and unique gene of integrating the above three methods with overlap data families and enriched the diversity of wild rice pop- removing. ulations, in addition to offer a new resource for rice We analysed and annotated four types of ncRNAs, breeding. including miRNA, tRNA, rRNA and snRNA. We per- formed homology searches and detections throughout the whole-genome sequence. For tRNA prediction, we used Materials and methods tRNAscan-SE (v.1.23) (Lowe and Eddy 1997). The snR- NAs and miRNAs were predicted by alignment using Plant materials preparation BLASTN (Griffiths-Jones et al. 2005). BLASTN was used for rRNAs alignment (Stanke et al. 2006). After filtering, O. rufipogon. dw seeds were provided by the academy of all the clean reads were annotated to five database agricultural sciences in Nanchang, Jiangxi, China. The including SwissProt, GO, COG, KEGG, and NR data- seeds of O. rufipogon. dw were grown in a controlled growth ≤ × ◦ ◦ bases (Kent 2002), at the threshold of e-value 1 chamber at 30 C/26 C and 14 h/10 h of light/dark condi- e−5. tions. The genomic DNA of the healthy and fresh leaves was extracted using a DNeasy Plant Mini kit (Qiagen, USA). Detection of DNA quantification using a Nan- Phylogenetic analysis odrop ND1000 spectrophotometer and detection of DNA quality by 0.8% agarose gel electrophoresis. We used the Bayes method to construct a phylogenetic tree by globally comparing the 1460 single-copy genes found with PRANK software. The first phase site in each single DNA sequencing and genome size estimation copy gene family is typically used to estimate the molecular clock (replacement rate) and the divergence time among Seven pair end (PE) libraries were constructed with insert species. length of 200, 500 and 800 bp and 2, 5, 10 and 20 kb respec- tively, from nuclear DNA according to the manufacturer’s instruction (Illumina, San Diego, USA) for sequencing by Results Illumina HiSeq4000 platform (Li et al. 2009). For genome size estimation, we used PE reads with K-mer size of Genomic DNA sequencing and genome size estimation 17, and k-mer distribution was investigated using Jellyfish v1.1.6.31 (Marcais and Kingsford 2011). A total number of 100.15 Gb raw data was generated in O. rufipogon. dw genome by Illumina Hiseq4000 plat- form from seven PE libraries with insert length of 200 Assembly of the O. rufipogon. dw genome sequences bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb and 20 kb (NCBI accession number, PRJNA543004). Consequently, After data filtering and error correction, the clean reads a total of 55,743,512,338 clean reads were generated were assembled using SOAPdenovo v1.05 with a min- by filtering for further study (table 1 in electronic sup- imum contig length of 1 kb. Assembled contigs were plementary material in http://www.ias.ac.in/jgenet/). We analysed at the nucleotide database of NCBI (http:// selected k-mer for estimating genome size and heterozy- www.ncbi.nlm.nih.gov/), and the genome DNA sequences gosity. The genome size of the O. rufipogon. dw was were published to represent the completion of genome estimated of 412 Mb, calculated by 11.07 G high quality sequencing. Mapping all clean whole-genome shotgun sequence data, according to the formula, genome size = sequence reads to the genome to calculate average depth by number of k−mer/depth of peak (k = 17) (figure 1 in SOAPaligner. electronic supplementary material). Gene prediction and annotation Genome assembly The repetitive sequence was annotated by three meth- A final genome size of 422.7 Mb with a contig N50 ods: RepBase TEs, TE proteins and de novo.RepBase of 15,026 bp and a scaffold N50 of 296,224 bp, respec- TEs and TE proteins are transposon components based tively (table 2 in electronic supplementary material). The Whole-genome of O. rufipogon. dw Page 3 of 5 90 Figure 1. The O. rufipogon. dw genome. Five concentric circles from the inner to the outer are (a) donors and acceptors of segmental duplications on rice chromosome are connected by gray lines, (b) transposable elements (TEs) density heat map, (c) gene density heat map, (d) GC content distribution linear map, and (e) chromosome. The chromosomes are scaled in units of 5 M. The statistical unit of GC content, gene density and TE density are all 1 M. gene number and scaffolds were shown in table 3 in dw genome was shown in figure 1. Concentric circles electronic supplementary material. The assembly genome reflected different features were drawn using the Circos size was higher with the estimated genome size (413 program. Mb) based on k-mer analysis. Raw reads of the O. rufi- pogon. dw genome were compared with O. rufipogon Griff. genome (NCBI accession number, SRP070627) (Zhang Repeat sequences and noncoding RNA annotation et al. 2016) raw reads by BWA software, and the map- ping rate was 85.41% containing 73.47% of uniquely and A combination of RepeatMasker and de novo analysis 11.94% of repeat sequences (table 4 in electronic sup- identified an official gene set including 169,457,333 bp plementary material).