<<

MARGEN-00542; No of Pages 4 Marine Genomics xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Marine Genomics

journal homepage: www.elsevier.com/locate/margen

De novo assembly and annotation of the officinalis L. transcriptome

Haomin Lyu 1, Xinnian Li 1, Zixiao Guo, Ziwen He ⁎,SuhuaShi

State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Resources, Sun Yat-sen University, Guangzhou 510275, China article info abstract

Article history: Avicennia officinalis L. is a typical species, inhabiting inhospitable environments in the interface be- Received 29 April 2017 tween sea and land. In this study, we generated RNA-seq data to de novo assemble the A. officinalis transcriptome. Received in revised form 3 July 2017 Starting with 36.24 million 100 bp paired-end reads, 38,576 high-confidence transcripts with an average length Accepted 10 July 2017 of 834 bp were produced after filtration of weakly expressed and redundant transcripts. We found known protein Available online xxxx homologs for 22,254 of these transcripts, and assigned them to at least one of 119 gene ontology (GO) terms. In fi Keywords: addition, we identi ed different copies and isoforms of three candidate genes, AoPIP, AoTIP and AoDHN1,which Avicennia officinalis might be involved in salt excretion via salt glands. All these genes were highly expressed in leaf tissue of RNA-seq data A. officinalis, suggestive of a complicated mechanism of response to salt stress. We further identified 613 micro- De novo assembly satellite markers for the assessment of genetic diversity and population differentiation in A. officinalis.Genomic Annotation resources generated in this study would be an important foundation for future research into molecular mecha- Salt glands nisms underlying salt and other stress tolerance, as well as the evolutionary history of this mangrove species Candidate genes and its relatives. © 2017 Published by Elsevier B.V.

1. Introduction evolved in only a few orders of halophytes and absent in the model spe- cies Arabidopsis thaliana (Flowers et al., 2010; Flowers and Colmer, are woody inhabiting intertidal zones of tropical 2008). Although transcriptomes of some other mangrove species have or subtropical coasts (Tomlinson, 1986). They have evolved both mor- been sequenced (He et al., 2015; Li et al., 2017; Yang et al., 2015), phological and physiological strategies to thrive in the inhospitable con- most of them were weakly salt-secreting or non-secreting (Tomlinson, ditions prevalent in their habitats, such as high salinity, hypoxia, high 1986). Thus, A. officinalis can be a good system to study the salt-gland sedimentation and muddy anaerobic soils (Giri et al., 2011). Avicennia mechanism of salt tolerance. Compared with other salt-secreting spe- officinalis L. is a widespread mangrove species distributed throughout cies with transcriptome data like A. marina (Huang et al., 2014), a the Indo-West Pacificregion(Duke, 1991). It exhibits some typical more solid foundation has been laid by structural and functional exam- adaptive traits of mangroves: breathing roots, salt tolerance, and vivi- ination of these organs in A. officinalis (Jyothi-Prakash et al., 2014; Tan parity (Tomlinson, 1986). While this mangrove species provides impor- et al., 2013). Two aquaporin genes (AoPIP, AoTIP) and one dehydrin tant ecological and economic benefits for the area it inhabits, it has lost gene (AoDHN1) are preferentially expressed in A. officinalis salt glands about 24% of its habitat since 1980 and is at risk from coastal develop- and might be associated with response to increasing salt concentration ment, overcutting, and global climate change (Duke et al., 2010). (Jyothi-Prakash et al., 2014; Tan et al., 2013). Genomic resources, such Strategies mangroves taken to tolerate high salinity include ultrafil- as expression profiles of functional genes in A. officinalis leaves, could tration, salt secretion and ion sequestration (Liang et al., 2008). Of all the enable more comprehensive molecular examination of excess salt se- ~70 mangrove species, only several mangrove species (8 in Avicennia cretion by salt glands. and 7 in other genera) use the unique mechanism of salt secretion An interesting additional feature of A. officinalis is the among- that employs multicellular salt glands (Tomlinson, 1986). Among population differentiation in leaf morphology (Duke et al., 1998; Duke, them, A. officinalis possesses almost the most significant features of 1991). Development of abundant genomic sequences and genetic salt secreting by salt glands in leaf tissue, and deposits obvious salt crys- markers can open the way to molecular population genetic examination tals (Tomlinson, 1986; Parida and Jha, 2010). These structures are of genetic variation within the species and elucidate the significance of this polymorphism. Since a high-confidence phylogenetic relationship among eight species in genus Avicennia has already been estimated (Li ⁎ Corresponding author. fi E-mail address: [email protected] (Z. He). et al., 2016), these population-genetic approaches will tintoarobust 1 HL and XL contributed equally to this paper. overall evolutionary framework.

http://dx.doi.org/10.1016/j.margen.2017.07.002 1874-7787/© 2017 Published by Elsevier B.V.

Please cite this article as: Lyu, H., et al., De novo assembly and annotation of the Avicennia officinalis L. transcriptome, Mar. Genomics (2017), http://dx.doi.org/10.1016/j.margen.2017.07.002 2 H. Lyu et al. / Marine Genomics xxx (2017) xxx–xxx

In this study, we present a de novo assembly and annotation of the out. We removed redundant transcripts using clustering tools imple- A. officinalis transcriptome. This genomic resource can be leveraged in mented in CD-HIT with sequence identity threshold of 1.0 and word future mechanistic studies of a variety of aspects of mangrove biology, size of 5 (“-c 1.0 -n 5”)(Fu et al., 2012). The final filtered assembly com- including the unusual salt tolerance adaptations using salt glands. As prises 38,576 non-redundant transcripts (unigenes), with an average an example of the potential use of this data set in population genetics, length of 834 bp and N50 of 606 bp (Supplementary Table 1). Approxi- we identified a set of simple sequence repeats (SSR) that can be used mately 28% of all the filtered transcripts were longer than 1 kb (Supple- for quick and inexpensive genotyping of A. officinalis population mentary Table 1). samples. 2.3. Functional annotation 2. Data description First, all filtered transcripts were compared to the NCBI non- 2.1. Sampling and sequencing redundant (nr) protein database for functional annotation using BLASTX with an e-value cutoff of 1e-5 (Camacho et al., 2009). Of the We collected Avicennia officinalis L. viviparous propagules from Cila- 38,576 non-redundant transcripts, 22,254 (57.69%) showed significant cap (Java Island, ). The propagules were first germinated in sequence similarity to known proteins in the database (Supplementary Petri dishes in the laboratory, and then cultivated in pots with nutrient Table 1). Genomes of eight Asterid species were the most frequent soil for two weeks. Seedlings were transported and planted in a man- matches, as expected (Supplementary Fig. S1). More than half of the grove forest natural habitat in Dongzhai Harbor Mangrove Natural Re- transcripts (17,274/22,254) matched proteins from Sesamum indicum, serve, Hainan, China. MIxS information is presented in Table 1. the most closely related species with available whole-genome data in After about one year of healthy growth in this natural environment, the nr protein database (Supplementary Fig. S1). In addition, 2082 fresh leaves of A. officinalis were sampled and stored at −80 °C before (9.4%) of the transcripts were annotated to Erythranthe guttatus,former- RNA extraction. Total RNA was extracted using a modified CTAB method ly known as Mimulus guttatus (Supplementary Fig. S1). (Fu et al., 2004). Library preparation and sequencing of RNA samples We then attempted to assign gene ontology (GO) terms to the tran- were carried out at BGI (Beijing Genomics Institute), using the Illumina scripts with blast hits using Blast2go software (Conesa et al., 2005). All HiSeq 2000 platform. Sequencing was performed with an insert size of 22,254 transcripts were assigned to at least one of the 119 level-2 GO 200 base pairs (bp). The adaptor sequences were removed from the terms (Supplementary Table 2). Of the three level-1 ontologies, Molec- resulting data using an in-house Perl script. ular Function contained the largest number of categories (63), while Cellular Component was assigned 36 and Biological Process 20 catego- 2.2. De novo transcriptome assembly ries (Supplementary Table 2). We further plotted the distribution of the 37 GO terms that contained N1% of all the 22,254 annotated tran- We generated 36.24 million paired-end reads with a read length of scripts (Fig. 1). The functional annotations can be used to further ex- 100 bp. FastQC v0.10.1 was used to evaluate read quality (Andrews, plore molecular mechanisms of salt excretion by salt glands, initially 2010). Raw paired-end reads were filtered using an in-house Perl script focusing on genes associated with transporter activity in the Molecular according to three criteria: average read score N 20; number of low qual- Function set (Fig. 1, Supplementary Table 3). ity sites (score b 10) in one read ≤ 20; missing data (‘N’) count per read Given that another mangrove of A. marina was the closest species ≤ 5. This filtering step excluded b1000 reads from the raw dataset with transcriptome data of leaf tissue (Huang et al., 2014), we further (0.003%). The transcriptome of A. officinalis was first assembled using identified the orthologous genes between this species and A. officinalis Trinity v2.0.6 using the high-quality sequencing data with default pa- by conducting the reciprocal best hit Blast with an e-value cutoff of rameters (Grabherr et al., 2011). A total of 50,654 initial transcripts 1e-5. Approximately 60.17% (23,320/38,756) transcripts in A. officinalis were produced from the initial assembly (Supplementary Table 1). were found orthologous in the transcriptome of A. marina.Ofthe We then filtered this set to remove weakly expressed and redundant 22,254 transcripts with functional annotations, only 12,951 (58.20%) transcripts, maximizing the confidence in our transcriptome. Re- orthologous genes were expressed in the leaf tissue of A. officinalis. alignment of the original short reads to the newly-assembled 50,654 transcripts was performed using the Burrows-Wheeler Aligner (BWA- 2.4. Identification of copies and isoforms of AoPIP, AoTIP, and AoDHN1 v0.7.4-r385) (Li and Durbin, 2009). RPKM (Reads Per Kilobase per Mil- lion reads) values were calculated from the alignment results. By apply- We used previously-published (Jyothi-Prakash et al., 2014; Tan et al., ing an RPKM cut-off of 1, transcripts with low expression were filtered 2013) cDNA sequences of three known genes ( AoPIP, AoTIP and AoDHN1) with potential functional roles in salt-gland-mediated salinity tolerance to search our set of 38,572 A. officinalis transcripts. We re- Table 1 quired that putative orthologs have BLAST E-values less than 1E-5, fi Avicennia of cinalis MIxS information. cover N30% of the original cDNA, and be at least half as long. This filter- Item Description ing left us with eight copies and isoforms of AoPIP, one copy but two iso-

Classification Plantae; Angiosperms; ; ; ; forms of AoTIP, and only one copy of AoDHN1 (Fig. 2, Supplementary ; Avicennia officinalis Table 4). To get the full length of each AoPIP gene, two bioinformatic Investigation type Eukaryote procedures were taken (1) comparing the AoPIP genes with their Project name Avicennia officinalis transcriptome orthologs in A. marina (Huang et al., 2014); (2) concatenating the Geographic location Cilacap; Java Island; Indonesia paired-reads with specific polymorphisms (SNPs and indels) in this re- Latitude, longitude 7°40′59.11″S, 108°49′42.91″E Collector Xinnian Li gion to re-assemble and prolong each uncomplete AoPIP sequences. Collection date Nov-2012 Then, we got the complete sequences of each AoPIP genes (Supplemen- Environment Mangrove forest tary Fig. 2). Biome ENVO: 01000181 Comparing with the mean (29.07) and median (8.32) RPKM value of Feature ENVO: 00000316 Material ENVO: 00002230 all transcripts, the AoDHN1 (4936.86) and both isoforms of AoTIP Plant height 5–10 m (296.54 of AoTIP_I1 of, 305.02 of AoTIP_I2) are highly expressed in Sequencing method Illumina HiSeq 2000 A. officinalis leaf tissue, supporting their potential role in salt-gland sa- Assembly method Trinity 2.0.6 linity tolerance (Supplementary Table 4). Among the AoPIP isoforms Finishing strategy High quality transcriptome assembly and homologs, copy 5 are the highest-expressed, suggesting that it

Please cite this article as: Lyu, H., et al., De novo assembly and annotation of the Avicennia officinalis L. transcriptome, Mar. Genomics (2017), http://dx.doi.org/10.1016/j.margen.2017.07.002 H. Lyu et al. / Marine Genomics xxx (2017) xxx–xxx 3

Fig. 1. GO functional classification of the A. officinalis transcriptome. Each of the shown 37 GO terms contain N1% of the 22,254 annotated transcripts. may play more important role (Supplementary Table 4). To get a sense unanticipated by the earlier study (Tan et al., 2013) that lacked detailed of the potential functional diversity among the AoPIP homologs, we in- A. officinalis mRNA expression data. ferred evolutionary history of these loci. We aligned the protein se- quences with Muscle implemented in MEGA7 (Kumar et al., 2016). 2.5. Identification of simple sequence repeats Then, a maximum likelihood (ML) phylogenetic tree was reconstructed with 1000 bootstraps under the Jones-Talor-Thorton (JTT) substitution Simple sequence repeats (SSRs, also known as microsatellites) are model by MEGA7 (Kumar et al., 2016; Jones et al., 1992)(Fig. 2, Supple- tracts of repetitive DNA with motif length from two to five base pairs. mentary Fig. S3). While the isoforms all cluster together, homologous They are commonly used as genetic markers to assess genetic diversity copy 1 and 2 are most similar to each other, as are copy 3 and 4 and distinguish varieties within species (Ellegren, 2004). We identified (Fig. 2, Supplementary Fig. S3). We further analyzed AoPIP domain 613 SSRs from our 38,576 non-redundant transcripts using MISA (MI- structures by searching the Pfam database (Finn et al., 2016). Isoform croSAtellite identification tool) (Thiel et al., 2003) (Supplementary 2 protein sequences of all the three homologs that have them Table 5). The most abundant SSR type was tri-nucleotide (523, (AoPIP_C1, AoPIP_C2 and AoPIP_C3) harbor insertions in comparison 85.32%), followed by the di-nucleotide (86, 14.03%) (Supplementary with Isoform 1, which suggests that the Isoforms 2 might be immature Fig. S3). These A. officinalis SSRs, as well as the transcript sequences, mRNA under post-transcriptional regulation (Fig. 2). All the AoPIP tran- can be used for further molecular evolution and population-genetic scripts except these three isoform 2 have complete major intrinsic pro- studies. tein (MIP) domains (Fig. 2). Our analyses have uncovered complexity in AoPIP protein isoform structure and, presumably, function 3. Data accessibility

The BioProject ID of our data is PRJNA381534, and the BioSample ac- cession number is SAMN06679993. All raw reads were deposited into the Sequencing Read Archive (SRA) of NCBI with accession number SRR5412281. This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GFLY00000000. The version described in this paper is the first version, GFLY01000000.

Acknowledgements

This work was supported by the National Natural Science Founda- tion of China (grant numbers 91331202, 31600182, 41130208); the Na- tional Key Research and Development Plan (2017FY100705); the Fundamental Research Funds for the Central Universities (grant number 17lgpy99); and the Chang Hungta Science Foundation of Sun Yat-Sen University.

Appendix A. Supplementary data

Fig. 2. AoPIP maximum likelihood tree and protein domains. “C” stands for copy number and “I” for isoform number in orthologous gene names. For example, AoPIP_C1;I1 Supplementary data to this article can be found online at http://dx. indicates Copy 1 Isoform 1 of AoPIP. Right panels show protein domains structures. doi.org/10.1016/j.margen.2017.07.002.

Please cite this article as: Lyu, H., et al., De novo assembly and annotation of the Avicennia officinalis L. transcriptome, Mar. Genomics (2017), http://dx.doi.org/10.1016/j.margen.2017.07.002 4 H. Lyu et al. / Marine Genomics xxx (2017) xxx–xxx

References He, Z., Zhang, Z., Guo, W., Zhang, Y., Zhou, R., Shi, S., 2015. De novo assembly of coding se- quences of the mangrove palm (Nypa fruticans) using RNA-Seq and discovery of Andrews, S., 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data. whole-genome duplications in the ancestor of palms. PLoS One 10, e0145385. http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Huang, J., Lu, X., Zhang, W., Huang, R., Chen, S., Zheng, Y., 2014. Transcriptome sequencing Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., and analysis of leaf tissue of Avicennia marina using the Illumina platform. PLoS One 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. 9, e108785. Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M., Robles, M., 2005. Blast2GO: a Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid generation of mutation data ma- – universal tool for annotation, visualization and analysis in functional genomics re- trices from protein sequences. Comput. Appl. Biosci. 8, 275 282. search. Bioinformatics 21, 3674–3676. Jyothi-Prakash, P.A., Mohanty, B., Wijaya, E., Lim, T.M., Lin, Q.S., Loh, C.S., Kumar, P.P., fi Duke, N.C., 1991. A systematic revision of the mangrove genus Avicennia (Avicenniaceae) 2014. Identi cation of salt gland-associated genes and characterization of a dehydrin fi in Australasia. Aust. Syst. Bot. 4, 299–324. from the salt secretor mangrove Avicennia of cinalis. BMC Plant Biol. 14, 16. Duke, N., Benzie, J., Goodall, J., Ballment, E., 1998. Genetic structure and evolution of spe- Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: molecular evolutionary genetics analysis – cies in the mangrove genus Avicennia (Avicenniaceae) in the indo-West Pacific. Evo- version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870 1874. lution 52, 1612–1626. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with burrows-wheeler – Duke, N., Kathiresan, K., Salmo III, S.G., Fernando, E.S., Peras, J.R., Sukardjo, S., Miyagi, T., transform. Bioinformatics 25, 1754 1760. 2010. Avicennia officinalis.TheIUCNRedListofThreatenedSpecies2010: Li, X., Duke, N.C., Yang, Y., Huang, L., Zhu, Y., Zhang, Z., Zhou, R., Zhong, C., Huang, Y., Shi, S., e.T178820A7616950 (Downloaded on 31 March 2017). 2016. Re-evaluation of phylogenetic relationships among species of the mangrove fi Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. genus Avicennia from indo-West Paci c based on multilocus analyses. PLoS One 11, Genet. 5, 435–445. e0164453. Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter, S.C., Punta, Li, L., Yang, Y., Yang, S., Zhang, Z., Chen, S., Zhong, C., Zhou, R., Shi, S., 2017. Comparative M., Qureshi, M., Sangrador-Vegas, A., Salazar, G.A., Tate, J., Bateman, A., 2016. The transcriptome analyses of a mangrove tree caseolaris and its non- fl Pfam protein families database: towards a more sustainable future. Nucleic Acids mangrove relatives, Trapa bispinosa and Duabanga grandi ora. Mar. Genomics 31, – Res. 44, D279–D285. 13 15. Flowers, T.J., Colmer, T.D., 2008. Salinity tolerance in halophytes. New Phytol. 179, Liang, S., Zhou, R., Dong, S., et al., 2008. Adaptation to salinity in mangroves: implication – 945–963. on the evolution of salt-tolerance. Chin. Sci. Bull. 53, 1708 1715. Flowers, T.J., Galal, H.K., Bromham, L., 2010. Evolution of halophytes: multiple origins of Parida, A.K., Jha, B., 2010. Salt tolerance mechanisms in mangroves: a review. Trees 24, – salt tolerance in land plants. Funct. Plant Biol. 37, 604–612. 199 217. Fu, X., Deng, S., Su, G., Zeng, Q., Shi, S., 2004. Isolating high-quality RNA from mangroves Tan, W.K., Lin, Q., Lim, T.M., Kumar, P., Loh, C.S., 2013. Dynamic secretion changes in the fi without liquid nitrogen. Plant Mol. Biol. Report. 22, 197. salt glands of the mangrove tree species Avicennia of cinalis in response to a changing – Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W., 2012. CD-HIT: accelerated for clustering the next- saline environment. Plant Cell Environ. 36, 1410 1422. generation sequencing data. Bioinformatics 28, 3150–3152. Thiel, T., Michalek, W., Varshney, R., Graner, A., 2003. Exploiting EST databases for the de- Giri, C., Ochieng, E., Tieszen, L.L., Zhu, Z., Singh, A., Loveland, T., Masek, J., Duke, N., 2011. velopment and characterization of gene-derived SSR-markers in barley (Hordeum – Status and distribution of mangrove forests of the world using earth observation sat- vulgare L.). Theor. Appl. Genet. 106, 411 422. ellite data. Glob. Ecol. Biogeogr. 20, 154–159. Tomlinson, P.B., 1986. The Botany of Mangroves. Cambridge University Press. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, Yang, Y., Yang, S., Li, J., Li, X., Zhong, C., Huang, Y., Zhou, R., Shi, S., 2015. De novo assembly L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., of the transcriptomes of two yellow mangroves, Ceriops tagal and C. zippeliana,and – di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A., 2011. one of their terrestrial relatives, Pellacalyx yunnanensis. Mar. Genomics 23, 33 36. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652.

Please cite this article as: Lyu, H., et al., De novo assembly and annotation of the Avicennia officinalis L. transcriptome, Mar. Genomics (2017), http://dx.doi.org/10.1016/j.margen.2017.07.002