Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62 DOI: 10.1007/s13131-014-0441-6 http://www.hyxb.org.cn E-mail: [email protected]

Comparative analysis of four essential Gracilariaceae species in China based on whole transcriptomic sequencing XU Jiayue1,4,5†, SUN Jing1,4,5†, YIN Jinlong3†, WANG Liang1,4,5, WANG Xumin1,4, LIU Tao2, CHI Shan2, LIU Cui2, REN Lufeng1,3, WU Shuangxiu1,4*, YU Jun1,4* 1 CAS Key Laboratory of Genome Sciences and Information, Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 2 College of Marine Life Science, Ocean University of China, Qingdao 266003, China 3 Changchun University of Chinese Medicine, Changchun 130117, China 4 Beijing Key Laboratory of Functional Genomics for Dao-di Herbs, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 5 University of Chinese Academy of Sciences, Beijing 100049, China

Received 25 March 2013; accepted 6 November 2013

©The Chinese Society of Oceanography and Springer-Verlag Berlin Heidelberg 2014

Abstract Three Gracilaria species, G. chouae, G. blodgettii, G. vermiculophylla and a close relative species, Gracilari- opsis lemaneiformis which is now nominated as Gracilaria lemaneiformis, are the typically indigenous spe- cies which are important resources for the production of special proteins, phycobilisomes, special carbo- hydrates, and in China. In this study, de novo transcriptome sequencing on these four species using the next generation sequencing technology was performed for the first time. Functional annotations on assembled sequencing reads showed that the transcriptomic profiles were quite different between G. lema- neiformis and other three Gracilaria species. Comparative analysis of differential gene expression related to carbohydrate and phycobiliprotein metabolisms also showed that the expression profiles of these essential genes were different in four species. The genes encoding allophycocyanin, phycocyanin and phycoerythrin were further examined in four species and their deduced amino acid sequences were used for phylogenetic analysis to confirm that G. lemaneiformis had close relationship to genus Gracilaria, as well as that within genus Gracilaria, G. chouae had closer relationship to G. vermiculophylla rather than to G. blodgettii. The de novo transcriptome study on four species provided a valuable genomic resource for further understanding and analysis on biological and evolutionary study among marine . Key words: Gracilaria chouae, Gracilaria blodgettii, Gracilaria vermiculophylla, Gracilariopsis lemaneiformis, transcriptome sequencing, phycobiliprotein, phylogeny Citation: Xu Jiayue, Sun Jing, Yin Jinlong, Wang Liang, Wang Xumin, Liu Tao, Chi Shan, Liu Cui, Ren Lufeng, Wu Shuangxiu, Yu Jun. 2014. Comparative analysis of four essential Gracilariaceae species in China based on whole transcriptomic sequencing. Acta Oceanologica Sinica, 33(2): 54–62, doi: 10.1007/s13131-014-0441-6

1 Introduction ics zone, compared with G. vermiculophylla, which often lives Gracilaria is a genus belonging to Family Gracilariaceae of in the relatively calm water in lower littoral to upper subtropics Class Florideophyceae, Phylum Rhodophyta (Pan and Li, 2010). zone. G. blodgettii is found in tide pools or in shallow subtidal Nearly 100 Gracilaria species are cultivated in , South areas and tropical zone, attaching to rocks, pieces of shells or America, and , usually growing in warm water coral fragments (Zhang and Xia, 1992), while G. lemaneiformis and spreading over tropics zone, subtropics zone and temper- commonly grows on the uppermost sublittoral. They are not ate zone. In China, there are more than 10 species of Gracilaria only the significant primary producers in the ocean but also (Steentoft and Farham, 1997). Among them, three Gracilaria play vital roles in recycling and maintenance of the balance of species, G. chouae, G. blodgettii, G. vermiculophylla and a close nitrogen, phosphorus, carbon dioxide and oxygen in the seawa- relative species, Gracilariopsis lemaneiformis, which has been ter where they live (Huovinen et al., 2006). classified in Gracilaria genus and named as Gracilaria lema- are a group of eukaryotic photosynthetic organ- neiformis because of its similar function to Gracilaria species isms mainly using phycobilisomes, a kind of special proteins of (Zhang et al., 2009), are the typically indigenous species and red algae and cyanobacteria, as the light-harvesting antennae important commercial red seaweeds in China. G. chouae and to capture light energy and pass on to chlorophylls during pho- G. vermiculophylla are the indigenous species in Shandong and tosynthesis (French and Young, 1952; Gantt and Conti, 1966). Fujian Province, and south of China. G. chouae usually grows Phycobilisomes of red algae are composed of such three types on gravel or shells in lower littoral tide pools or in the subtrop- as allophycocyanin, phycocyanin and phycoerythrin (Liu et

Foundation item: The National Natural Science Foundation of China under contract Nos 31140070, 31271397 and 41206116; the algal transcrip- tome sequencing was supported by 1KP Project (www.onekp.com). *Corresponding author, E-mail: [email protected], [email protected] †Contributed equally. XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62 55

al., 2005). Everroad and Wood (2012) confirmed subfamilies of 2.2 RNA isolation and sequence acquisition phycobiliprotein that are not only classified by their biochemi- Samples were first immersed in liquid nitrogen and ground cal and spectroscopic properties but also formed coherent to a fine powder using a chilled mortar and pestle. Total RNA evolutionary groups. For Family Gracilariaceae, phycobilipro- was extracted using an improved Trizol method (De Gasperi et teins have been reported to be a nitrogen (N) storage when N al., 2012; Johnson et al., 2012). Total RNA was quantified using a is abundant and a nitrogen source when N is limited (Lapointe Nanodrop ND 1 000 spectrophotometer (Labtech International and Duke, 1984). Therefore in China, G. vermiculophylla and G. Ltd, Lewes, UK). Qualitative analysis of RNA was performed by lemaneiformis have been used in many culture zones for eco- RIN value using Agilent 2 100 bioanalyzer (Agilent Technologies, logical restoration of coast entrophication and purification of Santa Clara, CA, USA). seawater (Xu et al., 2008). In addition, purified phycobilopro- cDNA libraries' construction and sequencing were per- teins are developed to be used as coloring matter (Dufosséa formed by BGI (Shenzhen, China) on an Illumina HiSeq 2 000 et al., 2005), fluorescent probe (Glazer, 1997), and even as an- platform (San Diego, USA) in accordance with the manufac- tiinflammatory and antihyperalgesic medications (Shih et al., turer's instructions. 2009). Gracilaria species also have high contents of novel cell-wall 2.3 Assembly and functional annotation polysaccharides, which are important resources for the produc- First, the sequences from each sample were preprocessed tion of agar which is an ideal industrial, medicinal and food- to remove low quality reads (such as greater than 35% “N” in additive material with low viscosity, good solubility and ab- a read, redundancy, duplication and ploy-A/T tails). We assem- sorbability. They have many other functions, such as antitumor bled the filtered reads into contigs using the software SOAPde- activity, anti-oxidation, anti-inflammation, and can be used as novo-Trans V1.01 (http://soap.genomics.org.cn/soapdenovo. anti-tumor drugs, cardiovascular drugs, natural additives in html.) with the de Bruijin graph method and filled the gaps in foods, feeds and cosmetics (Marinho, 2001; Pan and Li, 2010). scaffolds through GapCloser V1.12 (http://soap.genomics.org. However, the genetic information and molecular basis for cn/about.html#resource2) (Li et al., 2010). We used both con- Gracilaria species on the above biological processes and mech- tigs and singletons for further annotations. The unigenes which anisms are poorly understood. So far, in Family Gracilariaceae, were assembled and greater than 200 bp were annotated by only the plastid genome of G. tenuistipitata var. liui was se- matching sequences against NCBI non-redundant protein (nr) quenced (Hagopian et al., 2004). Therefore in this study, we for database (http://www.ncbi.nlm.nih.gov/). Based on nr anno- the first time, by using the second generation high-throughput tation, we used BLAST2GO program (http://www.BLAST2go. sequencing (NGS) technology, de novo sequenced the tran- org ) to determine Gene Ontology (GO) term. Clusters of Or- scriptomes of three major Gracilaria species, G. chouae, G. le- thologous Groups (COG) classification (Tatusov et al., 2001) of maneiformis, G. blodgettii, and one Gracilariopsis species, G. unigenes were performed against COG database (Tatusov et al., −5). Pathway analysis vermiculophylla, systematically examined and comparatively 2003) using BLASTX program (E-value<10 was performed using the Kyoto Encyclopedia of Genes and Ge- analyzed their transcriptomic differences, particularly for the nomes (KEGG) annotation service KAAS (Kanehisa and Goto, genes encoding phycoiliproteins and related to agar biosyn- 2000; Moriya et al., 2007). thesis and carbon dioxide fixation. The results provided much valuable information not only for the study of genomes and 2.4 Differential expression of unigenes functional genes of Gracilriaceae species, but also for the study We estimated gene expression based on FPKM values (frag- of phylogenic evolution of red algae. ments per kilobase of unigene model per million mapped reads) for each unigene. Transcriptome differences of four algae 2 Materials and methods were conducted using a web tool IDEG6 (http://telethon.bio. unipd.it/bioinfo/IDEG6_form/) through the statistic method 2.1 Sample collection Generalized Chi squared test (Romualdi et al., 2003). Moreover, The branch samples of three red algae, G. chouae, G. blodget- the expression characteristics of genes involved in agar, starch tii and G. vermiculophylla, were collected from Shantou City of metabolism, phycobiliprotein and carbon fixation were also Guangdong Province, Rongcheng and Qingdao City of Shandong identified. Province in 2011, respectively (Table 1). The branch of G. lemane- iformis was collected from Lianjiang County in Fujian Province in 2.5 Phylogenetic analysis May, 2011 and then transported to the laboratory of the Culture We selected phycobiliprotein genes, including allophyco- Collection of Seaweed at the Ocean University of China, grew in cyanin and subunits (apcA, apcB, apcD, apcE), phycocyanin a modified seawater medium, supplemented with nutrients of 4 and subunits (cpcA, cpcB, cpcG), phycoerythrin and subunits mg/L of NaNO3 and 0.4 mg/L of KH2PO4, at 22°C for G. lemane- (cpeA, cpeB) to analyze the phylogenetic relationship of Rho- iformis and under 30 μmol photons m−2 s−1 irradiance (Table 1). dophyta and Cyanobacteria. The open reading frame (ORF) re-

Table 1. Summary of sample information of four marine red algae of Family Gracilariaceae Species Collection date Geographic location Water temperature In door culture G. chouae 2011-4-28 Shantou City, Guangdong Province 13.6 No G. blodgettii 2011-7-14 Rongcheng City, Shandong Province 24 No G. vermiculophylla 2011-4-18 Qingdao City, Shandong Province 18 No G. lemaneiformis 2011-5-20 Lianjiang County, Fujian Province 22 Yes 56 XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62

gions of these genes were examined using similarity alignments representing adult tissues of three Gracilaria species and one by BioEdit Sequence Alignment Editor (Hall, 1999) and CLUST- close relative species, G. lemaneiform, were constructed and AL W (Thompson et al., 1994). The species, including four spe- sequenced on the Illumina HiSeq 2000 (Table 1). After the se- cies in this study, five red algal species of other families and five quencing (Table 2), a total of 21 142 772, 30 061 560, 12 720 865 Cyanobacterial species published in NCBI Genbank (Table 3) and 33 648 572 raw reads were obtained for G. chouae, G. lema- which have full ORF sequences of these genes, were used to neiformis, G. blodgettii and G. vermiculophylla, respectively. construct phylogenic trees through the maximum likelihood Through quality filters, more than 93% raw reads left, represent- method by Kimuira's two-parameter model (K2P) (Kimura, ing the high-quality reads in four algal samples. The GC content 1980) in MEGA 5.05 (Tamura et al., 2011). of G. chouae and G. lemaneiformis was low, 50.96% and 51.22%, respectively, while that of G. blodgettii and G. vermiculophylla 3 Results and discussion was high, 57.15% and 56.65%, respectively. We assembled the filtered sequences and obtained 12 702 3.1 Transcriptome sequencing and de novo assembly contigs for G. chouae, 20 939 contigs for G. lemaneiformis, 14 612 To obtain comprehensive transcripts of genera Gracilaria contigs for G. blodgettii and 15 611 contigs for G. vermiculophylla, and Gracilariopsis and to maximize the transcript representa- with the length ranging from 100 bp to 20 939 bp, and the average tion in a broad range of biological processes, cDNA libraries length from 678 bp to 1 629 bp (Table 2). The N50 length of con-

Table 2. Statistics summary of transcriptome sequencing and assembly data of G. chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis G. chouae G. blodgettii G. vermiculophylla G. lemaneiformis Total reads 21 142 772 12 720 856 33 648 572 30 061 560 Total bases/bp 1 952 708 040 1 145 777 040 3 028 371 480 2 705 540 400 Q20 percentage/% 95.85 93.91 95.42 97.23 GC percentage/% 50.96 57.15 56.65 51.22 Filtered reads 16 914 217 10 176 684 26 918 857 24 049 248 Filtered bases/bp 1 541 455 576 962 327 958 2 612 319 462 2 323 363 936 Assembly Max unigene length/bp 20 513 14 612 15 611 20 939 Total base/Gb 19.6 18.3 16.8 24.6 Number of unigene 12 072 20 988 24 841 25 880 Average of length/bp 1 629 873 678 951 N50/bp 3 816 2 372 1 493 2 920 N90/bp 983 306 244 316

Table 3. The annotation results of genes encoding phycobiliproteins in the transcriptome sequencing data of G. chouae, G. blodg- ettii, G. vermiculophylla and G. lemaneiformis as well as the genome-sequenced species of cyanobacteria, cryptomonads, and glaucocystophytes Allophycocyanin (apc) Phycocyanin (cpc) Phycoerythrin (cpe) Species Genbank ID apcA apcB apcD apcE cpcA cpcB cpcG cpeA cpeB G. chouae yes yes yes yes yes yes yes yes yes G.lemaneiformis yes yes yes yes yes yes yes yes yes G. blodgettii yes yes yes yes yes yes yes yes yes G. asiatica yes yes yes yes yes yes yes yes yes C. paradixa NC001675 yes yes yes yes yes yes yes A.P. 39 AP011615 yes yes yes yes yes yes yes P.S. 7803 NC009481 yes yes yes yes yes yes yes S.S. 307 NC009482 yes yes yes yes yes yes yes G.V. 7421 NC005125 yes yes yes yes yes yes yes yes yes M.A. 843 NC010296 yes yes yes yes yes yes yes P.M. 1375 NC005042 yes yes P.S. 9311 NC008319 yes yes yes yes yes yes yes C. merolae AB002583 yes yes yes yes yes yes yes G. theta AF041468 yes yes P. haitanenesis AY372218 yes yes yes yes yes yes yes yes yes P. purpurea U38804 yes yes yes yes yes yes yes yes yes P. yezoensis AP006715 yes yes yes yes yes yes yes yes yes Notes: A.P.39 represents Arthrospira platensis NIES-39, G.V.7421 Gloeobacter violaceus PCC 7421, M.A.843 Microcystis aeruginosa NIES- 843, P.M.1375 Prochlorococcus marinus pastoris CCMP1375, P.S. 9311 Prochlorococcus sp. CC9311, P.S.7803 Prochlorococcus sp. WH 7803, S.S. 307 Synechococcus sp. RCC 307, and yes annotated with full ORF sequence. XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62 57

tigs of G. chouae, G. lemaneiformis, G. blodgettii and G. vermiculo- using NGS technology to perform the transcriptome sequenc- phylla were 3 816 bp, 2 920 bp, 2 372 bp and 1 493 bp, respectively. ing studies on these four red algal species, we for the first time The results showed that the number of sequencing raw reads and gave them the whole transcript profiles and revealed that there the assembly result of G. lemaneiformis had no significant differ- were significant differences between G. lemaneiformis and ences compared with other three Gracilaria species. Gracilaria species, and even among the three Gracilaria spe- cies on their gene compositions and expression profiles (Fig. 1). 3.2 Gene annotation Such differences might reflect the specific genome characteris- Regarding gene annotation, we used the available public tics and expression feathers of each species, or might be caused information of algal genes and genomes for annotation and by sampling each sample in different times, locations and cul- performed a similarity search against the nr database using ture conditions which need further experiments to confirm. the BLASTx algorithm (Altschul et al., 1997) with an E-value threshold of 10−5 and a size threshold greater than 100 bp. For G. 3.3 COG and GO classification chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis, By searching against COG database and possible functions totally 4 980, 8 499, 5 042 and 7 820 unigenes were assigned to of 4 209 unigenes in G. chouae, 4 209 unigenes in G. vermiculo- taxonomic categories, respectively. phylla, 6 050 unigenes in G. blodgettii and 7 482 unigenes in G. However, there were many significant differences on the an- lemaneiformis were classifiedand subdivided into 25 COG cat- notated unigene compositions between G. lemaneiformis and egories (Fig. 2). Among these categories, except for G. blodget- three Gracilaria species, and even among the three Gracilaria tii whose unigenes assigned to Cluster “translation, ribosomal species (Fig. 1). First of all, for Gracilaria species, more than 70% structure and biogenesis” with the largest proportion (13.59%) unigenes, matched to genes of algal and plant species, includ- and Cluster “posttranslational modification, protein turnover, ing Ectocarpus siliculosus, Physcomitrella patens, Selaginella chaperones” with the 2nd largest proportion (12.84%), for G. moellendorffii, Gracilaria tenuistipitata and Coccomyxa subel- chouae, G. lemaneiformis, G. vermiculophylla, Cluster of “gen- lipsoidea. But for G. lemaneiformis, only less than 50% unigenes eral function prediction only” occupied the highest proportion matched to algal and plant genes. On the contrary, 52.04% of with 12.73%, 12.40%, 12.73%, followed by Cluster “posttrans- its unigenes were aligned to genes of bacteria Rhizopus oryzae lational modification, protein turnover, chaperones” (11.45%, and Batrachochytrium dendrobatidis, whereas these two bac- 11.55%, 11.05%) and Cluster “translation, ribosomal structure teria were not annotated in three Gracilaria species. Second, and biogenesis” (10.03%, 9.20%, 10.03%), respectively. Such dif- comparing the three Gracilaria species, G. chouae (84.66%) and ference between G. blodgettii and other three species was later G. vermiculophylla (83.70%) contained more genes aligned to proved to be caused by the contamination of ribosomal RNA algal and plant species than G. blodgettii (72.03%) did. Third, for during cDNA library's construction in G. blodgettii sample G. chouae and G. vermiculophylla., the most abundant anno- through the following analyses on sub-categories of GO classifi- tated genes were homologous to those of E. siliculosus, a brown cation (Fig. 3). On the other hand, Cluster “cell motility” repre- alga with the whole genome sequenced and annotated (Cock et sented the smallest portion (0.02%–0.07%) among four species. al., 2010), but E. siliculosus was the 4th abundantly homologous On GO classification, 8 745 unigenes of G. chouae, 14 766 species in G. blodgettii. Furthermore, the genes of Phytophthora unigenes of G. blodgettii, 8 852 unigenes of G. vermiculophylla infestans, which was not annotated in G. chouae and G. ver- and 14 626 unigenes of G. lemaneaformis were categorized into miculophylla, occupied the highest proportion (11.28%) among 47 GO terms containing three domains: “biological process”, all unigenes in G. blodgettii. So did genes of bacterium Albugo “cellular component” and “molecular function”. The distribu- laibachii, annotated in G. blodgettii but not found in G. chouae tion of unigenes among all GO terms were even across four and G. vermiculophylla. Therefore, among genus Graciaria, G. species, and the genes involving terms of “organelle part”, “vi- chouae seemed to have rather closer relationships to G. vermic- rion part”, “localization” and “biological regulation” were highly ulophylla than to G. blodgettii. represented and few genes were assigned to the term of “mac- Up to now, only the plasmid genome of G. tenuistipitata var. romolecular complex” (data not shown). The main differences liui had been sequenced (Hagopian et al., 2004) In this study, by in four species were from terms of “macromolecular complex”,

Ectocarpus siliculosus Physcomitrella patens G. chouae Selaginella moellendorffii Gracilaria tenuistipitata Coccomyxa subellipsoidea G. vermiculophylla Capsaspora owczarzaki Phaeodactylum tricornutum Phytophthora sojae Griffithsia japonica G. blodgettii Oryza sativa Rhizopus oryzae Batrachochytrium dendrobatidis G. lemaneiformis Salpingoeca sp. Albugo laibachii Thalassiosira pseudonana 020406080 100(%) Phytophthora infestans

Fig.1. Gene annotation of transcriptomes of G. chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis. 58 XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62

G. chouae J-Translation, ribosomal structure and biogenesis G. vermiculophylla A-RNA processing and modification G. blodgettii K-Transcription 0.12 L-Replication, recombination and repair G. lemaneiformis B-Chromatin structure and dynamics D-Cell cycle control, cell division, chromosome partitioning 0.10 Y-Nuclear structure

s V-Defense mechanisms T-Signal transduction mechanisms M-Cell wall/membrane/envelope biogenesis 0.08 N-Cell motility Z-Cytoskeleton W-Extracellular structures U-lntracellular trafficking, secretion, and vesicular transport 0.06 O-Posttranslational modification, protein turnover, chaperones C-Energy production and conversion G-Carbohydrate transport and metabolism 0.04 E-Amino acid transport and metabolism Ratio of unigenes to all unigene F-Nucleotide transport and metabolism H-Coenzyme transport and metabolism I -Lipid transport and metabolism 0.02 P-lnorganic ion transport and metabolism Q-Secondary metabolites biosynthesis, transport and catabolism R-General function prediction only S-Function unknown 0.00 1 - INFORMATION STORAGE AND PROCESSING J A K L B D Y V T M N Z W U O C G E F H I P Q R S 2 - CELLULAR PROCESSES AND SIGNALING 3 - METABOLISM 12 344 - POORLY CHARACTERIZED Function class

Fig.2. Histogram representation of clusters of orthologous groups (COG) classification of G. chouae, G. blodgettii, G. vermiculo- phylla and G. lemaneiformis.

0.12 G. chouae

s G. lemaneiformis 0.10 G. blodgettii G. vermiculophylla 0.08

0.06

0.04

Ratio of unigenes to all unigene 0.02

0.00 m t t d

ganelle ganelle ganelle cytoplas ribosome thylakoi ganelle part nuclear part photosystem I pore complex thylakoid part cytoplasmic part ganelle membrane ribosomal subunimitochondrial part ge ribosomal subuni intracellular or integral to membrane lar intracelluar or proteasome core complex

intrinsic to or ATPase complex, catalytic domain

intracelluar membrane-bounded orATP synthase complex, catalytic core F(1) intracellular non-membrane-bounded or

proton-transporting proton-transporting two-sector

Fig.3. Histogram representation of distribution of the isogroups in term “cellular component” in GO classification in Level 4 for G. chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis.

“membrane-enclosed lumen”, “molecular transducer activity”, difference in four species was in the term “cellular component” “receptor activity” and “metabolic process”. Further analyses (Fig. 3). There was an noticeably higher percentage of genes as- of differential GO categories in Level 4 showed that the major signed to terms of “ribosome”, “intracellular organelle”, “intra- XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62 59

cellular non-membrane-bounded organelle” and “cytoplasmic these sequenced transcriptomes. The results proved the pow- part” in G. blodgettii than in other three species, confirming er of high-throughput NGS technology in identifying genes in that there was ribosomal RNA contamination during its cDNA non-model organisms, and provided a valuable hint for further library construction. investigation on specific or related processes, functions, and pathways. 3.4 KEGG analysis We carried out functional and pathway analysis using KEGG 3.5 Differentially expressed gene analysis database among four species separately. 2 944 unigenes as- To explore the biological function of the significantly differ- signed to 299 KEGG pathways for G. chouae, 3 213 unigenes entially-expressed genes, we annotated a total of 1 287 KEGG mapped to 292 pathways in G. blodgettii, 2 815 unigenes as- orthologs which expressed in all samples, and 680 of them were signed to 293 pathways for G. vermiculophylla, and 3 654 unige- identified as differentially-expressed KEGG orthologs, involved nes mapped to 301 pathways in G. lemaneiformis. Outstanding- in Category “energy and reproduction”, such as “glycolysis, pen- ly, specific pathways were observed to involve “DNA replication, tose phosphate pathway and starch and sucrose metabolism”, recombination and repair”, “cell growth and death”, “energy and “carbon fixation in photosynthetic organisms” and the function reproduction (such as glycolysis, pentose phosphate pathway related to phycobiliprotein metabolism. In order to compare ex- and starch metabolism)”, “human diseases”, “carbohydrate me- pressed transcripts specifically related to the above mentioned tabolism and signal transduction” (Fig. 4). The results showed pathways among four species, the expression situations of those there were 1 236, 1 605, 1 317, 1 827 unigenes attributable to related genes were statistically calculated and shown in a heat- functions in these important pathways in G. chouae, G. blodget- map (Fig. 5). The results showed that G. vermiculophylla had the tii, G. vermiculophylla, G. lemaneiformis, respectively. highest expression level for the genes coding allophycocyanin There are debates on whether the substrate is UDP-D-Glu- and phycocyanin among four species. G. lemaneiformis also cose or ADP-D-Glucose, for the biosynthesis of floridean starch had high expression level for these genes compared with other in the pathway of “starch and sucrose metabolism” (KO:00050) two species. However, for the genes which were candidates for in red algae (Nyvall et al., 2000). This study verified that the the biosynthesis of agarose, their expression levels were higher substrate in this biosynthetic pathway was UDP-D-Glucose by in G. chouae, G. blodgettii and G. lemaneiformis than those in identifying UDP-glucose (EC: 1.1.1.22) with the length of 305 G. vermiculophylla. The genes in the carbon fixation pathway bp, 329 bp, 320 bp and 295bp in G. chouae, G. blodgettii, G. ver- were expressed in much higher levels in G. lemaneiformis and miculophylla and G. lemaneiformis compared with 437 bp of G. vermiculophylla than those in other two species. These dif- the ORF length of the reference gene in the KEGG analyses on ferences indicated that these four species had different proper-

250 G. chouae G. lemaneiformis G. blodgettii G. vermiculophylla 200

150

100 Number of unigene s

50

0 l s s n n n n n n m m m m m m m cancers translatio cell motility transcriptio development gy metabolis sensory system nervous system immune system digestive system lipid metabolis immune diseases excretory system endocrine system circulatory system signal transduction ener cell communicatio membrane transport replication and repair cell growth and death substance dependence nucleotide metabolis amino acid metabolis cardiovascular diseases infectious diseases: viral transport and catabolis carbohydrate metabolis environmental adaptatio neurodegenerative diseases infectious diseases: parasitic infectious diseases: bacteria folding, sorting and degradatio metabolism of other amino acid endocrine and metabolic diseases signaling molecules and interactio glycan biosynthesis and metabolism metabolism of cofactors and vitamin metabolism of terpenoids and polyketides xenobiotics biodegradation and metabolis biosynthesis1-Genetic of other secondary Information metabolites Processing 2-Organismal Systems 3-Cellular Processes 4-Environmental Information Processing 5-Human Diseases 12 34 566-Metabolism

Fig.4. Overview on the distribution of metabolism pathways of G. chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis. 60 XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62

G. chouae G. blodgettii G. lemaneiformis G. vermiculophylla allophycocyanin (AP) phycocyanin (PC)

agarose

carbon fixation

Fig.5. The heatmap of the overview on differentially-expressed genes related to some essential pathways of carbohydrate and phy- cobiliprotein metabolism among G. chouae, G. blodgettii, G. vermiculophylla and G. lemaneiformis. ties of their cellular products and had different potentials to be phycocyanin, phycocyanin and phycoerythrin (Liu et al., 2005), used in fish feeding, agar extraction, marine ecological environ- we searched genes encoding every type of phycobiliproteins in ment protection and reparation, and so on (Pan and Li, 2010). the transcriptome sequencing data of these four species and all published genome data of cyanobacteria, cryptomonads, and 3.6 Phylogenetic analysis on phycobiliproteins glaucocystophytes (Table 3). For the first time, we revealed that Phycobiliproteins are water-soluble proteins specially found the genes encoding all three types of phycobilisomes were ex- in cyanobacteria and certain algae, including rhodophytes, pressed in these four species. We also found that among these cryptomonads, glaucocystophytes, for capturing light energy algae, only rhodophytes and some species of cyanobacteria and passing on to chlorophylls during photosynthesis (Arnold contained all three types of phycobiliproteins, allophycocynin, and Oppenheimer, 1950; French and Young, 1952; Gantt and phycocyanin and phycoerythrin. Hence we used the full de- Conti, 1966). Gracilariceae is a representative family contain- duced amino acid sequences of the ORF of these genes to con- ing high contents of phycobiliprotein in red algae (Liu et al., struct phylogenic trees (Fig. 6). 2005). Since red algal phycobilisomes are composed of allo- From the phylogenetic tree (Fig. 6), we found that red al-

G. lemaneiformis 100 G. chouae 100 Gracilariaceae G. vermiculophylla Rhodophyta 92 G. blodgettii 85 88 Porphyra yezoensis 76 Porphyra haitanensis Bangiaceae Porphyra purpurea

Porchlorococcus sp. WH 7803 100 Synechococcus sp. RCC 307 Cyanobacteria 100 Gloeobacter violaceus PCC 7421

0.1

Fig.6. The phylogenetic relationships among rhodophytes, cryptomonads, glaucocystophytes and cyanobacteria based on the analyses of full amino acid sequences of allophycocyanin, phycoerythrin and phycocyanin using the maximum likelihood method by Kimuira’s two-parameter model (K2P) (Kimura, 1980). Numbers indicate the bootstrap values in the ML analysis. XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62 61

gae (Rhodophyta) and cyanobacteria were clustered into two related picocyanobacteria. Molecular Phylogenetics and Evolu- branches. This divergence verified the primary endosymbiont tion, 64(3): 381–392 hypothesis, i.e., red alga was originated from a eukaryotic cell French C S, Young V K. 1952. The fluorescence spectra of red algae and the transfer of energy from phycoerythrin to phycocyanin and incorporating a cyanobacterium (Gray, 1992). In addition, the chlorophyll. J Gen Physiol, 35(6): 873–890 red algae branch divided into two clades, Family Bangiaceae Gantt E, Conti S F. 1966. Phycobiliprotein localization in algae. and Family Gracilariaceae, which is consistent with common Brookhaven Symposia in Biology, 19: 393–405 taxonomy results. Therefore, phycobiliproteins might be suit- Glazer A N. 1997. Phycobiliproteins: A family of valuable, widely used able to be used to explain the phylogenetic relationship among fluorophores. Scanning, 19: 154–155 different families in red algae. Furthermore, in Family Gracila- Gray M W. 1992. The endosymbiont hypothesis revisited. Int Rev Cytol, riaceae, three Gracilaria species grouped into one branch, and 141: 233–357 G. lemaneiformis clustered with genus Gracilaria, indicating G. Hagopian J C, Reis M, Kitajima J P, et al. 2004. Comparative analysis of the complete plastid genome sequence of the red alga Gracilaria lemaneiformis is different from genus Gracilaria but has close tenuistipitata var. liui provides insights into the evolution of relationship to it. In genus Gracilaria, G. chouae had more close rhodoplasts and their relationship to other plastids. J Mol Evol, relationship to G. vermiculophylla rather than to G. blodgettii, 59(4): 464–477 in agreement with our annotation results (Fig. 1). On the other Hall T A. 1999. BioEdit: a user-friendly biological sequence alignment hand, by using each type of phycobiliprotein to build up the editor and analysis program for Windows 95/98/NT. Nucleic Ac- phylogenetic tree, Class Bangiaceae, Class Gracilariaceae, Cy- ids Sympsium Series, 41: 95–98 anidioschyzon merolae and crypomonads grouped into the red- Huovinen P, Gomez I, Lovengreen C. 2006. A five-year study of solar ultraviolet radiation in southern Chile (39 degrees S): potential algae branch (data not shown), showing their common origina- impact on physiology of coastal marine algae? J Photochem Pho- tion and close relationships to cyanobacteria. tobiol, 82(2): 515–522 Johnson M T, Carpenter E J, Tian Z, et al. 2012. Evaluating methods for 4 Conclusions isolating total RNA and predicting the success of sequencing In this study, the transcriptomes of three Gracilaria spe- phylogenetically diverse plant transcriptomes. PLoS One, 7(11): e50226 cies, G. chouae, G. blodgettii, G. vermiculophylla and a close Kanehisa M, Goto S. 2000. KEGG: kyoto encyclopedia of genes and ge- relative species, Gracilariopsis lemaneiformis, were sequenced, nomes. Nucleic Acids Res, 28(1): 27–30 assembled and characterized by NGS technology and relative Kimura M. 1980. A simple method for estimating evolutionary rates of software. The de novo assembly of the transcriptome generated base substitutions through comparative studies of nucleotide a set of unigenes and were annotated by Blast searches, GO, sequences. Journal of Molecular Evolution, 16(2): 111–120 COG and KEGG. We also concentrated on the gene expression Lapointe B E, Duke C S. 1984. Biochemical strategies for growth of in certain significant pathways involving glycolysis/gluconeo- Gracilaria tikvahiae (Rhodophyta) in relation to light intensity and nitrogen availability. J Phycol, 20(4): 488–495 genesis pathway, starch and sucrose metabolism pathway, phy- Li Ruiqiang, Zhu Hongmei, Ruan Jue, et al. 2010. De novo assembly of cobilisome and carbon fixation in photosynthetic organisms. human genomes with massively parallel short read sequencing. The phylogenetic analysis on phycobiliproteins suggested G. Genome Research, 20: 265–272 lemaneiformis had close relationship to genus Gracilaria but Liu Luning, Chen Xiulan, Zhang Xiying, et al. 2005. One-step chroma- the annotation and phylogenetic analysis results showed that it tography method for efficient separation and purification of has much difference from genus Gracilaria. In genus Gracilaria, R-phycoerythrin from Polysiphonia urceolata. J Biotechnol (in G. chouae had more close relationship to G. vermiculophylla Chinese), 116(1): 91–100 Marinho S E. 2001. Agar polysaccharides from Gracilaria species (Rho- rather than to G. blodgettii. The present study is a first contri- dophyta, Gracilariaceae). J Biotechnol, 89(1): 81–84 bution to the transcriptomic analyses of Gracilariaceae spe- Moriya Y, Itoh M, Okuda S, et al. 2007. KAAS: an automatic genome cies, and provided valuable information on functional genome annotation and pathway reconstruction server. NAR, 35(Web studies, phylogenetic studies and even utilization of red algae Server issue): W182–185 of Gracilariaceae. Nyvall P, Pedersen M, Kenne L, et al. 2000. Enzyme kinetics and chemi- cal modification of alpha-1,4-glucan lyase from Gracilariopsis sp. Phytochemistry, 54(2): 139–145 References Pan Jiangqiu, Li Sidong. 2010. Development and Utilization of Altschul S F, Madden T L, Schaffer A A, et al. 1997. Gapped BLAST and Gracilaria Resources. Chinese Journal of Tropical Agriculture (in PSI-BLAST: a new generation of protein database search pro- Chinese), 30(10): 47–50 grams. NAR, 25(17): 3389–3402 Romualdi C, Bortoluzzi S, D'Alessi F, et al. 2003. IDEG6: a web tool for Arnold W, Oppenheimer J R. 1950. Internal conversion in the photo- detection of differentially expressed genes in multiple tag sam- synthetic mechanism of blue-green algae. The Journal of Gen- pling experiments. Physiol Genomics, 12(2): 159–162 eral Physiology, 33(4): 423–435 Shih C M, Cheng S N, Wong C S, et al. 2009. Antiinflammatory and an- Cock J M, Sterck L, Rouze P, et al. 2010. The Ectocarpus genome and tihyperalgesic activity of C-phycocyanin. Anesth Analg, 108(4): the independent evolution of multicellularity in brown algae. 1303–1310 Nature, 465(7298): 617–621 Steentoft M, Farham W F. 1997. Northern distribution boundaries and De Gasperi R, Gama Sosa M A, Kim S H, et al. 2012. Acute blast injury thermal requirements of Gracilaria and Gracilariopsis (Graci- reduces brain abeta in two rodent species. Frontiers in Neurol- lariales, Rhodophyta) in Atlantic Europe and Scandinavia. Nord ogy, 3: 177 J Bot, 5(5): 87–93 Dufosséa L, Galaupa P, Yaronb A. 2005. Microorganisms and microal- Tamura K, Peterson D, Peterson N, et al. 2011. MEGA5: molecular evo- gae as sources of pigments for food use: a scientific oddity or an lutionary genetics analysis using maximum likelihood, evolu- industrial reality. Trends in Food Science and Technology, 16: tionary distance, and maximum parsimony methods. Mol Biol 389–406 Evol, 28(10): 2731–2739 Everroad R C, Wood A M. 2012. Phycoerythrin evolution and diversi- Tatusov R L, Fedorova N D, Jackson J D, et al. 2003. The COG database: fication of spectral phenotype in marine Synechococcus and an updated version includes eukaryotes. BMC Bioinformatics, 62 XU Jiayue et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 54–62

4: 41 of caged fish aquaculture by the red alga Gracilaria verrucosa in Tatusov R L, Natale D A, Garkavtsev I V, et al. 2001. The COG database: an integrated multi-trophic aquaculture system. Acta Ecologica new developments in phylogenetic classification of proteins Sinica (in Chinese), 967 from complete genomes. NAR, 29(1): 22–28 Zhang Xuecheng, Fei Xiugeng, Wang Guangce, et al. 2009. Genetic stud- Thompson J D, Higgins D G, Gibson T J. 1994. CLUSTAL W: improv- ies and large scale cultivation of gracilaria lemaneiformis. Peri- ing the sensitivity of progressive multiple sequence alignment odical of Ocean University of China (in Chinese), 39(5): 947–954 through sequence weighting, position-specific gap penalties Zhang Junfu, Xia Bangmei. 1992. Studies on two new Gracilaria from and weight matrix choice. NAR, 22(22): 4673–4680 South China and a summary of Gracilaria species in China. Tax- Xu Shannan, Wen Shanshan, Wu wangxin, et al. 2008. Bioremediation onomy of Economic Seaweeds, 3: 195–206