Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53 DOI: 10.1007/s13131-014-0440-7 http://www.hyxb.org.cn E-mail: [email protected]

De novo sequencing and comparative analysis of three red algal species of Family Solieriaceae to discover putative genes associated with carrageenan biosysthesis SONG Lipu1,3,4†, WU Shuangxiu1,3†, SUN Jing1,3,4†, WANG Liang1,3,4, LIU Tao2, CHI Shan2, LIU Cui2, LI Xingang1,3, YIN Jinlong1, WANG Xumin1,3*, YU Jun1,3*

1 CAS Key Laboratory of Genome Sciences and Information, Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 2 College of Marine Life Science, Ocean University of China, Qingdao 266003, China 3 Beijing Key Laboratory of Functional Genomics for Dao-di Herbs, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 4 University of Chinese Academy of Sciences, Beijing 100049, China

Received 25 March 2013; accepted 22 July 2013

©The Chinese Society of Oceanography and Springer-Verlag Berlin Heidelberg 2014

Abstract Betaphycus gelatinus, alvarezii and denticulatum of Family Solieriaceae, Order Gi- gartinales, Class Rhodophyceae are three important carrageenan-producing red algal species, which pro- duce different types of carrageenans, beta (β)-carrageenan, kappa (κ)-carrageenan and iota (ι)-carrageenan. So far the carrageenan biosynthesis pathway is not fully understood and few information is about the So- lieriaceae genome and transcriptome sequence. Here, we performed the de novo transcriptome sequencing, assembly, functional annotation and comparative analysis of these three commercial-valuable species using an Illumina short-sequencing platform Hiseq 2000 and bioinformatic software. Furthermore, we compared the different expression of some unigenes involved in some pathways relevant to carrageenan biosynthe- sis. We finally found 861 different expressed KEGG orthologs which contained a glycolysis/gluconeogenesis pathway (21 orthologs), carbon fixation in photosynthetic organisms (16 orthologs), galactose metabolism (5 orthologs), and fructose and mannose metabolism (9 orthologs) which are parts of the carbohydrate me- tabolism. We also found 8 different expressed KEGG orthologs for sulfur metabolism which might be impor- tantly related to biosynthesis of different types of carrageenans. The results presented in this study provided valuable resources for functional genomics annotation and investigation of mechanisms underlying the biosynthesis of carrageenan in Family Solieriaceae. Key words: Betaphycus gelatinus, , , Solieriaceae, de novo transcriptome sequencing, carrageenan Citation: Song Lipu, Wu Shuangxiu, Sun Jing, Wang Liang, Liu Tao, Chi Shan, Liu Cui, Li Xingang, Yin Jinlong, Wang Xumin, Yu Jun. 2014. De novo sequencing and comparative analysis of three red algal species of Family Solieriaceae to discover putative genes as- sociated with carrageenan biosysthesis. Acta Oceanologica Sinica, 33(2): 45–53, doi: 10.1007/s13131-014-0440-7

1 Introduction ducing species which produce different types of carrageenans Solieriaceae is a red algal family in Order of (Lobban and Harrison, 1994), i.e., beta (β)-carrageenan, kappa Class Rhodophyceae. In this family, there are many valuable (κ)-carrageenan and iota (ι)-carrageenan, which are different in species which generally have multiaxial thalli and often in- their chemical structures and properties, and are mainly pro- habite the warm seas, including Betaphycus gelatinus, Kappa- duced by B. gelatinus, K. alvarezii and E. denticulatum, respec- phycus alvarezii and Eucheuma denticulatum. B. gelatinus is tively (Rudolph, 2000) . most abundant in about one meter below the low tide region Carrageenans are sulfated galactans clustering in cell walls under cultivation in the Qionghai and Wenchang districts of of numerous red seaweeds, which are especially rich in some the Hainan Province, China (Tseng et al., 1981). K. alvarezii is species such as B. gelatinus, K. alvarezii and E. denticulatum often cultivated in the upper part of the sublittoral zone, from (Cole and Sheath, 1990). In the food industry, carrageenans are just below the low tide line to rocky substrates where water widely utilized due to their excellent physical properties, such flow is slow to moderate. E. denticulatum is produced in large as thickening, gelling and stabilizing abilities (McHugh, 2003). quantity in Taiwan and is known in China as Qilincai (mean- They are also used in various non-food products, such as phar- ing unicorn vegetable), a rather common algal food in China. maceutical, cosmetics, printing and textile formulations (Ime- These three species are the most significant carrageenan-pro- son, 2000). Carrageenans are linear units of D-galactose resi-

Foundation item: The National Natural Science Foundation of China under contract Nos 31140070, 31271397 and 41206116; the algal transcrip- tome sequencing was supported by 1KP Project (www.onekp.com). *Corresponding author, E-mail: [email protected], [email protected] †Contributed equally. 46 SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53

dues composed of alternating 3-linked β-D-galactopyranose those on gene structure and expression profiling, were carried and 4-linked α-D-galactopyranose or 4-linked 3,6-anhydroga- out not only on model species (Mortazavi et al., 2008; Nagalak- lactose (Gordon-Mills et al., 1978). Sulfated galactans are clas- shmi et al., 2008), but also on non-model organisms from de sified according to the presence of the 3,6-anhydro-bridge on novo analysis (Wang et al., 2010; Yuan et al., 2012), leading to a the 4-linked-galactose residue and the position and number of dramatic acceleration in gene discovery, especially for the genes sulfate groups (Fig. 1), which influences the physical properties having low expression levels which were ignored by tradition- of carrageenans (Chiovitti et al., 1995; Gordon-Mills et al., 1978; al method (Barrero et al., 2011; Garg et al., 2011), and rapidly van de Velde et al., 2002). broadening our knowledge on gene regulations and networks Carrageenans from Family Solieriaceae have attracted con- in metabolism. Hence we can study the organisms without the siderable attention because of the economically important reference genome information at the genome-wide level by the genera Eucheuma and Kappaphycus, of which algal species are cost-effective RNA-seq method. the main resources for the production of ι- and κ-carrageenans In this article, we, for the first time, performed the de novo (Doty, 1988; Santos, 1989). Moreover, B. gelatinus, previously transcriptome sequencing, assembly, functional annotation known as Eucheuma gelatinae, produces a hybrid of carrageen- and compararive analysis of the three most significant car- ans with significant β-carrageenan character (Greer and Yaphe, rageenan-producing species, B. gelatinus, K. alvarezii and E. 1984; Santos, 1989). However, carrageenans have very hetero- denticulatum of Family Solieriaceae collected from the Hainan geneous chemical structures, depending on the algal source, Province, using an Illumina short-sequencing platform Hiseq life stage, and extraction procedure (Cole and Sheath, 1990; 2000 and bioinformatic software. Furthermore, we used the se- Usov, 1998). Although the carrageenan biosynthesis pathway quencing reads to map back to the assembly result to get the is not fully understood, it is usually accepted that the last step expression profile by software RSEM (Li and Dewey, 2011) and consists of the formation of a 3,6-anhydro ring found in κ- and compared the differences of some unigenes possibly involved ι-carrageenans through the enzymatic conversion of D-galac- in some pathways relevant to carrageenan biosynthesis. The tose-6-sulfate or D-galactose-2,6-disulfate occurring in μ- and assembled and annotated transcriptomic sequences provided ν-carrageenan, respectively (van de Velde et al., 2002). Accord- comprehensive genomic resources for further gene identifica- ing to the information above the structures of carrageenans, tion, characterization and metabolic studies of the carrageen- β-carrageenan, κ-carrageenan and ι-carrageenan have different an-producing species or relative species of Family Solieriaceae percentages of sulfate. However, so far, there is no report on the and even other . gene characteristics related to the carrageenan biosynthesis on the molecular level. 2 Materials and methods Over the past several years, the next generation sequencing technology has been widely used for transcriptome analysis 2.1 Algal sample collection because it has evolved to be more cost-effective, have high- Branches of the algal samples of B. gelatinus, K. alvarezii throughput and is deep depth. By using this technology, the and E. denticulatum were collected from the Hainan Province genome-wide transcriptome studies (RNA-seq), including (19°27′59″N, 108°52′59″E; 18°24′30″N, 110°03′43″E; 18°24′30″N,

ab

n B. gelatinus β-carrageenan

n K. alvarezii κ-carrageenan

n E. denticulatum ι-carrageenan

Fig.1. Photos of B. gelatinus, K. alvarezii and E. denticulatum (a) and the structures of three different carrageenans (b). SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53 47

110°03′43″E) in China in February 2012, March 2011 and Oc- Groups (COG) classification was performed against COG data- tober 2010, respectively. The algal sample, B. gelatinus, was base (Tatusov et al., 2003) using BLASTX (Altschul et al., 1997) collected at 19°C in the field and fixed immediately for RNA program (E-value<10−5). Pathway analysis was performed using extraction, while the algal samples of K. alvarezii and E. den- the Kyoto Encyclopedia of Genes and Genomes (KEGG) anno- ticulatum were transported to grow in a modified seawater me- tation service KAAS (Kanehisa et al., 2004; Moriya et al., 2007). dium, which was supplemented with nutrients (NaNO3 4 mg/L −2 2.6 Expression analysis and KH2PO4 0.4 mg/L), at 25°C and under 30 μmol photons m s−1 irradiance at the laboratory of the Culture Collection of Sea- We estimated gene expression based on FPKM values (frag- weed in the Ocean University of China. ments per kilobase of unigene model per million mapped reads) for each unigene (Trapnell et al., 2010). To calculate the 2.2 Total RNA extraction and purification FPKM value of each unigene, we used software bowtie (Berrier Algal samples were first immersed in liquid nitrogen and et al., 2010) to align read to assembled unigenes and software ground to a fine powder using a chilled mortar and pestle. Total RSEM (Li and Dewey, 2011). To identify the differentially ex- RNA was extracted using an improved Trizol method (Johnson pressing unigenes, we used the statistical method Generalized et al., 2012; Li et al., 2012). Total RNA was quantified using a Chi-squared Test to define the differential expression of unige- Nanodrop ND 1000 spectrophotometer (Labtech International nes by software IDEG6 (Romualdi et al., 2003). Ltd, Lewes, UK). Qualitative analysis of RNA was done by RIN using Agilent 2100 bioanalyzer (Agilent Biotechnologies, Santa 3 Results and discussion Clara, CA, USA). 3.1 Reads generation and de novo assembly 2.3 Transcriptome sequencing We used the Illumina HiSeq 2000 to sequence three algal cDNA libraries construction and sequencing were per- RNA samples, B. gelatinus, K. alvarezii and E. denticulatum. For formed by BGI (Shenzhen, China) on a Illumina HiSeq 2000 all the three samples, we obtained 78 337 138 paired-end reads platform (San Diego, USA) in accordance with the manufac- with read length of 90 bp, equal to a total of 7.05 giga bases with turer's instructions. GC percentage close to 53% (Table 1). The Q20 (base quality of 2.4 Reads filtering and de novo assembly more than 20 and an error rate of less than 0.01) percentage for Strict reads filtering was performed before the assembly. each algal sample was more than 95%, indicating that the raw Pair-end reads with primer or adaptor sequences were removed. sequencing reads of all the three samples had a good sequenc- Reads with more than 10% of the bases below Q20 quality or ing quality. After read filtering, we got a total of 66 327 212 reads more than 5% of unknown nucleotides (Ns) were filtered from containing 5.18 giga bases that remained for the further de novo total reads. De novo assembly was carried out using SOAPde- assembly (Table 1). novo-Trans (http://soap.genomics.org.cn/SOAPdenovo-Trans. We used the software SOAPdenovo-Trans to assemble the html). Gapcloser (Luo et al., 2012) was further used for gap fill- reads into contigs, and used paired information of the reads to ing of the scaffolds. join the contigs into scaffolds. Finally, we used software Gap- closer (Luo et al., 2012) to facilitate filling the gaps of scaffolds to 2.5 Functional annotation generate the unigenes. For the unigenes longer than 100 bp, as- To identify gene expression pattern in our target species, the sembly of the reads resulted in 24 161, 47 691 and 25 025 unige- BLASTX (Altschul et al., 1997) homology search was conducted nes with N50 size of 2 577 bp, 1 736 bp and 2 244 bp, and a total against the NCBI non-redundant (nr) protein database (of July of 23.1 Mb, 41.4 Mb and 22.1 Mb nucleotides for B. gelatinus, K. 2012) (Sayers et al., 2012) with E-value less than 10−5. Functional alvarezii and E. denticulatum, respectively (Table 2). Although classification of Gene Ontology (GO) (Ashburner et al., 2000) there is a larger proportion of unigenes with a length between of the unigenes was performed by the InterProScan program 100 and 300 bp (data not shown), enough long unigenes were (Zdobnov and Apweiler, 2001). The Clusters of Orthologous acquired for the subsequent analysis.

Table 1. Statistics of sequencing data of three red algal samples Total Total Filtered Species Q20 percentage/% GC percentage/% Filtered reads reads bases/bp bases/bp B. gelatinus 29 611 670 2 665 050 300 96.87 54.17 25 922 988 2 058 609 829 K. alvarezii 20 976 038 1 887 843 420 95.43 52.84 16 927 452 1 256 720 598 E. denticulatum 27 749 430 2 497 448 700 96.64 53.90 23 476 772 1 861 867 262

Table 2. Statistics of de novo assembly of three red algal samples B. gelatinus K. alvarezii E. denticulatum Max unigene length/bp 16 487 11 721 16 577 Total base/Mb 23.1 41.4 22.1 Number of unigene 24 161 47 691 25 025 Average of length/bp 954 868 884 N50/bp 2 577 1 736 2 244 N90/bp 354 365 336 48 SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53

Here we got more than two giga bases of raw reads and 7% nearly two giga filtered bases for each sample except for spe- 8% 7% 19% cies K. alvarezii, whose sequencing reads were a little bit less 8% 23% than the other two samples (Table 1), possibly due to the sam- 8% 9% ple randomness. Because of the different sequence depth, the assembly results of these three samples had different indexes. 15% 9% 14% For the species B. gelatinus and E. denticulatum, the assembly 12% had very good quality with N50 of more than 2 000 bp and the 10% unigenes number of nearly 25 000. However, K. alvarezii had a 11% 8% 13% 10% larger percentage of shorter sequence length and larger amount 8% of assembled unigenes because it has less sequence reads. Ac- B. gelatinus K. alvarezii cording to the de Bruijn Graph algorithm (Li et al., 2010), we get 8% 16% a better assembly with a deeper depth of the reads and an abun- 8% Ectocarpus siliculosus Griffithsia japonica dance of K-mer. Therefore, because of the less sequence reads, Physcomitrella patens Volvox carteri K. alvarezii have a larger percentage of shorter unigenes which Capsaspora owczarzaki Emiliania huxleyi 11% 13% Glycine max were split into two or more parts from one cDNA. Selaginella moellendorffii Chlamydomonas reinhardtii Vitis vinifera Though we did not get deep-depth sequencing for the ge- Phytophthora infestans Oryza sativa nomes or the multiple libraries of transcriptomes of these three 11% 9% Phaeodactylum tricornutum essential carrageenan species, we did, for the first time, get the Coccomyxa subellipsoidea whole transcriptome profiles and numerous gene sequences 13% 11% Chlamydomonas variabilis Phytophthora sojae for B. gelatinus, K. alvarezii and E. denticulatum, which previ- E. denticulatum ously had no information about their genome or transcriptome, and no molecular research on their carrageenan biosynthesis pathways. These transcriptome data made the studies of these Fig.2. Statistics of the top 9 hit species annotated by nr database for B. gelatinus, K. alvarezii and E. denticula- genome-unknown species into omics level and also provided tum. valuable basis for further molecular studies on genome, gene characteristics, and even on carrageenan biosynthesis of the three or related species of red algae. dant aligned species, indicating they are close relatives or at least, on certain biological functions or processes, that there 3.2 Functional annotation might not be much differences among the three species. Because there was less molecular basis of red algae, only 25%–30% of total unigenes had a hit in the nr database for all 3.3 GO analysis three samples (Fig. 2), i.e., a total of 6 728 (27.8%), 14 628 (30.7%) To facilitate the organization of the three algal transcripts and 6 358 (25.4%) unigenes were aligned against the nr database into putative functional groups, Gene Ontology (GO) (Ashburn- for B. gelatinus, K. alvarezii and E. denticulatum, respectively. er et al., 2000) terms were assigned using Interproscan program The remaining sequences had no homologs in the nr database (Zdobnov and Apweiler, 2001). For B. gelatinus, a total of 4 525 by the cutoff E-value 10−5, suggesting that they might be com- unigenes were assigned to GO terms, including 2 731 sequences prised of novel genes which were specifically expressed. Alter- at term “biological process”, 1 337 sequences at term “cellular natively, part of these sequences may correspond to untrans- component” and 3 898 sequences at term “molecular func- lated regions or errors during assembly. tion”. For K. alvarezii, a total of 7 986 unigenes were assigned GO The best-alignment BLAST results were evenly distributed terms, including 4 457 sequences, 1 941 sequences and 6 882 se- among the three species and the largest proportion of hit for quences at terms of “biological process”, “cellular component” all the three samples was the same species, Ectocarpus silicu- and “molecular function”, respectively. Among all the 4 030 GO- losus, a brown alga with the whole genome already sequenced annotated unigenes of E. denticulatum, 2 304 unigenes, 1 029 and annotated (Cock et al., 2010), which occupied nearly 20% of unigenes and 3 462 unigenes were assigned to terms of “biologi- unigenes of each sample for all alignment results. However, the cal process”, “cellular component” and “molecular function”, re- proportion of unigenes aligned to E .siliculosus was only 4%–5% spectively. Moreover, the distribution of the functional annota- among all the annotated unigenes, demonstrating that the se- tion of the transcripts of the three red algal samples were similar quenced genome data of E .siliculosus in the public databases is (Fig. 3). Among the “biological processes” term, the unigenes not sufficient to analyze the transciptomes of Solieriaceae spe- belonging to the “cellular process” (GO: 0009987) and “meta- cies. The result also indicated that red algae and brown algae are bolic process” (GO: 0008152) occupied the largest percentage. quite different in terms of their gene/genome characteristics. Among the “molecular functions” term, the unigenes assigned The lack of a complete genomic or transcriptomic sequence to “binding” (GO: 0005488) was the most annotated GO term, dataset as the reference for red algal species increased the dif- followed by “catalytic activity” (GO: 0003824). “Cell part” (GO: ficulty of unigene annotation in the following study. 0044464), “macromolecular complex” (GO: 0032991) and “or- In addition, B. gelatinus and K. alvarezii were previously ganelle” (GO: 0043226) were the top three most-annotated cat- classified to genus Eucheuma and were known as E. gelatinus egories for the “cellular component” term. and E. cottonii, respectively, while E. denticulatum was named Importantly, there were some differences that exist at the as E. Spinosum, when used in the production of carrageenan “cellular component” level between B. gelatinus and the two (Doty, 1988). The functional annotation results of transcrip- other samples, i.e., its percentages of unigenes in the “cell part”, tomes of these three species were almost similar on the abun- “macromolecular complex” and “organelle” were larger than SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53 49

the other two samples. The same was observed for B. gelatinus 3.4 COG classification at the “structural molecule activity” (GO: 0005198) of the “mo- We used BLASTX to align the proteins in the COG (Tatusov lecular function” level. These differences suggested that the et al., 2003) database to the unigenes to classify the predicted unigenes which caused the different types and the different per- functions (Fig. 4). For B. gelatinus, 5 764 unigenes were aligned centage of carrageenan in the cell walls among the three species to 4 118 reference proteins in the COG database. For K. alvarezii, might be clustered in these functional terms. 6 276 unigenes had a good hit in 4 008 reference proteins and

B. gelatinus K. alvarezii E. denticulatum 0.5

s 0.4

0.3

0.2

Ratio of unigenes to all unigene 0.1

s

s

s s

0.0 s

n

n

n

roces

ganelle

binding

cell part

signaling

or

virion part

membrane

localizatio

ganelle part

or

factor activity

ganism process

membrane part

cellular proces

receptor activity

catalytic activity

ganismal process

metabolic proces

tion factor activity

transporter activity

biological adhesion

antioxidant activity

extracellular region

response to stimulus

biological regulatio

developmental proces

multi-or

electron carrier activity

ganization or biogenesi

nutrient reservoir activity

enzyme regulator activity

macromolecular complex

structural molecule activity

establishment of localizatio

molecular transducer activity

regulation of biological p

multicellular or

protein binding transcription

cellular component or nucleic acid binding transcrip

Cellular component Molecular function Biological process

Fig.3. GO analysis of three red algal samples of B. gelatinus, K. alvarezii and E. denticulatum.

B. gelatinus A-RNA processing and modification K. alvarezii B-Chromatin structure and dynamics E. denticulatum 0.14 C-Energy production and conversion D-Cell cycle control, cell division, chromosome partitioning E-Amino acid transport and metabolism 0.12 F-Nucleotide transport and metabolism G-Carbohydrate transport and metabolism

s H-Coenzyme transport and metabolism 0.10 I-Lipid transport and metabolism J-Translation, ribosomal structure and biognesis K-Transcription 0.08 L-Replication, recombination and repair M-Cell wall/membrane/envelope biogenesis N-Cell motility 0.06 O-Posttranslational modification, protein turnover, chaperones P-Inorganic ion transport and metabolism Q-Secondary metabolites biosynthesis, transport and catabolism R-General function prediction only 0.04 S-Function unknown Ratio of unigenes to all unigene T-Signal transduction mechanisms U-Intracellular trafficking, secretion, and vesicular transport 0.02 V-Defense mechanisms W-Extracelluar structures Y-Nuclear structure 0.00 Z-Cytoskeleton A B C D E F G H I J K L M N O P Q R S T U V W Y Z

Fig.4. COG analysis of three red algal samples of B. gelatinus, K. alvarezii and E. denticulatum. 50 SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53

for E. denticulatum, 4 498 unigenes hit 3 466 reference proteins 2 670 unigenes to 298 KEGG pathways by hitting 3 188 enzymes, in COG database. For all the COG functional categories of the respectively. For the three species, category “translation” had three samples, the proportion of each of the category was al- the largest numbers of unigenes among all the categories, fol- most the same. The category “general function prediction only” lowed by the categories of “carbohydrate metabolism”, “folding, (about 13%) had the largest number of transcript alignments, sorting and degradation” and “amino acid metabolism”. For the followed by category “post translational modification, protein above function annotation analysis, there was not much differ- turnover, chaperones” (nearly 11%), category “signal transduc- ence among the three species, so we further estimated the ex- tion mechanisms” (around 9%), category “translation, ribo- pression profile for the three species. somal structure and biogenesis” (nearly 5%) and category “cy- toskeleton” (nearly 5%). Whereas the clusters of “cell motility”, 3.6 Gene expression analysis on carbohydrate metabolisms “defense mechanisms”, “extracellular structures” and “nuclear We estimated the unigene expression by FPKM values (Trap- structure” had the fewest number in our results. nell et al., 2010) for each transcript based on the information of reads mapping to the unigenes. We also used the bowtie to map 3.5 KEGG pathway analysis the filtered reads to assembled unigenes separately for all the Functional and pathway analyses of the unigenes of the three samples. A total of 22 995 984 (88.7%) filtered reads was aligned red algal samples were carried out using the KEGG pathway da- to the unigenes of the sample of B. gelatinus, and 14 693 288 tabase (Kanehisa et al., 2004) and the online KEGG Automatic (86.8%) and 20 997 502 (89.4%) filtered reads were matched for Annotation Server (KAAS) (Moriya et al., 2007) (Fig. 5). 2 713 the samples of K. alvarezii and E. denticulatum, respectively. For of the 24 161 unigenes of B. gelatinus had significant matches all the unigenes of all the samples, we discarded the unigenes with 3 256 enzyme commissions, assigned to 297 KEGG path- whose FPKM values were equal to 0 because of the unexpres- ways. For K. alvarezii and E. denticulatum, we assigned 6 023 sion. For B. gelatinus, 15 897 (65.8%) unigenes were expressed unigenes to 296 KEGG pathways by hitting 3 369 enzymes, and with the mean value of 32.68. While for K. alvarezii and E. Den-

1-Genetic information processing B. gelatinus 2-Organismal systems K. alvarezii 3-Cellular processes E. denticulatum 4-Environmental information processing 5-Human diseases 6-Metabolism 250 200 150 100 Number of unigene s 0 05

t

l

n

s n h e

n n n

m

m m

m m m m m

Viral

Cancers

Translation

Transcription

gy metabolis

Cell motility

Developmen

Sensory system

Immune system

Nervous system

Digestive system

Immune diseases

Excretory system

Endocrine system

Lipid metabolis

Circulatory system

Signal transductio

Ener

Cell communicatio

Membrane transport

Amino acid metabolis

Replication and repair

Cell growth and deat

Substance dependenc

Transport and catabolis

Nucleotide metabolis

Cardiovascular diseases

Infectious diseases:

Environmental adaptatio

Carbohydrate metabolis

Neurodegenerative diseases

Infectious diseases: Parasitic

Infectious diseases: Bacteria

Folding, sorting and degradatio

Metabolism of other amino acids

Endocrine and metabolic diseases

Signaling molecules and interactio

Glycan biosynthesis and metabolis

Metabolism of cofactors and vitamins

Metabolism of terpenoids and polyketides

Biosynthesis of other secondary metabolite

Xenobiotics biodegradation and metabolis

1 2 3 4 5 6

Fig.5. KEGG pathway analysis of three red algal samples of B. gelatinus, K. alvarezii and E. denticulatum. SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53 51

ticulatum, 33 555 (70.4%) and 16 340 (65.3%) unigenes were ex- 6-phosphofructokinase I (PFK) [EC:2.7.1.11], fructose-1,6- pressed with a mean value of 34.03 and 35.35, respectively. bisphosphatase I (FBP) [EC:3.1.3.11] and fructose-bisphos- Due to less molecular information about Family Solieria- phate aldolase (ALDO) [EC:4.1.2.13], and so on (Fig. 6a). These ceae and the differences between species, we could not select a enzymes dominated the pathway of glycolysis/gluconeogen- species as a reference to identify the orthologs among the three esis. 13 of the total 861 KEGG orthologs were parts of the starch species. So, we used the annotation result of KEGG to identi- and sucrose metabolism. Among them, UTP-glucose-1-phos- fy the orthologs for the three samples. We found in total 1 560 phate uridylyltransferase (UGP2) [EC: 2.7.7.9], beta-glucosi- KEGG orthologs which were annotated in all samples and their dase (E.3.2.1.21) [EC:3.2.1.21], glucokinase (GCK) [EC:2.7.1.2], FPKM values in all the samples were larger than 0. We used the 4-alpha-glucanotransferase (malQ) [EC: 2.4.1.25] and other en- software IDEG6 (Romualdi et al., 2003) to define the differential zymes, which controlled the D-glucose and UDP-glucose bio- expressed orthologs by statistical method General Chi-squared synthesis, had different expression levels. There were also many Test with a significance threshold set at 0.001. Finally, we de- orthologs annoted in other pathways, such as carbon fixation fined 861 differentially expressed orthologs for the three sam- in photosynthetic organisms (16 orthologs), galactose metab- ples altogether. olism (5 orthologs) and fructose and mannose metabolism (9 Among these differentially expressed 861 KEGG orthologs, orhtologs), which are all parts of carbohydrate metabolism. we focused on pathways related to carbon metabolism. In the Carbohydrate metabolism is the most active metabolic pro- glycolysis/gluconeogenesis pathway, we found 21 differen- cess in seaweed and it takes part in the main life cycles with the tially-expressed orthologs which controlled the metabolism energy and material transformation. Therefore, though the car- of glucose and fructose, such as phosphoglucomutase (Pgm) rageenan biosynthesis pathway is not fully understood, there [EC: 5.4.2.2], glucose-6-phosphate isomerase (GPI) [EC:5.3.1.9], are a large number of differentially-expressed KEGG orthologs

a alpha-D-Glucose Pgm alpha-D-Glucose GPI beta-D-Fructose 1-phosphate EC: 5.4.4.2 6-phosphate EC: 5.3.1.9 6-phosphate E5.1.3.15, EC: 5.1.3.15 GPI GCK GPI, EC: 5.3.1.9 EC: 2.7.1.2 EC: 5.3.1.9 GCK beta-D-Glucose FBP PFK alpha-D-Glucose beta-D-Glucose EC: 2.7.1.2 6-phosphate EC: 3.1.3.11 EC: 2.7.1.11 beta-D-Fructose 1, 6-bisphosphate

3-phosphoglyce PGK D-Glycerate ALDO 1, 3-diphosphate rate EC: 2.7.2.3 CAPDH, EC: 1.2.1.12 EC: 4.1.2.13 gap2, EC: 1.2.1.59 PGAM Glyceraldehyde Glycerone EC: 5.4.2.1 −3 phosphate TPI phosphate EC: 5.3.1.1 phosphoenol 2-phospho-D- ENO PK pyruvate glycerate EC: 4.2.1.11 pyruvate EC: 2.7.1.40 ace EC: 1.2.4.1

Acetaldehyde 2-Hydroxyethyl ace S-Acetyldihydr -thPP EC: 1.2.4.1 olipoamide-E ADH1_7 E1.2.1.3 EC: 1.1.1.1 EC: 1.2.1.3 ACSS Ethanol Acetate EC: 6.2.1.1 Acetyl-Coa

cysQ b EC: 3.1.3.7 Sulfate PAPSS Adenylyl Sulfate 3'-phosphoadenylyl EC: 2.7.7.4 PAPSS sulfate EC: 2.7.1.25

Sulfite L-Homoserine

L-Cysteine sir EC: 1.8.7.1

cysK Hydrogen sulfide O-Succinyl-L- EC: 2.5.1.47 homoserine metB Acetate EC: 2.5.1.48

metC L-Homocysteine EC: 4.4.1.8 L-Cystathionine

Fig.6. Pathway diagrams taken part in by the differentially-expressed genes related to glycolysis/gluconeogenesis metabolism (a. inlcuding 21 enzymes) and sulfur metabolism (b. including 8 enzymes) among B. gelatinus, K. alvarezii and E. denticulatum. The coding enzymes and their coding numbers are shown in pink rectangular boxes and their catalyzed products are shown in blue oval boxes. 52 SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53

related to carbohydrate metabolism identified among B. gela- Cock J M, Sterck L, Rouze P, et al. 2010. The Ectocarpus genome and tinus, K. alvarezii and E. denticulatum, suggesting that these the independent evolution of multicellularity in brown algae. genes might be the molecular bases for generating their own Nature, 465(7298): 617–621 Cole K M, Sheath R G. 1990. Biology of the Red Algae. Cambridge, UK: carbohydrate products to form their own types of carrageenan Cambridge University Press for each species. Doty M S. 1988. Prodromus ad systematica Eucheumatoideorum: a tribe of commercial seaweeds related to Eucheuma (Solieriace- 3.7 Gene expression analysis related to sulfur metabolism ae, Gigartinales). In: Abbott I A, ed. Taxonomy of Economic Sea- pathways weeds, v II. La Jolla, CA: California Sea Grant Program, University According to the structure of β-carrageenan, κ-carrageenan of California, 159–207 Garg R, Patel R K, Tyagi A K, et al. 2011. De novo assembly of Chickpea and ι-carrageenan, these carrageenans had different percent- transcriptome using short reads for gene discovery and marker age of sulfate. So, in addition of the pathways about the car- identification. DNA Res, 18(1): 53–63 bohydrate metabolism, we focused on the sulfur metabolism Gordon-Mills E M, Johan Tas, McCandless E L. 1978. Carrageenans in pathway to find some clues about the carrageenan metabolism. the cell walls of Chondrus crispus Stack.(Rhodophyceae, Gigar- In this pathway, 8 differentially expressed KEGG orthologs are tinales). Phycologia, 17(1): 95–104 involved with the metabolism of sulfur, i.e., the genes encod- Greer C W, Yaphe W. 1984. Characterization of hybrid (beta-kappa- ing 3'-phosphoadenosine 5'-phosphosulfate synthase (PAPSS) gamma) carrageenan from Eucheuma gelatinae J. Agardh (Rhodophyta, Solieriaceae) using carrageenases, infrared and [EC:2.7.7.4, 2.7.1.25], 3'(2'), 5'-bisphosphate nucleotidase 13C-nuclear magnetic resonance spectroscopy. Bot Mar, 27(10): (cysQ) [EC:3.1.3.7], and so on (Fig. 6b). However, due to the lack 473–478 of molecular studies on Family Solieriaceae and even for red Imeson A P. 2000. Carrageenan. In: Phillips G O, Williams P A, eds. UK seaweeds, we did not find the exact enzymes that catalyze the Handbook of Hydrocolloids. Cambridge, UK: Woodhead Pub- carrageenan biosynthesis. lishing Limited, 87–102 Johnson M T J, Carpenter E J, Tian Z J, et al. 2012. Evaluating methods for isolating total RNA and predicting the success of sequencing 4 Conclusions phylogenetically diverse plant transcriptomes. PLoS One, 7(11): In this study, we have sequenced, assembled and charac- e50226 terized the transcriptomes of three commercial-valuable car- Kanehisa M, Goto S, Kawashima S, et al. 2004. The KEGG resource for rageenan-producing red algal species, B. gelatinus, K. alvarezii deciphering the genome. Nucleic Acids Res, 32(Database issue): and E. denticulatum using Illumina paired-end RNA sequenc- D277–280 Li Bo, Dewey C N. 2011. RSEM: accurate transcript quantification from ing. The de novo assembly of the transcriptomes generated a RNA-Seq data with or without a reference genome. BMC Bioin- set of unigenes. Nearly 30% of assembled unigenes were anno- formatics, 12: 323 tated by Blast searches. Functional classifications in terms of Li Tianyong, Ren Lei, Zhou Guan, et al. 2012. A suitable method for GO, COG and KEGG have been performed and have identified extracting total RNA from red algae. Transactions of Oceanology many genes related to multiple cellular components and path- and Limnology (in Chinese), 4: 64–71 ways. Differential expression analyses among the three species Li Ruiqiang, Zhu Hongmei, Ruan Jue, et al. 2010. De novo assembly of have further identified 861 KEGG orthologs which might con- human genomes with massively parallel short read sequencing. Genome Research, 20: 265-272 tain the main genes regulating the biosynthesis of carrageenan Lobban C, Harrison P. 1994. Seaweed Ecology and Physiology. Cam- with different types and possessions. bridge: Cambridge University Press To our knowledge, this is the first effort on assembling the Luo Ruibang, Liu Binghang, Xie Yinlong, et al. 2012. SOAPdenovo2: an transcriptome of B. gelatinus, K. alvarezii and E. denticulatum empirically improved memory-efficient short-read de novo as- and analyzing the carrageenan biosynthesis at the whole tran- sembler. GigaScience, 1: 18 scriptome level. The data presented here will provide valuable McHugh D J. 2003. A Guide to the Seaweed Industry: FAO Fisheries resources for functional genomic studies and investigation of Technical Paper No. 441. Rome: FAO Moriya Y, Itoh M, Okuda S, et al. 2007. KAAS: an automatic genome an- mechanisms underlying the biosynthesis of carrageenan of B. notation and pathway reconstruction server. Nucleic Acids Res, gelatinus, K. alvarezii, E. denticulatum, Family Solieriaceae and 35(Web Server issue): W182–185 even red algae. Mortazavi A, Williams B A, Mccue K, et al. 2008. Mapping and quanti- fying mammalian transcriptomes by RNA-Seq. Nature Methods, References 5(7): 621–628 Nagalakshmi U, Wang Z, Waern K, et al. 2008. The transcriptional land- Altschul S F, Madden T L, Schaffer A A, et al. 1997. Gapped BLAST and scape of the yeast genome defined by RNA sequencing. Science, PSI-BLAST: a new generation of protein database search pro- 320(5881): 1344–1349 grams. Nucleic Acids Res, 25(17): 389–402 Romualdi C, Bortoluzzi S, D'Alessi F, et al. 2003. IDEG6: a web tool for Ashburner M, Ball C A, Blake J A, et al. 2000. Gene ontology: tool for the detection of differentially expressed genes in multiple tag sam- unification of biology. The Gene Ontology Consortium. Nature pling experiments. Physiological Genomics, 12(2): 159–162 Genetics, 25(1): 25–29 Rudolph B. 2000. Seaweed products: red algae of economic signifi- Barrero R A, Chapman B, Yang Y F, et al. 2011. De novo assembly of cance. In: Martin R E, Carter E P, Flick Jr G J, et al., eds. Marine Euphorbia fischeriana root transcriptome identifies prostratin and Freshwater Products Handbook. Lancaster, PA (USA): Tech- pathway related genes. BMC genomics, 12: 600 nomic Publishing Co, 519–529 Berrier A, Ulbricht R, Bonn M, et al. 2010. Ultrafast active control of lo- Santos G A. 1989. Carrageenans of species of Eucheuma J. Agardh and calized surface plasmon resonances in silicon bowtie antennas. Kappaphycus Doty (Solieriaceae, Rhodophyta). Aquat Bot, 36(1): Optics Express, 18(22): 23226–23235 55–67 Chiovitti A, Liao M L, Kraft G T, et al. 1995. Cell wall polysaccharides Sayers E W, Barrett T, Benson D A, et al. 2012. Database resources of the from Australian red algae of the family Solieriaceae (Gigartinales, National Center for Biotechnology Information. Nucleic Acids Rhodophyta): Iota/kappa/beta-carrageenans from Melanema Res, 40(Database issue): D13–25 dumosum. Phycologia, 34(6): 522–527 Tatusov R L, Fedorova N D, Jackson J D, et al. 2003. The COG database: SONG Lipu et al. Acta Oceanol. Sin., 2014, Vol. 33, No. 2, P. 45–53 53

an updated version includes eukaryotes. BMC Bioinformatics, 4: and industry. Trends Food Sci Tech, 13(3): 73–93 41 Wang Xiaowei, Luan Junbo, Li Junming, et al. 2010. De novo character- Trapnell C, Williams B A, Pertea G, et al. 2010. Transcript assembly and ization of a whitefly transcriptome and analysis of its gene ex- quantification by RNA-Seq reveals unannotated transcripts and pression during development. BMC Genomics, 11: 400 isoform switching during cell differentiation. Nature Biotechnol- Yuan Yuan, Song Lipu, Li Minhui, et al. 2012. Genetic variation and ogy, 28(5): 511–515 metabolic pathway intricacy govern the active compound con- Tseng C K, Lobban C S, Wynne M J. 1981. The Biology of Seaweeds. Ox- tent and quality of the Chinese medicinal plant Lonicera japoni- ford: Blackwell Sci Publ Usov A. 1998. Structural analysis of red seaweed galactans of agar and ca thunb. BMC Genomics, 13: 195 carrageenan groups. Food Hydrocolloids, 12(3): 301–308 Zdobnov E M, Apweiler R. 2001. InterProScan—an integration platform van de Velde F, Knutsen S H, Usov A I, et al. 2002. 1H and 13C high resolu- for the signature-recognition methods in InterPro. Bioinformat- tion NMR spectroscopy of carrageenans: application in research ics, 17(9): 847–848