<<

ARTICLE

Received 27 Jan 2015 | Accepted 20 Mar 2015 | Published 24 Apr 2015 DOI: 10.1038/ncomms7986 OPEN genomes provide novel insight into biology

Naihao Ye1,*, Xiaowen Zhang1,*, Miao Miao2,3,*, Xiao Fan1,*, Yi Zheng2,3, Dong Xu1, Jinfeng Wang2, Lin Zhou2,3, Dongsheng Wang1, Yuan Gao2,3, Yitao Wang1, Wenyu Shi2,3, Peifeng Ji2,3, Demao Li4, Zheng Guan1, Changwei Shao1, Zhimeng Zhuang1, Zhengquan Gao5,JiQi6 & Fangqing Zhao2

Seaweeds are essential for marine ecosystems and have immense economic value. Here we present a comprehensive analysis of the draft genome of , one of the most economically important . The 537-Mb assembled genomic sequence covered 98.5% of the estimated genome, and 18,733 protein-coding genes are predicted and annotated. Gene families related to cell wall synthesis, halogen concentration, development and defence systems were expanded. Functional diversification of the mannuronan C-5-epimerase and haloperoxidase gene families provides insight into the evolutionary adaptation of polysaccharide biosynthesis and iodine antioxidation. Additional sequencing of seven cultivars and nine wild individuals reveal that the genetic diversity within wild populations is greater than among cultivars. All of the cultivars are descendants of a wild S. japonica accession showing limited admixture with S. longissima. This study represents an important advance toward improving yields and economic traits in Saccharina and provides an invaluable resource for plant genome studies.

1 Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China. 2 Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China. 3 College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. 4 Tianjin Key Laboratory for Industrial Biosystems and Bioprocessing Engineering, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China. 5 School of Life Sciences, Shandong University of Technology, Zibo 255049, China. 6 State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to N.Y. (email: [email protected]) or to J.Q. (email: [email protected]) or to F.Z. (email: [email protected]).

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 1 & 2015 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986

rown macroalgae () belong to the phylum Strameno- Genome annotation and gene prediction. By combining piles, a phylogenetic lineage that is distantly related to homology-based and de novo approaches, we identified B40% of Bterrestrial plants and animals1. These macroalgae exhibit repetitive elements in the assembled S. japonica genome, which is differentiated tissues during development, making them distinct nearly twice as many as in E. siliculosus (22.7%) (Supplementary from unicellular . The extensive submarine kelp Fig. 3). Among these repeats, 63.1% could be classified into forests are the largest biogenic structures within benthic marine known repeat families, with long-terminal repeats (LTRs) con- communities, occupying 70% of the total biomass in cold and stituting the most abundant repeat family, representing 21.1% of temperate marine systems2. Specifically, Laminariales kelp the repetitive sequences. Long interspersed elements were the are essential for ecosystems and are economically important as a second largest family found in S. japonica and could be sub- marine crop. These kelps are cultivated in East Asia and harvested divided into Jockey, RTE, L1 and CRE elements. Compared with from natural populations in Europe and North America to the LTRs (for example, Copia and Gypsy), the L1 and the CRE produce of alginate, which is used in a wide variety of elements and the RTE-2 and the RTEX-1 retrotransposons pharmaceuticals, foods and industrial applications3. These kelps identified in S. japonica shared a high sequence similarity may also provide an important component for the future (495%), indicating that they were recently inserted into the renewable energy mix4. Furthermore, Laminariales are the S. japonica genome and are likely to be active in the amplification largest accumulators of iodine, contributing tremendously to (Fig. 1). Previous studies have demonstrated that RTEs exhibit the biogeochemical iodine cycle, and thus have a significant a mosaic distribution throughout the Animalia and impact on atmospheric chemistry5. In particular, S. japonica, one tend to spread via horizontal gene transfer (HGT)11,12. of the most common Laminariales alga along the northwest A comprehensive search of the RTE RT domains in published coasts of the Pacific Ocean6, is becoming the most economically algal genomes revealed that E. siliculosus possesses a significant important in the sea farming cultivation industry. On the fraction of RTE elements (1.6%) that share high sequence basis of artificial seedling-rearing techniques, S. japonica sea similarity with those found in S. japonica, whereas the farming has evolved rapidly to make it the most common RTEs identified in two green algal genomes (Volvox carteri seaweed in the world7, with most of this seaweed used as food and and Chlamydomonas reinhardtii) were relatively divergent. raw industrial materials. The output of S. japonica reached 7.9 A phylogenetic analysis of the RTE elements showed that they million tons (dry weight) and had a market value of more than were not consistent with the species tree, indicating the lateral US $1.3 billion in 2012 (ref. 8). gene transfer of the RTE elements in these organisms. Despite the ecological, economic and evolutionary importance (Supplementary Fig. 4). of kelps, we currently have a very limited knowledge of their To more accurately annotate the S. japonica genome, we genetic architecture and metabolism, including their iodine performed deep transcriptome sequencing of both female concentration system and alginate-producing pathway, a limita- and sporophytes, which generated 11.3 Gb of tion that hinders both genetic research and mariculture practices. RNA sequencing data. By combining homologue-based, ab initio Furthermore, years of interspecific hybridization and biomass and transcriptome-based approaches, we predicted 18,733 yield-targeted artificial selection have not only degeneratted the protein-coding genes (gene models) in the S. japonica genome, economic characteristics of these kelps but have also narrowed which is greater than the number of genes predicted in their genetic variation. The recently sequenced small filamentous E. siliculosus7 (16,256). The average gene size (exons and model brown alga siliculosus9, a close relative of kelp introns) was 9,587 bp, with 6.54 exons per gene and an average species10, greatly facilitated the functional and evolutionary intron size of 1,057 bp, which is significantly larger than in investigation of S. japonica in this study. Here we report a draft E. siliculosus (average intron size, 703 bp). More than 90.7% of the genome sequence of the female gametophytes of the artificially predicted coding sequences were supported by transcriptome cultivated S. japonica strain Ja and the resequencing of seven wild sequencing data (Z5 reads), indicating the high accuracy of populations and nine representative cultivars of Saccharina the gene predictions in the sequenced S. japonica genome species. Comparative genomic analyses of these data provide (Supplementary Fig. 5). To independently evaluate the genome novel insight into the evolutionary adaptation and the functional diversification of the polysaccharide biosynthesis and iodine concentration mechanisms of S. japonica. The Saccharina 2 5 Gypsy-1 L1-1 genomic sequence that was obtained represents an important Gypsy-2 CRE-1 4 Jockey-1 advance toward securing bioproducts and biofuels from 1.5 Copia-1 Jockey-2 macroalgae and provides an invaluable resource for plant gene Copia-2 RTE-1 and genome evolution studies. Copia-3 3 RTE-2 1 ERV1 RTE-3 Results 2 RTEX-1 Bases per 1 kb Genome sequencing and assembly. Genomic DNA was extracted 0.5 Bases per 1 kb genomic sequence from filamentous female gametophytes of S. japonica strain Ja genomic sequence 1 cultured in Qingdao, China. The DNA sequencing reads were 0 0 obtained using both Roche and Illumina technologies and were 0 6 12 18 24 30 36 42 0 6 12 18 24 30 36 42 assembled after filtering out the low-quality and duplicated reads. Substitution rate (%) Substitution rate (%) A total of 84 Gb of high-quality Illumina reads and 10 Gb of PacBio long reads (Supplementary Table 1) were generated, Figure 1 | The repetitive elements in S. japonica. (a) Age distribution of representing an B178 Â coverage of the S. japonica genome, the LTR elements in S. japonica.(b) Age distribution of the LINE elements in with an estimated size of 545 Mb based on kmer depth dis- S. japonica. The average number of substitutions per site for each tribution analyses and flow cytometry (Supplementary Figs 1 fragmented repeat was estimated using the one-parameter Jukes–Cantor and 2). Approximately 98.5% (537 Mb) of the genome was model. Insertions and deletions were excluded from the substitution rate de novo assembled, consisting of 13,327scaffolds (Z500 bp) with calculation. The per cent substitution from the consensus roughly a scaffold N50 length of 252,007 bp (longest, 1.47 Mb) and a correlates with the age of the repetitive elements. LINE, long interspersed contig N50 length of 58,867 bp (Supplementary Table 2). elements.

2 NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications & 2015 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 ARTICLE completeness, we found that 96.5% of de novo assembled each genome, respectively. The higher number and content of transcripts (25,010 out of 25,914) could be aligned to the these gene families in S. japonica leads to a hypothesis that gene assembled genome. Among the annotated genes, 86.1% of the family expansion events occurred more frequently in S. japonica encoded proteins had homologues in the NCBI non-redundant than in E. siliculosus. Among the shared gene families, 2,267 of protein database, with 70.1% of the putative proteins showing them only had one copy in each genome, whereas 863 (20%) best-hit matches to E. siliculosus. Previous studies have revealed families were found to have more gene copies in S. japonica that E. siliculosus possesses an integrated virus (EsV-1) in its (9,562 of the total genes) than in E. siliculosus (4,666), but only genome9,13. In this study, we did not find any homologous genes 652 families were identified to be the opposite (3,753 for of EsV-1 in the Saccharina genome. However, using PFAM S. japonica versus 5,406 genes for E. siliculosus). These results domain searches, we identified 121 putative proteins in indicate that the majority of the amplified genes found in S. japonica containing the FNIP repeat domain (PF05725), S. japonica resulted from recent duplication events (with an which is B22 residues long and has previously only been found average synonymous substitution rate of 0.42; Supplementary in Dictyostelium and a few double-stranded DNA viruses. Similar Fig. 9), because the mean similarity among them (79%) FNIP repeat domains have also found in the E. siliculosus was higher than the similarity observed between the two genome. A phylogenetic analysis based on the FNIP repeat species (74%). domain revealed that the closest relatives of these FNIP sequences S. japonica and E. siliculosus also accumulated hundreds of came from a giant virus (Supplementary Figs 3B and 6) that gene families independently through gain/loss events after their infects the marine zooplankton Cafeteria roenbergensis13, diversification, which further contributed to the difference in suggesting an ancient association of brown with viruses. their gene content, with 527 genes gained in S. japonica and 629 gained in E. siliculosus. A total of 574 large families (Fig. 2c) with 10 or more genes from the two Phaeophyceae genomes were Comparative genomics. To investigate the gene content of selected for further consideration. Compared with E. siliculosus, S. japonica, phylogenomic analyses of 25 genomes from S. japonica has significant gene expansion in 58 families (Fisher Chromalveolata, , Glaucophyta, Rhodophyta, Chlorophyta exact test, corrected P-valueo0.05), including those families and higher plants were performed. First, 41 orthologues of single- involved in iodine concentration and the biogenesis and copy or species-specific gene duplications were identified remodelling of cell wall polysaccharides, for example, vana- (Supplementary Data 1; Supplementary Fig. 7A) for a phyloge- dium-dependent haloperoxidases (vHPOs) that catalyse the netic tree reconstruction using concatenated super genes after the oxidation of halides14, cellulose synthase (GT2 family) that removal of redundancy to avoid the effects of loss of paralogues. catalyses the terminal step of cellulose biosynthesis, mannuronan As shown in Fig. 2a, the phylogenetic tree divided the 23 C-5-epimerases (MC5Es) that epimerize D-mannuronate residues organisms into three main phyla, Chromalveolata, Rhodophyta into L-guluronate for alginate biosynthesis and alpha-(1,6)- and Viridiplantae. A total of 19,410 gene families were then fucosyltransferase (GT23 family) that polymerize GDP-fucose predicted from the 404,604 genes of S. japonica and the other 24 into the elongated fucan chain15 (Fig. 2d). Other examples come genomes (Supplementary Data 2). Among these gene families, a from families encoding the endo-1,3-beta-glucanase (GH81 Dollo parsimony analysis based on the genomes of the seven family), leucine-rich GTPase and Imm upregulated gene heterokontic algae revealed that many large-scale amplification families, which may be related to the development and defence events in different clades: 400 gene families were gained in the systems in (Fig. 2d). Endo-1,3-beta-glucanase ancestor of , 451 in Nannochloropsis, and strikingly, 1,240 hydrolyses the glycoside linkages of laminarin, the major families were gained in the common ancestor of S. japonica and storage polysaccharide in S. japonica, into oligosaccharides that E. siliculosus (Supplementary Data 3; Supplementary Fig. 7B). respond to tissue damage and provide protection against Consistently, these amplification events were also observed in a pathogens by triggering defence responses such as an initial K-means clustering of the families based on the gene abundance oxidative burst16. They are also thought to play important roles in of each species (Fig. 2a, right panel; Supplementary Fig. 8). In diverse physiological and developmental processes in plants such addition, some clades were found to have lost many gene families, as microsporogenesis, fertilization, seed germination and somatic namely, 499 families were lost in the clade and 813 in embryogenesis17. The leucine-rich GTPases of the ROCO family, Nannochloropsis. The loss of so many gene families greatly which are excellent candidates for recognition/transduction reduced the genome sizes of these species. Conversely, because events linked to immunity in Ectocarpus18, are another more gene families were gained rather than lost, brown algae potential family involved in defence and development. The experienced a large genome expansion during evolution. Imm upregulated genes are brown algae-specific gene families After the divergence of Phaeophyceae and Eustigmatophyceae, that were first discovered in E. siliculosus, and are related to the S. japonica and E. siliculosus together exhibited a significant development of the sporophyte and generations19. increase in their gene family number (1,240 gained versus 309 Imm upregulated 3 was found to be significantly upregulated in lost). Because of insufficient evidence of protein similarities to the gametophyte generation compared with the sporophyte other algae, most of these newly gained families in Phaeophyceae generation16. It was also found to be a female-biased gene in were classified as proteins with unknown function. It should be the brown alga Fucus vesiculosus20. In addition, this gene has noted that 102 annotated gained families were found to be weak similarity to BIP2, a gene that appears to be specifically significantly enriched (Po10 À 5) in the protein kinase or HeH or associated with the acquisition of a three-dimensional peptidase CA super family domains (or clans; Fig. 2b). By architecture in Physcomitrella21. In addition to these, several contrast, the genes from the 309 families that were lost in the two super families with diverse functions, including Cupin-like species showed a relatively good annotation and were enriched proteins, Ig-like proteins, C2H2 zinc-fingrer proteins and in the clans of the cupin domain (4.5% of 309 families), the cytochrome P450 were also expanded in S. japonica. alpha/beta hydrolase (2.9%), the glycosyl hydrolase family (1.9%) and a wide range of other clans (Fig. 2b). S. japonica and E. siliculosus share 4,309 gene families, which Expansion of vHPO genes is associated with functional comprise 17,379 genes in S. japonica and 14,136 genes in diversification. Brown macroalgal species are the most well- E. siliculosus, covering 92.8% and 85.5% of the gene content of known effective iodine accumulators among all living organisms

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 3 & 2015 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986

Log2(gene number) –Log10(P-value) 02468 >=10 0>=5+++– – – 100 Saccharina japonica Cupin domain CL0029 100 Ectocarpus siliculosus Bet v I family CL0209 Nannochloropsis gaditana Glycosyl transferase family A CL0110 Glutathione S-transferase C-term CL0497 100 100 Nannochloropsis oceanica Amino acid permease CL0062 100 Phaeodactylum tricornutum Alkaline phosphatase CL0088 Thalassiosira pseudonana Zinc beta ribbon domain CL0167 100 100 Aureococcus anophagefferens PAS fold domain CL0183 Phytophthora infestans Glycosyl hydrolase family CL0059 100 Peptidase family MA CL0126 100 100 Pythium ultimum Calcineurin phosphoesterase CL0163 Albugo laibachii Concanavalin A-like lectin CL0004 100 88 Saprolegnia parasitica Alpha/beta hydrolase CL0028 Leucine-rich repeat CL0022 100 PDDEXK domain CL0236 Emiliania huxleyi EF-hand domain CL0220 Guillardia theta Ubiquitin family CL0072 100 Chondrus crispus Peptidase family CA CL0125 100 Pyropia yezoensis HeH domain CL0306 100 Porphyridium purpureum Protein kinase domain CL0016 DMT domain CL0184 98 Cyanidioschyzon merolae Glycosyl transferase family B CL0113 Galdieria sulphuraria Glycosyl hydrolase family CL0058 95 100 Chlamydomonas reinhardtii von Willebrand factor type A CL0128 100 Volvox carteri Major facilitator superfamily CL0015 Tudor domain CL0049 Coccomyxa subellipsoidea 100 BTB/POZ domain CL0033 100 Arabidopsis thaliana Transcription initiation factor CL0012 Physcomitrella patens C2H2-type zinc finger CL0361 Cyanophora paradoxa Helix-turn-helix domain CL0123 K-means clustering of genefamilies Sj&Es Sj Es

C249 C305 Log2(gene number) 0>=8246 C424 C566 C75 Polymorphic outer membrane protein C23 Peptidase S8 and S53 C30 C298 Dynein heavy chain C75 C30 C252 Endo-1,3-beta-glucanase C249 C348 Glutathione S-transferase C252 C147 LRR-GTPaseof the ROCO family C348 C2413 C1358 Dynamin-related protein C274 Mono/diacylglycerolacyltransferase C305 C514 Cupin-like protein C424 Cellulose synthase C298 Mannuronan C-5-epimerase C277 Imm upregulated 3 C147 C277 Ig-like protein, group 2 C417 C2H2 zinc finger protein C566 C1409 C417 Vanadium-dependent iodoperoxidase C514 Cytochrome P450 C1358 C274 Acetyltransferase C2413 Alpha-(1,6)-fucosyltransferase C1409 S.japonica C23 E.siliculosus Sj Es Other 23 taxa

Figure 2 | The comparative genomics of S. japonica and 24 other algae and plant genomes. (a) The phylogenetic tree was constructed based on the concatenated sequences of 41 single-copy genes in all of these genomes after the removal of species-specific gene duplications using maximum-likelihood methods. The coloured squares denote Chromalveolata (brown), Rhodophyta (red), Viridiplantae (green), Rhizaria (grey) and Glaucophyta (blue). A K-means clustering of the families based on the gene abundance in each species is shown in the right panel; each column represents a family and each row represents one species. The clustering is sorted according to the order of the left tree. (b) The domain-based annotation of the gained/lost families in S. japonica or E. siliculosus or their latest common ancestor. Fisher’s exact test of the domain enrichment of the families was compared against the total number of annotated domain in both of the genomes. (c) A comparison of the gene abundances between S. japonica and E. siliculosus in their shared families. The pie charts at each node represent the ratio of S. japonica genes (blue) to E. siliculosus genes (yellow) in each gene family. (d) The copy number differences of the genes in each family within which S. japonica (the first column) underwent more gene amplifications than E. siliculosus (the second column). and are major contributors to the global biogeochemical iodine of hydrogen peroxide, leading to the halogenation of various cycle, displaying an average iodine content of 1% (up to 5%) of organic substrates14.InLaminaria, the identified vHPOs their dry weight, representing B30,000 times the concentration comprise two large multigenic families encoding vanadium- of this element in seawater22. In addition, these species are the dependent bromoperoxidases (vBPOs) and iodoperoxidases only known organisms to use inorganic iodide as an extracellular (vIPOs)24. Previously, vIPOs had mostly been found in antioxidant in a living system23. Most iodine compounds are species, and they are characterized by the novel chelated by apoplastic macromolecules and accumulate in the biochemical function of showing strict specificity for iodide apoplast of the cortical cell layer, which protects the thallus oxidation (Fig. 3a). surface from both aqueous and gaseous oxidants23. However, In the S. japonica genome, 17 vBPO and 59 vIPO genes were the mechanisms of iodine concentration and antioxidation are identified and annotated, and a phylogenetic analysis revealed not well known and are presumably linked to the presence that all of the vHPO genes form a monophyletic group sharing a of particular vHPOs. In algae, vHPOs are a particular class of common ancestor with the vCPO (vanadium-dependent peroxidases that catalyse the oxidation of halides in the presence chloroperoxidase) genes of fungi, after which they evolved

4 NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications & 2015 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 ARTICLE

11 12 13 SC750 vBPO Oxidative 14 9 10 Atmosphere O3 stress

SC122 vIPO 16 17 8 9 10 11 12 13 14 15 18 – I– I + H2O2 21 22 vBPO vIPO SC1051 vIPO 19 59 57 20 58 OH– + HIO Cell 28 29 31 SC493 vIPO 27 43 30 I– + H+ + HIO 32

SC1171 vIPO 12 3456 7 I I + H O 2 2 2 5′ 3′

0 kb Apoplasm 10 kb 20 kb 30 kb 40 kb 50 kb 60 kb 70 kb 80 kb 90 kb Seawater 100 kb110 kb120 kb130 kb140 kb150 kb160 kb170 kb180 kb190 kb200 kb210 kb220 kb230 kb240 kb250 kb260 kb (0.3 μM) (10–30 mM)

Red algae Bacteria Fungi Brown algae Laminariales Colour key vIPO53 vBPO3 0510 vIPO6 ln(fold change) vIPO1 vIPO26 vIPO5 vIPO31 vIPO8 vCPO vIPO46 P vIPO38 y A r l G en t vIPO32 e a o r Ma e n p vBPO6 u a Cu h g m r n o i a r a vBPO1 a r n v a po n u d t i Z M o l r d vBPO8 a 8 o r m i 4 a t r t y 5 6 1 b h 1 g i i m 1 1 7 Z e e y a ci 0 O n O 1 vIPO42 o c O 1 2 l O l a o i - o P e P 1 be i n r O 1 I P a p s I P O r s e I y a I O 1 o v P p v l g P vBPO13 p v I li r z g e v I a a th e o P O 8 q a v I 43 a r a v l r a a e a n a P u c v 9 g c I O O e a c a 2 i c i vBPO12 m a a a t i P ic v P O P t i c a I I l o n c V n 27 a a l s n i i n i v r 1 n i c a v IP O3 n V o O c s C o i 9 y 3 o n n t i i o c v P 4 a P 0 vBPO16 A p n i a a v z 1V p I I s V P p o o c 3 c n o a C p c v i v a o n a O O a a i O a V C j p i e P j a n r v r j p o c a P a y a C C ap a n i I o IP 1 vBPO10 o V P O j j c 3 C a a p o n a i a o n v p v P aj j ic c i r s CP P O a a n O n a a n a p o h Ac t O n j a j a i P n 8 r O i o n i a c I l ei v i a p c 2 vIPO48 r o r r j i a p v r a s I O 1 in in n a a pon ni O ce a a P a r i j n i r a r n n i j s y v 2 ar r a a P 9 O a i o r ca I h a j o l I h h r a a po i v 2 m l P h a n p a vIPO41 2 c h i a c O c a c h a n a n j a a h O c h r i a h i on c P c c c j r I G c r s 100 c h a r n c a p i M r i lor p 1 c c i a n v n a c c c a a a c a r a n j e a a .3 h a h i o a vIPO36 s c i Saccharina vIPO13 japonica a a n r c S c h a p o i M sm S S ac c i c i f C l Sa ha S a a n l a 5 S S a c c r c in j a h r B 7 S c c a a h r o 8 v o i I 100 S a c a p 4 vIPO33 iba n a C a V 100 a c h S c a n O d c r S a h i ja P 2 ct Py r h 1 in B S c a c ar a vI 4 er r u a 10 a P 0.1 S c S c h in a O vIPO52 ze op s n 1 O a a c r c IP a ia c g 7 v S S c a ni v 0 vBPO xa ris ii c B 1 a ch o ca 5 C n y V P 100 100 S p i O o th ez pu B B O ac ja on IP 22 vIPO35 ra in o s P P S a p v O C llin if en V O O 100 n ja ca vIP or a a s B 1 100 ri a ni a 9 all p cie is P in po ic O5 vIPO40 in ilu n v O 1 ha ar a on IP Co a li s B 0 c h j p v ra Of fe V P 0 c c a ja a 20 llin fic ra B O Sa ac rin na nic PO vIPO39 a in P a i o vI Brown algae V r p vIPO Co Off alis BP O S ch ha ja ica ra ici V O 1 ac cc ina on llin na BP 2 0 100 S a ar jap 19 vIPO23 a lis 0 S h a IPO Co pilul VB O2 acc arin ica v rallina ifera PO 100 S ch pon 58 pilu VBP 1 Sac ina ja vIPO vBPO17 Lauren lifera O3 char ponica cianip VBP 1 Sac ina ja O21 Alte ponicav O1 100 00 100 cchar nicavIP vIPO22 romonas BPO 100 100 Sa rina japo sp. SN2 VB 100 Saccha IPO57 Synecho PO1 arina japonica v vBPO2 coccus sp. CC9311 VBP 100 100 Sacch O2 digitata VIPO1 Synechococcus sp. WH 8020 VBPO1 vIPO54 100 Laminaria digit J VBPO3 100 ata VIPO2 sp. Heron Island Saccharina vBPO4 Leptolyngbya VBPO2 100 Sa japonica vI is marina Sa ccharin PO46 aryochlor VBPO cchar a japoni vIPO30 Ac E 5410 O13 100 Sa ina ja ca vIPO CME vBP 1 100 cch poni 52 s sp. C nica O1 100 S arin ca vIP vIPO29 chlori japo vBP 10 100 ac Sa a jap O54 ryo na a 100 c c on Aca ari nic PO 100 ha cha ica acch japo vB 12 Sa rin rin vIP vIPO14 S ina ica PO 4 cch a ja a ja O51 har pon vB O1 ar po pon acc ja ica P O9 Sa S in nic ica vIPO24 S rina on vB 7 00 S c ac a j a v vIP ha ap ca vBP PO 1 a ch ch ap IP O3 c j i 100 1 S c a o O 7 Sac na on ca VB O6 0 S a ch ri ar nic 39 vBPO11 ari ap ni a P 5 0 a cc ar na ina a h j o t B O 100 S c v cc a p ta V 4 a c ha in ja ja IPO4 Sa rin ja gi ta BP O 3 S c h r a po po vBPO5 ha a di a V P O S a c a in ja n n 0 c in it B O4 100 S a c h ri a p ic ic ac ar ia g ta V P 0 S c c a n o a a h r i a B P 0 100 S a h r a ja n v vBPO9 S c a d t a 1 a c i p v IP c in a i t V B S a c h a n ja on ic IP O a ri ig a v 8 7 S c c c a ri a p a O 4 S m d it ta a 5 S a c h n i v 3 a a g 1 O O 6 a c c h r ja o c I 5 vIPO10 L in a i a c 1 a h a in a n a P i it i O P P 2 c c a r p O m r d O 3 Sa c j i v g n B O S S c h a i a a o c 3 a a P B 1 S S S r n I n a i o P O 6 c r p n a P 4 i v O S h in L i v P a a a j vIPO50 r d B a i a a i O p 1 a a o v B O 2 h n c P c a a O5 7 r v P ac c I m a B O c p a a a c c a n a P 3 c ja a n i j v 55 a r i a c P O c o B 1 c n i r a i c V B P O c i O L P c c i j v a i c r p n c a c a h h n j a O a vIPO25 P 6 V B h I m c n v P h h i a a n n n a B O h o i P a i c B n a p i o v a a c ni t h a p B P v O L r o a v a a j on n B a a P a I a ap V m o p a t r r I a ni r o i 5 p v a r r j P t c a v r i c i a a a v v vIPO37 p i h o i a i a B i i n j n j n O a r i i c s n a L c a j t n n a p o I i i a n c c v i P j p n a a ic 1 a a c ig a c i n a a a p o n a v a a g n a O a a i c n j d i c a I n su a o n i i i a j P vIPO47 j v Brown algae n i 2 S n r i po o j j j a c d n a ja i c ap a n i v r a a o n o a a I O r a p n p c P i a l j j o p p i I a a n a p o a p h r a p a P O 4 i o o c i ni v j p o 1 vIPO34 o o h h r p o c a a p O r a o n a I j p 3 c c a cu o v P c a n n n n i o n a a j a 6 i n n i 7 c a I l a h i j p c v O i i P a j i a ac n n n i r c i c i ca I vIPO19 S c c m i a a a n si c P O 4 S a j S a r i a av i a a n c m a O 4 ac h a i r n v 4 s La i a n a v a r v v c i I 5 vIPO9 S h r v 3 u P n I L a r I I I c c ha a i v P 3 P IP P P p O h a r a c I c r O h O O O c P vBPO a h O 4 vIPO28 S a c c c 2 c O 7 h 2 2 a 5 S c ca a 2 5 c 3 6 c 3 3 S a o 4

S a vIPO56 t c 8 S

c a S

E S vBPO14 vIPO49 vIPO17 vBPO15 vBPO7 vIPO18 vIPO16 Saccharina japonica Laminaria digitata Bacteria MG FG BB MB JS HS DB Ectocarpus siliculosus Red algae Gametophyte Sporophyte Figure 3 | The vBPO and vIPO genes involved in halogen metabolism in S. japonica. (a) A schematic representation of the iodine uptake and release in Laminariales in response to oxidative stress involving vBPOs and vIPOs. The coloured squares denote the homologous genes found in red algae (red), bacteria (pink), fungi (purple), brown algae (orange) and Laminariales (blue). The vIPOs are only found in Laminariales species. (b) The genomic organization of one vBPO and four vIPO gene clusters, which are arranged in tandem order. The numbers in the different coloured boxes represents the vBPO/vIPO gene names. The hypothetical proteins are shown in grey boxes. SC, scaffold. (c) The phylogenetic analysis of the vHPO gene families. Five recent tandem duplicated gene clusters are indicated with circles of different colours. (d) The heat map of the expression profiles of the vBPO and vIPO genes in gametophytes, juvenile sporophytes and different tissues of adult sporophytes of S. japonica. Recent tandemly duplicated vHPO genes, indicated with coloured circles, show obvious expression diversification. BB, basal blade; DB, distal blade; FG, female gametophytes; HS, holdfast and stipe; JS, juvenile sporophyte; MB, middle blade; MG, male gametophytes. independently in red and brown algae (Fig. 3b; Supplementary shown to be efficient for switching to the specialized iodine Table 3), which is consistent with a previous study that found that metabolism related to antioxidative capacities26. The expression vIPOs and vBPOs were paralogues resulting from an ancestral of 59 vHPO genes was determined in S. japonica gametophytes, gene duplication24. Recently, two bacterial vIPO genes were juvenile sporophytes and in different tissues of adult sporophytes, found in the flavobacterium Zobellia galactanivorans, a marine including the holdfast, stipe, basal blades, middle blades and bacterium associated with macroalgae25. A phylogenetic analysis distal blades (Supplementary Table 4). Notably, the vHPO genes showed that these two bacterial vIPO genes evolved identified in S. japonica showed diverse expression patterns in independently from eukaryotic algal vHPO (Fig. 3b). A pairwise different tissues and during different developmental stages. In similarity comparison of vHPOs revealed at least five blocks of particular, vHPO gene expression was observed to be significantly conserved gene clusters that are expected to have derived from upregulated in the gametophytes, which is the stage that is most recent tandem duplication events (Fig. 3b,c; Supplementary vulnerable to external stress in the entire life circle of S. japonica. Fig. 10). The transcriptional regulation of vHPO has been Among the sporophyte samples, the greatest number (54) and the

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 5 & 2015 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 amount of vHPOs were expressed in the juvenile sporophytes, have been acquired by the common ancestor of brown algae and followed by the distal blades (53). These results agree well with Haptophyceae (Supplementary Fig. 14). No Actinobacteria the largest quantity of iodine elements being found in the juvenile species harbours all of the brown algal MC5E, mannitol-1P sporophytes and distal blades in L. digitata23 and S. japonica dehydrogenase and mannitol-1-phosphatase homologues, (Supplementary Fig. 11). During the cultivation of kelps, the indicating that the abundant, complex polysaccharide pathways juvenile sporophytes and distal blades are more sensitive to observed in brown algae were likely acquired through multiple environmental stresses and are more readily infected by independent HGT or other non-canonical events (Fig. 4b). pathogens27. Notably, the expression of vIPOs is more specific than the expression of vBPOs, with only 66.7% of the tested vIPOs being expressed in all samples, compared with 82.3% Population genomics. To better understand the genetic diversity of vBPOs. One possible explanation for this pattern is that of the Saccharina population, we resequenced the entire genomes recently tandem duplicated vIPO genes showing high sequence of seven cultivated individuals from China and nine wild indi- identities and distinct expression patterns in gametophyte and viduals collected from Japan, Russia and Germany (Fig. 5a). We sporophyte, exhibit functional diversification in S. japonica first compared the number of mitochondrial differences in these (Fig. 3d). 17 Saccharina individuals and found that the cultivated popula- tions exhibited relatively low levels of genetic diversity compared with the wild populations. By contrast, S. latissima (C14) and a Carbohydrate pathways in S. japonica. Compared with land wild isolate (‘E’; as shown in Fig. 5b) possessed a much higher plants and green or red algae, brown algae exhibit unique modes genetic diversity than the other 15 Saccharina individuals, of carbon storage and cell wall metabolism9,15,28,29. Cellulose and showing 1,162 and 1,214 polymorphic loci in their mitochondrial trehalose are common in both algae and land plants. However, a genomes, respectively. S. latissima is a close relative of S. japonica, striking difference in brown algae is that they utilize mannitol and and the former is considered a native European species (Fig. 5b). laminarian for carbon storage and possess alginates and sulfated However, its close relative S. sp. E is also present along Russia’s fucans as cell wall polysaccharides. By reconstructing the Far Eastern coast. After aligning the reads to the reference Ja carbohydrate metabolism pathways in S. japonica and 14 other genome, we identified an average of 0.94 M single-nucleotide algal genomes, we found that S. japonica harbours the same variations (SNVs) and 96-K-small InDels in the cultivars and an carbohydrate pathways as E. siliculosus and is distinct from average of 2.27 M SNVs and 274-K-small InDels in the wild Aureococcus and Nannochloropsis in the alginate pathway and populations. In addition, considering the large genetic distance from diatoms in the mannitol and alginate pathways. The starch among S. latissima and S. sp. E and the reference Ja, we first built and sucrose pathways are absent in all four of the stramenopiles draft genome assemblies for S. latissima and S. sp. E and then species (Fig. 4a; Supplementary Fig. 12). employed NUMMER25 to align the assembled contigs and to call Notably, in S. japonica, gene expansion and duplication events the SNVs. Phylogenetic analyses based on genome-wide SNVs were observed at particular reaction nodes of the alginate and further confirmed that the diversity between any pair of the sulfated fucan pathways. Alginate is the major matrix component collected wild individuals was greater than the diversity among all of brown algal cell walls, providing an increased rigidity to the of the cultivars and that these cultivated individuals shared the stipe and holdfasts as well as flexibility to the blades3. The same ancestor with the wild individual (C6; Fig. 5b), indicating a different matrix and physicochemical properties of alginate restricted germplasm base and a very low genetic diversity within depend on the final step of epimerization, which is a reaction the principal Saccharina cultivars. A log likelihood STRUCTURE catalysed by MC5Es27. S. japonica was found to exhibit 105 analysis15 revealed that some crossbreeds (C17, C11 and C2) were MC5E genes, whereas only 28 were identified in E. siliculosus.In likely derived from the hybridization of S. japonica and addition, 43 of the MC5E genes present in S. japonica shared high S. longissima (C5). The number of groups identified within the sequence identities (485%) and were clustered on seven scaffolds wild Saccharina individuals increased with an increasing number (Fig. 4b), indicating recent tandem duplication events. The of K, with up to four groups at K ¼ 5, which was consistent with expansion of MC5Es in S. japonica may account for the species’ the phylogenetic analysis (Fig. 5b,c). extraordinarily high alginate content (up to 45% of the dry To detect the genome-wide signatures of artificial selection in weight)28 and high bending flexibility. cultivated Saccharina, we used a sliding window strategy to A phylogenetic analysis showed that the MC5E genes of brown estimate the theta-pi and theta-w values from both the cultivated algae were similar to the MC5E genes in bacteria (Fig. 4b), and wild populations and to perform Tajima’s D test (Fig. 5d; indicating that these genes may have undergone a non-canonical Supplementary Data 4–7). The regions showing Tajima’s D values evolutionary history and then expanded into several subfamilies that were lower or higher than 5% of all of the bins were through multiple duplication events. A prior study in considered to be candidate regions. We identified 122 regions E. siliculosus hypothesized that this evolutionary history involved exhibiting strong signals of a selective sweep in the cultivars, HGT in (Supplementary Fig. 13)27. A similar phenomenon was which encompassed 828 genes. A gene ontology-enrichment observed for another alginate-specific enzyme, GDP-mannose analysis revealed that the majority of the genes affected by dehydrogenase. One GDP-mannose dehydrogenase gene was artificial selection were related to carbohydrate transporters, found to be conserved in all Chromalveolata species, whereas the enzyme inhibitors and responses to external stimuli (Fig. 5e). other three clustered in a distinct clade that comprises only Average yield is one of the most important economic and bacterial species and E. siliculosus, indicating that these three selective criteria used in S. japonica cultivation and seedling enzymes were acquired via ancient HGT from bacteria before the selection. The efficiency and regulation mechanisms of carbon last common ancestor of brown algae (Fig. 4c; Supplementary fixation and carbohydrate metabolism are important for algal Table 5). In addition, the mannitol synthesis pathway, which is growth and body size. Six genes with known functions exhibiting catalysed by mannitol-1P dehydrogenase and mannitol-1- strong selective signals were significantly over-represented in the phosphatase, is not universally present in all stramenopiles. cultivated samples. For example, fructose-1,6-bisphosphate Both of these genes are absent from diatoms and but aldolase, which catalyses the reversible aldol cleavage of were surprisingly found in the Haptophyceae alga Emiliania. fructose-1,6-bisphosphate into DHAP and glyceraldehyde-3- Phylogenetic analyses suggest that the mannitol pathway may phosphate, functions in glycolysis or gluconeogenesis and in the

6 NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications & 2015 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 ARTICLE

ChlaChlamydomonasmydoomonam s rereinhardtiiinharrdtii 01 ChlaChlamydomonasmydoomonas rereinhardtiiinharrdtii 0022 VolvoxVolvox cartericararteart ri 0011 Alginate Sulfate fucan VolvVolvoxox ccarteriararteri 0022 CoccomyxaCoccomyxyxa subellipsoideasubellipsoidea 0011 Green algae CoccomyxaCoccomyxyxa subellipsoideasubellipsoididea 0022 and plants 82 ArabidopsisArabidopsiss thalianathaliana PhyscomitrellaPhyscomimiitreltret la patenspatens Mannitol ChlaChlamydomonashlalamydom monas rereinhardtiiinhardtii 0033 GDP-mannose VolvoxVolvvoxo cartericarteartarrteteri 0033 99 CyanidioschyzonCyanidioschyzon merolaemerolae 01 ChonChondrusdrus cricrispusspus Red algae PyropiaPyropia yezoensisyezoensis Branched β-1,3-glucan PhytophthoraPhPhytytophth hhora iinfestansnfestans Fructose-6-P 1001 Esi0101_0043Esi0101_004300 (Laminarin) SJ16894SJ16894 PhaeodactylumPhaeaeodactylum tricornutumtricornutum 01 Stramenopiles ThalassiosiraThalassiosiros a pseudonanapseudonana 01 AureococcusAurereococcus anophagefferensanophagefferens NannochloropsisNannochlchloropor sis gaditanagaditana 01 UDP-glucose NNannochloropsisannochlororopsis gadigaditanatana 0022 Sucrose 86 ProProchlorococcusochlorococccus marimarinusrinnus Cyanobacteria 94 CyanCyanidioschyzonidioschyzon meromerolaelae 02 Brown algae GaldieriGaldGaldieriaierier aasu susulphurariaulphulpphurariraaria Red algae Paenibacillus curdlanolyticuscurddlanolyyticus Red algae 999 Bacteria Trehalose Cellulose BlastopirellulaBlastopirellula marinamararina Green algae Fibrobacter succsuccinogenessuccinoginnogeneenenesnes Frankiaa sp.spp EuI1cEuI1c Actinobacteria Diatoms Esi0164_0053Esi0si0164164_0053 100 SJ03911SJ03J03911 SJ11024SJ110240 Stramenopiles Esi0051_0092Esi0si0051051_0092 Esi0051_0113Esi00051_0113 (brown algae) SJ11025SJ1111025 0.05 scaffold20 80 sctg495 scaffold354 SJSJ45SJ457070 scaffold513 776767 100 SJ06823SJ06823 SJ22069SJ22069 SSJ04550J045500 Esi0113_0051Esi01113_0051 Esi0017_0043Esi00017_0043 SJ15142SJ151424

Esi0340_0006EEsi0Ei0340_340 00066 Esi0016_0047EsEsi0016_004747 100 82 Sctg466 EEsi0315_0022si0315_000022 SJ0909SJ0909 Sctg45 96 SJ8882SJ88S 82 SJ22247SJ22247

SJ03359SJ03S 359 100 Esi0071_0099Esi000071_0099 SJ02850SJ02850 95 SJ02844SJ02844 Esi0Esi0071_0098071_1__0098

WPWP_031058944.1_031058944.1 WPWP_013421368.1_013421368.11 Actinobacteria 100% 100 EEsi0241_0023si0241__0023 SJ0981SJ09811 NP_077711.1NPNP_P_0777111.1 ESESV-1-226SV-1-2226 Virus 95% Esi0052_0064Esi00052_0064 100 100 90% SJ52SJ5SJ5209209 100 YP_002798298.1YP_YP_002798298.1 85%

YP_223381.1YPYP_223381.1 Identity YP_002870646.1YP_P_002870646.1 Bacteria WP_022642435.1WPWP_022642435.1 80% 0.06

Figure 4 | Polysaccharide biosynthesis and metabolism in S. japonica. (a) A comparison of the carbohydrate metabolic pathways among algal genomes. The width of the arrows represents the number of genes involved in the metabolic pathway. (b) A phylogenetic analysis and Circos plot of MC5E genes in S. japonica and E. siliculosus. The majority of the phylogenetically clustered MC5E genes present in S. japonica are in the assembled scaffolds in a tandem order. The brown nodes and red nodes in the tree denote the MC5E genes derived from S. japonica and E. siliculosus, respectively. The abbreviated words begin with ‘Esi’ and ‘SJ’ in the tree stand for the protein ids for E. siliculosus and S. japonica, respectively. Actinobacteria, Bacteria and Virus taxon names were represented by the NCBI protein accession numbers. The S. japonica and E. siliculosus scaffolds in the Circos plot are shown in brown and red, respectively. Green and red dashed lines represent the locations of MC5E-enriched scaffolds in S. japonica and E. siliculosus, respectively. The inner curved lines in the Circos plot indicate the sequence identity between a pair of MC5E genes. (c) The phylogenetic analysis of GMD, a key enzyme in the alginate biosynthesis pathway.

Calvin cycle30. Traditionally, FbA was assumed to only exist in the wild samples showed strong selective signals that may be the genomes of bacteria and fungi, but it has now been found in related to environmental adaptation and morphological changes. some species of Chromista and Chromalveolata, including the Wild brown algae exhibit a particularly high flexible morphology, brown algae E. siliculosus and S. japonica, giving rise to the and the physical properties of the marine environment are partly possibility that these genes were independently obtained from responsible for the algal shape. Compared with cultivated kelps fungi through ancient HGT events. on artificial floating rafts with suitable and consistent environ- A population-level analysis further revealed selection in the ments (for example, sufficient water depth), wild kelps show wild samples. A total of 566 genes embedded in selected regions greater morphological plasticity in response to harsh intertidal presented significantly elevated Tajima’s D values. Compared environments31. In land plants, several types of transmembrane with the cultivars, three transmembrane receptor kinase genes in receptor kinases transmit information regarding mechanical

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 7 & 2015 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986

E 100 C14 F Russia G E 100 F C17 C14 B 100 G C15 100 C15 Germany C5 B C6 100 C5 C17 C11 100 China C2 Japan C8 W C13 100 C11 C12 C8 100 C6 Ja C2 99 C3 100 100 W C13 C3 100 C12 100 Ja

0 8,000 16,000 24,000 32,000 Ribosomal RNA Coding genes Transfer RNA Non-coding VNTR

Wild Cultivar –1.742 K=2 1 <=0.001 2.220 GO:1901505 carbohydrate derivative transporter activity GO:0004857 enzyme inhibitor activity GO:0016829 lyase activity K=3 GO:0010576 metalloenzyme regulator activity 2 GO:0061134 peptidase regulator activity GO:0019208 phosphatase regulator activity GO:1901476 carbohydrate transporter activity K=4 1.055 GO:0019867 outer membrane 1 GO:0031090 organelle membrane GO:0016021 integral component of membrane GO:0009605 response to external stimulus GO:0033218 amide binding K=5 0 GO:0045156 electron transporter –0.255 GO:0008287 protein serine/threonine phosphatase complex Tajima’s D in wild GO:0016209 antioxidant activity GO:0016740 transferase activity –1 K=6 GO:0044455 mitochondrial membrane part GO:0033177 proton-transporting two-sector ATPase complex F E B G –2 –1 0123 W Ja CN CP C5 C2 C3 C6 C8 WN WP C14 C15 C17 C11 C12 C13 Tajima’s D in cultivars Figure 5 | The population genomics of cultivated and wild S. japonica populations. (a) The geographic locations of the cultivated and wild S. japonica samples used in this study. (b) The neighbour-joining tree of the genetic distances calculated using the genome-wide SNVs among 17 S. japonica individuals. The mitochondrial diversity covering SNVs is indicated with vertical lines in the seven cultivated and nine wild samples relative to the S. japonica JA reference sequence. (c) The population structure of 17 S. japonica individuals determined using the STRUCTURE program. Each individual is represented by a vertical bar, and each colour represents one population. (d) The distribution of the Tajima’s D values in both the wild and cultivated populations. The data points to the left and right of the vertical dashed lines (x ¼À1.742 and x ¼ 2.220), which correspond to the 5% left and right tails of the empirical Tajima’s D value distribution, respectively, are denoted as CN (cultivated negative) and CP (cultivated positive), respectively. The data points below and above the lower and upper horizontal dashed lines (y ¼À0.255 and y ¼ 1.055), which correspond to the 5% lower and upper tails of the empirical Tajima’s D value distribution, respectively, are denoted as WN (wild negative) and WP (wild positive), respectively. (e) The gene ontology-enrichment analysis of the genes in the selected CN, CP, WN and WP regions. forces from the cell wall to cytoplasmic effectors and are structure34. Laminarin is not only the major storage responsible for the development and morphogenesis32,33. polysaccharide in S. japonica, but it also play important roles in the defence system, in which it can be hydrolysed into Discussion oligosaccharides by endo-1,3-beta-glucanase. In addition, Brown algae are valuable economic and ecological resources and laminarin and its derived oligosaccharides can even activate the are an important evolutionary lineage because of the development defence response and protect against pathogens in terrestrial of specialized tissues and organs outside of plants and metazoans, plants16. Iodide was found to be concentrated by vHPO and was which makes their genomes exciting resources for comparative shown to bind to the cell wall polysaccharide components5,26, investigations in this field. The S. japonica genome is the providing extracellular antioxidants on the blade surfaces of S. first sequenced reference genome from kelps and the second japonica. The expansion of the endo-1,3-beta-glucanase and genome from brown algae. When compared with filamentous vHPO gene families in S. japonica may provide it with more E. siliculosus, S. japonica shows several unique characteristics, complex defence systems than Ectocarpus. Moreover, tissue such as more complex differentiation, large blades, a higher differentiation was more complex in S. japonica than in polysaccharide content and striking iodine concentrating abilities. Ectocarpus with respect to such events as the emergence of the In this study, some of the mechanisms underlying these holdfast, stipe and blades. In addition to several other expanded characteristics were identified using comparative genomics. gene families, including Cupin-like protein35, C2H2 zinc-finger S. japonica can grow up to dozens of metres with large blades, protein36 and LRR-GTPase of the ROCO family37 (Fig. 2d), and its cell wall polysaccharide content is much higher than which have been shown to have key roles in development in both Ectocarpus. We speculated that the expansion of the cellulose animals and green plant lineages, the newly identified synthase, mannuronan C-5-epimerase and alpha-(1,6)-fucosyl- development related imm upregulated 3 gene family may transferase gene families may be responsible for the formation of represent a new developmental mechanism in brown algae. the extensible cell walls of S. japonica that support its large blade Combining all of the above evidence further indicates that brown

8 NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications & 2015 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 ARTICLE algae evolved developmental complexity independently from K-mers) is a wrapper script for constructing a repeat library from sequencing reads higher plants and animals. using Jellyfish and Velvet. After the raw repeat library was constructed with these This work also presents the first comprehensive resequencing two programs, CAP3 was run on the raw data using the parameters ‘-o 20, -i 30 -p 80 -s 400 -j 31’ to assemble the TE fragments into full-length TEs and to remove data for wild and cultivated Saccharina individuals, providing the redundancy, which was removed from the raw repeat library by cd-hit-est with the basic materials for population genetic studies. The Saccharina parameter ‘-c 0.8’. However, this newly built library still included plentiful short cultivars that are currently farmed in China were mainly bred sequences (r150 bp) produced by RepARK, which may have been meaningless from the descendants of wild S. japonica that were introduced fragments that would have significantly lengthened the CPU running time of RepeatMasker (http://www.repeatmasker.org). To address this problem, we only from northern Japan. According to our results, only limited retained the sequences that were longer than 800 bp that were produced by crossbreeds were derived from the hybridization of S. japonica RepARK in the new library and used the RepeatModeler-detected short repeats and L. longissima. Therefore, the restricted germplasm base and (r800 bp) as substitute sequences. After applying RepeatMasker to mask the continuous selfing of these kelps have resulted in very low genetic S. japonica genome with this new repeat library, we further simplified the library by removing the sequences with fewer than 50 hits in the S. japonica genome. The diversity within Saccharina cultivars. The obtained genomic RepeatClassifier module in the RepeatModeler package was used to classify the information on wild Saccharina species can not only be used to identified repetitive elements based on Repbase. increase the genetic diversity through hybridization, but it can also provide a large source of candidate genes for further Insertion age of TEs in S. japonica. We first built a consensus sequence for each functional studies aimed at improving quality and yield. There are abundant TE using a combination of RepeatModeler and CAP3 (ref. 48). In total, other important issues that remain for the further exploitation of four full-length RTE elements (RTE-1, RTE-2, RTE-3 and RTEX-1), six LTR these kelps, such as the many biological features that are unique elements (Gypsy-1, Gypsy-2, Copia-1, Copia-2, Copia-3 and ERV1), two Jockey to Saccharina, that need to be studied in detail at the molecular elements (Jockey-1 and Jockey-2), one L1-1 and one CRE-1 element were reconstructed. For each TE, we aligned the matched hits identified by level. In addition, the identification of SNVs in representative RepeatMasker to the consensus sequence and calculated the proportion of pairwise wild and cultivated individuals will provide opportunities for differences between these hits (p). The average number of substitutions per site for marker-assisted breeding. each fragmented repeat was estimated using the one-parameter Jukes–Cantor model ( À 3/4ln(1 À 4/3p)). Methods Genome sequencing and assembly. DNA libraries harbouring 180-bp, 300-bp, Gene prediction and annotation. Homologue-based, ab initio and transcriptome- 500-bp, 800-bp, 3-kb and 5-kb inserts were subsequently constructed for based approaches were integrated to predict protein-coding genes in the S. japonica 49 S. japonica Ja and DNA libraries with 300-bp inserts were constructed for the other genome. For the ab initio gene predictions, Augustus (version 2.5.5) , 50 51 16 Saccharina individuals. These DNA libraries were paired-end sequenced on an GeneMarkES (version 3.0.1) and FGenesh were used to predict the protein- Illumina HiSeq 2000 sequencer. Two-large-insert 454 libraries (8 and 16 kb) were coding genes. The protein sequences of Arabidopsis thaliana, Ectocarpus siliculosus, also constructed for S. japonica Ja and were sequenced using 454 pyrosequencing. Chlamydomonas reinhardtii and Phaeodactylum tricornutum were downloaded For PacBio library construction, genomic DNA was sheared to 8 kb using an from Phytozome and used to align to the S. japonica genome with Exonerate v. ultrasonicator and was converted into the proprietary SMRTbell library format 2.2.0 (ref. 52) using a Protein2Genome model to predict gene structures. The RNA 40 53 using an RS DNA Template Preparation Kit. SMRTbell templates were subjected to sequencing data were mapped to the genome using Tophat , and Cufflinks was employed to assemble transcripts to the gene models. The gene evidence predicted standard SMRT sequencing on the PacBio RS system according to the manu- 54 facturer’s protocol. from the above three strategies was combined with EVidenceModeler into a non- We used the paired-end reads from the short-insert-size libraries (180, 300, 500 redundant consensus of gene structures. We masked all of the TEs from the and 800 bp) to assemble the genome into contigs using SOAPdenovo2 (ref. 38) genome before gene prediction and filtered out all of the short-coding ORFs with the multi-kmer option (-K 63 -m 81). We then aligned all of the available (o150 bp) supported by only ab initio methods. sequence data to these contigs and used the mate-pair information on the order of The functional annotation of protein-coding genes was achieved using BLASTp À 04 the estimated insert size (180 bp to 16 kb) to generate scaffolds using both (E-value 1E ) against the NCBI non-redundant protein sequence database. The SOAPdenovo and SSPACE32. First, the gaps that resulted from the scaffolding were annotation information on the best BLAST hit derived from the database was closed using GapCloser. Then, the PacBio long reads were mapped to the scaffold transferred to the gene set. Motifs and protein domains were determined with sequences using BLASR. PBJelly2 (ref. 39) was used to fill the gaps by generating InterProScan v5 (ref. 55) searches against the InterPro databases, including Pfam, consensus sequences of gap-spanning reads. PRINTS, PROSITE, PANTHER and SMART. The Gene Ontology IDs for each To remove potential bacterial contamination, the assembled contigs were first gene were obtained from the corresponding InterPro entries. All of the genes were subjected to BLASTX searches against the NCBI non-redundant protein database. mapped to KEGG proteins to determine the metabolic pathways. The contigs with the best-hit matches to bacteria were referred to as candidate bacterial contigs, and the contigs with the best-hit matches to E. siliculosus were Identification of S. japonica gene families and phylogenetic analysis. We referred to as authentic Saccharina contigs. For each contig, the GC content and identified gene families in S. japonica by performing an all-against-all BLASTp the open reading frame (ORF) density (the total length of ORFs in the contig search against the protein sequences of 25 algae and plants with available divided by the contig length) were calculated. The candidate bacterial contigs whole-genome information. Among these algae, 13 belonged to Chromalveolata showing an ORF density of Z60% or a biased GC content (460 or 40%) were o (S. japonica, E. siliculosus, Nannochloropsis gaditana, Nannochloropsis oceanica, filtered. The assembly statistics are shown in Supplementary Table 2. This whole Aureococcus anophagefferens, Phaeodactylum tricornutum, Thalassiosira pseudo- genome shotgun project can be accessed at http://124.16.129.28:8080/saccharina/. nana, Albugo laibachii, Phytophthora infestans, Pythium ultimum, Saprolegnia parasitica, Emiliania huxleyi and Guillardia theta CCMP2712); 1 was Rhizaria Transcriptome sequencing and analysis. RNA sequencing libraries were pre- (Bigelowiella natans); 1 was Glaucophyta (Cyanophora paradoxa); 5 were pared with RNA samples from female gametophytes and sporophytes using the Rhodophyta (Chondrus crispus, Cyanidioschyzon merolae, Galdieria sulphuraria, Illumina TruSeq RNA Sample prep Kit (Illumina). These RNA libraries were Porphyridium purpureum and Pyropia yezoensis); 3 were Chlorophyta (Chlamy- paired-end sequenced on an Illumina HiSeq 2000 sequencer. RNA sequencing domonas reinhardtii, Coccomyxa subellipsoidea and Volvox carteri); and 2 were reads were mapped against the S. japonica genome using TOPHAT v1.3.2 (ref. 40) plants (Arabidopsis thaliana and Physcomitrella patens). with annotated exons to perform transcript-guided mapping. The GFOLD v1.0.5 The global protein identities of each BLAST match were calculated using 56 (ref. 41) was employed to quantify gene expression levels with reads within or InParanoid to filter out matches exhibiting poor similarities (o20%) or poor 57 spanning exons. SOAPdenovo-Trans v 1.03 (ref. 42) was used to de novo assemble gene coverage (o50%). Gene families were identified by MCL with the ‘inflation’ the RNA sequencing reads using default parameters. The de novo assembled option as 1.2. Gene numbers were counted for each family and for each species, transcripts were BLATed against the assembled genome to independently access and were compared by using the K-means algorithm in R. To obtain a robust genome completeness. The inGAP package43,44 was used to visualize and manually species tree, redundant sequences (90% identity or more) from the same organism 58 check read mapping of target regions. were removed using CD-HIT , then, homologue clusters were predicted by comparing each pair of the 25 algal and plant genomes and further summarized by InParanoid and QuickParanoid. Subsequently, the clusters containing single-copy Detection and classification of repetitive elements. RepeatModeler (http:// genes from each organism and clusters with species-specific duplications were www.repeatmasker.org/RepeatModeler.html) and RepARK45 were employed to selected for further consideration. For each cluster, multiple alignments were then detect transposable elements (TEs) in the S. japonica genome using default performed using MUSCLE v3.8.31 (ref. 59) with the default parameters and were parameters. RepeatModeler is a de novo repeat family identification and modelling further trimmed using trimAl v1.4 (ref. 60) with the options ‘-gt 0.1 -resoverlap package containing two de novo repeat-finding programs (RECON46 and 0.75 -seqoverlap 80’. RAxML61 was employed to reconstruct a maximum- RepeatScout47). RepARK (Repetitive motif detection by Assembly of Repetitive likelihood phylogenetic tree for each cluster with an evolutionary model specified

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 9 & 2015 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986

as ‘PROTCATJTT’ and to perform a bootstrap significance test with 100 replicates. 4. Adams, J., Toop, T., Donnison, I. S. & Gallagher, J. A. Seasonal variation TreSpEx62 was then applied to the tree of each cluster to evaluate taxa with long inoi4 Laminaria digitatao/i4 and its impact on biochemical conversion branches and to the trees of the 41 clusters with both long branch score routes to biofuels. Bioresour. Technol. 102, 9976–9984 (2011). heterogeneity and upper quartiles smaller than 30. The according clusters were 5. McFiggans, G. et al. Direct evidence for coastal iodine particles from Laminaria selected for concatenation of their sequences within the same species into super genes. macroalgae–linkage to emissions of molecular iodine. Atmos. Chem. Phys. 4, A Dollo analysis was conducted on the homologue clusters of the 25 algal and 701–713 (2004). plant proteomes using the Dollop tool in the PHYLIP package63 and custom Java 6. Bartsch, I. et al. The Laminaria sensu lato: recent insights and scripts (available on request). To examine the evolutionary relationships between developments. Eur. J. Phycol. 43, 1–86 (2008). the duplicated homologues in the S. japonica genome before or after its divergence 7. Tseng, C. Algal biotechnology industries and research activities in China. from E. siliculosus, synonymous (Ks) and non-synonymous (Ka) substitution rates J. Appl. Phycol. 13, 375–380 (2001). were calculated using KaKs_Calculator64. A gene function-enrichment analysis for 8. FAO. The State of Food and Agriculture 2012 (FAO, Rome, Italy, 2012). the clustered genes was performed using Fisher’s exact test to compare all of the 9. Cock, J. M. et al. The Ectocarpus genome and the independent evolution of genes in the S. japonica genome. multicellularity in brown algae. Nature 465, 617–621 (2010). 10. Silberfeld, T. et al. A multi-locus time-calibrated phylogeny of the brown algae (Heterokonta, , Phaeophyceae): investigating the evolutionary The vHPO and polysaccharide biosynthesis gene family analysis . Searches for nature of the ‘brown algal crown radiation’. Mol. Phylogenet. Evol. 56, 659–674 vHPOs in the S. japonica genome were conducted with BLASTp (E-value 1e À 10) o (2010). using 8 vBPOs and 3 vIPOs present in L. digitata and E. siliculosus as reference 11. Zhao, F., Qi, J. & Schuster, S. C. Tracking the past: interspersed repeats in an sequences. All of the hits were determined by NCBI BLASTp using the default extinct Afrotherian mammal, Mammuthus primigenius. Genome Res. 19, parameters. The phylogenetic tree was constructed by the neighbour-joining 65 1384–1392 (2009). algorithm of the MEGA 5.0 program , and a total of 1,000 bootstrap replicates ˇ were performed (Supplementary Table 3). Expression of vBPO and vIPO genes was 12. Zupunski, V., Gubensˇek, F. & Kordis, D. Evolutionary dynamics and investigated by real-time PCR in gametophytes, juvenile sporophytes and different evolutionary history in the RTE clade of non-LTR retrotransposons. Mol. Biol. tissues of adult sporophytes including holdfast, stipe, basal blades, middle blades Evol. 18, 1849–1863 (2001). and distal blades (Supplementary Table 4). The lowest expression of vBPO17 in the 13. Delaroque, N. & Boland, W. The genome of the brown alga Ectocarpus basal blade was set to 1. Average-linkage hierarchical clustering and heatmaps were siliculosus contains a series of viral DNA pieces, suggesting an ancient generated in R Bioconductor using the heat map.2 function (omitting row and association with large dsDNA viruses. BMC Evol. Biol. 8, 110 (2008). column dendrograms) in the gplots package of the R program (http://cran.r- 14. Leblanc, C. et al. Iodine transfers in the coastal marine environment: the key project.org/web/packages/gplots/index.html). The analogous set of genes involved role of brown algae and of their vanadium-dependent haloperoxidases. in the polysaccharide biosynthesis metabolism pathways for mannitol, trehalose, Biochimie 88, 1773–1785 (2006). cellulose, laminarin, alginate, sulfate fucan and sucrose in the 14 algal genomes 15. Michel, G., Tonon, T., Scornet, D., Cock, J. M. & Kloareg, B. The cell wall were identified and annotated based on KEGG and previous functional polysaccharide metabolism of the brown alga Ectocarpus siliculosus. Insights classifications9,29,66,67 (Supplementary Figs 12 and 15). To identify the CAZymes into the evolution of extracellular matrix polysaccharides in . New from S. japonica and distinguish the differences in cell wall polysaccharides from Phytol. 188, 82–97 (2010). other algae, we performed CAZymes screening in S. japonica and the other 13 algal 16. Vera, J., Castro, J., Gonzalez, A. & Moenne, A. Seaweed polysaccharides and genomes (Supplementary Tables 6–8). All of the putative proteins were searched derived oligosaccharides stimulate defense responses and protection against 68 against entries in the CAZy database using the dbCAN Web server , in which pathogens in plants. Mar. Drugs 9, 2514–2525 (2011). 69 HMMer was used to query against a collection of custom-made hidden Markov 17. Choudhury, S. R., Roy, S., Singh, S. K. & Sengupta, D. N. Molecular model profiles that were constructed for each CAZy family. characterization and differential expression of b-1, 3-glucanase during ripening in banana fruit in response to ethylene, auxin, ABA, wounding, cold and Identification of SNPs and InDels from resequenced Saccharina genomes. The light–dark cycles. Plant Cell Rep. 29, 813–828 (2010). paired-end reads from the 14 resequenced samples (C2, C3, C5, C6, C8, C11, C12, 18. Zambounis, A., Elias, M., Sterck, L., Maumus, F. & Gachon, C. M. Highly C13, C15, C17, B, F, G and W) were aligned to the draft assembly using BWA dynamic exon shuffling in candidate pathogen receptorsy What if brown v.0.7.5. We employed the default parameters to align the reads, with the exception algae were capable of adaptive immunity? Mol. Biol. Evol. 29, 1263–1276 of the ‘-q 15’ parameter, which was used to trim the low-quality regions at the (2012). 30 ends of the reads before mapping. Bcftools in the SAMtools50,70 package was 19. Peters, A. F. et al. Life-cycle-generation-specific developmental processes are employed to call all of the raw variant locations using the parameters ‘bcftools view modified in the immediate upright mutant of the brown alga Ectocarpus –vcg’. The locations of raw variants in C8, C11, C13, C15 and C17 were filtered siliculosus. Development 135, 1503–1512 (2008). using the parameters ‘vcfutils.pl varFilter -d 10 -D 100 -Q 20’, whereas the 20. Martins, M. J. F., Mota, C. F. & Pearson, G. A. Sex-biased gene expression in the remaining nine samples were filtered using the parameters ‘vcfutils.pl varFilter -d 3 brown alga Fucus vesiculosus. BMC Genomics 14, 294 (2013). -D 40 -Q 20’. Once a list of putative variant locations had been constructed, we 21. Brun, F., Gonneau, M., Laloue, M. & Nogue´, F. Identification ofoi4 used the mpileup command in SAMtools to call the genotypes for each sequenced Physcomitrella patenso/i4 genes specific of bud and gametophore formation. individual. Plant Sci. 165, 1267–1274 (2003). The paired-end reads from the two distantly related species, E and C14 were de 38 22. Verhaeghe, E. F. et al. Microchemical imaging of iodine distribution in the novo assembled separately using SOAPdenovo . The assembled contigs were brown alga Laminaria digitata suggests a new mechanism for its accumulation. scaffolded using the paired-end reads. GAPcloser was employed to close the gaps in J. Biol. Inorg. Chem. 13, 257–269 (2008). the scaffolds. NUCmer was used to align the assembled sequences to the S. japonica 23. Ku¨pper, F. C. et al. Iodide accumulation provides kelp with an inorganic Ja assembly. After filtering the multiple mapped regions, NUCmer was employed antioxidant impacting atmospheric chemistry. Proc. Natl Acad. Sci. 105, to call single-nucleotide variants and short InDels. 6954–6958 (2008). 24. Colin, C. et al. Vanadium-dependent iodoperoxidases in Laminaria digitata, a Phylogenetics and population structure of Saccharina. Homozygous single- novel biochemical function diverging from brown algal bromoperoxidases. nucleotide polymorphism (SNPs) were used to calculate the genetic distances J. Biol. Inorg. Chem. 10, 156–166 (2005). between different S. japonica accessions. The neighbour-joining method in MEGA5 25. Fournier, J.-B. et al. The Vanadium Iodoperoxidase from the Marine was applied to construct a phylogenetic tree based on the p-distance method with Flavobacteriaceae Species Zobellia galactanivorans Reveals Novel Molecular 1,000 bootstrap replicates. and Evolutionary Features of Halide Specificity in the Vanadium We used STRUCTURE software (version 2.3.2) to investigate population Haloperoxidase Enzyme Family. Appl. Environ. Microbiol. 80, 7561–7573 structure across different values of K, representing the number of putative ancestral (2014). clusters of allelic similarity. We employed an admixture model with correlated 26. Cosse, A., Potin, P. & Leblanc, C. Patterns of gene expression induced by allele frequencies to assign individuals to the K clusters. A 10,000 step burn-in oligoguluronates reveal conserved and environment-specific molecular defense period for Monte Carlo Chain searches, followed by 20,000 replicate runs, was responses in the brown alga Laminaria digitata. New Phytol. 182, 239–250 applied at each K from 2 to 6. (2009). 27. Ku¨pper, F. et al. Iodine uptake in Laminariales involves extracellular, References haloperoxidase-mediated oxidation of iodide. Planta 207, 163–171 (1998). 1. Tonon, T. et al. Toward systems biology in brown algae to explore acclimation 28. Groisillier, A. et al. Mannitol metabolism in brown algae involves a new and adaptation to the shore environment. Omics 15, 883–892 (2011). phosphatase family. J. Exp. Bot. 65, 559–570 (2014). 2. Bischof, K. et al. Ultraviolet radiation shapes seaweed communities. Life in 29. Michel, G., Tonon, T., Scornet, D., Cock, J. M. & Kloareg, B. Central and Extreme Environments (Springer, 2007). storage carbon metabolism of the brown alga Ectocarpus siliculosus: insights 3. Nyvall, P. et al. Characterization of mannuronan C-5-epimerase genes from the into the origin and evolution of storage carbohydrates in Eukaryotes. New brown alga Laminaria digitata. Plant Physiol. 133, 726–735 (2003). Phytol. 188, 67–81 (2010).

10 NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications & 2015 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7986 ARTICLE

30. Kloareg, B. & Quatrano, R. Structure of the cell walls of marine algae and 61. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic ecophysiological functions of the matrix polysaccharides. Oceanogr. Mar. Biol. analyses with thousands of taxa and mixed models. Bioinformatics 22, 26, 259–315 (1988). 2688–2690 (2006). 31. Kurtz, S. et al. Versatile and open software for comparing large genomes. 62. Struck, T. H. TreSpEx—detection of misleading signal in phylogenetic Genome Biol. 5, R12 (2004). reconstructions based on tree information. Evol. bioinform. Online 10, 51 32. Hubisz, M. J., Falush, D., Stephens, M. & Pritchard, J. K. Inferring weak (2014). population structure with the assistance of sample group information. Mol. 63. Felsenstein, J. {PHYLIP}(Phylogeny Inference Package) version 3.6 a3 Ecol. Res. 9, 1322–1332 (2009). http://evolution.genetics.washington.edu/phylip.html, (2002). 33. Berg, I. A. et al. Autotrophic carbon fixation in archaea. Nat. Rev. Microbiol. 64. Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: a toolkit 8, 447–460 (2010). incorporating gamma-series methods and sliding window strategies. Genomics 34. Koehl, M. & Wainwright, S. Mechanical adaptations of a giant kelp. Limnol. Proteomics Bioinformatics 8, 77–80 (2010). Oceanogr. 22, 1067–1071 (1977). 65. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using 35. Dunwell, J. M., Culham, A., Carter, C. E., Sosa-Aguirre, C. R. & Goodenough, maximum likelihood, evolutionary distance, and maximum parsimony P. W. Evolution of functional diversity in the cupin superfamily. Trends methods. Mol. Biol. Evol. 28, 2731–2739 (2011). Biochem. Sci. 26, 740–746 (2001). 66. Iwamoto, K. & Shiraiwa, Y. Salt-regulated mannitol metabolism in algae. Mar. 36. Klug, A. The discovery of zinc fingers and their applications in gene regulation Biotechnol. 7, 407–415 (2005). and genome manipulation. Ann. Rev. Biochem. 79, 213–231 (2010). 67. Iwamoto, K., Kawanobe, H., Ikawa, T. & Shiraiwa, Y. Characterization of 37. Marı´n, I., van Egmond, W. N. & van Haastert, P. J. The Roco protein family: salt-regulated mannitol-1-phosphate dehydrogenase in the red alga Caloglossa a functional perspective. FASEB J. 22, 3103–3110 (2008). continua. Plant Physiol. 133, 893–900 (2003). 38. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient 68. Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active short-read de novo assembler. Gigascience 1, 18 (2012). enzyme annotation. Nucleic Acids Res. 40, W445–W451 (2012). 39. English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences 69. Eddy, S. R. Genome Inform (World Scientific, 2009). RS long-read sequencing technology. PloS One 7, e47768 (2012). 70. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 40. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions 25, 2078–2079 (2009). with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). 41. Feng, J. et al. GFOLD: a generalized fold change for ranking differentially Acknowledgements expressed genes from RNA-seq data. Bioinformatics 28, 2782–2788 (2012). This work was supported by the Hi-Tech Research and Development Program (863) of 42. Xie, Y. et al. SOAPdenovo-Trans: de novo transcriptome assembly with short China (2012AA052103, 2014AA022003), the National Science and Technology Pillar RNA-Seq reads. Bioinformatics 30, 1660–1666 (2014). Program (2013BAD23B01), Qingdao Municipal Leading Talent project (13-CX-27) to 43. Qi, J. & Zhao, F. inGAP-sv: a novel scheme to identify and visualize structural N.Y., the National Natural Science Foundation of China Grant (91131013, 31100952), variation from paired end mapping data. Nucleic acids Res. 39, W567–W575 the Fund of Beijing Institutes of Life Science, Chinese Academy of Sciences to F.Z., (2011). the National Natural Science Foundation of China Grant (31272285) to M.M. 44. Qi, J., Zhao, F., Buboltz, A. & Schuster, S. C. inGAP: an integrated next- and the Ministry of Sciences and Technology of China 973 Program Grant generation genome analysis pipeline. Bioinformatics 26, 127–129 (2010). (2012CB910503) to J.Q. 45. Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014). 46. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence Author contributions families in sequenced genomes. Genome Res. 12, 1269–1276 (2002). N.Y. and F.Z. conceived the project, designed content and managed the project; X.Z. and 47. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat D.L. coordinated the project; X.Z. and X.F. prepared DNA and RNA samples; D.X., families in large genomes. Bioinformatics 21, i351–i358 (2005). D.W., Y.W., Z.G. and Z.G. provided algal materials and conducted experiments; J.W. 48. Huang, X. & Madan, A. CAP3: a DNA sequence assembly program. Genome constructed DNA libraries and performed sequencing; F.Z. and P.J. performed raw data Res. 9, 68–877 (1999). analysis; F.Z. and W.S. performed the genome assembly; Y.Z. identified and annotated 49. Stanke, M., Scho¨ffmann, O., Morgenstern, B. & Waack, S. Gene prediction in repeats; J.Q. and F.Z. performed transcriptome analysis, gene prediction and annotation; eukaryotes with a generalized hidden Markov model that uses hints from X.F. performed mitochondrial and chloroplast genome assembly and annotation; X.Z., external sources. BMC Bioinformatics 7, 62 (2006). X.F., G.Y., L. Z., J.Q. and F.Z. performed comparative genomic analyses; F.Z. performed 50. Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. Gene read mapping and SNP identification; and M.M. and F.Z. performed phylogenetic and prediction in novel fungal genomes using an ab initio algorithm with population genetic analyses. X.Z., F.Z., J.Q., X.F., M.M., Y.Z. and Y.N. wrote the unsupervised training. Genome Res. 18, 1979–1990 (2008). manuscript. Z.Z. and S.C. revised the manuscript. All authors commented on the 51. Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic manuscript. DNA. Genome Res. 10, 516–522 (2000). 52. Slater, G. S. & Birney, E. Automated generation of heuristics for biological Additional information sequence comparison. BMC Bioinformatics 6, 31 (2005). Accession codes. This whole genome shotgun project has been deposited in GenBank 53. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA- under the accession code JXRI00000000. seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012). 54. Haas, B. J. et al. Automated eukaryotic gene structure annotation using Supplementary Information accompanies this paper at http://www.nature.com/ EVidenceModeler and the Program to Assemble Spliced Alignments. Genome naturecommunications Biol. 9, R7 (2008). 55. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Competing financial interests: The authors declare no competing financial interests. Bioinformatics 30, 1236–1240 (2014). Reprints and permission information is available online at http://npg.nature.com/ 56. O’Brien, K. P., Remm, M. & Sonnhammer, E. L. Inparanoid: a comprehensive reprintsandpermissions/ database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480 (2005). 57. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for How to cite this article: Ye, N. et al. Saccharina genomes provide novel insight into kelp large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 biology. Nat. Commun. 6:6986 doi: 10.1038/ncomms7986 (2015). (2002). 58. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). This work is licensed under a Creative Commons Attribution 4.0 59. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and International License. The images or other third party material in this high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). article are included in the article’s Creative Commons license, unless indicated otherwise 60. Capella-Gutie´rrez, S., Silla-Martı´nez, J. M. & Gabaldo´n, T. trimAl: a tool for in the credit line; if the material is not included under the Creative Commons license, automated alignment trimming in large-scale phylogenetic analyses. users will need to obtain permission from the license holder to reproduce the material. Bioinformatics 25, 1972–1973 (2009). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

NATURE COMMUNICATIONS | 6:6986 | DOI: 10.1038/ncomms7986 | www.nature.com/naturecommunications 11 & 2015 Macmillan Publishers Limited. All rights reserved.