Supporting Information

Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis extracellular domain. This novel protein is found only in green Given that the analysis using dozens of algal and protist genomic algae and , and may play a role in responding to salt stress. datasets showed that homologs of all of the S genes are expressed, For S genes present in the diatom Phaeodactylum tricornutum, we asked whether some of them might be differentially expressed we used RNA-seq data from ref. 30 that compared cultures in response to stress. This question was motivated by two ob- grown under control and nitrogen (N)-depleted conditions. This servations: (i) The composite genes entered nuclear analysis showed that out of six families present in this alga (2, 1, genomes via primary endosymbiosis, and therefore they may still 28, 20, 35, 61), three are differentially expressed under N stress retain ancient cyanobacterial functions even when fused to novel (28, 35, 61). One of these is family 28, which encodes an domains, and (ii) many of the S genes are redox enzymes or N-terminal bacterium-derived, calcium-sensing EF-hand domain encode domains involved in redox regulation, and therefore their fused, intriguingly, to a cyanobacterium-derived region with sim- roles may involve sensing and/or responding to cellular stress ilarity to the plastid inner-membrane proten import component resulting from the oxygen-evolving photosynthetic organelle. To Tic20, which acts as a translocon channel. This fused protein may + address this issue, we inspected RNA-seq data from organisms use Ca2 as a signal for protein import, and is found in several that encoded particular S genes that had either been generated stramenopile (brown algal) species. The homolog in P. tricornutum under conditions of cellular stress or spanned the light–dark (NCBI gi:219117465) is significantly down-regulated (P = 2.57e-23) transition. For genes present in Chlamydomonas reinhardtii,we under N depletion. used data from ref. 53 that compared triplicate transcript data Finally, for S genes present in Arabidopsis thaliana, we used from this alga grown in standard Tris-acetate-phosphate (TAP) medium (54) or TAP with the addition of 200 mM NaCl. The RNA-seq data reflecting light-dependent DE in seedlings, coty- composite gene families 10, 24, and 18 showed significant dif- ledons, and roots (14). This analysis showed that four distinct ferential expression (DE) under salt stress [up-regulation in both S-gene families have DE in tissues in the presence of light cases (P = 4.14e-4, P = 0.0268, and P = 0.0077, respectively)]. (14, 19, 1, 18). One of the gene families showing DE is family 1, These genes encode bacterial–cyanobacterial domain fusions which is broadly distributed in algae and plants and is composed that are involved in stress responses (i.e., rhodanese domain in of a cyanobacterium-derived N-terminal hydrolase domain fused 10) and in preventing protein misfolding (i.e., DnaJ/Hsp40 do- to a non-cyanobacterium-derived LPLAT family (lysophospho- main at the N terminus of 24). Interestingly, the DnaJ domain is lipid acyltransferases) domain. This gene was significantly up- fused to an upstream SCP superfamily region that likely forms an regulated in all three plant tissues in the presence of light.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 1of5 ées tal. et Méheust Gloeochaete.witrockiana.SAG46_84 Cparadoxa

Cyanidioschyzon.merolae.strain.10D

Porphyridium.aerugineum.SAG.1380.2 Cmerolae

Rhodella.maculata.CCMP736

carragheen Gsulphuraria Haptophyta Erythrolobus.australicus.CCMP3124

Rhodosorus.marinus.UTEX.LB.2760

www.pnas.org/cgi/content/short/1517551113 Ppurpureum Madagascaria.erythrocladiodes.CCMP3234 Bacillariophyta Erythrolobus.madagascarensis.CCMP3276

Timspurckia.oligopyrenoides.CCMP3278

Compsopogon.coeruleus.SAG.36.94 Galdieria.sulphuraria Glaucophyta

Rhodosorus.marinus.CCMP769

Ostreococcus.tauri Vcarteri Euglenozoa Picocystis.salinarum.CCMP1897

Micromonas.sp..RCC299

Mpusilla Otauri Dinophyta Chlorella.variabilis

Creinhardtii

Auxenochlorella.protothecoides Csubellipsoidea Ochrophyta

hollow.green.seaweed

Micromonas.sp.NEPCC29 Micromonas.sp.CCMP2099 Cvariabilis

Dunaliella.tertiolecta.CCMP1320

Pyramimonas.parkeae.CCMP726 Ostreococcus.lucimarinus.CCE9901 Cryptophyta Coccomyxa.subellipsoidea.C.169

Micromonas.sp.RCC472

Tetraselmis.striata.LANL1001 Chlamydomonas.reinhardtii Unknown

Bathycoccus.prasinos

shepherd.s.purse Pedicularis.kerneri Cercozoa Manoao.colensoi

Zamia.kickxii

long.grained.rice Cycas.media.subsp..media Rhodophyta Dacrydium.nausoriense gray.rockcress Zamia.integrifolia chickpea Castilleja.rubicundula Triphysaria.pusilla Gossypium.arboreum Pherosphaera.fitzgeraldii Ceratozamia.mirandae Encephalartos.hildebrandtii Chorispora.tenella Eutrema.halophilum Podocarpus.spathoides Eutrema.salsugineum Dontostemon.senilis Dacrydium.guillauminii Podocarpus.salignus Encephalartos.concinnus Encephalartos.caffer Encephalartos.sclavoi Encephalartos.aemulans Glycine.soja Pedicularis.kansuensis Encephalartos.cycadifolius Nicotiana.tomentosiformis Encephalartos.dyerianus i.S1. Fig. Selaginella.moellendorffii Encephalartos.gratus Boschniakia.himalaica muskmelon Pieris.nana Sterigmostemum.violaceum Altenstein.s.bread.tree

aooi itiuino h 7Sgnsdsoee norsuy h aooi itiuino h aai hw ihbakbxsidctn presenc a indicating boxes black with shown is data the of distribution taxonomic The study. our in discovered genes S 67 the of distribution Taxonomic Tozzia.alpina Cordylanthus.ramosus Encephalartos.woodii Zamia.pumila Podocarpus.lucienii Cycas.litoralis Zamia.hymenophyllidia Euphrasia.collina Orobanche.californica Nicotiana.benthamiana Physcomitrella.patens Encephalartos.trispinosus common.sunflower wood.tobacco Euphrasia.alsa sweet.orange Japanese.rice Encephalartos.whitelockii Populus.euphratica Taxonomic distributionof the67S-genefamilies Macrozamia.fraseri Encephalartos.aplanatus Eruca.vesicaria Japanese.apricot malo.sina Pedicularis.foliosa Boschniakia.rossica Cycas.chamberlainii Encephalartos.princeps Castilleja.exserta Afrocarpus.mannii Saccharum.hybrid.cultivar.R570 Podocarpus.aristulatus Solanum.demissum Strobilanthes.attenuata African.oil.palm Aptosimum.pumilum Encephalartos.nubimontanus Pedicularis.tuberosa castor.bean Nicotiana.attenuata Citrus.clementina Muricaria.prostrata Fragaria.vesca.subsp..vesca sugar.beet Podocarpus.guatemalensis Eucalyptus.grandis Encephalartos.kisambo Brassica.juncea wine.grape Zamia.pygmaea Zamia.acuminata Castilleja.tenuis

Transcriptomes + Podocarpus.cunninghamii Robeschia.schimperii Jatropha.curcas barrel.medic Ceratozamia.morettii Encephalartos.cupidus foxtail.millet Orobanche.ludoviciana Macrozamia.stenomera Nelumbo.nucifera Encephalartos.dolomiticus bugle apple Podocarpus.lawrencei Castilleja.miniata Cleome.hassleriana soybean Podocarpus.totara Zamia.neurophyllidia Lecomtella.madagascariensis spotted.monkey.flower Lepidozamia.peroffskyana Encephalartos.lehmannii Encephalartos.ituriensis

Genome dataset Phaseolus.vulgaris Pyrus.x.bretschneideri Ceratozamia.mixeorum Encephalartos.longifolius Cycas.xipholepis Encephalartos.senticosus Encephalartos.transvenosus field.mustard Cycas.maconochiei Encephalartos.pterogonus Triticum.urartu Encephalartos.eugene.maraisii Afrocarpus.falcatus Glechoma.hederacea Cycas.calcicola Encephalartos.munchii Retrophyllum.comptonii Rheum.australe Morus.notabilis Zamia.portoricensis Madagascar.periwinkle Oryza.officinalis Encephalartos.lanatus + Encephalartos.lebomboensis Capsella.rubella Arabidopsis.lyrata.subsp..lyrata

NCBI RefSeq sesame wild.Malaysian.banana Ceratozamia.miqueliana Encephalartos.middelburgensis Penstemon.cobaea Dacrydium.xanthandrum Lamium.purpureum sorghum Esterhazya.campestris domesticated.barley thale.cress Encephalartos.inopinus Retrophyllum.minus Chelone.obliqua Hirschfeldia.incana Afrocarpus.gracilior Camelina.sativa Aureolaria.pedicularia

(349samples) Encephalartos.friderici.guilielmi Orobanche.pinorum cacao rape Podocarpus.coriaceus Aegilops.tauschii n ht oe niaigpeue bec ntegnm rtasrpoedt rmtetaxon. the from data transcriptome or genome the in absence presumed indicating boxes white and e Encephalartos.ngoyanus Sitka.spruce Orthocarpus.bracteosus Encephalartos.cerinus Macrozamia.plurinervia Cycas.thouarsii Dacrydium.balansae Polygonatum.pubescens Genlisea.aurea Encephalartos.ghellinckii Zea.mays Phtheirospermum.japonicum Populus.tremula black.cottonwood Castilleja.fissifolia Encephalartos.arenarius Pedicularis.julica Brachypodium.distachyon Castilleja.crista.galli date.palm Podocarpus.acutifolius cucumber Schizopetalon.walkeri tomato peach potato Ceratozamia.huastecorum Dacrydium.elatum Encephalartos.turneri Coffea.canephora Amborella.trichopoda Afrocarpus.usambarensis Nageia.formosensis Cucumis.melo.subsp..melo Rhodomonas.sp.CCMP768 Guillardia.theta.CCMP2712 Goniomonas.pacifica.CCMP1869 Gtheta Prymnesium.parvum.Texoma1 Emiliania.huxleyi.CCMP1516 Ehuxleyi Pleurochrysis.carterae.CCMP645 Emiliania.huxleyi.374 Isochrysis.galbana.CCMP1323 Emiliania.huxleyi.CCMP370 Isochrysis.sp.CCMP1244 Chrysochromulina.polylepis.CCMP1757 Emiliania.huxleyi.379 Pavlova.sp.CCMP459 Isochrysis.sp.CCMP1324 Emiliania.huxleyi.PLYM219 Gephyrocapsa.oceanica.RCC1303 Bnatans Lotharella.globosa.CCCM811 Aureoumbra.lagunensis.CCMP1510 Aanophagefferens Ectocarpus.siliculosus Chattonella.subsalsa.CCMP2191 Ochromonas.sp.CCMP1393 Pseudopedinella.elastica.CCMP716 Heterosigma.akashiwo.CCMP2393 Pteridomonas.danica.PT Aureococcus.anophagefferens Nannochloropsis.gaditana Heterosigma.akashiwo.CCMP3107 Dinobryon.sp.UTEXLB2267 Ngaditana Heterosigma.akashiwo.CCMP452 Vaucheria.litorea.CCMP2940 Pelagococcus.subviridis.CCMP1429 Pelagomonas.calceolata.CCMP1756 Heterosigma.akashiwo.NB Aureococcus.anophagefferens.CCMP1850 Tpseudonana Fragilariopsis.kerguelensis.L2_C3 Thalassiosira.weissflogii.CCMP1010 Pseudo_nitzschia.fradulenta.WWA7 Thalassiosira.oceanica Thalassiosira.weissflogii.CCMP1336 Amphiprora.sp Extubocellulus.spinifer.CCMP396 Thalassiosira.antarctica.CCMP982 Thalassiothrix.antarctica.L6_D1 Chaetoceros.affinis.CCMP159 Ditylum.brightwellii.GSO103 Chaetoceros.debilis.MM31A_1 Thalassiosira.rotula.GSO102 Chaetoceros.curvisetus Ditylum.brightwellii.GSO105 Ditylum.brightwellii.GSO104 Thalassiosira.rotula.CCMP3096 Skeletonema.dohrnii.SkelB Thalassiosira.miniscula.CCMP1093 Skeletonema.marinoi.SkelA Skeletonema.menzelii.CCMP793 Nitzschia.punctata.CCMP561 Phaeodactylum.tricornutum.CCAP.1055.1 Amphora.coffeaeformis.CCMP127 Proboscia.alata.PI_D3 Chaetoceros.neogracile.CCMP1317 Asterionellopsis.glacialis.CCMP134 Thalassiosira.oceanica.CCMP1005 Ptricornutum Thalassionema.nitzschioides.L26_B Pmultiseries Corethron.pennatum.L29A3 Pseudo_nitzschia.australis.10249_10_AB Thalassiosira.gravida.GMp14c1 Fragilariopsis.kerguelensis.L26_C5 Scrippsiella.trochoidea.CCMP3099 Durinskia.baltica.CSIRO_CS.38 Amphidinium.carterae.CCMP1314 Karenia.brevis.CCMP2229 Ceratium.fusus.PA161109 Heterocapsa.triquetra Symbiodinium.sp.C1 Alexandrium.monilatum.CCMP3105 Peridinium.aciculiferum.PAER_2 Prorocentrum.minimum.CCMP1329 Scrippsiella.hangoei.SHTV5 Kryptoperidinium.foliaceum.CCMP1326 Glenodinium.foliaceum.CCAP1116_3 Symbiodinium.sp.CCMP2430 Crypthecodinium.cohnii.Seligo Alexandrium.temarense.CCMP1771 Prorocentrum.minimum.CCMP2233 Karenia.brevis.Wilson Oxyrrhis.marina Azadinium.spinosum.3D9 Symbiodinium.sp.Mp Prorocentrum.minimum Lingulodinium.polyedra.CCMP1738 Oxyrrhis.marina.LB1974 Sminutum Symbiodinium.sp.C15 Scrippsiella.hangoei_like.SHHI_4 Karlodinium.micrum.CCMP2283 Karenia.brevis.SP1 Karenia.brevis.SP3 Eutreptiella.gymnastica_like.CCMP1594 Undescribed.sp.CCMP2098 Undescribed.sp.CCMP2436 Undescribed.undescribed.CCMP2298 Undescribed.sp.CCMP2293 Undescribed.sp.CCMP2097 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2of5 S−genes families A

B

C

Fig. S2. Mapping data for S genes in and A. thaliana.(A) S-gene family 31, which encodes RIBR + DUF1768 (PyrR) and is involved in riboflavin biosynthesis. The CDS derives from the MMETSP database and encodes this 1,848-nt S gene from CCMP2329. The 6,609 RNA-seq reads mapped to it are from the closely related species Picochlorum SE3 (55). The homologous SE3 gene is encoded on Picochlorum contig 185.g609.t1. (B) S-gene family 23, which encodes a TPR repeat/RING and an ATP-dependent protease domain. This CDS also derives from the MMETSP database, and encodes the 1,554-nt S gene from P. oklahomensis CCMP2329. The 9,253 RNA-seq reads mapped to it are from Picochlorum SE3. The homologous SE3 gene is encoded on Picochlorum contig 43.g1 98.t1. (C) S-gene family 14, which encodes GIY–YIF superfamily and thioredoxin superfamily domains. This genic region is from A. thaliana (ArGrxS16), and the 16,906 unique reads mapped to the exons are also from this species [see Table S2 for Sequence Read Archive (SRA) run accession numbers]. Thin blue lines indicate a spliceosomal intron. The mapping in all cases is colored in green, red, and blue for forward, reverse, and paired-end reads, respectively. The fused domains and their putative annotations are shown. These data are typical for all of the Picochlorum and Arabidopsis mappings and for many of the other plant and algal S genes when sufficient RNA-seq data are available (Table S2). This unambiguous, “deep” mapping across the region that spans the domain fusion argues strongly against misassembly of this (and other) S genes.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 3of5 Fig. S3. Genomic PCRs that targeted S genes identified in Picochlorum species. The contig number in the Picochlorum SE3 assembly is given for each S gene, as is the S-gene family number (see Table S2 for details). The PCR primers were complementary to regions at the 5′ and 3′ termini of the S genes to span the domain-fusion region. The sizes of these S-gene CDS fragments matched the fragment sizes resulting from PCR amplification as follows: S gene 11 (1,015 nt), S gene 23 (1,419 nt), S gene 4 (1,214 nt), S gene 34 (924 nt), and S gene 12 (1,119 nt). Sanger sequencing of these PCR fragments showed identity to the genomic region in Picochlorum SE3, and BLASTX analysis using the fragments showed that each spanned the domain-fusion region in the respective S gene. The match of the CDS size to the genomic region is explained by the paucity of spliceosomal introns in Picochlorum SE3. These data demonstrate that the tested S genes exist as intact fragments in this green alga.

Fig. S4. Maximum-likelihood (RAxML) (57) tree of species encoding S-gene family 31. This composite gene is limited to the (Fig. S1) and encodes fused RIBR + DUF1768 (PyrR) domains that are involved in riboflavin biosynthesis. This manually trimmed alignment includes a selection of taxonomically diverse green lineage species and is of length 524 amino acids. The intact gene (see also mapping evidence in Fig. S2) was analyzed using the LG + Γ + I evolutionary model with the results of 100 bootstrap replicates, when ≥50% are shown at the branches. The topology of this tree is consistent with the expected phylogeny of Viridiplantae (e.g., 56), indicating an ancient origin of this S gene. The NCBI gi numbers are shown after each species name, when available.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 4of5 Other Supporting Information Files

Table S1 (DOCX) Table S2 (DOCX) Table S3 (DOCX) Table S4 (DOCX) Dataset S1 (XLSX)

Méheust et al. www.pnas.org/cgi/content/short/1517551113 5of5