Supporting Information

Speijer et al. 10.1073/pnas.1501725112 SI Text gion corresponding to the conserved defined by Ning Previously Published Evidence for Sex in Major Eukaryotic Lineages. et al. (10) was used as the query. Different inclusion threshold Here we briefly present information used for drawing Fig. 2, E-values were tested in the psi-blast searches, and convergence with specifically the quality of evidence for sex in major eukaryotic lineages the same set of significant hits was achieved using more permissive showninthetree(seethecolumnSexknown?). For several lineages, (the default value of 0.005) and more stringent (1e−15) thresholds. including Metazoa, Fungi, Rhodophyta, Chloroplastida, Strameno- To survey not represented by predicted proteomes, piles, and Alveolata, the presence of sex is textbook knowledge and expressed sequence tags (ESTs) and transcript shotgun assem- no special references need to be cited here. For other lineages, in- blies (TSAs) were searched by tblastn (which allowed the iden- cluding Choanoflagellata, , and Haptophyta, cases of tification of a HAP2 homolog in pyriformis,arepre- sexual behavior of their members are discussed in the main text (see sentative of Preaxostyla). To further improve the taxonomic thecaseofSalpingoeca rosetta, brucei, and prymne- coverage of our analysis, we used HMMER (hmmer.janelia.org/) siophytes). The presence of sex in and is dis- to search for HAP2 and GEX1 homologs in the protein sequence cussed in length in ref. 1 and can be considered well documented. set deduced from assembled deeply sequenced transcriptomes of For several other groups, only limited evidence for sex (in- several hundred species obtained in the MMETSP project cluding rare direct observations or indirect inference from ge- (marinemicroeukaryotes.org/) (11). For HAP2, we used the pro- nomic data) is available. For a discussion on the occurrence of file HMM model available in the Pfam database (pfam.xfam.org/ sexual cycles in (a subgroup of the recently pro- family/PF10699). For GEX1, a HMM model was built anew using posed more inclusive taxon ) see refs. 2 and 3. a multiple sequence alignment (created using MAFFT; mafft. Evidence for sex in Fornicata is thus far restricted to the some- cbrc.jp/alignment/server/) of the conserved domain (see above) what problematic case of a single species, intestinalis from phylogenetically diverse GEX1 homologs identified by by (4, 5). Malik et al. (6) discussed evidence for the presence of psi-blast searches in the protein sequence database at NCBI. meiosis and recombination in vaginalis and pro- We used HMMER and the same HMM models to search vided references to older works suggesting the presence of sex- additional protein sequence datasets thus far missing from the ual cycles in some other . Putative atypical sex in NCBI protein database, specifically the predicted proteomes some , representing Preaxostyla, was documented by from Sphaerophorma arctica (Ichthyosporea), alba Cleveland (7) in 1956, but no modern account seems to exist. (Cristidiscoidea), and Thecamonas trahens (), Circumstantial evidence for sex in Heterolobosea is discussed by  downloaded from the Origins of Multicellularity Database Pánek and Cepiˇcka (8). The rare reports of signs of sex in two (Broad Institute; (www.broadinstitute.org/annotation/genome/ species and indirect hints for the presence of sex in in general are mentioned in the main text. multicellularity_project/MultiHome.html), and the predicted For the remaining lineages in the schematic phy- proteome generated by the genome sequencing project for the logeny in Fig. 2, no published evidence for signs of sex is known Cyanophora paradoxa (cyanophora.rutgers.edu/ to us. Many of these lineages are species poor (Mantamonadida, cyanophora/). Finally, we also searched predicted proteomes Tsukubamonadida, Palpitomonadida, and each currently derived from our draft genome assemblies for the Anda- comprise only a single formally described species), and their lucia godoyi and the malawimonad californiana    ˇ biology has been poorly explored in general. (M. Eliás, V. Klimes, C. Vlcek, B. F. Lang, deposited NCBI GenBank sequences). The identity of candidate homologs iden- Searching for Homologs of HAP2 (GCS1) and GEX1 (KAR5). The tified by HMMER was checked by blastp against the NCBI pro- nonredundant protein sequence database at the National Center tein database. for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov/) Table S1 provides accession numbers or sequence IDs of was searched by blastp and psi-blast (9) using known HAP2 and HAP2andGEX1homologsidentifiedinselectedrepresenta- GEX1 sequences as queries. In the case of GEX1, only the re- tivesofthemajoreukaryoticlineagesshowninFig.2.

1. Lahr DJ, Parfrey LW, Mitchell EA, Katz LA, Lara E (2011) The chastity of amoebae: Re- 7. Cleveland LR (1956) Brief accounts of the sexual cycles of the of crypto- evaluating evidence for sex in amoeboid organisms. Proc Biol Sci 278(1715):2081–2090. cercus. J Protozool 3(4):161–180. 2. Karpov SA, et al. (2014) Morphology, phylogeny, and ecology of the aphelids (Ap- 8. Pánek T, Cepi ckaˇ I (2012) Diversity of Heterolobosea. Genetic Diversity in Microor- helidea, Opisthokonta) and proposal for the new superphylum Opisthosporidia. Front ganisms, ed Caliskan M (InTech, Rijeka, Croatia), pp 3–26. Microbiol 5:112. 9. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein 3. Cuomo CA, et al. (2012) Microsporidian genome analysis reveals evolutionary strat- database search programs. Nucleic Acids Res 25(17):3389–3402. egies for obligate intracellular growth. Genome Res 22(12):2478–2488. 10. Ning J, et al. (2013) Comparative genomics in Chlamydomonas and Plasmodium 4. Andersson JO (2012) Double peaks reveal rare sex. Trends Parasitol 28(2): identifies an ancient nuclear envelope protein family essential for sexual re- 46–52. production in , fungi, , and vertebrates. Genes Dev 27(10):1198–1215. 5. Birky CW, Jr (2010) Giardia sex? Yes, but how and how much? Trends Parasitol 26(2): 11. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome Sequencing 70–74. Project (MMETSP): Illuminating the functional diversity of eukaryotic life in the 6. Malik SB, Pightling AW, Stefaniak LM, Schurko AM, Logsdon JM, Jr (2008) An ex- oceans through transcriptome sequencing. PLoS Biol 12(6):e1001889. panded inventory of conserved meiotic genes provides evidence for sex in Tricho- monas vaginalis. PLoS ONE 3(8):e2879.

Speijer et al. www.pnas.org/cgi/content/short/1501725112 1of3 pie tal. et Speijer Table S1. HAP2 (GCS1) and GEX1 (KAR5) homologs in Eukaryotic lineage Species (strain) HAP2 (GCS1) GEX1 (KAR5) Sequence database

Metazoa Homo sapiens ——NCBI www.pnas.org/cgi/content/short/1501725112 Some other vertebrates — + NCBI Nematostella vectensis XP_001628495.1 XP_001637213.1 NCBI Some other metazoans ++NCBI Choanoflagellata Salpingoeca rosetta XP_004989263.1 XP_004993341.1 NCBI owczarzaki XP_004343268.1 EFW42133.2 NCBI Ichtyosporea ——NCBI Fungi Saccharomyces cerevisiae — NP_013781.1 NCBI Various other fungi — + NCBI Opisthosporidia Several Microsporidia ——NCBI (a number of complete genome sequences) allomycis ——NCBI Cristidiscoidea Fonticula alba ——Broad Institute Origins of Multicellularity Database www.broadinstitute.org/ annotation/genome/multicellularity_project/MultiHome.html Apusomonadida Thecamonas trahens ——Broad Institute Origins of Multicellularity Database www.broadinstitute.org/ annotation/genome/multicellularity_project/MultiHome.html Breviatea ? ? No genome or comprehensive transcriptome sequence data available Amoebozoa Dictyostelium discoideum XP_643321.1 XP_637084.1 NCBI Various other ++NCBI amoebozoans Collodictyonida ? ? No genome or comprehensive transcriptome sequence data available Rigifilida ? ? No genome or comprehensive transcriptome sequence data available Mantamonadida ? ? No genome or comprehensive transcriptome sequence data available ? ? No genome or comprehensive transcriptome sequence data available Malawimonadida Malawimonas KR230050 KR230051 NCBI californiana Fornicata Giardia intestinalis — ESU39074.1 NCBI Parabasalia Trichomonas vaginalis — XP_001583192.1 NCBI Preaxostyla Trimastix pyriformis GAFH01001525.1 ? NCBI (TSA) Heterolobosea gruberi XP_002674350.1 XP_002683391.1 NCBI Euglenozoa XP_823296.1 XP_827062.1 NCBI Various other ++NCBI, MMETSP euglenozoans Tsukubamonadida ? ? No genome or comprehensive transcriptome sequence data available Jakobida godoyi KR230048 KR230049 NCBI Rhodophyta Cyanidioschyzon merolae XP_005536505.1 — NCBI Erythrolobus ? CAMPEP_0185858126 MMETSP (transcriptome sequence assembly) madagascarensis, Strain CCMP3276 Chloroplastida Arabidopsis thaliana NP_192909.2 NP_200360.2 NCBI Various other ++NCBI chloroplastids Glaucophyta Cyanophora paradoxa — ConsensusfromContig9288- Cyanophora genome project cyanophora.rutgers.edu/cyanophora/ snap_masked- ConsensusfromContig9288- abinit-gene-0.0-mRNA-1:

2of3 cds:2442/11–143:0:- pie tal. et Speijer Table S1. Cont. Eukaryotic lineage Species (strain) HAP2 (GCS1) GEX1 (KAR5) Sequence database

Alveolata Tetrahymena XP_001030543.1 XP_001470809.1 NCBI www.pnas.org/cgi/content/short/1501725112 thermophila various other ++NCBI Stramenopiles Bicosoecid sp, Strain ms1 CAMPEP_0203813596 CAMPEP_0203808578 MMETSP (transcriptome sequence assembly) Ectocarpus siliculosus — CBN75731.1 NCBI Various other — + NCBI stramenopiles Rhizaria Bigelowiella natans — CAMPEP_0114218144 JGI Bigelowiella natans CCMP2755 v1.0 genome.jgi-psf.org/Bigna1/Bigna1.home.html MMETSP Reticulomyxa filosa ETO22865.1+ETO22866.1 ETN99253.1 NCBI Katablepharida ? ? No genome or comprehensive transcriptome sequence data available Cryptomonadida theta ——JGI Guillardia theta v1.0 genome.jgi-psf.org/Guith1/Guith1.home.html Palpitomonadida ? ? Only a transcriptome assembly available (MMETSP) Haptophyta , strain — 215856jgm1.6700050 JGI Emiliania huxleyi v1.0 genome.jgi-psf.org/Emihu1/Emihu1.home.html CCMP 1516 Phaeocystis antarctica, CAMPEP_0173015642 CAMPEP_0172959906 MMETSP (transcriptome sequence assembly) Strain Caron Lab Isolate Picozoa ? ? No genome or comprehensive transcriptome sequence data available ? ? No genome or comprehensive transcriptome sequence data available Centrohelida ? ? No genome or comprehensive transcriptome sequence data available

—, not detected; ?, genomic data not available; +, detected. 3of3