Quick viewing(Text Mode)

Figures and Figure Supplements

Figures and Figure Supplements

RESEARCH ARTICLE

Figures and figure supplements

Dynamics of genomic innovation in the unicellular ancestry of

Xavier Grau-Bove´ et al

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 1 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

500bp A. Phylogeny and genome statistics  B. Phenotypic traits

Metazoa+ Metazoa Choanoflagellata Source Abreviation Genome size# scaffolds(Mb) L75 N50 (kb) # genes % BUSCOorthologs % GC Monosiga brevicollis † Mbre 41.6 218 27 1,073.6 9,172 78.6% 54.9 Choanoflagellata Single-celled flagellates, Filozoa Choano- colonial forms ( Salpingoeca ) flagellata Salpingoeca rosetta † Sros 55.4 154 25 1,519.5 11,624 80.6% 56.0 Capsaspora owczarzaki † Cowc 27.9 84 11 1,617.7 8,741 86.6% 53.8 Single-celled filopodiated , Filasterea ‡ Mvib ------aggregative stage ( Capsaspora ) Cfra 42.9 83 17 1,585.0 8,694 86.7% 40.5 Ichthyosporea * Sarc 120.9 15,618 1,442 116.1 18,319 77.15% 42.0 Coenocytic clonal growth, schizogony and propagule * dispersal (by amoebas, Ichthyo- hoferi Ihof 88.1 1,633 515 106.7 6,351 63.2% 33.4 flagellated or not) phonida parasiticum *‡ Apar ------Ichthyo- Abeoforma whisleri Awhi 101.9 51,561 25,133 2.4 17,283 11.9% 30.24 Teretosporea sporea Pirum gemmata ** Pgem 84.4 50,415 25,440 1.9 21,835 17.0% 29.11 Opisthokonta ** Chromosphaera perkinsii Cper 34.6 3,994 187 120.2 12,463 85.5% 42.13 Coenocytic clonal growth, Dermo- ** flagellated amoeboid dispersal stage cystida Sphaerothecum destruens ‡ Sdes ------ limacisporum Clim 24.1 287 86 180.5 7,535 83.4% 51.2 Corallochytrea Colonies with palintomic division, ** amoeboid dispersal stage (cryptic ? Fungi Unikonta/ flagellum hypothesized) Discicristoidea LECA Diaphoratickes+ / Bikonta

Figure 1. Evolutionary framework and genome statistics of the study. (A) Schematic phylogenetic tree of , with a focus on the Holozoa. The adjacent table summarizes genome assembly/annotation statistics. Data sources: red asterisks denote Teretosporea genomes reported here; double asterisks denote organisms sequenced for this study; † previously sequenced genomes (King et al., 2008; Fairclough et al., 2013; Suga et al., 2013); ‡ organisms for which transcriptomic data exists but no genome is available (Torruella et al., 2015). (B) Overview of the phenotypic traits of each group of unicellular Holozoa, focusing on their multicellular-like characteristics. For further details, see (Torruella et al., 2015; Mendoza et al., 2002; Marshall et al., 2008; Glockling et al., 2013). Figure 1—source data 1 and 2. DOI: 10.7554/eLife.26036.003 The following source data is available for figure 1: Source data 1. Table of genome structure statistics, from the data-set of eukaryotic genomes used in the study. DOI: 10.7554/eLife.26036.004 Source data 2. List of genome and transcriptome assemblies and annotations, including abbreviations, taxonomic classification and data sources. DOI: 10.7554/eLife.26036.005

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 2 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Pair-wise comparisons of 1-to-1 orthologs' lenghts

Salpingoeca Monosiga Capsaspora Sphaeroforma Creolimax Abeoforma Pirum Ichthyophonus Chromosphaera Corallochytrium Salpingoeca

1000

100 Monosiga

1000

100 Capsaspora

1000

100 Sphaeroforma

1000

100 Creolimax

1000

100 length Abeoforma

1000

100

1000 Pirum

100 Ichthyophonus

1000

100 Chromosphaera

1000

100 Corallochytrium

1000

100

100 1000 100 1000 100 1000 100 1000 100 1000 100 1000 100 1000 100 1000 100 1000 100 1000 length

Choanoflagellata Density Filasterea

Ichthyosporea 10 20 30 40 50 Corallochytrium

Figure 1—figure supplement 1. Comparisons of gene length of one-to-one orthologs from pair-wise comparisons of all 10 unicellular Holozoa. Dots around the diagonal lines indicate that orthologs from both organisms have identical lengths. Note that Abeoforma and Pirum have abundant incomplete orthologous sequences. Figure 1—figure supplement 1 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 3 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 1—figure supplement 1 continued DOI: 10.7554/eLife.26036.006

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 4 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Supports: Xenopus tropicalis Metazoa 99/1 ML LG+R7+C60 / Branchiostoma floridae BI LG+ 7+i Saccoglossus kowalevskii Distance: Lottia gigantea 0.2 Capitella teleta Daphnia pulex Acropora digitifera Nematostella vectensis 100/0.9 Hydra magnipapillata 95/1 Trichoplax adhaerens Oscarella carmela Amphimedon queenslandica Mnemiopsis leidyi Didymoeca costata Choanoflagellata Helgoeca nana Salpingoeca rosetta Monosiga brevicollis Capsaspora owczarzaki Filasterea Ministeria vibrans Washington Ichthyo- Ichthyosporea

Holozoa Ichthyophonus hoferi Alaska phonida Amoebidium parasiticum Creolimax fragrantissima Sphaeroforma arctica Abeoforma whisleri Teretosporea 98/1 Pirum gemmata Chromosphaera perkinsii Dermo- 93/0.85 Sphaerothecum destruens cystida Corallochytrium limacisporum Hawaii Corallochytrea Corallochytrium limacisporum India Mucor circinelloides Fungi Rhizopus oryzae 98/NA

Opisthokonta Mortierella verticillata Rhizophagus irregularis Cryptococcus neoformans Ustilago maydis Schizosaccharomyces pombe 64/1 Coemansia reversa Conidiobolus coronatus Allomyces macrogynus 84/0.95 Catenaria anguillulae Piromyces sp. E2 94/1 Gonapodya prolifera Spizellomyces punctatus

Holomycota Batrachochytrium dendrobatidis Rozella allomycis Parvularia atlantis (Nuclearia sp. ) Nucleariids alba Thecamonas trahens Apusomonadida Manchomonas bermudensis anathema Breviatea Subulatomonas tetraspora Pygsuia biforma Dictyostelium discoideum Amoebozoa Polysphondylium pallidum Physarum polycephalum Acanthamoeba castellanii

Figure 2. Phylogenomic tree of Unikonta/Amorphea. Phylogenomic analysis of the BVD57 taxa matrix. Tree topology is the consensus of two Markov chain Monte Carlo chains run for 1231 generations, saving every 20 trees and after a burn-in of 32%. Statistical supports are indicated at each node: (i) non-parametric maximum likelihood ultrafast-bootstrap (UFBS) values obtained from 1000 replicates using IQ-TREE and the LG + R7+C60 model; (ii) Bayesian posterior probabilities (BPP) under the LG+G7 + CAT model as implemented in Phylobayes. Nodes with maximum support values (BPP = 1 and UFBS = 100) are indicated by a black bullet. See Figure 2—figure supplement 1 for raw trees with complete statistical supports. Figure 2—source data 1. DOI: 10.7554/eLife.26036.007 The following source data is available for figure 2: Source data 1. BVD57 phylogenomic dataset (Torruella et al., 2015) including 87 unaligned protein domains (with PFAM accession number) per species. DOI: 10.7554/eLife.26036.008

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 5 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Phylogenomic analysis: BVD57, ML (LG+R7+C60) B. Phylogenomic analysis: BVD57, ML (LG+R7+PMSF)

100/100 Xenopus tropicalis 99 Xenopus tropicalis 0 . 2 0 . 0 9 99.1/99 Branchiostoma floridae 96 Branchiostoma floridae Saccoglossus kowalevskii Saccoglossus kowalevskii 100/100 100 100/100 Lottia gigantea 100 Lottia gigantea 100/100 Capitella teleta 100 Capitella teleta 100/100 Daphnia pulex 59 Daphnia pulex 100/100 Acropora digitifera 100 Acropora digitifera 99.9/100 100/100 Nematostella vectensis 98 Nematostella vectensis 93.5/95 Hydra magnipapillata 99 43 Hydra magnipapillata Trichoplax adhaerens Trichoplax adhaerens 100/100 100 100/100 Oscarella carmela 99 Oscarella carmela Amphimedon queenslandica Amphimedon queenslandica 100/100 Mnemiopsis leidyi 100 Mnemiopsis leidyi 100/100 Didymoeca costata 100 Didymoeca costata 100/100 Helgoeca nana 100 Helgoeca nana 100/100 68 100/100 Salpingoeca rosetta 100 Salpingoeca rosetta Monosiga brevicollis Monosiga brevicollis 100/100 Capsaspora owczarzaki 100 Capsaspora owczarzaki Ministeria vibrans Ministeria vibrans 100/100 Ichthyophonus hoferi Washington 100 Ichthyophonus hoferi Washington 100/100 Ichthyophonus hoferi Alaska 100 Ichthyophonus hoferi Alaska 100/100 100 100/100 Amoebidium parasiticum 100 Amoebidium parasiticum 100/100 Creolimax fragrantissima 100 Creolimax fragrantissima 100/100 Sphaeroforma arctica 100 Sphaeroforma arctica 100/100 Abeoforma whisleri 100 Abeoforma whisleri 98.1/98 Pirum gemmata 59 Pirum gemmata 100/100 Chromoshpaera perkinsii 100 Chromoshpaera perkinsii 84.9/93 Sphaerothecum destruens 58 Sphaerothecum destruens 100/100 Corallochytrium limacisporum Hawaii 100 Corallochytrium limacisporum Hawaii Corallochytrium limacisporum India Corallochytrium limacisporum India 100/100 Mucor circinelloides 100 Mucor circinelloides 99.2/100 Rhizopus oryzae 98 Rhizopus oryzae 99.2/98 Mortierella verticillata 98 Mortierella verticillata 100/100 Rhizophagus irregularis 100 Rhizophagus irregularis 100/100 97 100/100 Cryptococcus neoformans 100 Cryptococcus neoformans 100/100 Ustilago maydis 100 Ustilago maydis 100 100/100 Schizosaccharomyces pombe Schizosaccharomyces pombe 100/100 Coemansia reversa 100 Coemansia reversa Conidiobolus coronatus 71 Conidiobolus coronatus 17.5/64 100/100 Allomyces macrogynus 98 Piromyces sp. E2 100/100 Catenaria anguillulae 100 Gonapodya prolifera 54.2/84 57 Spizellomyces punctatus 92.8/94 Piromyces sp. E2 100 100 100/100 Gonapodya prolifera Batrachochytrium dendrobatidis 100/100 100 100/100 Spizellomyces punctatus 100 Allomyces macrogynus 100/100 Batrachochytrium dendrobatidis Catenaria anguillulae Rozella allomycis 100 Rozella allomycis 100/100 100/100 Parvularia atlantica 100 Parvularia atlantica Fonticula alba Fonticula alba 100/100 Thecamonas trahens 100 Breviata anathema Manchomonas bermudensis 100 Subulatomonas tetraspora 99.9/100 Breviata anathema 94 Pygsuia biforma 100/100 Subulatomonas tetraspora 100 Thecamonas trahens Pygsuia biforma Manchomonas bermudensis 100/100 Dictyostelium discoideum 100 Dictyostelium discoideum 100/100 Polysphondylium pallidum 100 Polysphondylium pallidum 100/100 Physarum polycephalum 100 Physarum polycephalum Acanthamoeba castellanii Acanthamoeba castellanii

C. Phylogenomic analysis: BVD57, BI (LG+ 7+CAT) Metazoa Choanoflagellata Xenopus tropicalis 1 Capsaspora 0 . 3 1 Branchiostoma floridae Saccoglossus kowalevskii Ichthyosporea 1 Lottia gigantea 1 Corallochytrium 1 Capitella teleta 1 Daphnia pulex Fungi Acropora digitifera 1 Nucleariids 0.9 1 Nematostella vectensis 1 Hydra magnipapillata Apusomonadida Trichoplax adhaerens 1 Breviatea 1 Oscarella carmela Amphimedon queenslandica Amoebozoa 1 Mnemiopsis leidyi 1 Didymoeca costata 1 Helgoeca nana 1 1 Salpingoeca rosetta Monosiga brevicollis 1 Capsaspora owczarzaki Ministeria vibrans 1 Ichthyophonus hoferi Washington 1 Ichthyophonus hoferi Alaska 1 1 Amoebidium parasiticum 1 Creolimax fragrantissima 1 Sphaeroforma arctica 1 Abeoforma whisleri 1 Pirum gemmata

0.85 1 Chromoshpaera perkinsii Sphaerothecum destruens 1 Corallochytrium limacisporum Hawaii Corallochytrium limacisporum India 1 Mucor circinelloides 1 Rhizopus oryzae

1 Mortierella verticillata 1 Rhizophagus irregularis 1 1 Cryptococcus neoformans 1 1 Ustilago maydis Schizosaccharomyces pombe 1 1 Coemansia reversa Conidiobolus coronatus 1 Allomyces macrogynus 1 0.95 Catenaria anguillulae 1 Piromyces sp. E2 1 1 Gonapodya prolifera 1 Spizellomyces punctatus 1 Batrachochytrium dendrobatidis Rozella allomycis 1 1 Parvularia atlantica Fonticula alba 1 Thecamonas trahens Manchomonas bermudensis 1 Breviata anathema 1 Subulatomonas tetraspora Pygsuia biforma 1 Dictyostelium discoideum 1 Polysphondylium pallidum 1 Physarum polycephalum Acanthamoeba castellanii

Figure 2—figure supplement 1. Phylogenomic analysis of the BVD57 matrix using (A) IQ-TREE maximum likelihood and the LG + R7+C60 model (supports are SH-like approximate likelihood ratio test/UFBS, respectively); (B) IQ-TREE maximum likelihood and the LG + R7+PMSF model (fast CAT Figure 2—figure supplement 1 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 6 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 2—figure supplement 1 continued approximation; non-parametric bootstrap supports); and (C) Phylobayes Bayesian inference under the LG+G7 + CAT model (BPP supports). DOI: 10.7554/eLife.26036.009

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 7 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Genome size and composition B. Similarity of TE complements

Salpingoeca Sphaeroforma LINE Exonic 0.6 0.6 Nc=70,630 Nc=85,556 SINE Intronic LTR Nf=843 Nf=3,064 300 Intergenic DNA P25f<0.01% P25f=0.01% ) Inferred Unknown b P =0.06% Metazoa 75f N75f=0.13%

M 0 0 ( LCA 200 Choanoflagellata h Capsaspora Abeoforma t Repetitive

g 0.6 0.6 Capsaspora fraction Nc=3,360 Nc=55,501 n Ichthyosporea e

l N =433 N =2,950 100 f f Corallochytrium P25f=0.06% P25f=0.02% P =1.43% P =0.34% 75f 0 75f 0 0 Corallochytrium Pirum s s a a a a a a a a n n x x A i l i

m m 0.6 r r l c u g o o a a N =1,470 0.6 N =83,489 s s i

l c c m m u u C o e e e c i n d r r r p t a s p i r m L p a t y o i o e t

o o N =611 N =3,601 s o o l f f o s f f h i P p g y h S a o i n o h a m o o p t n i h p

o P =0.95% P <0.01% m r o i

c e 25f 25f A s s e a i c h r z o e e p r p o b l M o y p a P =16.53% P =0.05% n m a l C 75f 75f T a t l a A h m e h 0 0 t m e a M C S o p r h N r A 100% 75% 100% 75% M o c S h I C

C sequence identity sequence identity

C. Synteny conservation in unicellular holozoans D. Coding region conservation

Chromosphaera- Chromosphaera- Trichoplax () Homo () Capsaspora Corallochytrium 142/2904 (4.9%) 72/2853 (2.5%) Monosiga ● ● ● ● ● ● ● ●

Salpingoeca ● ● ● ● ●

SphaeroformaCreolimaxPirum IchthyophonusAbeoformaCapsasporaCorallochytriumChromosphaeraSalpingoecaMonosiga Capsaspora ● ● ● ● ● ● Sphaeroforma ● ** ● ● ● ● ● Sphaeroforma ** Creolimax Creolimax ● ● Pirum Ichthyophonus ● ● ● ● ● ● Ichthyophonus

● ● ** ● ● ● ** Abeoforma Chromosphaera

** ● ** ● Capsaspora Corallochytrium † † Corallochytrium Corallochytrium- Creolimax - Amphimedon (Porifera) Nematostella () Chromosphaera Capsaspora Chromosphaera 129/2657 (4.9%) 12/2835 (0.4%) Monosiga ● ● ● ● ● ● ● Salpingoeca Salpingoeca ● ● ● ● Monosiga ● ● ● ● ● ● ● Clustering of species by collinearity Capsaspora

(Euclidean distance, Ward clustering) Sphaeroforma ● ● * ● ● ● **● ●

synteny ratio Creolimax ● ● ● ●

60 Ichthyophonus ● ●

30 Chromosphaera * ● ● ● **

* ● ** ● 0 Corallochytrium † † 0 0.4 0.8 NA 0 2 4 0 2 4 phylogenetic distance (subst/site)

Figure 3. Patterns of genome evolution across unicellular Holozoa. (A) Genome size and composition in terms of coding exonic, intronic and intergenic sequences of unicellular holozoan and selected metazoans. Percentage of repetitive sequences shown as black bars. Genome size of the Metazoa LCA (gray bar) from (Simakov and Kawashima, 2017) (exonic, intronic and intergenic composition not known). (B) Profile of TE composition for selected organisms. Density plots indicate the sequence similarity profile of the TE complement in each organism. Embedded pie-charts denote the relative abundance, in nucleotides, of the main TE superclasses in each genome: retrotransposons (SINE, LINE and LTR), DNA transposons (DNA) and unknown. Nc: total number TE copies in the genome; Nf: number of families to which these belong; P25f and P75f: percentage of most-frequent TE families that account for 25% and 75% of the total number of TE copies, respectively. (C) Heatmap of pairwise microsynteny conservation between 10 unicellular holozoan genomes. Species ordered according the number of shared syntenic genes (Euclidean distances, Ward clustering). At the right: selected pairwise comparisons of syntenic single-copy orthologs between unicellular holozoan genomes. Numbers denote number of syntenic genes, total number of single-copy orthologs, and proportions (%) of syntenic genes per the compared orthologs. Circle segments are scaffolds sharing ortholog pairs, connected by gray lines. (D) Phylogenetic distances between unicellular holozoans and four selected animals: Homo sapiens, Nematostella vectensis, Trichoplax adhaerens and Amphimedon queenslandica. Red asterisks denote organisms that have lower phylogenetic distances to metazoans than one (single asterisk) or both (double asterisks) (p value < 0.05 in Wilcoxon rank sum test). † indicates significantly higher distances between Corallochytrium and metazoans. Figure 1—source data 1, Figure 3—source data 1, 2 and 3. DOI: 10.7554/eLife.26036.010 The following source data is available for figure 3: Source data 1. Annotated repetitive sequences from 10 unicellular Holozoa genomes. DOI: 10.7554/eLife.26036.011 Source data 2. List of annotated transposable element families in 10 unicellular Holozoa genomes, with copy counts. DOI: 10.7554/eLife.26036.012 Source data 3. List of annotated transposable element families shared between the genomes of 10 unicellular holozoans and 11 animals, including the number of species where the TE family is present. DOI: 10.7554/eLife.26036.013

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 8 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Salpingoeca rosetta TE profile F. Ichthyophonus hoferi TE profile

1.25 1.25 1,500 150

3,000 1.00 900 1.00

type type NFH=276 DNA DNA 0.75 0.75 NFH=239 LINE LINE LTR LTR Other count density density count RC count count RC SINE 0.50 0.50 SINE Unknown Unknown

0.25 0.25

0 0.00 0 0.00 0 0 10090 80 70 100 90 80 70 10090 80 70 100 90 80 70 % identity % identity TE list % identity % identity TE list

B. Monosiga brevicollis TE profile G. Abeoforma whisleri TE profile

1.25 1.25 12,000 3,000 60,000 600 1.00 1.00

type DNA NFH=433 type NFH=97 0.75 LINE 0.75 DNA LTR LINE Other

count LTR count

density RC density count

RC count 0.50 RNA 0.50 Unknown SINE Unknown

0.25 0.25

0 0.00 0 0.00 0 0 10090 80 70 100 90 80 70 10090 80 70 100 90 80 70 % identity % identity TE list % identity % identity TE list

C. Capsaspora owczarzaki TE profile H. Pirum gemmata TE profile

1.25 1.25 1,000 15,000

900 1.00 9,000 1.00

type NFH=66 DNA type 0.75 0.75 NFH=479 LINE DNA LTR LINE Other count density density LTR count RC count RC 0.50 0.50

count SINE Unknown

0.25 0.25

0 0.00 0 0.00 0 0 100 90 80 70 100 90 80 70 10090 80 70 100 90 80 70 % identity % identity % identity % identity TE list TE list D. Creolimax fragrantissima TE profile I. Chromosphaera perkinsii TE profile

800 1.25 60 1,500 1.25

400 1.00 1.00 NFH=129 type 0.75 type 0.75 NFH=146 DNA DNA LINE LINE LTR LTR density count count count count RC density 0.50 RC 0.50 Unknown SINE

0.25 0.25

0 0.00 0 0.00 0 0 10090 80 70 100 90 80 70 100 90 80 70 100 90 80 70 % identity % identity TE list % identity % identity TE list

E. Sphaeroforma arctica TE profile J. Corallochytrium limacisporum TE profile

1.25 1.25 600 25,000 60 1.00 30,000 1.00

type NFH=42 type DNA 0.75 NFH=656 0.75 count DNA LINE LINE LTR LTR count density RC count count density SINE SINE 0.50 0.50 Unknown Unknown

0.25 0.25

0 0.00 0 0.00 0 0 10090 80 70 100 90 80 70 10090 80 70 100 90 80 70 % identity % identity TE list % identity % identity TE list

Figure 3—figure supplement 1. Profile of TE composition of unicellular Holozoa. (A-J) Profile of transposable element (TE) composition of 10 unicellular Holozoa, including (i) distribution of sequence similarity frequencies within the TE complement obtained from BLAST alignments (minimum 70% identity and 80 bp alignment length); (ii) same data but using density-normalized plots; and (iii) raw counts of hits for each TE family, indicating the number of families with hits (NFH) for each species. Each third panel illustrates how TE complements can be biased towards a handful of families with a high number of similarity hits (e.g. Monosiga or Pirum) or, conversely, exhibit even distributions (e.g. Corallochytrium). DOI: 10.7554/eLife.26036.014

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 9 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Shared TE families across Holozoa B. TE families present in >2 holozoan species (only families accounting for 25% of TE copies)

Present in both unicellular and multicellular Holozoa present in present in Present in Only present in unicellular TE family # species TE family unicellular or multicellular Holozoa presence Metazoa both Holozoa 6 Daphne-18_HM TRUE TRUE TRUE 6 IMPB_01 TRUE TRUE TRUE 5 CR1-79_HM TRUE TRUE TRUE 5 Ginger1-1_AC TRUE TRUE TRUE 4 Ag-Jock-1 TRUE TRUE TRUE 4 Gypsy-24_DEl-I TRUE TRUE TRUE 4 Novosib-6_CR TRUE TRUE TRUE 4 Penelope-1_DR TRUE TRUE TRUE 4 RTAg4 TRUE TRUE TRUE 4 TART_DV TRUE TRUE TRUE 3 Coprina_Cc1 TRUE TRUE TRUE 3 hAT-N4_ALy TRUE TRUE TRUE 3 R5-1_SM TRUE TRUE TRUE 3 REP-4_PBa TRUE TRUE TRUE 3 Sola1-1_AC TRUE TRUE TRUE 3 Sola3-1_BF TRUE TRUE TRUE 3 EnSpm-5_HM TRUE FALSE FALSE 2 DNA2-5_DR TRUE TRUE TRUE

TE families TE 2 DNA8-5_DR TRUE TRUE TRUE 2 EnSpm-34_SBi TRUE TRUE TRUE 2 hAT-N144_DR TRUE TRUE TRUE 2 Penelope-13_HM TRUE TRUE TRUE 2 Polinton-2A_NV TRUE TRUE TRUE

(Euclideancompletedistancesclustering) and 2 Polinton-2_HM TRUE TRUE TRUE 2 Polinton-8_NVi TRUE TRUE TRUE 2 DNA-N4_DR FALSE TRUE FALSE 2 ENSPM-6_DR TRUE FALSE FALSE 2 Gypsy-20_CSa-I TRUE FALSE FALSE 2 Kolobok-1_AQu TRUE FALSE FALSE 2 Kolobok-3_Aplcal TRUE FALSE FALSE

Aiptasia sp. Homo sapiens Lottia gigantea Pirum gemmata Ciona intestinalis MnemiopsisOscarella leidyi carmela MonosigaSalpingoeca brevicollis rosetta Abeoforma whisleri Trichoplax adhaerens SphaeroformaIchthyophonus arctica hoferi Nematostella vectensis CapsasporaCreolimax owczarzaki fragrantissima SaccoglossusDrosophila kowalevskii melanogaster Chromosphaera perkinsii Amphimedon queenslandica Corallochytrium limacisporum

C. Distribution of TE family presence per species D. Distribution of TE family presence per species E. Distribution of TE family presence per species (all families) (only families accounting for 75% of TE copies) (only families accounting for 25% of TE copies)

total = 18649 | present in both = 5755 total = 3587 | present in both = 402 total = 243 | present in both = 25 present in metazoa = 17252 | present in unicell holozoa = 7152 present in metazoa = 3261 | present in unicell holozoa = 728 present in metazoa = 210 | present in unicell holozoa = 58

Present in both unicellular Present in both unicellular 200 Present in both unicellular and multicellular Holozoa and multicellular Holozoa and multicellular Holozoa Only present in unicellular Only present in unicellular Only present in unicellular or multicellular Holozoa or multicellular Holozoa or multicellular Holozoa 6000

2000 150

4000 100 count count count

1000

2000 50

0 0 0

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 # species # species # species

Figure 3—figure supplement 2. Shared TEs between unicellular Holozoa and genomes. (A) Pattern of presence/absence of TE families across Holozoa (11 animals and 10 unicellular holozoans). Dendrogram at the left represents the sorting of TE families by Euclidean distance and Ward clustering. Colored column indicates presence in both unicellular and multicellular holozoans (green) or just on unicellular or multicellular holozoans Figure 3—figure supplement 2 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 10 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 3—figure supplement 2 continued (light brown). (B) List of the most abundant TE families across holozoans, including only most abundant families present in >1 species and accounting for 25% of the copies in a given genome. Complete table in Figure 3—source data 3.(C–E) Distribution of the number of TE families present in unicellular holozoans per number of species (X axis). The color code indicates presence in both unicellular and multicellular holozoans (green) or just on unicellular or multicellular holozoans (light brown). Panel C includes all TE families; panel D only most abundant TEs accounting for 75% of the copies in a given genome (P75f); panel E id. for 25% of copies (P25f). DOI: 10.7554/eLife.26036.015

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 11 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Synteny conservation between unicellular and multicellular Holozoa

Metazoa Hsap synteny ratio 250 Cint 125 Skow 0 Dmel 0 0.4 0.8 NA Lgig Nvec Aipt Tadh Mlei Ocar Aque Mbre Sros Cowc Sarc Cfra Teretosporea Ihof Nk52 Clim Ihof Aipt Cint Mlei Lgig Cfra Clim Sros Sarc Tadh Ocar Nvec Mbre Dmel Nk52 Aque Hsap Skow Cowc

Figure 3—figure supplement 3. Heatmap of pairwise ratios of ortholog collinearity between 10 unicellular holozoan genomes. Species are manually ordered by taxonomic classification (no clustering). DOI: 10.7554/eLife.26036.016

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 12 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Intron length and count distribution B. % exon and intron sequence in genome

4.0 Metazoa 40 Cfra Sros Choanoflagellata 3.0 Hsap Capsaspora 30 Ihof Dmel Mbre Ichthyosporea 25 2.0 Corallochytrium Metazoa Sarc Other eukaryotes Hsap 20 Aque 1.5 Cowc 15 Ocar 1.0 Ihof 10 Sarc Cfra 0.6 Cper 0.3 3.3 intron length(kbp) intron Sros Clim Awhi Cper Pgem % intron in genome in intron % Pgem Cowc Mbre Awhi Dmel Aque Ocar identity Clim 0 0 0 2.5 5.0 7.5 0 3.3 10 15 25 50 75 # intron/gene % exon in genome

Figure 4. Intron abundance in eukaryotes. (A) Distribution of intron lengths and number of introns per gene in selected genomes. Dots represent median intron lengths and vertical lines delimit the first and third quartiles. Color code denotes taxonomic assignment. Species abbreviation as in Figure 1 and Figure 2—source data 1.(B) Fraction of the genome covered by introns and exons in selected eukaryotes. Dotted line represents the identity between both values. Color code denotes taxonomic assignment. Figure 1—source data 1. DOI: 10.7554/eLife.26036.017

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 13 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Intron gain and loss per lineage B. Intron gain-loss % in ancestors

Deuterostomia 6.7 Homo Introns per CDS kbp Chordata 7.6 20% 7.5 4.3 Ciona Bilateria 6.5 7.7 Saccoglossus 0.032.7 4.5 6.0 7.2 9.0 Protostomia 1.5 Drosophila 6.9 0% gains 8.6 6.2 Lottia losses Cnidaria 7.4 Nematostella 8.7 7.4 Metazoa 7.3 Aiptasia -20% Variance-to- % gain - loss - gain % mean ratio 0% 4% 8% 7.8 Trichoplax (confidence) 8.7 Metazoa+ 3.8 Mnemiopsis -40% Choanoflagellata Porifera 6.2 Oscarella 7.3 8.6 6.8 Amphimedon VMR 1.0 1.81.4 Filozoa Choanoflagellata 6.0 Monosiga Mode 5.5 7.5 Gain 5% 7.1 Salpingoeca Loss -5% 3.5 Capsaspora Stasis Holozoa Creolimax+Sphaeroforma Filozoa -5% to 5% BilateriaCnidaria PoriferaMetazoa Holozoa ObazoaUnikonta 5.5 4.8 Sphaeroforma Cfra+Sarc Ichthyophonida 4.9 Eumetazoa 5.0 Creolimax Opisthokonta IchthyosporeaTeretosporea Ichthyosporea 6.9 Ichthyophonida 5.5 6.2 Ichthyophonus Eumet.+Placozoa Choanoflagellata Teretosporea Metazoa+Choanof. 5.5 0.7 Chromosphaera Opisthokonta 0.0 Corallochytrium C. 5.8 2.7 1.3 Schizosaccharomyces Conservation of NMD and SR machinery Dikarya 4.2 1.2 Neurospora NMD machinery* SR splicing factors 4.3 Cryptococcus Fungi Obazoa 3.2 Mortierella 4.8 5.3 Holomycota 4.4 Spizellomyces 5.6 Unikonta/ 4.5 Gonapodya Upf (3) SMGs RF(6) (2) EJC (4) SRSF3/7SRSF1/9SRSF4/5/6SRSF2/8TRA2 Amorphea 3.5 Fonticula Monosiga 4.9 Apusomonadida 0.5 Thecamonas Salpingoeca Dictyostelia 2.3 1.2 Dictyostelium Capsaspora Amoebozoa 4.5 2.1 Polysphondylium Creolimax 4.9 Acanthamoeba Sphaeroforma Embryophyta 6.3 Arabidopsis Ichthyophonus 6.7 5.8 Physcomitrella 5.5 Chromosphaera 6.2 Chlamydomonas 4.9 Corallochytrium LECA Diaphoratickes 5.9 Chlorella 4.9 4.3 0.3 Toxoplasma Metazoa 5.0 Alveolata 2.7 Tetrahymena Metazoa+Choanof. 3.2 Bikonta 7.4 Bigelowiella Ichthyophonida SAR 6.2 Ectocarpus Holozoa 4.9 1.6 Phytophthora Excavata Stramenopila Opisthokonta 0.4 Naegleria LECA †

Figure 5. Intron evolution. (A) Rates of intron gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of introns per kbp of coding sequence at each ancestral node. Bolder edges mark the lines of descent between the LECA and Metazoa/Ichthyophonida, which were characterized by continued high intron densities (see text). Red and green bars represent the inferred number of intron gains (green) and losses (red) in ancestral nodes. (B) Difference between intron site gains and losses in selected ancestors, including animals (left; from Metazoa to Unikonta/Amorphea) and unicellular holozoans (right). For each ancestor, we specify the variance-to-mean ratio of the inferred number of introns from 100 bootstrap replicates (higher values, denoted by lighter purple, indicate less reliable inferences; see Methods. The color code denotes modes of intron evolution: dominance of gains (green), losses (pink) and stasis (light gray). (C) Conservation of the NMD machinery and SR splicing factors in unicellular holozoans (up) and selected ancestors (down). Black dots indicate the presence of an ortholog, and empty dots partial conservation. For the NMD machinery, each column summarizes the presence of multiple gene families (number between brackets). † denotes the ancestral eukaryotic origin of TRA2 according to (Plass et al., 2008). Complete survey at the species and gene levels available as Figure 4—figure supplements 2 and 3. Figure 5—source data 1, 2 and 3. DOI: 10.7554/eLife.26036.018 The following source data is available for figure 5: Source data 1. Rates of gain and loss of intron sites for extant and ancestral eukaryotes, calculated for a rates-across-sites Markov model for intron evo- lution with branch-specific gain and loss rates (Csuro¨s, 2008). DOI: 10.7554/eLife.26036.019 Source data 2. Reconstruction of intron site evolutionary histories, using a rates-across-sites Markov model for intron evolution, with branch-specific gain and loss rates (Csuro¨s, 2008). DOI: 10.7554/eLife.26036.020 Source data 3. Reconstruction of the evolution of the NMD machinery (He and Jacobson, 2015) and key SR splicing factors (Plass et al., 2008). DOI: 10.7554/eLife.26036.021

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 14 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Intron site classification conserved lineage-specific ambiguous site intron sites intron gains (alignment gaps)

P0 P2 TGCAACTTA lost introns in TGUAACTTG unclassified site conserved site C N L (non-homologous alignment region)

Figure 5—figure supplement 1. Classification of intron sites by conservation in protein alignments, as used in (Csu˝ro¨s and Miklo´s, 2006; Csuro¨s, 2008). Grey boxes denote aligned amino acids with gaps (dashed lines). Intron sites (vertical lines) are conserved if they are present in various organisms at the same alignment position and codon phase. The method accounts for loss of intron sites (red crosses), independent gains at the same site (different codon phases), ambiguous sites (in poorly-aligned regions) and unclassifiable sites (non-homologous regions). DOI: 10.7554/eLife.26036.022

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 15 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. NMD machinery

Core Release Exon-junction NMD factors complex SMG5/6/7 SMG8/9 Y14 eRF1 eRF3 eIF4A3 unclear homolog SMG1 SMG5 SMG6 SMG7 SMG8 SMG9 unclear homolog Magoh MLN51 Species UPF1 UPF2 UPF3 Homo sapiens Nematostella vectensis Trichoplax adhaerens Mnemiopsis leidyi Amphimedon queenslandica Metazoa * * * ? Monosiga brevicollis Salpingoeca rosetta Choanoflagellata Metazoa+Choanoflagellata ? Capsaspora owczarzaki Ministeria vibrans Filozoa (Met+Cho+Capsaspora) ? Sphaeroforma arctica Creolimax fragrantissima Ichthyophonus hoferi Abeoforma whisleri Pirum gemmata Ichthyophonida ? Chromosphaera perkinsii Sphaerothecum destruens Dermocystida Ichthyosporea Corallochytrium limacisporum Teretosporea Holozoa ? Schizosaccharomyces pombe Spizellomyces punctatus Fungi Opisthokonta ? Obazoa (Opi+Apusomonadida) ? Unikonta/Amorphea Bikonta/Diaphoratickes LECA

SMG5/6/7 SMG8/9 Core Release Exon-junction NMD factors complex

? SMG9 is only present in vertebrartes and the chytrid S. punctatus , with Metazoa Ichthyosporea Ancestors of Metazoa unclear orthology. SMG8/9 homologs present in most eukaryotes. Choanoflagellata Corallochytrium Other ancestors * Possible serial duplications of SMG5/6/7 at the Metazoa LCA (origin of Filasterea Fungi Present in a close SMG5, 6 and 7). Ichthyosporeans have extra SMG5/6/7 metazoan-like animal relative paralogs of unclear orthology (see Fig5-Supplement 3).

B. SR splicing factors C. RNA-binding domains

0 100 200 300 400 500 600 700 800 900 1000 Homo sapiens Ciona intestinalis Branchiostoma floridae zf-CCCH/2 Saccoglossus kowalevskii RRM/1/2/3/4/6 Drosophila melanogaster Daphnia pulex KH1/2/3 Caenorhabditis elegans cwf21 Capitella teleta Lottia gigantea SRRM_C Aiptasia sp. Nematostella vectensis Acropora digitifera Hydra magnipapillata (or vulgaris) TRA2 SRSF4-5-6(SRP2) 2nd RRM_1 (SRP1) SRSF2-8 Species (SRP20/9G8) SRSF3-7 (ASF) SRSF1-9 SRSF4-5-6(SRP2) domain RRM_1 1st Trichoplax adhaerens Homo sapiens Mnemiopsis leidyi Sycon ciliatum Nematostella vectensis Leucosolenia complicata Trichoplax adhaerens Oscarella carmela Amphimedon queenslandica Mnemiopsis leidyi Monosiga brevicollis Amphimedon queenslandica Salpingoeca rosetta Ministeria vibrans Metazoa Capsaspora owczarzaki Monosiga brevicollis Sphaeroforma arctica Creolimax fragrantissima Salpingoeca rosetta Abeoforma whisleri Choanoflagellata Pirum gemmata Amoebidium parasiticum Metazoa+Choanoflagellata Ichthyophonus hoferi Capsaspora owczarzaki Sphaerothecum destruens Chromoshpaera perkinsii Ministeria vibrans Corallochytrium limacisporum Filozoa (Met+Cho+Capsaspora) Schizosaccharomyces pombe Neurospora crassa Sphaeroforma arctica Cryptococcus neoformans Creolimax fragrantissima Ustilago maydis Mortierella verticillata Ichthyophonus hoferi Rhizopus oryzae Abeoforma whisleri Allomyces macrogynus Spizellomyces punctatus Pirum gemmata Rhizophagus irregularis Rozella allomycis Ichthyophonida Gonapodya prolifera Chromosphaera perkinsii Parvularia atlantica (former Nuclearia spp.) Fonticula alba Sphaerothecum destruens Thecamonas trahens Dermocystida Pygsuia biforma N. longa : highly partial Ichthyosporea Nutomonas longa transcriptome assembly Dictyostelium* discoideum * Corallochytrium limacisporum Polysphondylium pallidum Acanthamoeba castellanii Teretosporea Arabidopsis thaliana Holozoa Selaginella moellendorffii Physcomitrella patens Schizosaccharomyces pombe Chlamydomonas reinhardtii Rhizophagus irregularis Volvox cartieri Chlorella variabilis Fungi Micromonas pusilla Opisthokonta Cyanidioschyzon merolae * Cyanophora paradoxa Obazoa (Opi+Apusomonadida) * ** Ectocarpus siliculosus Unikonta/Amorphea Phytophthora infestans * Toxoplasma gondii Bikonta/Diaphoratickes ? Tetrahymena thermophila LECA ? Bigelowiella natans Emiliania huxleyi Guillardia theta SRSF1-9 is absent in Fungi but present in ? Present in A. thaliana according Leishmania major * Parvularia atlantica (Nuclearid, Holomycota) to Plass et al. 2008 Naegleria gruberi Ancestral eukaryotic SRSF4-5-6 had one 0 100 200 300 400 500 600 700 800 900 1000 ** RRM_1 domain, which duplicated internally at the Opisthokonta LCA # domains

Figure 5—figure supplement 2. Phylogenetic distribution of the NMD machinery, SR splicing factors and RNA-binding domains in eukaryotes. (A) Phylogenetic distribution of the NMD molecular toolkit across eukaryotes, as defined in Whelan et al. (2015), with a focus on unicellular holozoans and Figure 5—figure supplement 2 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 16 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 5—figure supplement 2 continued selected metazoans. The analysis includes 12 gene families: the core regulatory factors Upf1, Upf2 and Upf3 (also Smg2-4); the accessory proteins Smg1, Smg5/6/7 and Smg8/9; the release factors eRF1 and eRF3; and the exon junction complex (EJC) proteins eIF4A3, Y14, Magoh and MLN51. Extant species are color-coded by taxonomic assignment. Black dots indicate reconstructed LCAs within the line of descent between Metazoa and the LECA, whereas gray dots indicate LCAs not affiliated with Metaoza. Red, hollow circles indicate that a specific gene is absent from a given animal genome, but nevertheless present in a close relative in the same lineage (e.g., poriferans other than Amphimedon). See Methods for details on ortholog identification. Complete survey at the species level available as Figure 5—source data 3. Note that the core NMD tool-kit is conserved in most post-LECA LCAs, both in the animal and ichthyophonid ancestry. Secondary losses affect extant taxa: Corallochytrium lacks the complete EJC (only eIF4A3 is conserved) and homologs of Smg5/6/7 and Smg8/9. (B) Phylogenetic distribution of the SR splicing factors involved in alternative splicing determination across eukaryotes, as defined in Plass et al. (2008), with a focus on unicellular holozoans and selected metazoans. The analysis is focused on the following RNA-binding genes: SRP20/9G8 (human paralogs SRSF3/7), ASF (human paralogs SRSF1/9), SRP2 (human paralogs SRSF4/ 5/6), SRP1 (human paralogs SRSF2/8) and TRA2 (human paralogs TRA2A/B). Extant species are color-coded by taxonomic assignment. Black dots indicate reconstructed LCAs within the line of descent between Metazoa and the LECA, whereas gray dots indicate LCAs not affiliated with Metaoza. See Methods for details on ortholog identification. Complete survey at the species level available as Figure 5—source data 3. Note that the complement of SR genes is conserved in most post-LECA LCAs, including Opisthokonta, Holozoa and Metazoa. Secondary losses are, however, frequent in some lineages: SRSF1/9 is lost in Fungi, Teretosporea and Filasterea; Corallochytrium lacks all canonical SR genes but TRA2 and a fragmentary SRSF4/5/6; choanoflagellates have lost SRSF4/5/6, etc. (C) Counts of RNA-binding protein domains in extant eukaryotic genomes. Unicellular holozoans have a rich complement of RNA-binding proteins (domains per species: average 127.8; median 120), albeit less abundant than metazoans’ (domains per species: average 225.3; median 187). DOI: 10.7554/eLife.26036.023

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 17 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Synteny conservation between unicellular and multicellular Holozoa

Metazoa Hsap synteny ratio 250 Cint 125 Skow 0 Dmel 0 0.4 0.8 NA Lgig Nvec Aipt Tadh Mlei Ocar Aque Mbre Sros Cowc Sarc Cfra Teretosporea Ihof Nk52 Clim Ihof Aipt Cint Mlei Lgig Cfra Clim Sros Sarc Tadh Ocar Nvec Mbre Dmel Nk52 Aque Hsap Skow Cowc

Figure 5—figure supplement 3. Phylogenetic analysis of (A) eIF4A3, (B) Smg5/6/7, and (C) eRF3, using Maximum likelihood in IQ-TREE (supports are SH-like approximate likelihood ratio test/UFBS, respectively); including Bayesian inference supports for the ortologous groups of interest (BPP statistical supports, in red). DOI: 10.7554/eLife.26036.024

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 18 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Profile of intron site presence across eukaryotes B. Phylostratigraphy

Clustering of intron site presence across species of Capsaspora introns (Spearman corr. distance, Ward clustering) Metazoa Porifera All introns Eumetazoa+Placozoa Eumetazoa (n=564) Cnidaria Metazoa+Choanoflagellata 0.9% Non-bilaterian Trichoplax 13.6% Oscarella 4.0% Metazoa Amphimedon Aiptasia 10.8% 69.4% Bilaterian Nematostella Metazoa Bilateria Deuterostomia 1.2% Protostomia Chordata Homo Saccoglossus Choano- Lottia Introns with flagellata Choanoflagellata regulatory sites (n=25) Salpingoeca Monosiga Corallochytrium 1.6% Mnemiopsis 0.8% 12.8% Chromosphaera 3.2% Secondary Drosophila 2.0% losses Ciona 79.6% Holozoa Teretosporea

Ichthyosporea clustering) Ward distance, (Spearmancorr. Filozoa presence intronsites by species Clusteringof Ichthyophonida Unikonta Unicellular LCA, Obazoa Ichthyophonus , Opisthokonta Capsaspora LECA Capsaspora Ichthyophonus Filozoa Capsaspora Creolimax +Sphaeroforma Holozoa Sphaeroforma Opisthokonta Creolimax and Creolimax Unikonta/ Sphaeroforma Amorphea LECA

Bilaterian Metazoa Ichthyosporea presence probability Non-bilaterian Metazoa Corallochytrium Choanoflagellata Unicellular animal Capsaspora ancestor 0.0 0.2 0.40.6 0.8 1.0

Figure 6. Profile of intron site presence across eukaryotes. (A) Heatmap representing presence/absence of 4312 intron sites (columns) from extant and ancestral holozoan genomes, plus the line of ascent to the LECA (rows). Intron sites and genomes have been grouped according to their respective patterns of co-occurrence (dendrogram based on Spearman correlation distances and Ward clustering algorithm; see Methods). The dendrogram of genome clusterings is shown to the left. Figure 5—source data 2.(B) Phylostratigraphic analysis of the origin of Capsaspora introns, considering all sites (left) and those with putative regulatory sites (right; after [Sebe´-Pedro´s et al., 2016b]). DOI: 10.7554/eLife.26036.025

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 19 of 28

Research article Genes and Chromosomes Genomics and Evolutionary Biology A AA Domain combination gain and loss per lineage B Ratio of domain duet gain and loss in ancestors 2,000 -2,000 -1,000 0 3,000 1,000 # domain duets TFs Signalling Protein bind. Protein Ubiquitin Homo -4 -2 0 2 4 ECM Ciona Bilateria Bilateria 1,600 3,000 6,000 9,000 Saccoglossus Cnidaria * Eumetazoa Drosophila * gains Lottia Eumetazoa * * losses Cnidaria Nematostella Eumet.+Placozoa * * 0 Aiptasia Porifera 500 1,000 2,000 Metazoa Trichoplax Metazoa * Metazoa+ Mnemiopsis * * * Choanoflagellata Metazoa+Choanofl. Porifera Oscarella Filozoa * Filozoa Amphimedon * * Choanoflagellata Monosiga Holozoa * * * * Salpingoeca Opisthokonta * Holozoa Capsaspora Obazoa Creolimax+Sphaerophorma Sphaeroforma Unikonta * * * * Ichthyophonida Creolimax Choanoflagellata Ichthyosporea Ichthyophonus Teretosporea * * Chromosphaera Cfra+Sarc * Opisthokonta Corallochytrium Ichthyophonida Ascomycota Schizosaccharomyces Ichthyosporea * Dikarya Neurospora Teretosporea Cryptococcus * * Obazoa Fungi GT/L T  1  log 10 (G T/L T) Mortierella GT/L T < 1  log 10 (1/(G T/L T)×(-1)) Holomycota Spizellomyces Unikonta/ Amorphea Gonapodya 0-2 2 log((G /L )/(G /L )) Fonticula i i T T Apusomonadida Thecamonas Amoebozoa Dictyostelium Polysphondylium Acanthamoeba Embryophyta Arabidopsis Viridiplantae Physcomitrella Chlamydomonas LECA Diaphoratickes Chlorella Toxoplasma Alveolata Tetrahymena Bikonta Bigelowiella SAR Ectocarpus Stramenopila Phytophthora Excavata Naegleria

Figure 7. Evolution of protein domain architectures. (A) Protein domain combination gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of different domain combinations (in different gene families) in that node of the tree. Bolder edges mark the line of descent between the LCAs of Opisthokonta and Bilateria, which was generally dominated by gains of protein domain combinations (see text). Red and green bars represent the inferred number of gains and losses, respectively. (B) Gain/loss ratio of protein domain diversity in selected ancestors, including animals (upper chart; from Metazoa to Unikonta/Amorphea) and unicellular holozoans (lower). Heatmap to the right represents the log-ratio value of the diversification rate for selected sub-sets of functionally-related protein domains relevant to multicellularity: green indicates higher-than-average diversification; pink less; white asterisks indicate two-fold or more increases or decreases (all comparisons relative to the whole set of protein domains). Source Data Figure 7—source data 1, 2, 3 and 4. DOI: 10.7554/eLife.26036.026 The following source data is available for figure 7: Source data 1. Rates of gain and loss of protein domain pairs within a given orthogroup for extant and ancestral eukaryotes, calculated for a phyloge- netic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csuro¨s, 2010). DOI: 10.7554/eLife.26036.027 Source data 2. Reconstruction of the evolutionary histories of protein domain pairs gains within orthogroups, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csuro¨s, 2010). DOI: 10.7554/eLife.26036.028 Source data 3. Reconstruction of the evolutionary histories of individual protein domains, using Dollo parsimony and accounting for gains and losses (Csuro¨s, 2010). DOI: 10.7554/eLife.26036.029 Source data 4. Rates of gain and loss of orthogroups for extant and ancestral eukaryotes, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications. DOI: 10.7554/eLife.26036.030

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 20 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

log(gain/loss) A B -4 -3 -2 -1 0 1 2 Homo sapiens Protostomia Deuterostomia Ciona intestinalis +5 / -364 +63 / -130 Homo +878 / -142 Chordata Saccoglossus kowalevskii Bilateria Ciona +3 / -1118 Deuterostomia +137 / -261 Saccoglossus +3 / -499 Drosophila melanogaster Lottia gigantea Eumetazoa Drosophila +124 / -854 +165 / -54 Protostomia Lottia +8 / -313 Bilateria Bilateria+Cnidaria+Placozoa Cnidaria Nematostella vectensis +53 / -194 +9 / -401 Nematostella +91 / -52 Aiptasia sp. Aiptasia +35 / -389 Cnidaria Metazoa Bilateria+Cnidaria +376 / -149 Trichoplax +28 / -1447 Trichoplax adhaerens Mnemiopsis +6 / -1978 Bilateria+Cnidaria+Placozoa Metazoa+Choanoflagellata Mnemiopsis leidyi +61 / -60 Porifera +7 / -870 Oscarella +25 / -1394 Oscarella carmela Amphimedon Amphimedon queenslandica Filozoa +87 / -430 Porifera +45 / -207 Choanoflagellata +3 / -1376 Monosiga +2 / -651 Metazoa Monosiga brevicollis Salpingoeca +11 / -279 Salpingoeca rosetta Holozoa Capsaspora +6 / -1749 Choanoflagellata +154 / -161 Creolimax+Sphaerophorma +6 / -179 Metazoa+Choanoflagellata Sphaeroforma +9 / -419 Capsaspora owczarzaki Ichthyophonida +2 / -606 Creolimax +6 / -211 Filozoa (Met+Cho+ Capsaspora ) Opisthokonta Ichthyosporea +1 / -134 Sphaeroforma arctica +219 / -25 Ichthyophonus +1 / -1433 Teretosporea +4 / -883 Creolimax fragrantissima Chromosphaera +12 / -689 Sphaeroforma +Creolimax Ichthyophonus hoferi Obazoa Corallochytrium +9 / -1331 Ichthyophonida Chromosphaera perkinsii +58 / -42 +104 / -96 Holomycota Fungi Ichthyosporea Unikonta/ +5 / -987 Corallochytrium limacisporum Teretosporea Amorphea +1 / -2060 +208 / -11 Fonticula Holozoa Apusomonadida Schizosaccharomyces pombe +23 / -2004 Neurospora crassa Thecamonas Ascomycota Amoebozoa Cryptococcus neoformans +6 / -1215 Amoebozoa Dikarya Mortierella verticillata Spizellomyces punctatus +88 / -691 Gonapodya prolifera LECA Diaphoratickes Viridiplantae Fungi +140 / -113 Fonticula alba Holomycota Opisthokonta SAR Thecamonas trahens Bikonta Excavata +3 / -512 Obazoa (Opistho+Apusomonadida) +29 / -2065 Dictyostelium discoideum Naegleria Polysphondylium pallidum Dictyostelia Acanthamoeba castellanii Amoebozoa Unikonta/Amorphea Arabidopsis thaliana Physcomitrella patens Embryophyta/Land Chlamydomonas reinhardtii Chlorella variabilis Chlorophyta Viridiplantae Toxoplasma gondii Tetrahymena thermophila Alveolata Bigelowiella natans Ectocarpus siliculosus Phytophthora infestans Stramenopila/Heterokonta SAR (Stram+Alveolata+) Bikonta/Diaphoratickes Naegleria gruberi

Protein domain pairs Single protein domains

Figure 7—figure supplement 1. Gains and losses of individual protein domains across eukaryotes. (A) Ancestral reconstruction of gains (green) and losses (red) of protein domains per lineage, based on Dollo parsimony. Note that, in contrast with the evolution of protein domain combinations here most nodes are dominated by losses. The most notable exceptions are the Metazoa and Opisthokonta LCAs (dominated by gains) and its unicellular ancestors from Holozoa to Choanoflagellata + Metazoa LCAs (in which gains and losses are in a relative stasis). Figure 7—source data 3.(B) Log-ratio of gains-to-losses for single protein domains (brown bars) and protein domain pairs (blue bars), based on the respective ancestral reconstructions. Positive values thus mean that gains are greater than losses. Loss of single protein domains dominates in most nodes, but gains in protein domain combinations can nevertheless outweigh losses in many ancestors (e.g., Holozoa or Metazoa LCAs). DOI: 10.7554/eLife.26036.031

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 21 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Network modularity B. Modularity & community C. Example domain (animals & ancestors) size (all eukaryotes) co-occurrence graph

1.0 1.0 ECM module: ● ● ●●●●●●●●  modularity ● ● ● ● ●●●●● ● ● ● ● ● ● ●●●● Global network ● ● ● ● ●● ●● Global network ● ● ● ● ● ●● promiscuity ● ● ● ● ● ● ● ● ●●●  ● ● ● ● ● ●●● ● ● ● ● ● ●●●●● ● ● ● 0.9 ● ● ● 0.9 ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.8 ● 0.8 ● s = -0.96 ● s = -0.93 ● p- value = 0.0 p- value = 0.0 0.7 0.7

1.0 1.0 TF modules TF subnetwork TF subnetwork  modularity promiscuity ● ●  ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● = 0.11 ● ● ● ● ● ● ●● ● s ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● p- value = 0.32 ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ●● ● 0.8 ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● modularity ● ● s = -0.05 ● ● ● ● p- value = 1.09e-05 Protein domain

● ● Global network ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● TF subnetwork 0.6 ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ECM subnetwork ●

● ● ● ● ● 0.4 0.4 2.5 5.0 7.5 10.0

HomoCiona Lottia Fungi LECA Bilateria Aiptasia Porifera Filozoa Holozoa Obazoa domains/community Chordata Cnidaria Oscarella Metazoa Unikonta Drosophila Trichoplax Monosiga CreolimaxCfra+Sarc Protostomia Eumetazoa Mnemiopsis Capsaspora Holomycota Nematostella Amphimedon Salpingoeca Teretosporea Opisthokonta SaccoglossusDeuterostomia SphaeroformaIchthyophonusIchthyosporea IchthyophonidaChromosphaeraCorallochytrium Eumet.+Placozoa Choanoflagellata Choanofl.+Metazoa Original network or sub-network Bilaterian Metazoa Ichthyosporea Random networks Non-bilaterian Metazoa Corallochytrium Choanoflagellata Unicellular animal Capsaspora ancestors

Figure 8. Protein domain architecture networks. (A and B) Modularity and community size of the global network of domain pairs (upper panels) and the TF subnetwork (lower panels), with 90% probability. The modularity parameter measures the fraction of the intra-community edges in the network, minus the expected value in a random network (takes values from 0 to 1; see Materials and methods and [Newman and Girvan, 2004]). Panels at the left show the observed modularity of the protein domain (sub)networks of various genomes (Holozoa and selected ancestors; dots are taxa-colored). Purple box plots represent the distribution of simulated modularities from 100 rewirings of the original organism-specific networks, while keeping a constant vertex degree distribution. Panels to the right show the relationship between modularities and the number of domains/community, both for actual genomes (orange) and simulated rewired networks (purple density plot, see Methods). Monotonic dependence between modularity and domains/community was tested for each set of data (global, TF and their respective simulations) using Spearman’s rank correlation coefficient (s), and linear regression fits are included for clarity. Note that simulated TF subnetworks are less modular and have more domains/community than the original ones, signaling their higher-than-expected modularities. Note that the scales of the vertical axes change between upper and lower panels. (C) Example of protein domain co-occurrence network. Vertices represent domains, linked by edges if they co-occur within the same gene family. Two subnetworks are highlighted in yellow (domain pairs occurring in TF genes) or green (same for signaling genes). Figure 7—source data 1 and 2, Figure 1—source data 2. DOI: 10.7554/eLife.26036.032

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 22 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. Signalling: modularity & nodes/comm E. Signalling: modularity per species or ancestor

● Bilaterian Metazoa ● 1.0 ● ● Non-bilaterian Metazoa ● ● ● Choanoflagellata ●● ● ● ● ● ● ● ●● ● Capsaspora ● ● ● ● ●● ● ●● ● Ichthyosporea ●● ● ● 0.7 ● Corallochytrium ● ● ●● ● ● Other eukaryotes ● ● ● ● ● ● ● ● rho= 0.365 p=0.001 ● ●  ● ● ● ● ● ● ● ● ● ● 0.8 ● ● count ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ●●● 0.6 ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● louvain modularity louvain ● ● ● ● ●● louvain modularity louvain ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● rho=0.604 p=0 ● ● ● 0.4 0.4 10 20 30 40 ) Fungi Bilateria Cnidaria Porifera Holozoa Dikarya Chordata Metazoa Creolimax Alveolata Aiptasia sp. Protostomia + Ascomycota Holomycota DictyosteliaAmoebozoa ChlorophytaViridiplantae nodes/comunity Capsaspora IchthyosporeaTeretosporea Fonticula Opisthokontaalba Homo sapiens DeuterostomiaLottia gigantea Ichthyophonida Ciona intestinalis Bilateria+Cnidaria Choanoflagellata Naegleria gruberi MnemiopsisOscarella leidyi carmela Neurospora crassa Chlorella variabilis MonosigaSalpingoeca brevicollis rosetta Gonapodya prolifera Unikonta/AmorpheaArabidopsis thaliana Toxoplasma Bigelowiellagondii natans Trichoplax adhaerens SphaeroformaIchthyophonus arctica hoferi Mortierella verticillata Thecamonas trahens Nematostella vectensis Physcomitrella patens EctocarpusPhytophthora siliculosusBikonta/Diaphoratickes infestans Capsaspora owczarzaki Drosophila melanogaster Creolimax fragrantissimaChromosphaera perkinsii Spizellomyces punctatus Dictyostelium discoideum Embryophyta/Land plants Tetrahymena thermophila Saccoglossus kowalevskii Metazoa+ChoanoflagellataSphaeroforma Cryptococcus neoformans PolysphondyliumAcanthamoeba pallidum castellanii Stramenopila/Heterokonta Bilateria+Cnidaria+PlacozoaAmphimedon queenslandica Chlamydomonas reinhardtii CorallochytriumSchizosaccharomyces limacisporum pombe Filozoa (Met+Cho+ SAR (Stram+Alveolata+Rhizaria) Obazoa (Opistho+Apusomonadida) B. Ubiquitination: modularity & nodes/comm F. Ubiquitination: modularity per species or ancestor

0.9 ● 1.0

● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● 0.8 ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● 0.8 ● ● ● ● ● ● ● ● rho= 0.713 p=0 count ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.7 50 ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● louvain modularity louvain ● ● ● ● ● louvain modularity louvain 0.6 ● ● ● ● ●

0.6 ●

rho= 0.549 p=0

0.4

) 5 10 15 Fungi Bilateria Cnidaria Porifera Holozoa Dikarya Chordata Metazoa Creolimax Alveolata Aiptasia sp. Protostomia + Ascomycota Holomycota DictyosteliaAmoebozoa ChlorophytaViridiplantae nodes/comunity Capsaspora IchthyosporeaTeretosporea Fonticula Opisthokontaalba Homo sapiens DeuterostomiaLottia gigantea Ichthyophonida Ciona intestinalis Bilateria+Cnidaria Choanoflagellata Naegleria gruberi MnemiopsisOscarella leidyi carmela Neurospora crassa Chlorella variabilis MonosigaSalpingoeca brevicollis rosetta Gonapodya prolifera Unikonta/AmorpheaArabidopsis thaliana Toxoplasma Bigelowiellagondii natans Trichoplax adhaerens SphaeroformaIchthyophonus arctica hoferi Mortierella verticillata Thecamonas trahens Nematostella vectensis Physcomitrella patens EctocarpusPhytophthora siliculosusBikonta/Diaphoratickes infestans Capsaspora owczarzaki Drosophila melanogaster Creolimax fragrantissimaChromosphaera perkinsii Spizellomyces punctatus Dictyostelium discoideum Embryophyta/Land plants Tetrahymena thermophila Saccoglossus kowalevskii Metazoa+ChoanoflagellataSphaeroforma Cryptococcus neoformans PolysphondyliumAcanthamoeba pallidum castellanii Stramenopila/Heterokonta Bilateria+Cnidaria+PlacozoaAmphimedon queenslandica Chlamydomonas reinhardtii CorallochytriumSchizosaccharomyces limacisporum pombe Filozoa (Met+Cho+ SAR (Stram+Alveolata+Rhizaria) Obazoa (Opistho+Apusomonadida) C. ECM: modularity & nodes/comm G. ECM: modularity per species or ancestor

● 1.0 ●

● ● ● ● 0.8 ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● rho= 0.328 p=0.004 ● ● ● ● ●  0.8 ● ● ● ● ● ● ● ● ● ● ● ● count ● ● ● ● ● ● ● ● ● ● ● ●● 80 ● 0.6 ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ●●●●●●● ● louvain modularity louvain ● ● ● ● ● ● ●●● ● ● louvain modularity louvain ● ● 0.6 ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ●

● ● rho= 0.319 p=0 ● ● ● ● ●  ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 0.4

) 0 10 20 30 Fungi Bilateria Cnidaria Porifera Holozoa Dikarya Chordata Metazoa Creolimax Alveolata Aiptasia sp. Protostomia + Ascomycota Holomycota DictyosteliaAmoebozoa ChlorophytaViridiplantae nodes/comunity Capsaspora IchthyosporeaTeretosporea Fonticula Opisthokontaalba Homo sapiens DeuterostomiaLottia gigantea Ichthyophonida Ciona intestinalis Bilateria+Cnidaria Choanoflagellata Naegleria gruberi MnemiopsisOscarella leidyi carmela Neurospora crassa Chlorella variabilis MonosigaSalpingoeca brevicollis rosetta Gonapodya prolifera Unikonta/AmorpheaArabidopsis thaliana Toxoplasma Bigelowiellagondii natans Trichoplax adhaerens SphaeroformaIchthyophonus arctica hoferi Mortierella verticillata Thecamonas trahens Nematostella vectensis Physcomitrella patens EctocarpusPhytophthora siliculosusBikonta/Diaphoratickes infestans Capsaspora owczarzaki Drosophila melanogaster Creolimax fragrantissimaChromosphaera perkinsii Spizellomyces punctatus Dictyostelium discoideum Embryophyta/Land plants Tetrahymena thermophila Saccoglossus kowalevskii Metazoa+ChoanoflagellataSphaeroforma Cryptococcus neoformans PolysphondyliumAcanthamoeba pallidum castellanii Stramenopila/Heterokonta Bilateria+Cnidaria+PlacozoaAmphimedon queenslandica Chlamydomonas reinhardtii CorallochytriumSchizosaccharomyces limacisporum pombe Filozoa (Met+Cho+ SAR (Stram+Alveolata+Rhizaria) Obazoa (Opistho+Apusomonadida) D. Protein binding: modularity & nodes/comm H. Protein binding: modularity per species or ancestor

● ● 1.0 0.9 ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● 0.8 ● ●● ● ● ● 0.8 ● ● ● ● count ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● rho= 0.847 p=0 ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● 90 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● 0.7 ● ● ● ● ● ● louvain modularity louvain ● ● ● rho= 0.233 p=0 modularity louvain 0.6 ●

0.6

0.4

10 20 30 ) Fungi Bilateria Porifera Dikarya Chordata Cnidaria Metazoa Holozoa Alveolata Aiptasia sp. Creolimax nodes/community Protostomia + Ascomycota Holomycota DictyosteliaAmoebozoa ChlorophytaViridiplantae Capsaspora IchthyosporeaTeretosporea Fonticula Opisthokontaalba Homo sapiens DeuterostomiaLottia gigantea Ichthyophonida Ciona intestinalis Bilateria+Cnidaria Choanoflagellata Naegleria gruberi MnemiopsisOscarella leidyi carmela Neurospora crassa Chlorella variabilis MonosigaSalpingoeca brevicollis rosetta Gonapodya prolifera Unikonta/AmorpheaArabidopsis thaliana Toxoplasma Bigelowiellagondii natans Trichoplax adhaerens SphaeroformaIchthyophonus arctica hoferi Mortierella verticillata Thecamonas trahens Nematostella vectensis Physcomitrella patens EctocarpusPhytophthora siliculosusBikonta/Diaphoratickes infestans Capsaspora owczarzaki Drosophila melanogaster Creolimax fragrantissimaChromosphaera perkinsii Spizellomyces punctatus Dictyostelium discoideum Embryophyta/Land plants Tetrahymena thermophila Saccoglossus kowalevskii Metazoa+ChoanoflagellataSphaeroforma Cryptococcus neoformans PolysphondyliumAcanthamoeba pallidum castellanii Stramenopila/Heterokonta Bilateria+Cnidaria+PlacozoaAmphimedon queenslandica Chlamydomonas reinhardtii CorallochytriumSchizosaccharomyces limacisporum pombe Filozoa (Met+Cho+ SAR (Stram+Alveolata+Rhizaria) Obazoa (Opistho+Apusomonadida)

Figure 8—figure supplement 1. Mo dularity of protein domain co-occurrence networks of multicellularity-related gene sets across eukaryotes. (A–D) Modularity and community size of the functional sub-networks based on domains related to signaling (Richter and King, 2013), ubiquitination (Grau- Figure 8—figure supplement 1 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 23 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 8—figure supplement 1 continued Bove´ et al., 2015), ECM (Richter and King, 2013; Sebe´-Pedro´s et al., 2010; Hynes, 2012) and protein binding (Mitchell et al., 2015) functions. Blue dots indicate real genomes, and the purple density plot indicates the values of the simulated rewired networks. Monotonic dependence between modularity and domains/community was tested for each set of data using Spearman’s rank correlation coefficient (s); linear regression fits are included for clarity. All sub-networks (A–D) have the same decreasing trend as the global sub-network of Figure 8A–B, and contrast with the results for TFs. (E– H) Modularity of the functional sub-network based on domains related to signaling, ubiquitination, ECM and protein binding functions (same domains as in A-D), per species or ancestral node (colored dots). Purple box plots represent the distribution of simulated modularities from 100 rewirings of the original organism-specific networks, while keeping a constant vertex degree distribution. Note the decreased modularities shown by metazoans (red and pink) in all sub-networks. DOI: 10.7554/eLife.26036.033

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 24 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

A. TF domain co-occurrence in Metazoa B. ECM domain co-occurrence in Metazoa

Integrin , FG-GAP Selected  domain clusters Independent Collagen clusters MH1-MH2 Promiscuous C4 clusters COLFI Co-ocurrence bZIP frequency Jun, pKID EGF-CA Integrin , EGF_2, Laminin_EGF/G1/G2, PSI, Integrin  cytoplasmic... none high p300/CBP P53 zf-TAZ CUB, hEGF, cEGF, LDL P53_tetramer receptor a/b, Astacin, VWD, HAT_KAT11 SAM_2 Bromodomain NIDO, SRCR, Sushi... ZZ HMG-box EGF DUF902 SOXp KIX Delta receptor DSL, HMG_box_2 Notch, Cadherin... Rtt106 FERM proteins SSrecog FERM_C/M/N Talin domains STAT Talin_middle, FERM_C/M/N, STAT_bind VBS, IRS, FERM_f0, I_LWEQ STAT_alpha STAT_int Homeobox SH2 LIM HNF 1_N Fibronectin type III OAR Laminin_G3/N/EGF, Kringle, PAX Protein tyrosine kinase Y phosphatase, Pkinase-Tyr, PBC Fz, fn3, immunoglobulins, growth factor immunoglobulins (I-set, Ig2/3), Pou and ephrin receptors, Kringle, Focal_AT... GPCR (7tm_2, GPS)... SIX1_SD

Clustering of protein domain co-occurrences in TFs Clustering of protein domain co-occurrences in ECM proteins (Spearman corr. distance, Ward clustering) (Spearman corr. distance, Ward clustering)

C. LIM Homeobox phylogeny D. TAZ zinc finger acetylase phylogeny E. Type IV Collagen phylogeny

0.82/- 'Bilateria' 1/100 Metazoa 0.40/75 Metazoa ZfHX Homo sapiens Amoebidium Apar_comp20622 Acropora digitifera Collagen IV 2/4/6 Ichthyophonus Ihof_evm2s487 Adig_578v103049  0.99/100 Pirum, Abeoforma, Creolimax, Sphaeroforma ( ) Capsaspora 0.99/99 Homo sapiens 'Bilateria' p300/CBP Collagen IV 1/3/5 Homo sapiens (KAT3A/B) Chromosphaera Nk52_evm60s152 0.97/73 0.92/93 Domain C4-A Metazoa LIM Lhx2/9 0.57/84 0.99/100 0.93/87 0.99/99 0.47/80 ( ) Leucosolenia , Metazoa LIM Lhx6/8 Salpingoca Sros_PTSG_00169 Sycon , Oscarella , 0.82/90 1/100 Monosiga Mbre_XP_001747336 Unicellular and Trichoplax 1/100 Holozoa BA Metazoa LIM Islet Creolimax, Sphaeroforma, 0.75/93 0.99/97 n collagen Capsaspora Cowc_CAOG_10010 Icthyophonus, Pirum, Ministeria vibrans repeats Abeoforma Mvib_comp15306 0.92/95 1/100 Metazoa ZfHX 0.98/82 'Bilateria' Corallochytrium Clim_evm6s218 Supports: 0.28/- BI BPP / ML UFBS 0.99/100 Metazoa LIM Lhx3/4 0.25/- Domain C4-B 0.90/97 Corallochytrium Clim_evm142s210 Distance 0.3: Homo sapiens 0.69/91 Chromosphaera Nk52_evm98s2192 Collagen IV 2/4/6 Capsaspora Cowc_CAOG_04903 Metazoa LIM Lhx1/5 0.83/75 Prot. domains: 1/100 1/100 0.99/99 0.76/60 Supports: 0.93/92 p300/CBP-like ( ) HAT/KAT11 Metazoa LIM Lmx Arabidopsis HAC1/4/5/12 zf TAZ 0.92/97 Supports: 1/100 BI BPP / ML UFBS ZZ Homo sapiens BI BPP / ML UFBS Distance 1.0 Arabidopsis HAC2 Collagen IV 1/3/5 0.99/98 KIX/KIX2 0.77/88  Distance 0.4 1/71 Metazoa ZfHX Volvox, Micromonas, Prot. domains: 0.80/96 Chlamydomonas ( ) Bromodomain Leucosolenia , Prot. domains: 0.58/36 0.70/100 Homeobox DUF902 Stramenopila Sycon , Oscarella , C4 domain LIM 0.97/93 p300/CBP-like Creb binding and Trichoplax Collagen Metazoa POU POU 0.99/97 PHD 0.75/93 repeat 1/100 Outgroup Emiliania huxleyi 0.70/- Ministeria vibrans Mvib_comp15306

Figure 9. Phylogenetic analysis of the premetazoan gene families LIM Homeobox, CBP/p300 and type IV collagen. (A and B) Protein domain co- occurrence matrices of transcription factor (TF) (A) or extracellular matrix (ECM)-related gene families (B), inferred at the LCA of Metazoa (90% probability). Horizontal and vertical axes of the heatmap represent individual protein domains and their mutual co-occurrence frequency, and have been clustered according to the number of shared domains (dendrogram based on Spearman correlation distances and Ward clustering algorithm). Note that, for TFs, most co-occurrence clusters are located along the diagonal, indicating isolated domain communities; whereas ECM genes tend to contain promiscuous domains shared in multiple domain co-occurrence communities. Representative examples of independent and promiscuous domain clusters have been highlighted in both heat maps (orange and pink, respectively). (C) Phylogenetic tree of LIM Homeobox TFs, with mapped protein domains architectures. (D) Phylogenetic tree of CBP/p300 TFs based on HAT/KAT11 domain, with mapped consensus protein domain architectures. (E) Phylogeny of type IV collagen genes based on the C4 domain. All extant homologs, from Ministeria to animals, have a C4-C4 dual arrangement of filozoan origin (reflected in the phylogeny by two parallel clades representing the first and second domains within each gene). Ministeria (orange) and human (blue) homologs are highlighted. In C, D and E panels, bold branches represent unicellular holozoan genes and are color-coded by taxonomic assignment. All trees are Bayesian inferences (BI). Protein domain architectures and statistical supports (BPP/UFBS) are shown for selected nodes (see Figure 9—figure supplement 1 for the complete BI and ML trees with statistical supports). Figure 7—source data 1 and 2. DOI: 10.7554/eLife.26036.034

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 25 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

89.2/100Beetle_Ap2_Lhx29_LIM Human_ZFHX3_HD1_Zfhx_ZF 0/78Beetle_Ap1_Lhx29_LIM Human_ZFHX4_HD1_Zfhx_ZF 78.4/97Human_LHX2_Lhx29_LIM Amphioxus_Zfhx_HD1_Zfhx_ZF Amphioxus_Lhx29b_Lhx29_LIM 1 100/10086.3/98 Nematode_ZC1233_HD1_Zfhx_ZF Nematode_ttx3_Lhx29_LIM 0,3923 Fruitfly_zfh2_HD1_Zfhx_ZF 39.7/47 Amphioxus_Lhx29a_Lhx29_LIM Beetle_Zfh2_HD1_Zfhx_ZF 96/84 Human_LHX8_Lhx68_LIM 0,112 Pgem_Unigene35461_Pirum_gemmatam21889_226-280 Human_LHX6_Lhx68_LIM 0,0434 Awhi_Unigene35993_m23114_79-135 95.8/992.2/59 Amphioxus_Lhx68_Lhx68_LIM 0,9898 Apar_comp20622_c1_seq1m26656_165-221 74.6/94 Nematode_lim4_Lhx68_LIM Ihof_3133_195-251 75.6/72 87.2/90 Fruitfly_Awh_Lhx68_LIM Awhi_Unigene16655_m10371_50-106 0,7112 Beetle_Awh_Lhx68_LIM 0,3138 Awhi_Unigene17928_m11854_42-98 Cowc_CAOG_10010_132-188 0,0674 0,6903 Pgem_Unigene8381_Pirum_gemmatam5304_39-95 78.7/68 82.7/99 Amphioxus_Isl_Isl_LIM 0,9496 Pgem_Unigene166_Pirum_gemmatam160_70-125 52.4/99 Beetle_Tup_Isl_LIM Pgem_Unigene18219_Pirum_gemmatam11852_118-174 75.3/98 Human_ISL2_Isl_LIM 0,2604 0,9996 Cfra_3283T1_30-86 99.7/100 Human_ISL1_Isl_LIM 0,1734 Sarc_SARC_11902_24-80 Nematode_lim7_Isl_LIM 0,0948 0,2842 Pgem_Unigene35461_Pirum_gemmatam21889_129-185 A. 88.2/86 Awhi_Unigene16655_m10371_50-106 Cowc_CAOG_04648_331-386 LIM Homeobox: ML/BI 53.7/70 Awhi_Unigene17928_m11854_42-98 Nk52_evm60s152_93-149 0/17 84.1/82 Pgem_Unigene166_Pirum_gemmatam160_70-125 Amphioxus_Lhx29b_Lhx29_LIM 89.1/92 Pgem_Unigene8381_Pirum_gemmatam5304_39-95 Nematode_ttx3_Lhx29_LIM Gene Protein_domain_architecture Pgem_Unigene18219_Pirum_gemmatam11852_118-174 0,0506 Human_LHX2_Lhx29_LIM 83.2/56 0,9999 Apar_comp20622_c1_seq1m26656 LIM LIM Homeobox 99.5/100 Cfra_3283T1_30-86 Beetle_Ap2_Lhx29_LIM Awhi_Unigene16655_m10371 Homeobox 73.9/60 Sarc_SARC_11902_24-80 0,5752 Beetle_Ap1_Lhx29_LIM Awhi_Unigene35993_m23114 Homeobox 69.1/55 Cowc_CAOG_04648_331-386 Amphioxus_Lhx29a_Lhx29_LIM 0/31 Pgem_Unigene35461_Pirum_gemmatam21889_129-185 Human_LHX6_Lhx68_LIM Awhi_Unigene17928_m11854 Homeobox 78.1/3486.1/31 0,9827 Cfra_3283T1 Homeobox Nk52_evm60s152_93-149 0,9897 Human_LHX8_Lhx68_LIM Clim_evm6s218 LIM LIM Homeobox Awhi_Unigene35993_m23114_79-135 Amphioxus_Lhx68_Lhx68_LIM 76.9/74 Human_ZFHX3_HD1_Zfhx_ZF 0,8166 Nematode_lim4_Lhx68_LIM Cowc_CAOG_10010 LIM Homeobox Human_ZFHX4_HD1_Zfhx_ZF Fruitfly_Awh_Lhx68_LIM Cowc_CAOG_04648 Homeobox 41.2/67 17.9/31 75.9/77 Amphioxus_Zfhx_HD1_Zfhx_ZF 0,2614 Beetle_Awh_Lhx68_LIM Ihof_emv2s487/3133 LIM LIM Homeobox Nematode_ZC1233_HD1_Zfhx_ZF 0,0715 Amphioxus_Isl_Isl_LIM Nk52_evm60s152 LIM Homeobox 98.8/10079.3/87 Fruitfly_zfh2_HD1_Zfhx_ZF Beetle_Tup_Isl_LIM Pgem_Unigene166_Pirum_gemmatam160 Homeobox 85.1/75 Beetle_Zfh2_HD1_Zfhx_ZF 1 Human_ISL2_Isl_LIM Pgem_Unigene18219_Pirum_gemmatam11852 Homeobox Pgem_Unigene35461_Pirum_gemmatam21889_226-280 0,333 Human_ISL1_Isl_LIM Pgem_Unigene8381_Pirum_gemmatam5304 Homeobox Clim_evm6s218_183-239 Nematode_lim7_Isl_LIM Pgem_Unigene35461_Pirum_gemmatam21889 Homeobox Homeobox Human_LHX3_Lhx34_LIM Cowc_CAOG_10010_132-188 84.6/92 Human_LHX4_Lhx34_LIM Beetle_Zfh2_HD4_Zfhx_ZF Sarc_SARC_11902 Homeobox 84.5/91 Beetle_Lim3_Lhx34_LIM Nematode_ZC1233_HD3_Zfhx_ZF 10.9/32 71/9075.7/100 Fruitfly_Lim3_Lhx34_LIM 1 Human_ZFHX2_HD3_Zfhx_ZF 78.2/97 Amphioxus_Lhx34_Lhx34_LIM Human_ZFHX4_HD4_Zfhx_ZF Nematode_ceh14_Lhx34_LIM 0,2839 Amphioxus_Zfhx_HD4_Zfhx_ZF 71.1/91 86.1/98 Beetle_Lim1_Lhx15_LIM 0,3546 Human_ZFHX3_HD4_Zfhx_ZF 62.8/91 Fruitfly_Lim1_Lhx15_LIM Clim_evm6s218_183-239 0/33 Nematode_mec3_Lhx15_LIM Human_LHX3_Lhx34_LIM 99.2/99 Nematode_lin11_Lhx15_LIM Human_LHX4_Lhx34_LIM Amphioxus_Lhx15_Lhx15_LIM 0,9092 Beetle_Lim3_Lhx34_LIM Metazoa 98.2/99 91.7/93 Amphioxus_Lmx_Lmx_LIM Fruitfly_Lim3_Lhx34_LIM 85.8/89 Human_LMX1A_Lmx_LIM Amphioxus_Lhx34_Lhx34_LIM Beetle_Lmxa_Lmx_LIM 0,688 Nematode_ceh14_Lhx34_LIM Choanoflagellata 72/8481.4/100 Fruitfly_CG4328_Lmx_LIM Beetle_Lim1_Lhx15_LIM 83.7/33 57.9/86 Beetle_Lmxb_Lmx_LIM Fruitfly_Lim1_Lhx15_LIM Filasterea 98.5/100 Fruitfly_CG32105_Lmx_LIM 1 Nematode_mec3_Lhx15_LIM Nematode_lim6_Lmx_LIM 0,9882 Amphioxus_Lhx15_Lhx15_LIM Ichthyosporea 97.5/97 Beetle_Zfh2_HD3_Zfhx_ZF 1 Nematode_lin11_Lhx15_LIM 39.9/67 Fruitfly_zfh2_HD3_Zfhx_ZF Amphioxus_Lmx_Lmx_LIM Corallochytrium 0/62 Human_ZFHX2_HD2_Zfhx_ZF Human_LMX1A_Lmx_LIM Human_ZFHX3_HD3_Zfhx_ZF Beetle_Lmxa_Lmx_LIM 96.9/10079.5/76 Human_ZFHX4_HD3_Zfhx_ZF 1 Fruitfly_CG4328_Lmx_LIM Supports: 87.2/71 Amphioxus_Zfhx_HD3_Zfhx_ZF Beetle_Lmxb_Lmx_LIM 99.9/100 81.5/100 Beetle_Zfh1_Zeb_ZF Fruitfly_CG32105_Lmx_LIM ML: SH-aLRT/UFBS 81.5/76 Nematode_zag1_Zeb_ZF Nematode_lim6_Lmx_LIM BI: BPP Fruitfly_zfh1_Zfhx_ZF Human_ZFHX3_HD3_Zfhx_ZF 63.3/36 94.1/99 Beetle_Zfh2_HD4_Zfhx_ZF Human_ZFHX4_HD3_Zfhx_ZF 77.5/75 Nematode_ZC1233_HD3_Zfhx_ZF 1 Human_ZFHX2_HD2_Zfhx_ZF Distance: 0/58 Human_ZFHX2_HD3_Zfhx_ZF Beetle_Zfh2_HD3_Zfhx_ZF 64.4/83 Human_ZFHX4_HD4_Zfhx_ZF Fruitfly_zfh2_HD3_Zfhx_ZF ML: 0.5 98.6/100 Amphioxus_Zfhx_HD4_Zfhx_ZF 0,5892 Amphioxus_Zfhx_HD3_Zfhx_ZF Human_ZFHX3_HD4_Zfhx_ZF Beetle_Zfh1_Zeb_ZF BI: 0.2 Apar_comp20622_c1_seq1m26656_165-221 0,7011 Nematode_zag1_Zeb_ZF 71.2/100 Ihof_3133_195-251 Fruitfly_zfh1_Zfhx_ZF 0/89 Human_POU3F1_Pou3_POU 0,2647 Human_POU3F1_Pou3_POU 79.2/94 Human_POU3F3_Pou3_POU 0,5268 Human_POU3F3_Pou3_POU 0/36 Human_POU3F2_Pou3_POU 0,1938 Human_POU3F2_Pou3_POU 78.8/72 Human_POU3F4_Pou3_POU 0,4757 Human_POU3F4_Pou3_POU 0/6389/75 Fruitfly_vvl_Pou3_POU 0,9997 Human_POU5F1_Pou5_POU Beetle_Vvl_Pou3_POU 0,2173 Human_POU5F2_Pou5_POU 0/75 Amphioxus_POU3_Pou3_POU 0,21980,8446 Fruitfly_vvl_Pou3_POU 82.2/67 Amphioxus_POU3L_Pou3_POU 0,3488 Beetle_Vvl_Pou3_POU 99.2/99 Amphioxus_POU1_Pou1_POU Amphioxus_POU3_Pou3_POU 79.5/65 Human_POU1F1_Pou1_POU 0,1521 Amphioxus_POU3L_Pou3_POU Nematode_ceh6_Pou3_POU Nematode_ceh6_Pou3_POU 0/56 78.7/90 Beetle_Pdm3b_Pou6_POU 0,8908 Fruitfly_nub_Pou2_POU 93/90 Beetle_Pdm3a_Pou6_POU 0,3276 0,707 Fruitfly_pdm2_Pou2_POU 92.1/89 Amphioxus_POU6_Pou6_POU 0,5263 Human_POU2F1_Pou2_POU 100/100 Human_POU6F2_Pou6_POU 0,597 Human_POU2F2_Pou2_POU Human_POU6F1_Pou6_POU Beetle_Nub_Pou2_POU 85.1/77 0,3415 0,95940,4963 81.5/82 Fruitfly_nub_Pou2_POU Amphioxus_POU2_Pou2_POU 81.8/78 Fruitfly_pdm2_Pou2_POU Human_POU2F3_Pou2_POU Human_POU2F1_Pou2_POU 0,9931 Amphioxus_POU1_Pou1_POU 76.6/5641.7/69 Human_POU1F1_Pou1_POU 89.7/87 Human_POU2F2_Pou2_POU 0,8912 91.3/83 Human_POU2F3_Pou2_POU 0,9398 Beetle_Pdm3b_Pou6_POU 0/61 Beetle_Nub_Pou2_POU 0,9814 Beetle_Pdm3a_Pou6_POU Amphioxus_POU2_Pou2_POU 0,9911 Amphioxus_POU6_Pou6_POU 1 91.7/99 Human_POU5F1_Pou5_POU Human_POU6F2_Pou6_POU Human_POU5F2_Pou5_POU 0,4322 Human_POU6F1_Pou6_POU 14.7/37 0/96 Fruitfly_acj6_Pou4_POU 0,3642 Fruitfly_acj6_Pou4_POU 0/90 Nematode_unc86_Pou4_POU 0,3661 Nematode_unc86_Pou4_POU 91/96 Beetle_Acj6l_Pou4_POU 0,7068 Beetle_Acj6l_Pou4_POU 99.9/100 83.6/94 Beetle_Acj6_Pou4_POU 1 0,4373 Beetle_Acj6_Pou4_POU 79.2/88 Amphioxus_POU4_Pou4_POU 0,9928 Amphioxus_POU4_Pou4_POU 96.9/99 Human_POU4F2_Pou4_POU 0,5518 Human_POU4F2_Pou4_POU Human_POU4F3_Pou4_POU Human_POU4F3_Pou4_POU Nematode_C02F1210_Pou2_POU Nematode_C02F1210_Pou2_POU

99.9/100 Aque_XP_003388280__845-1143 1 Aque_XP_003388280__845-1143 100/100 Aque_XP_003391523__9-178 1 Aque_XP_003391523__9-178 B. p300/CBP: ML/BI 89.3/99 Aque_XP_003383601__1083-1365 0,9959 Aque_XP_003383601__1083-1365 94.5/100 Clva_128m174__285-452 1 Clva_128m174__285-452 16.5/56 Clva_31835m29468__266-486 0,5939 Clva_31835m29468__266-486 93.6/92 Ocar_g9389_t1__263-567 0,9357 Ocar_g9389_t1__263-567 76.9/37 Ccan_2623m2929__443-739 0,4616 Ccan_2623m2929__443-739 98.7/100 Adig_578v103049__1070-1244 1 Adig_578v103049__1070-1244 68/38 Nvec_XP_001632816__925-1225 0,2942 Nvec_XP_001632816__925-1225 Cele_R10E11_1b__1142-1428 Cele_R10E11_1b__1142-1428 Metazoa Scil_112574__20-121 Lcom_15013__943-1194 43.8/16 100/100 0,1151 0,4751 0/75 Scil_113510__8-112 1 Scil_44631__394-640 Choanoflagellata 100/100 Scil_44631__394-640 1 Scil_112574__20-121 88.5/43 Lcom_15013__943-1194 0,8867 Scil_113510__8-112 Filasterea 82.8/33 Hmag_XP_002156492__1216-1512 0,486 Hmag_XP_002156492__1216-1512 Cint_ENSCINP00000018331__938-1237 Cint_ENSCINP00000018331__938-1237 77/61 0,4237 Ichthyosporea 99/100 Hsap_ENSP00000263253__1306-1609 1 Hsap_ENSP00000263253__1306-1609 86.1/94 Hsap_ENSP00000441978__1372-1670 0,7894 Hsap_ENSP00000441978__1372-1670 Corallochytrium 41/42 Bflo_126580__2484-2800 0,4781 Bflo_126580__2484-2800 61.1/87 Skow_XP_002741545__73-371 0,7586 Skow_XP_002741545__73-371 82.7/7390.1/71 Spur_XP_782558__1593-1899 0,96530,5794 Spur_XP_782558__1593-1899 Supports: 85.8/98 Ctel_224799__1204-1524 0,9945 Ctel_224799__1204-1524 80.2/97 Lgig_136183__903-1202 0,8716 Lgig_136183__903-1202 ML: SH-aLRT/UFBS 95.1/100 Dmel_FBpp0291862__1970-2282 0,9998 Dmel_FBpp0291862__1970-2282 90.7/87 Dpul_64953__983-1290 0,9318 Dpul_64953__983-1290 BI: BPP Tadh_XP_002116995__416-664 Tadh_XP_002116995__416-664 Lcom_19261__815-1073 Lcom_19261__815-1073 67.2/80 97.4/100 0,4695 1 Distance: 78.8/73 Scil_8584__953-1149 0,9139 Scil_8584__953-1149 Mlei_ML274431a__735-1020 Mlei_ML274431a__735-1020 ML: 0.3 98.3/100 Mbre_XP_001747336__913-1118 1 Mbre_XP_001747336__913-1118 Sros_PTSG_00169__1244-1457 Sros_PTSG_00169__1244-1457 BI: 0.3 95.8/100 Apar_comp22018_c3_seq2m31906__341-532 0,9999 Apar_comp22018_c3_seq2m31906__341-532 46.6/77 Ihof_evm1s19__1329-1482 0,5848 Ihof_evm1s19__1329-1482 99/100 Apar_comp22018_c5_seq1m31910__1-85 1 Apar_comp22018_c5_seq1m31910__1-85 66/69 Ihof_evm1s874__632-792 0,8236 Ihof_evm1s874__632-792 75.9/49 99.8/100 Cfra_5129T1__1421-1654 1 Cfra_5129T1__1421-1654 43.4/65 Sarc_SARC_05971__1562-1803 0,9987 0,383 Sarc_SARC_05971__1562-1803 93.1/100 Apar_comp21739_c1_seq1m30766__542-694 0,9999 Apar_comp21739_c1_seq1m30766__542-694 74.7/69 99.9/100 Apar_comp21739_c1_seq2m30769__5-115 0,7032 1 Apar_comp21739_c1_seq2m30769__5-115 Ihof_evm7s240__125-316 Ihof_evm7s240__125-316 75.2/6491.8/98 Awhi_Unigene32540_m19981__82-240 0,4381 0,997 Awhi_Unigene32540_m19981__82-240 71.6/44 Pgem_Unigene32728_Pirum_gemmatam19855__168-290 Pgem_Unigene32728_Pirum_gemmatam19855__168-290 100/100 Cfra_2068T1__1370-1531 1 Cfra_2068T1__1370-1531 95.6/82 Sarc_SARC_16065_08769__6-68 0,9793 Sarc_SARC_16065_08769__6-68 99.3/100 Awhi_Unigene7112_m4152__225-423 1 Awhi_Unigene7112_m4152__225-423 95.1/98 Awhi_Unigene7113_m4154__225-424 0,9939 Awhi_Unigene7113_m4154__225-424 95.1/97 Pgem_Unigene33496_Pirum_gemmatam20228__689-851 0,2568 Pgem_Unigene33496_Pirum_gemmatam20228__689-851 91.4/99 Mvib_comp11329_c0_seq1_m15180__165-397 0,9979 Mvib_comp11329_c0_seq1_m15180__165-397 88/60 Mvib_comp15396_c1_seq1_m34759__42-221 0,7672 Mvib_comp15396_c1_seq1_m34759__42-221 Clim_evm142s210__1559-1764 Clim_evm142s210__1559-1764 0,2277 99.4/100 Nk52_evm98s2192__941-1146 1 Nk52_evm98s2192__941-1146 89.1/75 Sdes_comp15749_c0_seq1m4796__189-392 0,8301 Sdes_comp15749_c0_seq1m4796__189-392 Cowc_CAOG_04903__1143-1365 Cowc_CAOG_04903__1143-1365 99/94 Atha_AT1G55970__895-1098 0,9896 Atha_AT1G55970__895-1098 77.8/53 Atha_AT3G12980__1101-1317 0,4118 Atha_AT3G12980__1101-1317 98.7/100 Smoe_XP_002983745__995-1206 1 Smoe_XP_002983745__995-1206 6.4/38 Smoe_XP_002990513__442-673 0,1838 Smoe_XP_002990513__442-673 80.6/70 Bdis_1g34530__1086-1308 1 Smoe_XP_002964262__801-948 46.4/40 Bdis_3g03330__1137-1354 0,3541 Smoe_XP_002989547__224-436 99.8/100 Smoe_XP_002964262__801-948 0,5595 Bdis_1g34530__1086-1308 90.6/92 Smoe_XP_002989547__224-436 0,9305 Bdis_3g03330__1137-1354 93.1/100 Atha_AT1G16710__1130-1346 0,9954 Atha_AT1G16710__1130-1346 60.3/8952.9/76 Atha_AT1G79000__1121-1337 0,49480,6642 Atha_AT1G79000__1121-1337 99.3/100 Ppat_XP_001775575__1051-1271 1 Ppat_XP_001775575__1051-1271 74.2/95 Ppat_XP_001785357__517-736 0,7072 Ppat_XP_001785357__517-736 Atha_AT1G67220__821-1033 Atha_AT1G67220__821-1033 84.1/95 100/100 Bdis_1g23570__1944-2157 0,8878 1 Bdis_1g23570__1944-2157 Bdis_1g23570__717-930 Bdis_1g23570__717-930 94.4/98 Mpus_Micpu60218__698-869 0,9953 Mpus_Micpu60218__698-869 100/100 Crei_XP_001692031__554-721 1 Crei_XP_001692031__554-721 Vcar_XP_002958386__768-926 Vcar_XP_002958386__768-926 100/100Esil_CBN77397__3-126 1 Esil_CBN77397__3-126 88.3/95 Esil_CBN77398__2-91 0,8589 Esil_CBN77398__2-91 86.4/93 90.6/97 Esil_CBN77394__696-856 0,9632 0,9998 Esil_CBN77394__696-856 58/77 Tpse_XP_002292623__346-616 0,7597 Tpse_XP_002292623__346-616 99.2/100 Tpse_XP_002296425__1130-1288 1 Tpse_XP_002296425__1130-1288 77.7/82 Tpse_XP_002296560__188-343 0,8476 Tpse_XP_002296560__188-343 84.3/96 99.9/100 Alai_CCA24581__1445-1684 0,8027 1 Alai_CCA24581__1445-1684 86/93 Pinf_XP_002905234__1649-1845 0,9709 Pinf_XP_002905234__1649-1845 100/100 Alai_CCA27731__1119-1424 1 Alai_CCA27731__1119-1424 95.1/97 Pinf_XP_002998489__495-748 0,9987 Pinf_XP_002998489__495-748 79.2/89 Pinf_XP_002903949__738-864 0,9105 Pinf_XP_002903949__738-864 Pinf_XP_002997957__514-671 Pinf_XP_002997957__514-671 98.7/100 Ehux_441758__1-100 1 Ehux_441758__1-100 Ehux_469732__810-972 Ehux_469732__810-972

59.3/96 Lcom_52947_214_317 0,8645 Hsap_ENSP00000257309_1491_1594 81.6/92 Lgig_116244_3_106 0,5929 Hsap_ENSP00000379866_1467_1570 Ctel_226710_427_529 0,8284 Cint_ENSCINP00000015996_1541_1644 C. Type IV Collagen: ML/BI 79/82 50.8/85 Dpul_306823_280_383 0,368 Hsap_ENSP00000361290_1470_1572 82.1/88 Dmel_FBpp0078640_1557_1660 0,7852 Lcom_52947_214_317 85.5/85 Skow_XP_002742266_215_318 0,70720,8936 Lgig_116244_3_106 76.5/100 Hsap_ENSP00000257309_1491_1594 0,6742 Ctel_226710_427_529 86/93 92/90 Hsap_ENSP00000379866_1467_1570 Dpul_306823_280_383 Hsap_ENSP00000361290_1470_1572 0,8169 Skow_XP_002742266_215_318 69.4/51 Cint_ENSCINP00000015996_1541_1644 0,6814 Dmel_FBpp0078640_1557_1660 Spur_NP_999676_1528_1631 Metazoa 77.3/64 100/100 Nvec_XP_001620257_258_329 0,3699 Nvec_XP_001621603_477_551 0,5158 Ctel_200264_621_725 Ctel_227631_611_713 Choanoflagellata 82.7/3376.8/71 Adig_5493v122704_256_359 0,3056 1 Nvec_XP_001626265_5_108 1 Lgig_154576_1232_1335 Dpul_306783_268_373 0,2707 Lgig_229256_322_425 Filasterea Tadh_XP_002116296_1292_1395 24.5/89 Hsap_ENSP00000331902_1469_1574 84.4/37 Hsap_ENSP00000364979_1447_1552 Dpul_306783_268_373 86.3/99 Hsap_ENSP00000331902_1469_1574 Ichthyosporea 70.8/84 Hsap_ENSP00000379823_1447_1552 0,4458 Skow_XP_002742276_1459_1564 0,9944 Hsap_ENSP00000364979_1447_1552 78.2/92 0,8279 0,2772 Hsap_ENSP00000379823_1447_1552 Corallochytrium 84.2/68 Cint_ENSCINP00000016086_1525_1630 Skow_XP_002742276_1459_1564 Spur_NP_999631_1530_1635 47.5/7966.4/87 0,3455 0,657 Hmag_XP_002157001_1504_1606 Supports: 82.9/72 Hmag_XP_002157001_1504_1606 0,5428 Dmel_FBpp0078672_1517_1622 Lgig_188217_403_508 0,5825 Lgig_188217_403_508 61.7/98 Ctel_200264_621_725 Spur_NP_999631_1530_1635 ML: SH-aLRT/UFBS Ctel_227631_611_713 0,9227 0,5172 83.5/95 96.1/100 1 Nvec_XP_001620257_258_329 BI: BPP 99.8/100 Lgig_154576_1232_1335 0,8619 Nvec_XP_001621603_477_551 89.7/63 Lgig_229256_322_425 0,8043 Adig_5493v122704_256_359 Dmel_FBpp0078672_1517_1622 87.2/93 43.2/67 0,2908 Nvec_XP_001626265_5_108 Distance: Tadh_XP_002116296_1292_1395 Cint_ENSCINP00000016086_1525_1630 Spur_NP_999676_1528_1631 0,4469 1 Lcom_16924_1306_1411 ML: 0.4 100/100 Lcom_16924_1306_1411 Scil_12346_1313_1418 BI: 0.4 22.7/75 Scil_12346_1313_1418 1 Lcom_13694_1314_1418 98/100 Lcom_13694_1314_1418 0,4507 0,9921 Scil_10989_1312_1416 Scil_10989_1312_1416 Lcom_4786_1302_1406 88/99 0,9732 0,9999 79.5/77 94.6/100 Lcom_4786_1302_1406 Scil_11320_1319_1423 49.5/85 Scil_11320_1319_1423 0,7947 Ocar_g10977_t1_628_690 94.7/82 Ocar_g10977_t1_628_690 Ocar_g1079_t1_255_358 Ocar_g1079_t1_255_358 0,4072 Tadh_XP_002116198_68_171 Tadh_XP_002116198_68_171 Ocar_g4577_t1_104_221 83.4/93 0,7531 1 84.7/98 Lcom_73506_435_535 Ocar_g4577_t1_318_432 88.2/95 Lcom_102744_227_326 0,9766 Lcom_73506_435_535 40.5/72 Mvib_comp15306_c0_seq1_m34021_944_1043 0,9282 Lcom_102744_227_326 99.9/100 Ocar_g4577_t1_104_221 Mvib_comp15306_c0_seq1_m34021_944_1043 Ocar_g4577_t1_318_432 0,5975 Ctel_200264_730_837 Ctel_227631_719_824 63/99 Ctel_200264_730_837 0,9954 Ctel_227631_719_824 1 Lgig_154576_1340_1447 97.3/99 0,2623 Lgig_229256_430_537 100/100 Lgig_154576_1340_1447 69.5/75 Lgig_229256_430_537 0,3388 Ctel_226710_534_645 Ctel_226710_534_645 0,8095 Lcom_52947_322_433 50.1/47 0,7316 Lgig_116244_111_222 56.7/99 Lcom_52947_322_433 60.6/4879.4/64 Lgig_116244_111_222 Dmel_FBpp0078640_1665_1776 Dmel_FBpp0078640_1665_1776 0,4201 0,9921 Hsap_ENSP00000257309_1599_1710 Spur_NP_999676_1637_1745 0,9926 Hsap_ENSP00000379866_1575_1687 85.2/64 Hsap_ENSP00000257309_1599_1710 0,4555 Hsap_ENSP00000361290_1577_1689 88.3/100 0,40030,5102 Cint_ENSCINP00000015996_1649_1759 89.1/100 Hsap_ENSP00000379866_1575_1687 0,1815 Skow_XP_002742266_323_433 67.1/62 Hsap_ENSP00000361290_1577_1689 Spur_NP_999676_1637_1745 19.7/74 Cint_ENSCINP00000015996_1649_1759 0,2023 Dpul_306823_388_498 94.8/38 Skow_XP_002742266_323_433 0,9201 Adig_5493v122704_363_474 90.8/100 Lcom_13694_1424_1532 0,2312 0,9973 Hmag_XP_002157001_1611_1720 100/100 Scil_10989_1421_1532 Nvec_XP_001626265_113_223 74.4/37 86/98 Lcom_4786_1411_1520 Cint_ENSCINP00000016086_1635_1744 31.8/34 Scil_11320_1428_1538 1 Lcom_16924_1416_1522 Dpul_306823_388_498 0,9993 Scil_12346_1423_1529 87.5/36 85.7/96 Adig_5493v122704_363_474 0,3174 Ocar_g1079_t1_363_475 Hmag_XP_002157001_1611_1720 0,3455 97.2/99 0,8556 Dpul_306783_378_487 Nvec_XP_001626265_113_223 0,1345 Dmel_FBpp0078672_1627_1735 Cint_ENSCINP00000016086_1635_1744 0,11230,3149 Skow_XP_002742276_1569_1678 99.1/100 Lcom_16924_1416_1522 Spur_NP_999631_1640_1749 96.9/99 Scil_12346_1423_1529 Lgig_188217_513_622 94.4/88 0,7715 0,0875 69/75 Ocar_g1079_t1_363_475 0,8138 Hsap_ENSP00000364979_1556_1666 64.5/98 Dpul_306783_378_487 0,9257 Hsap_ENSP00000379823_1557_1667 36/32 Dmel_FBpp0078672_1627_1735 Hsap_ENSP00000331902_1579_1689 77.7/37 Spur_NP_999631_1640_1749 0,9959 Lcom_13694_1424_1532 51.6/35 Skow_XP_002742276_1569_1678 0,3607 0,9997 Scil_10989_1421_1532 0/40 0/23 Lgig_188217_513_622 0,9966 Lcom_4786_1411_1520 80.6/98 Hsap_ENSP00000364979_1556_1666 Scil_11320_1428_1538 93.5/97 Hsap_ENSP00000379823_1557_1667 0,341 Tadh_XP_002116296_1400_1509 Hsap_ENSP00000331902_1579_1689 0,7531 0,7094 Lcom_73506_549_648 83.4/93 0/76 Tadh_XP_002116296_1400_1509 0,2615 Tadh_XP_002116198_177_287 92.5/93 Lcom_73506_549_648 Lcom_102744_332_405 Tadh_XP_002116198_177_287 0,9994 Ocar_g4577_t1_223_296 Mvib_comp15306_c0_seq1_m34021_1050_1121 0,7022 Ocar_g4577_t1_8_84 59.2/85 Mvib_comp15306_c0_seq1_m34021_1050_1121 79.1/79 Lcom_102744_332_405 94.9/99 Ocar_g4577_t1_223_296 Ocar_g4577_t1_8_84

Figure 9—figure supplement 1. Phylogenetic analysis of the (A) LIM-Homeobox, (B) p300/CBP, and (C) Collagen Type IV, using Maximum likelihood in IQ-TREE (supports are SH-like approximate likelihood ratio test/UFBS, respectively) and Bayesian inference in Mr. Bayes (BPP statistical supports). Figure 9—figure supplement 1 continued on next page

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 26 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Figure 9—figure supplement 1 continued DOI: 10.7554/eLife.26036.035

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 27 of 28 Research article Genes and Chromosomes Genomics and Evolutionary Biology

Transcription Metazoa Metazoa + Filozoa Holozoa Opisthokonta Unikonta/ factors Choanoflagellata Amorphea ARID 0.022 RFX_DNA_binding, 0.214 Tudor-knot 0.149 PLU-1 RBB1NT, DUF3518 bZIP_1 0.048 pKID, Jun CSD 0.254 zf-CCHC CUT 0.198 Homeobox Ets 0.104 SAM_PNT GATA 0.537 BAH, ELM_2 HLH 0.281 PAS_3, 0.222 MITF_TFEB_C_3_N 0.001 Response_reg, Hairy_orange CRAL_TRIO, PAS, PAS_9, PAS_11 HMG_box 1.000 SOXp Homeobox 0.001 OAR, SIX1_SD, 0.446 LIM Pou, PAX, PBC, CUT, HNF-1_N Homeobox_KN 0.254 Meis_PKNOX_N HTH_psq 0.168 DDE_1, HTH_Tnp_Tc5 IRF 0.036 IRF-3 IRF-3 0.036 IRF LAG1-DNAbind 0.020 BTD MH1 0.000 MH2 Myb_DNA-binding 0.664 DnaJ, 0.345 RAC_head 0.305 ZZ SWIRM-assoc_3 NDT80_PhoG 0.018 MRF_C1, Peptidase_S74 P53 0.020 SAM_2 0.044 P53_tetramer, SAM_1 RFX_DNA_binding 0.136 ARID Runt 0.030 Ank_4 SRF-TF 0.044 HJURP_C zf-BED 0.281 Dimer_Tnp_hAT zf-C2H2 0.332 SET, zf-C2H2_4, 0.105 zf-C2H2_6 zf-H2C2_5, zf-met, zf-H2C2_2 zf-C2HC 0.136 MOZ_SAS zf-C4 0.600 Hormone_recep zf-GRF 0.537 Rnase_T 0.305 DUF2439, AAA_12 zf-MIZ 0.071 PINIT 0.030 SAP 0.026 PINIT zf-TAZ 0.114 Bromodomain, DUF902, KIX

Figure 10. Domain combinations that appear in transcription factor (TF) families in unicellular premetazoans, from the LCA of Unikonta/Amorphea to the LCA of Metazoa. First and second columns indicate the TF family and its inferred evolutionary origin, respectively (from [de Mendoza et al., 2013]). Subsequent columns list (i) the p-value of a Fisher’s exact test for the relative enrichment of that TF family in that node of the tree (compared to other domains that rearrange there; p-values<0.05 in green); and (ii) the accessory domains that appear within each TF family. Figure 7—source data 2, Table 1. DOI: 10.7554/eLife.26036.036 The following source data is available for figure 10: Source data 1. Probability of emergence of protein domain combinations present in the LCA of Metazoa in previous ancestral nodes (from LCA of Met- azoa to LCA of Unikonta/Amorphea). DOI: 10.7554/eLife.26036.037

Grau-Bove´ et al. eLife 2017;6:e26036. DOI: 10.7554/eLife.26036 28 of 28