The 4-Celled Tetrabaena socialis Nuclear Genome Reveals the Essential Components for Genetic Control of Cell Number at the Origin of Multicellularity in the Volvocine Lineage
Item Type Article
Authors Featherston, Jonathan; Arakaki, Yoko; Hanschen, Erik R; Ferris, Patrick J; Michod, Richard E; Olson, Bradley J S C; Nozaki, Hisayoshi; Durand, Pierre M
Citation Jonathan Featherston, Yoko Arakaki, Erik R Hanschen, Patrick J Ferris, Richard E Michod, Bradley J S C Olson, Hisayoshi Nozaki, Pierre M Durand; The 4-Celled Tetrabaena socialis Nuclear Genome Reveals the Essential Components for Genetic Control of Cell Number at the Origin of Multicellularity in the Volvocine Lineage, Molecular Biology and Evolution, Volume 35, Issue 4, 1 April 2018, Pages 855–870, https://doi.org/10.1093/molbev/ msx332
DOI 10.1093/molbev/msx332
Publisher OXFORD UNIV PRESS
Journal MOLECULAR BIOLOGY AND EVOLUTION
Rights © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved.
Download date 08/10/2021 09:07:22
Item License http://rightsstatements.org/vocab/InC/1.0/ Version Final accepted manuscript
Link to Item http://hdl.handle.net/10150/628254 Jonathan Featherston*1,2, Yoko Arakaki3, Erik R. Hanschen4, Patrick J. Ferris4, Richard E. Michod4, Bradley J.S.C. Olson5, Hisayoshi Nozaki3, Pierre M. Durand1
Email addresses: Jonathan Featherston [email protected] Yoko Arakaki [email protected] Erik R. Hanschen [email protected] Patrick J. Ferris [email protected] Richard E. Michod [email protected] Bradley J. S. C. Olson [email protected] Hisayoshi Nozaki [email protected] Pierre M. Durand [email protected]
Affiliations: 1 Evolutionary Studies Institute, University of the Witwatersrand, South Africa. 2Agricultural Research Council, Biotechnology Platform, Pretoria 0040, South Africa 3Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan. 4 Department of Ecology and Evolutionary Biology, University of Arizona, United States of America. 5 Division of Biology, Kansas State University, United States of America. #Correspondence to: [email protected]
The 4-celled Tetrabaena socialis nuclear genome reveals the essential components for genetic control of cell number at the origin of multicellularity in the volvocine lineage.
Abstract Multicellularity is the premier example of a major evolutionary transition in individuality and was a foundational event in the evolution of macroscopic biodiversity. The volvocine chlorophyte lineage is well suited for studying this process. Extant members span unicellular, simple colonial, and obligate multicellular taxa with germ-soma differentiation. Here, we report the nuclear genome sequence of one the simplest and most deeply rooted organisms in the lineage – the 4-celled colonial Tetrabaena socialis and compare this to the three other complete volvocine nuclear genomes. Using conservative estimates of gene family expansions a minimal set of expanded gene families was identified that associate with the origin of multicellularity. These families are rich in genes related to developmental processes. Multiple points of evidence associate modifications to the ubiquitin proteasomal pathway (UPP) with the beginning of coloniality. Genes undergoing positive or accelerating selection in the multicellular volvocines were found to be enriched in components of the UPP and gene families gained at the origin of multicellularity include components of the UPP. A defining feature of colonial / multicellular life- cycles is the genetic control of cell number. The genomic data presented here, which includes diversification of cell cycle genes and modifications to the UPP, align the genetic components with the evolution of this trait.
Introduction The evolution of multicellular organisms from unicellular ancestors is an example of a transition in the unit of selection and evolution, in this case, from the level of the cell to the level of the multicellular group. Explaining such a major evolutionary transition (sensu Maynard Smith and Szathmary (Maynard Smith and Szathmáry 1995)) on evolutionary principles is a challenge, because explanatory concepts such as fitness and individuality are themselves changing during the transition. Critical in this endeavour to explain the origin of multicellular life is understanding the initial steps involved in the transition from a unicellular life cycle to a life cycle involving groups of cells (Maliet, et al. 2015; Rashidi, et al. 2015). The volvocine green algae provide a compelling model system to deconstruct these early steps (Starr 1968; Kirk and Harper 1986; Kirk 1999, 2005; Herron and Michod 2008; Herron 2009). The value of the volvocines as a model system is due to (i) the relatively recent evolution of multicellularity in the lineage (~234 million years ago (MYA)) (Herron, et al. 2009) when compared to lineages such as land plants and metazoa (600-1000 MYA)(Sharpe, et al. 2015), and (ii) the availability of extant taxa with varying levels of organismal complexity and (iii) with phylogenetic relationships sufficiently resolved for meaningful comparative analyses (Larson, et al. 1992; Coleman 1999; Nozaki, et al. 2000; Nakada, et al. 2008; Herron, et al. 2009; Nozaki, et al. 2014; Nozaki, et al. 2016). The volvocine algae form a monophyletic lineage within the reinhardtinia- clade of the order Chlamydomonadales (Nakada, et al. 2008). Members include single-celled species (e.g., Chlamydomonas reinhardtii), simple colonial forms where multicellularity can be facultative (e.g. the 4-celled Tetrabaena socialis and the 16- celled Gonium pectorale) to complex multicellular taxa with germ-soma differentiation (e.g. Volvox carteri) (Nishii and Miller 2010; Coleman 2012). Multicellular members of the lineage are divided into three families: the Tetrabaenaceae (e.g. T. socialis), the Goniaceae (e.g. G. pectorale) and the Volvocaceae (e.g. Pleodorina starrii and V. carteri) (Coleman 1999; Nozaki, et al. 2000). Comparative genomic analyses of C. reinhardtii (Merchant, et al. 2007), G. pectorale (Hanschen, et al. 2016) and V. carteri f. nagariensis (hereafter V. carteri) (Prochnik, et al. 2010) have revealed protein-coding potential to be similar across the lineage. Consequently, research into volvocine developmental complexity has largely focused on gene exaptation/co-option rather than protein-coding novelty (Miller and Kirk 1999; Kirk 2001; Nishii, et al. 2003; Duncan, et al. 2006; Hanschen, et al. 2014; Hanschen, et al. 2016; Olson and Nedelcu 2016; Grochau-Wright, et al. 2017). Genome comparisons have furthered our understanding of some of the important developmental traits. These include: the association between expanded extracellular matrix (ECM) genes and morphological expansion of the ECM in V. carteri, (Prochnik, et al. 2010; Hanschen, et al. 2016) and the master regulator of germ-soma differentiation in V. carteri regA (Kirk, et al. 1999) and closely related regA-cluster genes (Duncan, et al. 2006; Hanschen, et al. 2014) originated near the split between the Goniaceae and Volvocaceae (Hanschen, et al. 2014; Hanschen, et al. 2016; Grochau-Wright, et al. 2017). In accordance with the expectation that the transition to multicellular life involves construction of a life cycle at the level of the cell group (Maliet, et al. 2015), previous work, including genomic analyses, in the volvocine algae has shown that cell cycle genes are among the earliest diversifying gene families (Prochnik, et al. 2010; Hanschen, et al. 2016). These include the expansion of CYCD1 genes relative to C. reinhardtii in G. pectorale and V. carteri (Prochnik, et al. 2010; Hanschen, et al. 2016) and (iv) a role for the retinoblastoma pathway in cell cycle modifications associated with multicellularity (Hanschen, et al. 2016). The palintomic multiple- fission life cycle of the unicellular C. reinhardtii includes a brief phase as a group where division has occurred but separation is incomplete. This brief phase is rapidly followed by separation during which there is no growth. In contrast, cell growth in T. socialis occurs in the group context. The order of cell cycle events has changed in T. socialis such that the separation stage occurs after cell growth and before division in the group life cycle instead of before growth in the unicellular cycle (Maliet, et al. 2015). Another major difference between the multicellular program of division and the unicellular program is that the maximum number of divisions (n) is genetically determined in the multicellular taxa and not in C. reinhardtii. To date, sequenced colonial and multicellular volvocines are from the Goniaceae and Volvocaceae families.. The best-studied member of the most deeply rooted Tetrabaenaceae family (fig. 1) is the 4-celled T. socialis (supplementary fig. S1), which is amongst the “simplest” of known colonial multicellular eukaryotes (Nozaki 1986; Arakaki, et al. 2013). The nuclear genome of T. socialis is analysed here (organelle genomes have been described elsewhere (Featherston, et al. 2016)). Our analyses confirm the limited differences in protein coding potential between the unicellular C. reinhardtii and the colonial/multicellular taxa. A small set of orthologous gene families were identified that arose at the evolution of colonial living and an even smaller set of families showed expansion in colonial/multicellular taxa. Although few families were gained or expanded, these include a number of genes with putative functions related to DNA repair, surveillance of DNA damage or are homologous to metazoan oncogenes. We show here that this diversification of cell cycle genes began at the earliest possible stage with the evolution of a 4-celled Tetrabaena colony and that the genetic determination of colony size was likely one of the earliest group properties to evolve. Methods to detect positive and accelerating selection, as well as other findings, identified numerous genes involved at most points in the ubiquitin proteasomal pathway (UPP) and we discuss a role for the UPP in the genetic control of cell number.
Results and Discussion Genome Assembly A final Tetrabaena socialis (NIES-571) genome assembly was generated after combining Allpaths-LG (Gnerre, et al. 2011) and SPAdes (Bankevich, et al. 2012) assemblies from coverage in excess of 400X. The combined meta-assembly produced 20,363 contigs, 5,856 scaffolds and totalled 133MB in length. The N50 values for contigs and scaffolds were 7.4KB and 145.9KB respectively (table 1). The T. socialis genome is highly GC-skewed with a GC content of 66%. Kmer analyses with Jellyfish (v2.0.0) (Marcais and Kingsford 2011) and GenomeScope (Vurture, et al. 2017) (e.g. supplementary fig. S2), Allpaths-LG and BBtools (http://jgi.doe.gov/data- and-tools/bbtools/) did not converge on the same estimated genome size or repeat content (supplementary table 1). The estimated genome is likely to be at least ~120MB with repeat content in excess of 20%, which is similar to C. reinhardtii (19%) and V. carteri (20%) (Prochnik, et al. 2010).
Transcriptome Assembly and Annotation A de novo transcriptome was generated using Bridger De Novo (r2014-12-01) (Chang, et al. 2015) from 75 million paired RNAseq reads (v4 SBS 2x125bp chemistry). TransRate (1.0) (Smith-Unna, et al. 2016) was used to evaluate assembly quality and to filter contigs (n=23,559). The TransRate Optimal Assembly Score was 0.362. By comparison, 50% of 155 publically available de novo transcriptomes have been shown to have a TransRate Optimal Assembly Score of ≤0.35 (Smith-Unna, et al. 2016). A total of 21,823 (92%) assembled transcripts mapped to the genome assembly (90% mapped with >95% of contig length). Computationally derived annotations were generated using the MAKER2 pipeline (v2.31.3) (Holt and Yandell 2011) with evidence for computational predictions derived from de novo assembled transcripts, protein sequences from publically available volvocine genomes (from C. reinhardtii, G. pectorale and V. carteri, supplementary table 2) as well as the SWISS- PROT database (Boutet, et al. 2016) of curated proteins. Automated annotations produced 15,513 protein-coding loci, which is a value similar to other volvocines (table 1). A total of 65% of proteins modelled by MAKER2 were in possession of at least 1 Pfam-A domain (Finn, et al. 2014) and 93.8% of Eukaryotic Clusters of Orthologous Genes (KOG) domains (Parra, et al. 2007) could be identified in the predicted proteome. Non-redundant homologs for at least 91% of C. reinhardtii KOG proteins (n=406) (Parra, et al. 2007) were identified in the T. socialis predicted proteome, indicating that at least 90% of genes were modelled. Manual annotation of gene families (e.g. pherophorins or supplementary table 24 orthologues) showed similar levels of model coverage. The average intron length in T. socialis is greater than those of other volvocines (table 1). This is likely due to a mixture of the highly repetitive nature of the genome and due to assembly artefacts that are the result of inflated average mate-pair insert distances spanning exons. Using OrthoMCL (v2.09) and OrthoVenn orthologous proteins shared by all volvocines numbered 5199 (supplementary fig. 3). BLASTP searches to volvocine proteins as well as UNIPROT databases identified significant matches (E=10-7) for 90% of T. socialis predicted proteins (supplementary table 3) and the percentage of reciprocal BLAST matches to proteins from model organism C. reinhardtii was comparable for T. socialis when compared to G. pectorale and V. carteri (supplementary table 4).
Domain content Comparing domains in C. reinhardtii with those of the multicellular taxa (T. socialis, G. pectorale, and V. carteri) in a 2-way comparison identified only 61 domains as significantly enriched (p=0.05). Of these only 18 were enriched in the multicellular taxa (the other 43 were enriched in C. reinhardtii). A further 30 domains were identified as present in T. socialis, G. pectorale and V. carteri but absent from the proteome of C. reinhardtii (supplementary table 5). Domains that were significantly enriched (n=61) and domains unique to the multicellular taxa (n=30) were combined (total n=91) and examined for their presence in selected eukaryotic model species (supplementary table 2, supplementary fig. S4). The majority of these domains were not enriched in volvocine or indeed chlorophyte specific domains (supplementary fig. S5) but rather the majority (n=59) were of a more universal eukaryotic origin (supplementary fig. S5). These findings are consistent with previous volvocine genomes, which suggested that extensive domain novelty was not a major feature of the evolution of multicellularity in the volvocines (Prochnik, et al. 2010; Hanschen, et al. 2016). With the exception of the gametolysin domain (Hanschen, et al. 2016) enriched domains between C. reinhardtii and V. carteri (Prochnik, et al. 2010) mostly show loss in V. carteri (supplementary table 6) and this is particularly evident for ankyrin domains (supplementary table 7).
Phylogenetic reconstruction of gene family expansion and contraction OrthoMCL (2.09) (Li, et al. 2003) was used to generate gene family clusters from the proteomes of all available genome sequenced chlorophyta (n=13) (supplementary table 2)). Following Hanschen et al (2016), parsimony reconstruction of gene family expansion and contraction was performed using the Count software package (v10.04) (Csuros 2010) using Dollo and Wagner’s parsimony (symmetric and asymmetric). We focused on the results of Wagner’s symmetrical parsimony as this produced the most conservative set of gene families expanded at the origin of multicellularity (supplementary table 8). The largest expansion of gene families (n=2657) in the phylogeny was for the Chlorophyceae (at the split between the chlorophyta orders Chlorophyceae and Trebouxiophyceae), which points towards substantial protein- coding diversification and increasing complexity in this order since separating from other lineages (fig. 2) (Hanschen, et al. 2016). However, it remains unknown if this increased proteome complexity was important for the evolution of multicellularity. Using results from Wagner’s symmetric parsimony a set of 131 gene families was gained/originated at the origin of volvocine coloniality/multicellularity (fig. 2). For simplicity these genes were labelled as the “origin of multicellularity” (ORIMC) gene set (supplementary table 9). Argot2 (http://www.medcomp.medicina.unipd.it/Argot2/) (Falda, et al. 2012), InterProScan (Quevillon, et al. 2005) as well as BLASTP (Altschul, et al. 1997) homology to C. reinhardtii proteins were used to annotate the ORIMC gene set. A number of families that originate at the ORIMC are candidates for further exploration many of which have putative functions related to developmental or regulatory processes. Of particular interest may be those homologous to DNA repair, DNA damage surveillance and cancer genes, including: breast cancer susceptibility1-like (BRCA1- like) gene, colon cancer-associated protein Mic1-like gene, RECQ-helicases, structural maintenance of chromosome 3 proteins (with a recF/recN domain), MutL C-terminal dimerization domain containing proteins and a family of proliferating cell nuclear antigen domain containing proteins (Strzalka and Ziemienowicz 2011). In relation to the Ubiquitin Proteasomal Pathway (UPP), a family of AAA-type ATPases with a possible role in proteosomal degradation was gained (Bar-Nun and Glickman 2012). Of the 131 ORIMC families, homology (E=10-10) to C. reinhardtii proteins was not identified for proteins from 30 families. Using TBLASTN to search the C. reinhardtii genome with proteins from these 30 families it was found that 16 gene families had no homology at E=10-5 while a further 5 showed no homology at E=10- 10. There were 2 instances where alignments indicated that an orthologue is present in C. reinhardtii while all other BLAST alignments (E<10-10) were partial or due to repetitive motifs. Overall, only 8 gene ontology (GO) terms could be assigned to the 30 families by Argot2 and, furthermore, BLASTP searches against the RefSeq non- redundant (O'Leary, et al. 2016) and UNIPROT-TREMBL databases (Schneider, et al. 2009) identified only 3 families where matches were not to volvocine proteins, suggesting that most of the 30 families were unlikely to have been gained by horizontal transfer events. Families with no homology to C. reinhardtii proteins were likely either lost in C. reinhardtii or arose de novo near the origin of multicellularity. Homology to C. reinhardtii proteins could be identified for proteins from the remaining ORIMC families (n=101), however, it was not determined for how many a C. reinhardtii orthologue might be present. The exclusion of C. reinhardtii proteins from these 101 OrthoMCL Markov family clusters is most likely accounted for by protein diversification, including paralogue divergence from gene duplication events but is not suggestive of ab initio gene formation. Gene families expanded in the multicellular taxa relative to C. reinhardtii numbered only 27 (supplementary table 10). Amongst expanded families was a family of MutL-related genes and, furthermore, enrichment analysis of expanded families using C. reinhardtii v5 annotations (Lopez, et al. 2011; Blaby, et al. 2014) identified the KEGG pathway “Mismatch repair”. In terms of gene function, families of protein kinases and DNA-repair related genes were identified both amongst families gained and expanded at the ORIMC. As molecules with diverse functions it is unsurprising to identify diversification of protein kinases (Francino 2005) and protein kinase expansions have previously been associated with the evolution of multicellularity or increasing organismal complexity in other lineages (Rensing, et al. 2008; Stajich, et al. 2010; de Mendoza and Ruiz- Trillo 2011; Miller 2012; Schultheiss, et al. 2012; Glockner, et al. 2016). The expansion or diversification of DNA repair genes (Rad51-type) has been identified in the multicellular brown alga Ectocarpus siliculosus (Cock, et al. 2010), the importance of which is not known and the significance of gained/expanded families of DNA repair-related genes in the multicellular volvocines requires further investigation. There is currently no evidence to suggest that replication fidelity is improved in multicellular volvocines and the general trend across life suggests that mutation rates rise with increased organismal complexity (Lynch, et al. 2016). This does not, however, preclude specific exceptions to this general trend. Based on the premise that increased organismal complexity potentially results in increased risk of harm from mutations and mutator genotypes (Lynch 2008) then a possibility for gains in DNA repair-related genes in the multicellular taxa might be related to increased redundancy in repair mechanisms. Both protein kinase signalling and DNA repair pathways may play a role in regulating the cell cycle (Haring, et al. 1995; Branzei and Foiani 2008).
Protein orthologues with evidence of codon-level positive selection Codeml (Yang 2007) site models of positive selection were applied within the Potion pipeline (v1.1.2) (Hongo, et al. 2015) to identify evidence for positive selection amongst orthologue families. Analysis was restricted to OrthoMCL families without paralogues with a single protein from each of the 4-volvocine taxa and 73 family clusters with evidence of positive selection were identified (q=0.05 and m7/8 models, supplementary table 11). It is expected that substantially more families with evidence of positive selection would be detected if paralogues were included. GO enrichment analysis applied to C. reinhardtii v5.5 annotations (Blaby, et al. 2014) from the 73 families identified 82 enriched terms. Most enriched terms were related to metabolic processes (supplementary table 12). Selection acting upon metabolic processes might be evidence of metabolic adaptations to group living since the initial step of group formation is stressful to organisms due to the challenges of dealing with dying cells and nutrient and waste exchange (Durand, et al. 2016; Sathe and Durand 2016). In addition, terms associated with developmental processes were amongst the enriched terms. A number of terms related to the UPP were enriched that ranged from targeting for proteolysis by ubiquitination to proteasomal components. In addition, the MapMan annotation term (Klie and Nikoloski 2012) “protein degradation ubiquitin” and the KEGG pathway “Proteosome” pathway were enriched. Genes from the 73 families with putative functions of interest include genes related to mRNA processing/splicing and polymerization, protein translation, protein shuttling, Ca2+ sensing and signalling, UPP-related genes, chromatin-remodelling and FIZZY-related kinase activity.
Genome-level scans for conservation and accelerating selection A whole-genome multiple sequence alignment of C. reinhardtii, T. socialis, G. pectorale, and V. carteri was generated, referenced to C. reinhardtii and examined for signature of selection using the PHAST suite of algorithms (v1.1.) (Hubisz, et al. 2011). Highly conserved non-coding elements identified with PhastCons in the volvocine genomes were remarkably sparse - only 101 elements of average length 58.9bp were identified (5 951bp) (supplementary table 13 and supplementary table 14). This was almost entirely due to the minimal alignment stemming from non- coding sequences, which amounted to only 57 898bp. The paucity of alignments stemming from non-coding regions can be explained by a high-degree of non-coding genome diversification, simple repeats masking (including removal of duplicate alignments) and the gap content of volvocine genome assemblies, which are highest in the T. socialis (27%) and G. pectorale (21%) assemblies. PhyloP (Pollard, et al. 2010) was used to identify features that, relative to C. reinhardtii, showed evidence of accelerating selection in the multicellular taxa. Features were divided according to C. reinhardtii v5.5 annotations into genes (whole genes including untranslated regions), CDS, 3’UTR, 5’UTR and intergenic regions. With false discovery rate correction (q=0.05) 129 genes were identified as accelerating in the multicellular taxa relative to C. reinhardtii (supplementary table 15) of which 24 were accelerating faster than the neutral rate (the remaining 105 genes were accelerating in the multicellular taxa relative to C. reinhardtii but not faster than the neutral rate). Amongst the 24 more rapidly evolving genes were the retinoblastoma (MAT3/RB) gene and an SPC97/SPC98 member of the spindle pole body component (also known as Gamma tubulin interacting protein 2 (GCP2)). GCP genes in plants are involved in microtubule cytoskeleton organisation and spindle integrity (Janski, et al. 2012). In addition, the tubulin beta chain 1 gene was accelerating relative to C. reinhardtii. Both genes may be of interest for further investigation because microtubule interactions are important for volvocine developmental traits such as embryo inversion in V. carteri (Nishii, et al. 2003). GO enrichment analysis of the 129 genes identified enriched annotations such as “generative cell differentiation”, “regulation of cell cycle” and “actin filament-based movement”. Enriched molecular functions included “cyclin binding” and “ubiquitin activating enzyme activity”. Enriched MapMan terms included “RNA transcription”, “cell organisation”, “protein degradation ubiquitin E1” and “cell division” (supplementary table 16). The term “accelerating selection” implies that loci were evolving more rapidly than the neutral rate or, for branch tests, acceleration indicates that loci are evolving more rapidly on a particular phylogenetic branch than for the rest of the tree. Acceleration might indicate positive selection but may also indicate relaxation of purifying selection and/or even ultimately pseudogenisation. Therefore, in order to identify if genes accelerating in the multicellular taxa showed evidence of positive selection Potion analysis was repeated specifically for families containing accelerating genes, but with slightly relaxed filters (see supplementary text) employed and with the inclusion of families containing paralogues. The 129 genes were associated with 124 orthoMCL families, of which 67 were valid within the Potion pipeline and 25 (37%) displayed evidence of positive selection (q-value=0.05). We considered this to be a substantial proportion because accelerating genes without evidence for positive selection might be undergoing pseudogenisation or acceleration may be occurring in non-coding parts of gene sequences rather than in CDS. A total of 42 CDS features were identified (from 38 genes –supplementary table 17) as accelerating in the multicellular taxa relative to C. reinhardtii with 16 accelerating faster than the neutral rate and 6 that were also identified as accelerating in the whole-gene analysis. Accelerating CDS features of interest include a putative chromatin remodelling gene and a cytoplasmic dynein heavy chain. Corroborating whole gene analysis and indicating that the coding regions of genes are under acceleration are the ubiquitin activating enzyme 2, ubiquitin ligase 1, GCP2, tubulin beta chain 2, a putative transcription regulator and a putative histone protein. GO enrichment analysis identified the cellular compartment terms “microtubule organising centre”, “SWI/SNF complex” and “SWI/SNF-type complex” and the molecular function term “ubiquitin activating enzyme activity” as enriched. As per the gene-level analysis, accelerating CDS regions were examined for positive selection using Potion for 34 OrthoMCL families, which were filtered to 14 and of which 11 were significant for positive selection (79%). This indicates a good correlation, albeit from a limited number of families, between phyloP results and protein-level Codeml analysis, suggesting that the majority of accelerating CDS are under positive selection rather than under-going pseudogenisation. Perhaps due to ~234MYA of separation and low-levels of alignment stemming from non-coding regions no 3’UTRs, 5’UTRs or intergenic regions were identified as accelerating significantly faster than the neutral rate of non-coding DNA evolution. This is similar to the non-coding regions of volvocine organelle genomes, which are replete with extensive tracts of non-coding regions intractable to alignment (Smith and Lee 2009; Smith, et al. 2013). Diversification of non-coding regions also suggests extensive divergence of regulatory elements and gene regulatory mechanisms (Kianianmomeni 2015), which in other lineages have been associated with the evolution of multicellularity (Sebe-Pedros, et al. 2016). As per findings from proteome-wide scans of positive selection amongst orthologues, accelerating genes and CDS with evidence of positive selection include genes with functions related to the UPP (e.g. ubiquitin- activating enzyme 2 and ubiquitin ligase 1), Ca2+ sensing and signalling (e.g. calmodulin 4) and protein shuttling (e.g. heat shock protein 70), which is further evidence of selection acting on these processes. Genes with these functions are likely to be important for basic cellular functions and regulatory processes and have been associated with cell cycle regulatory functions (Goldknopf, et al. 1980; Ciechanover, et al. 1984; Kao, et al. 1985; Goebl, et al. 1988; von Kampen and Wettern 1991; Cheng, et al. 2005; Choi and Husain 2006).
Cyclins and the retinoblastoma gene Theoretical models (Maliet, et al. 2015; Rashidi, et al. 2015) and observations of traits in extant taxa such as T. socialis (Arakaki, et al. 2013) suggest that alterations to the cell cycle was one of the primary and contingent steps to emerge at the origin of group living (Kirk 2005; Herron and Michod 2008). The underlying genetic components that control the cell cycle are therefore of special interest. The primary genetic components of the cell cycle regulation pathway are similar throughout the volvocine lineage (Bisova, et al. 2005; Merchant, et al. 2007; Olson, et al. 2010; Prochnik, et al. 2010; Cross and Umen 2015) and are generally similar to those found in most eukaryotes. However, the cyclin D1 (CYCD1) gene has expanded through tandem duplication from a single copy in C. reinhardtii to four copies in V. carteri and G. pectorale (Prochnik, et al. 2010; Hanschen, et al. 2016). The expansion of CYCD1 was identified in the T. socialis genome, three copies of CYCD1 were identified and confirmed using rapid amplification of cDNA (RACE) and Sanger sequencing (fig. 3, supplementary table 18 and supplementary table 19). Two of the CYCD1 copies are located in close proximity on the same scaffold while the third copy is found on a separate scaffold (supplementary fig. S6). The expansion of CYCD1 genes in T. socialis suggests that CYCD1 expansion traces back to the formation of colonial multicellularity and that an additional duplication event (resulting in four copies) most probably occurred later, after the G pectorale and V carteri lineage split from the Tetrabaena lineage. CYCD1 expansion in the simple colonial T. socialis supports the hypothesis that the expansion was integral for the development of colonial multicellularity. As per other volvocines (Merchant, et al. 2007; Prochnik, et al. 2010; Hanschen, et al. 2016) the repertoire of cyclin-dependent kinases is similar in T. socialis (supplementary fig. S7). In most eukaryotic lineages including the volvocines D-type cyclins interact with the retinoblastoma (RB) gene (a notable exception being yeast) (Muller, et al. 1994; Umen and Goodenough 2001; Olson, et al. 2010; Desvoyes, et al. 2014; Narasimha, et al. 2014; Li, et al. 2016). The volvocine RB homolog (also known as MAT3) is important for the formation of colonial multicellularity (Hanschen, et al. 2016) and may be involved in oogamous sexual reproduction in V. carteri (Kianianmomeni, et al. 2008; Ferris, et al. 2010; Hiraide, et al. 2013). Transformation of a MAT3/RB negative strain of C. reinhardtii with the G. pectorale RB gene was found to cause colony formation in the unicellular C. reinhardtii (Hanschen, et al. 2016). The transformative effect of the G. pectorale RB gene was proposed to be due to specific differences of the RB protein structure observed between C. reinhardtii RB and the G. pectorale and V. carteri RB proteins. Including T. socialis, G. pectorale, V. carteri and other published RB protein sequences from Yamagishiella, Eudorina, and Pleodorina (Hiraide, et al. 2013) the colonial/multicellular taxa were confirmed to possess a shortened linker region between the RB-A and RB-B domains (supplementary table 20 and fig. S8). Phylogenetic analysis of MAT3/RB proteins was in accordance with Hiraide et al. (2015) (supplementary fig. 9). Following Hanschen et al. (2016) cyclin-dependant kinase (CDK) phosphorylation targets within the linker region between RB-B and the carboxy (C)-terminal domain were identified in T. socialis, Eudorina spp and Yamagishiella spp (fig. 4). In addition, the PlantPhos server was used to computationally identify additional targets within this region in all examined taxa. Findings show that phosphorylation targets within this region are not an exclusive feature of the C. reinhardtii MAT3/RB protein and do not correlate with the evolution of multicellularity. Results from the PlantPhos server, in particular, show extensive variation in putative phosphorylation targets amongst MAT3/RB protein sequences and, therefore, altered MAT3/RB phosphorylation may be of significance. However, the absence of phosphorylation targets between the RB-B and RB-C domains cannot be associated with the emergence of multicellularity. To further investigate the RB gene, the exon/intron boundaries of RB genes from C. reinhardtii, T. socialis, G. pectorale, and V. carteri were examined (supplementary fig. S10). Aligned gene structures identified three introns present in T. socialis, G. pectorale, and V. carteri but are absent in C. reinhardtii. The exact intron positions and sequence composition in these regions are not identical and, thus, gene structures are not identical. However, the intron gains are at least indicative of a predisposition for intron gain at these specific locations and, of more interest, introns gained in these 3 regions are limited to the multicellular taxa. It is unknown if these introns are important for transcript regulation or if the intron gains have no influence on function. However, if the former is true, it might contribute towards the transformative ability of G. pectorale MAT3/RB to induce colonies in a strain of C. reinhardtii lacking MAT3/RB. From analysis of cell cycle genes in T. socialis we can conclude that the expansion of D1-type cyclins and alterations to the MAT3/RB gene (shortened RB-A to RB-B linker region) trace near to the origin of colonial of group living, which further supports a role for these genetic alterations for a multicellular program of cell division.
The Ubiquitin Proteasomal Pathway Although other classes of genes identified in the current investigation are of interest in relation to the genetic control of division number (e.g. protein kinases) the UPP is of particular interest because our various analyses identified genes related to most steps in this pathway (supplementary table 21). Targeted degradation of proteins via the UPP is integral to various cellular processes including but not limited to the removal of misfolded proteins, deflagellation and the regulation of cell cycle proteins (Teixeira and Reed 2013). The basic components of the pathway include a series of enzymatic steps to bind ubiquitins (or ubiquitin-like molecules) to proteins whereupon they are shuttled to the proteasome and degraded, releasing free ubiquitin molecules and degraded protein products. Specialised ligases (E3 ligases) that target specific proteins take the form of RING-U-box complexes, which vary but may consist of cullins, transducins, E2 conjugators and various other molecules (e.g. SKP1 in the Skp1- Cullin-F-box complex) (Bai, et al. 1996; Hershko 2005). Complexes such as the Skp1-Cullin-F-Box complex form the basis of key cell cycle checkpoints. Migration to the proteasome may be facilitated and regulated by chaperones such as heat-shock proteins (Noga, et al. 1997; Park, et al. 2007; Shiber, et al. 2013). Aspects of the UPP identified in our analyses (supplementary table 21, fig. 5) include: ubiquitin activation, conjugation and ligation enzymes (E1, E2 and E3), shuttling and regulation of ubiquitinated targets to the proteasome (HSP70), proteasomal subunits/ATPases as well as components of the cullin-RING ubiquitin ligase-type complexes. The scope and varying nature (e.g. positive selection versus gained families) of UPP-related modifications identified in our analyses render targeted functional characterisation challenging. Together, however, they point to systemic modifications of this pathway in the multicellular taxa. Regulation via the UPP is important for many aspects of cell biology and modifications of this pathway will have far-reaching and unpredictable consequences and hence it cannot be stated with certainty that UPP-related genes identified in our analyses have specifically contributed towards increased developmental complexity in the lineage. Based on of the well-known role in many organisms of the UPP in cell cycle regulation where it has been shown to be an integral means of regulating cell cycle progression (Goldknopf, et al. 1980; Ciechanover, et al. 1984; Pagano 1997; Nakayama and Nakayama 2005, 2006b, a; Tu, et al. 2012; Vlachostergios, et al. 2012; Teixeira and Reed 2013) the modifications of the UPP pathway will impact cell cycle regulation in the volvocines. The UPP is not well characterized in C. reinhardtii (von Kampen, et al. 1995; Vallentine, et al. 2014) although a relationship between the UPP and the C. reinhardtii cell cycle has been demonstrated (von Kampen and Wettern 1991; von Kampen, et al. 1995). Specifically, the expression of ubiquitin encoding genes is associated with stress responses in C. reinhardtii and their expression has furthermore been associated with progression through the cell cycle (von Kampen and Wettern 1991; von Kampen, et al. 1995). In C. reinhardtii the number of divisions is highly correlated with the size of the mother cell entering the division phase of the life cycle (Craigie and Cavalier-Smith 1982; Cross and Umen 2015). A key step that evolved early in the evolution of volvocine multicellularity was the modification of the genetic program to control the number of rounds of cell division (Kirk 2005; Herron and Michod 2008). Specifically, the maximum number of divisions is genetically controlled in all colonial volvocines and this feature is even present in the comparatively simple 4-celled T. socialis (Arakaki, et al. 2013). An intriguing feature of the multicellular volvocines is that the maximum number of cell divisions is a characteristic feature of each species e.g. 8-celled T. socialis colonies are rare (requiring 3 divisions). This suggests that the genetic control program is differentially regulated. Such regulation may occur through various mechanisms, not the least being alternative regulation of key cell cycle regulatory molecules such as RB, which has been shown to be a key regulatory molecule at specific points in the cell cycle (Umen and Goodenough 2001; Olson, et al. 2010; Hanschen, et al. 2016). Another molecule of interest is CDKG1, because CDKG1 concentration has been correlated with the number of cell divisions in C. reinhardtii, and after the final round of palintomic division in C. reinhardtii CDKG1 is undetectable (Li, et al. 2016). Evidence suggests that CDKG1 undergoes degradation between rounds of palintomic divisions (Li, et al. 2016) and hence serves as example of how targeted degradation may control cell division in the volvocines. Modification to the UPP identified in the current investigation associate with even the deeply-rooted T. socialis, therefore, although we do not present specific evidence these modifications are associated with a genetic program to control the number of divisions in the volvocines, we propose that because of the known role of the UPP in the cell cycle regulation, including in C. reinhardtii, that the modifications are likely to be important for this key evolutionary development. The role of the UPP might conceivably be through regulated degradation of other determinants of the number of divisions such as CDKG1 or MAT3/RB or other cell cycle genes such as the CYCD1 genes.
Additional analysis of volvocine families associated with developmental complexity Extracellular matrix proteins The two major protein families found in the extracellular matrix (ECM) of volvocines are the pherophorins and matrix metalloproteases (Godl, et al. 1995; Godl, et al. 1997; Sumper and Hallmann 1998; Hallmann 1999). A total of 36 pherophorin and 34 matrix metalloproteases (MMPs) genes were identified in T. socialis. Separate phylogenies of manually curated pherophorins and MMPs that included proteins from C. reinhardtii (Merchant, et al. 2007; Blaby, et al. 2014), T. socialis, G. pectorale (Hanschen, et al. 2016) and V. carteri (Prochnik, et al. 2010; Hanschen, et al. 2016) highlight orthologues shared throughout the lineage as well as species-specific expansions (supplementary fig. S11 and fig. S12). As per previous findings (Hanschen, et al. 2016), widespread expansion of pherophorins and MMP genes are not a feature of the colonial taxa such as T. socialis or G. pectorale but rather are limited to V. carteri and are, therefore, best associated with the expansion of the extracellular matrix rather than initial transformation of the cell wall into an extracellular matrix. The C. reinhardtii cell wall hydroxyproline-rich glycoprotein 1 (GP1) gene (Adair, et al. 1987) is absent from the genomes of T. socialis, G. pectorale and V. carteri (supplementary table 11). The significance of this is unknown, however, it might be important for understanding the initial transformation of cell wall components into an ECM, which is a feature that all multicellular volvocines share.
RegA-related sequences. The evolution of germ-soma division of labour is a key development in complex multicellular volvocines such as Pleodorina and Volvox spp. The somatic regenerator (regA) is a putative transcription factor that is expressed in Volvox carteri somatic cells where it is thought to suppress cell growth via suppression of chloroplast biogenesis. Homologues of regA (regA-like sequences (rls) genes) are found throughout the lineage (Harper, et al. 1987; Kirk 1994; Choi, et al. 1996; Kirk 1997; Meissner, et al. 1999; Duncan, et al. 2006; Duncan, et al. 2007; Hanschen, et al. 2014; Hanschen, et al. 2016; Grochau-Wright, et al. 2017). The genes rlsA, rlsB, rlsC and rlsN/O have been identified as syntenically linked to regA in a number of Volvocaceae and show a greater phylogenetic relationship to regA than other rls genes (Hanschen, et al. 2014; Grochau-Wright, et al. 2017). Together with regA these genes have been labelled as regA-cluster genes and are absent in C. reinhardtii and G. pectorale. A total of 8 RLS genes were identified in T. socialis, which is the same number as in G. pectorale but less than C. reinhardtii (n=12) and V. carteri (n=14). Phylogenetic relationships (supplementary fig. S13) of RLS proteins from C. reinhardtii, T. socialis, G. pectorale, and V. carteri f. nagariensis show that regA- cluster genes are absent in T. socialis and confirm the close relationship of RLS1/RlsD proteins to RegA-cluster proteins (Duncan, et al. 2006; Duncan, et al. 2007; Hanschen, et al. 2014)(supplementary fig S8), consistent with the hypothesis that the regA-cluster genes evolved from an ancestral RLS1/rlsD gene (Hanschen, et al. 2014; Hanschen, et al. 2016).
Fasciclin-domain containing proteins A V. carteri fasciclin-domain containing protein known as the algal-cell adhesion molecule (algal-CAM) has previously been shown to be required for V. carteri embryogenesis and embryo inversion (Huber and Sumper 1994). Protein phylogenetic relationships of fasciclin-domain containing proteins from C. reinhardtii (N=21), T. socialis (n=18), G. pectorale (n=17) and V. carteri (n=17) show that 2 V. carteri proteins (Jgi|Volca1|127393 and Jgi|Volca1|108097) cluster separately with the annotated V. carteri algal-CAM protein (Jgi|Volca1|127389) (fig. 6). Orthologues of algal-CAM were not found in C. reinhardtii, T. socialis, or G. pectorale. While G. pectorale undergoes a form of partial inversion V. carteri undergoes complete embryo inversion. The absence of algal-CAM orthologues in C. reinhardtii, T. socialis and G. pectorale supports that algal-CAMs are required for complete embryo inversion in V. carteri (Huber and Sumper 1994). As yet it is unknown how CAMs are involved in inversion. A possible role might be that CAMs are required for reconstitution of cellular connectivity after cellular inversion around cytoplasmic bridges and are candidates for further exploration as cell adhesion molecules, which may play an important role in volvocine developmental biology.
Conclusions We present findings from the nuclear genome of one of the simplest eukaryotic colonial multicellular organisms- the 4-celled volvocine T. socialis – as well as comparative genomic analyses with the three other available volvocine genomes. In keeping with previous findings, extensive increases in proteome complexity are not associated with the origin of coloniality in this lineage (Prochnik, et al. 2010; Hanschen, et al. 2016). However, gene families gained near the origin of coloniality were enriched in genes related to developmental processes such as DNA repair, protein kinase activity, cell adhesion and extracellular functions. In addition, over 200 genes with evidence of either positive or accelerating selection trace back to T. socialis, which suggests that the evolution of volvocine multicellularity has been shaped by numerous genetic loci. Diversification of cell cycle genes as well as modification to components of the UPP were identified that emerged at the beginning of colonial living and associate with the evolution of a genetic program for the control of cell number.
Materials and Methods Culturing, nucleic acid extraction and sequencing. An axenic preparation of T. socialis NIES-571 was cultured in tri-acetate-phosphate medium (Gorman and Levine 1965) as per Arakaki et al (2013). Total DNA was extracted using standard Volvox DNA isolation method (Miller, et al. 1993) and RNA was extracted 2hrs into the dark phase of the life cycle with Thermo Fisher Tri reagent (Waltham, MA, USA). Paired- end sequencing libraries for DNA were prepared using the Illumina TruSeq DNA library preparation kit (San Diego, CA, USA) and libraries with two insert sizes were generated (~180bp and ~450bp). A 6kb insert mate-pair library was generated using the Illumina Nextera Mate-Pair Library Kit (San Diego, CA, USA ) and automated pulse-field size selection on the Sage BioScience Blue-Pippen (0.75% agarose cassette, Beverley, MA, USA). An RNA-seq library was prepared with the TruSeq stranded-mRNA kit (cat # RS-122-2101). Sequencing was performed on an Illumina HiScanSQ (2x100bp), HiSeq 2500 (2x100bp and 2x125bp) and MiSeq (2x300bp).
Genome Assembly. An Allpaths-LG assembly (v48057) (Gnerre, et al. 2011) was generated using mate-pair data and paired-end data from short reads (2x100bp) as well as longer MiSeq reads that were trimmed to 150bp. A SPAdes (v3.1.1) assembly (Bankevich, et al. 2012) was generated using only MiSeq data (2x300bp), using the multi-Kmer approach (maximum kmer of 127) and subsequent scaffolding with mate- pair data was performed with Opera (v1) (Gao, et al. 2011). The Allpaths-LG and SPAdes assemblies were combined using MetaAssembler (Wences and Schatz 2015) with the Allpaths-LG assembly selected as the “master” assembly. Further improvements to the assembly in terms of gap content and accuracy were made using Gapfiller (Boetzer and Pirovano 2012) and Pilon (Walker, et al. 2014). Multi-kmer analysis was performed with Jellyfish (v2.0.0) (Vurture, et al. 2017), Allpaths-LG and BBtools (http://jgi.doe.gov/data-and-tools/bbtools/).
Transcriptome Assembly and De Novo gene prediction. Transcriptome assembly was performed from ~150 million reads using Bridger de novo (r2014-12-01) (Chang, et al. 2015) (k=25) and was examined for quality using TransRate (v1.0) (Smith- Unna, et al. 2016). Automated gene predictions were performed using MAKER2 (v2.31.3) (Holt and Yandell 2011) using protein homology data and and de novo transcripts generated by Bridger assembly. Consensus models were generated by MAKER2 from genes predicted by Augustus (v3.02) (Stanke, et al. 2004), Genemark (v4.10) (Borodovsky, et al. 2003) and SNAP (v2013-11-29) (Korf 2004). Functional annotations of proteins were derived using HMMER3 (v3.1b) (Zhang and Wood 2003), InterProScan v5 (Quevillon, et al. 2005), BLASTP (Altschul, et al. 1997) to the UniProt TREMBL database (Schneider, et al. 2009) as well as by identifying homology to proteins from sequenced volvocines.
Global Coding Content. Proteomes were collected for all available genome sequenced chlorophyta (supplementary table 2), additional transcriptome-derived proteomes from chlorophyta in the Chlamydomonadales order, yeast and selected model metazoans and plants. Domains were identified using hmmsearch (HMMER v3.1) against Pfam-A. Domain content was compared between C. reinhardtii and multicellular taxa with a χ2-test. Domains with significant difference in abundance (p=0.05) as well as domains shared by multicellular volvocines are absent in C. reinhardtii were identified for presence in opistokonta (yeast and selected metazoa), chlorophyta and plantae. Orthologue protein families for all available genome- sequenced chlorophyta were generated with OrthoMCL (v2.09) (Li, et al. 2003). An RAxML (v8.2.0) phylogeny (Stamatakis 2006) was first generated from 1:1 orthologue families (singletons, n=927). The expansion and contraction of Markov families were examined within the Count environment (v10.04) (Csuros 2010) using Dollo and Wagner’s parsimony (symmetric and asymmetric - gain cost of 2). Families gained in the multicellular volvocines and families with expanded protein numbers in the multicellular volvocines were further annotated with manual curation of outputs from BLAST, InterProScan and Argot2 (Falda, et al. 2012) results.
Positive selection in orthologue protein families. OrthoMCL family clustering was repeated except only volvocine proteomes were included. Identification of families with evidence of positive selection was performed using the Potion pipeline (v1.1.2) (Codeml with site-models) (Yang 2007; Hongo, et al. 2015). Only singleton families were examined. Identified families with evidence for positive selection (q=0.05) were examined for Gene Ontology, MapMan and KEGG pathway enrichment via the Algal Functional Annotation Tool (http://pathways.mcdb.ucla.edu/algal/index.html) (Lopez, et al. 2011).
Genomic elements with evidence of conservation or acceleration. Whole genome alignments for volvocine genomes were performed using LAST (Kielbasa, et al. 2011) and TBA-Multiz (Blanchette, et al. 2004) and multiple sequence alignments (MSA) were referenced to the C. reinhardtii genome thereby removing alignments that did not include C. reinhardtii sequences. The PHAST suite of software (v1.1.) was used to analyse elements within the MSA for evidence of conservation and acceleration. After the PhastCons (Pollard, et al. 2010) parameters Gamma and Omega were optimised (supplementary table 22) PhastCons was used to identify discreet highly conserved regions while gene (whole genes including untranslated regions), coding sequences, intergenic regions and untranslated gene regions (3’ and 5’ ends of genes) were examined in separate analyses for evidence of acceleration in the multicellular taxa using phyloP with Benjamini-Hochberg false discovery rate correction applied (Hochberg and Benjamini 1990). Enrichment analyses were performed as per families with evidence of positive selection (Lopez, et al. 2011). Potion was used to examine accelerating genes (identified by either gene-level analysis or CDS-level analysis) for evidence of selection, this time with families containing paralogues included.
Specific Gene Families. Tetrabaena socialis orthologues of cell cycle genes, matrix metalloproteases, pherophorins and regA-related genes were identified. D-type cyclins and the MAT3/RB gene were sequenced using capillary sequencing with primers designed from the genome sequence (supplementary table and supplementary table 23). Furthermore, fasciclin-domain containing proteins from all volvocines were identified. Protein phylogenies of these families were generated that included T. socialis proteins and where possible curated sets of proteins from C. reinhardtii, G. pectorale and V. carteri from Hanschen et al (2016) were used. Orthologues of the “Prochnik Table” genes (Prochnik, et al. 2010; Hanschen, et al. 2016) were identified in T. socialis (supplementary table 24).
References: Adair WS, Steinmetz SA, Mattson DM, Goodenough UW, Heuser JE. 1987. Nucleated assembly of Chlamydomonas and Volvox cell walls. J Cell Biol 105:2373-2382. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. Arakaki Y, Kawai-Toyooka H, Hamamura Y, Higashiyama T, Noga A, Hirono M, Olson BJ, Nozaki H. 2013. The simplest integrated multicellular organism unveiled. PLoS One 8:e81641. Bai C, Sen P, Hofmann K, Ma L, Goebl M, Harper JW, Elledge SJ. 1996. SKP1 connects cell cycle regulators to the ubiquitin proteolysis machinery through a novel motif, the F-box. Cell 86:263-274. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455- 477. Bar-Nun S, Glickman MH. 2012. Proteasomal AAA-ATPases: structure and function. Biochim Biophys Acta 1823:67-82. Bisova K, Krylov DM, Umen JG. 2005. Genome-wide annotation and expression profiling of cell cycle regulatory genes in Chlamydomonas reinhardtii. Plant Physiol 137:475-491. Blaby IK, Blaby-Haas CE, Tourasse N, Hom EF, Lopez D, Aksoy M, Grossman A, Umen J, Dutcher S, Porter M, et al. 2014. The Chlamydomonas genome project: a decade on. Trends Plant Sci 19:672-680. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708-715. Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome Biol 13:R56. Borodovsky M, Lomsadze A, Ivanov N, Mills R. 2003. Eukaryotic gene prediction using GeneMark.hmm. Curr Protoc Bioinformatics Chapter 4:Unit4 6. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I. 2016. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54. Branzei D, Foiani M. 2008. Regulation of DNA repair throughout the cell cycle. Nature Reviews Molecular Cell Biology 9:297-308. Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X. 2015. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30. Cheng Q, Pappas V, Hallmann A, Miller SM. 2005. Hsp70A and GlsA interact as partner chaperones to regulate asymmetric division in Volvox. Dev Biol 286:537- 548. Choi G, Przybylska M, Straus D. 1996. Three abundant germ line-specific transcripts in Volvox carteri encode photosynthetic proteins. Curr Genet 30:347- 355. Choi J, Husain M. 2006. Calmodulin-Mediated Cell Cycle Regulation: New Mechanisms for Old Observations. Cell Cycle 5:2183-2186. Ciechanover A, Finley D, Varshavsky A. 1984. Ubiquitin dependence of selective protein degradation demonstrated in the mammalian cell cycle mutant ts85. Cell 37:57-66. Cock JM, Sterck L, Rouze P, Scornet D, Allen AE, Amoutzias G, Anthouard V, Artiguenave F, Aury JM, Badger JH, et al. 2010. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465:617-621. Coleman AW. 2012. A Comparative Analysis of the Volvocaceae (Chlorophyta)(1). J Phycol 48:491-513. Coleman AW. 1999. Phylogenetic analysis of "Volvocacae" for comparative genetic studies. Proc Natl Acad Sci U S A 96:13892-13897. Craigie RA, Cavalier-Smith T. 1982. Cell Volume and the control of the Chlamydomonas Cell Cycle. J Cell Sci 54:173-191. Cross FR, Umen JG. 2015. The Chlamydomonas cell cycle. Plant J 82:370-392. Csuros M. 2010. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26:1910-1912. de Mendoza A, Ruiz-Trillo I. 2011. The mysterious evolutionary origin for the GNE gene and the root of bilateria. Mol Biol Evol 28:2987-2991. Desvoyes B, de Mendoza A, Ruiz-Trillo I, Gutierrez C. 2014. Novel roles of plant RETINOBLASTOMA-RELATED (RBR) protein in cell proliferation and asymmetric cell division. J Exp Bot 65:2657-2666. Duncan L, Nishii I, Harryman A, Buckley S, Howard A, Friedman NR, Miller SM. 2007. The VARL gene family and the evolutionary origins of the master cell-type regulatory gene, regA, in Volvox carteri. J Mol Evol 65:1-11. Duncan L, Nishii I, Howard A, Kirk D, Miller SM. 2006. Orthologs and paralogs of regA, a master cell-type regulatory gene in Volvox carteri. Curr Genet 50:61-72. Durand PM, Sym S, Michod RE. 2016. Programmed Cell Death and Complexity in Microbial Systems. Curr Biol 26:R587-593. Falda M, Toppo S, Pescarolo A, Lavezzo E, Di Camillo B, Facchinetti A, Cilia E, Velasco R, Fontana P. 2012. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 13 Suppl 4:S14. Featherston J, Arakaki Y, Nozaki H, Durand PM, Smith DR. 2016. Inflated organelle genomes and circular-mapping mtDNA probably existed at the origin of coloniality in volvocine green algae. European Journal of Phycology:1-9. Ferris P, Olson BJ, De Hoff PL, Douglass S, Casero D, Prochnik S, Geng S, Rai R, Grimwood J, Schmutz J, et al. 2010. Evolution of an expanded sex-determining locus in Volvox. Science 328:351-354. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. 2014. Pfam: the protein families database. Nucleic Acids Res 42:D222-230. Francino MP. 2005. An adaptive radiation model for the origin of new gene functions. Nat Genet 37:573-577. Gao S, Sung WK, Nagarajan N. 2011. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 18:1681- 1691. Glockner G, Lawal HM, Felder M, Singh R, Singer G, Weijer CJ, Schaap P. 2016. The multicellularity genes of dictyostelid social amoebas. Nat Commun 7:12085. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, et al. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108:1513-1518. Godl K, Hallmann A, Rappel A, Sumper M. 1995. Pherophorins: a family of extracellular matrix glycoproteins from Volvox structurally related to the sex- inducing pheromone. Planta 196:781-787. Godl K, Hallmann A, Wenzl S, Sumper M. 1997. Differential targeting of closely related ECM glycoproteins: the pherophorin family from Volvox. EMBO J 16:25- 34. Goebl MG, Yochem J, Jentsch S, McGrath JP, Varshavsky A, Byers B. 1988. The yeast cell cycle gene CDC34 encodes a ubiquitin-conjugating enzyme. Science 241:1331-1335. Goldknopf IL, Sudhakar S, Rosenbaum F, Busch H. 1980. Timing of ubiquitin synthesis and conjugation into protein A24 during the HeLa cell cycle. Biochem Biophys Res Commun 95:1253-1260. Gorman DS, Levine RP. 1965. Cytochrome f and plastocyanin: their sequence in the photosynthetic eletron transport chain of Chlamydomonas reinhardtii. Proc Natl Acad Sci U S A 54:1665-1669. Grochau-Wright ZI, Hanschen ER, Ferris PJ, Hamaji T, Nozaki H, Olson BJSC, Michod RE. 2017. Genetic Basis for Soma is Present in Undifferentiated Volvocine Green Algae. J Evol Biol doi: 10.1111/jeb.13100. Hallmann A. 1999. Enzymes in the extracellular matrix of Volvox: an inducible, calcium-dependent phosphatase with a modular composition. J Biol Chem 274:1691-1697. Hanschen ER, Ferris PJ, Michod RE. 2014. Early evolution of the genetic basis for soma in the volvocaceae. Evolution 68:2014-2025. Hanschen ER, Marriage TN, Ferris PJ, Hamaji T, Toyoda A, Fujiyama A, Neme R, Noguchi H, Minakuchi Y, Suzuki M, et al. 2016. The Gonium pectorale genome demonstrates co-option of cell cycle regulation during the evolution of multicellularity. Nat Commun 7:11370. Haring MA, Siderius M, Jonak C, Hirt H, Walton KM, Musgrave A. 1995. Tyrosine phosphatase signalling in a lower plant: cell-cycle and oxidative stress-regulated expression of the Chlamydomonas eugametos VH-PTP13 gene. Plant J 7:981-988. Harper JF, Huson KS, Kirk DL. 1987. Use of repetitive sequences to identify DNA polymorphisms linked to regA, a developmentally important locus in Volvox. Genes Dev 1:573-584. Herron MD. 2009. Many from one: Lessons from the volvocine algae on the evolution of multicellularity. Commun Integr Biol 2:368-370. Herron MD, Hackett JD, Aylward FO, Michod RE. 2009. Triassic origin and early radiation of multicellular volvocine algae. Proc Natl Acad Sci U S A 106:3254- 3258. Herron MD, Michod RE. 2008. Evolution of complexity in the volvocine algae: transitions in individuality through Darwin's eye. Evolution 62:436-451. Hershko A. 2005. The ubiquitin system for protein degradation and some of its roles in the control of the cell-division cycle (Nobel lecture). Angew Chem Int Ed Engl 44:5932-5943. Hiraide R, Kawai-Toyooka H, Hamaji T, Matsuzaki R, Kawafune K, Abe J, Sekimoto H, Umen J, Nozaki H. 2013. The evolution of male-female sexual dimorphism predates the gender-based divergence of the mating locus gene MAT3/RB. Mol Biol Evol 30:1038-1040. Hochberg Y, Benjamini Y. 1990. More powerful procedures for multiple significance testing. Stat Med 9:811-818. Holt C, Yandell M. 2011. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491. Hongo JA, de Castro GM, Cintra LC, Zerlotini A, Lobo FP. 2015. POTION: an end- to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes. BMC Genomics 16:567. Huber O, Sumper M. 1994. Algal-CAMs: isoforms of a cell adhesion molecule in embryos of the alga Volvox with homology to Drosophila fasciclin I. EMBO J 13:4212-4222. Hubisz MJ, Pollard KS, Siepel A. 2011. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 12:41-51. Janski N, Masoud K, Batzenschlager M, Herzog E, Evrard JL, Houlne G, Bourge M, Chaboute ME, Schmit AC. 2012. The GCP3-interacting proteins GIP1 and GIP2 are required for gamma-tubulin complex protein localization, spindle integrity, and chromosomal stability. Plant Cell 24:1171-1187. Kao HT, Capasso O, Heintz N, Nevins JR. 1985. Cell cycle control of the human HSP70 gene: implications for the role of a cellular E1A-like function. Mol Cell Biol 5:628-633. Kianianmomeni A. 2015. Potential impact of gene regulatory mechanisms on the evolution of multicellularity in the volvocine algae. Commun Integr Biol 8:e1017175. Kianianmomeni A, Nematollahi G, Hallmann A. 2008. A gender-specific retinoblastoma-related protein in Volvox carteri implies a role for the retinoblastoma protein family in sexual development. Plant Cell 20:2399-2419. Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. 2011. Adaptive seeds tame genomic sequence comparison. Genome Res 21:487-493. Kirk DL. 1999. Evolution of multicellularity in the volvocine algae. Curr Opin Plant Biol 2:496-501. Kirk DL. 1997. The genetic program for germ-soma differentiation in Volvox. Annu Rev Genet 31:359-380. Kirk DL. 1994. Germ cell specification in Volvox carteri. Ciba Found Symp 182:2- 15; discussion 15-30. Kirk DL. 2001. Germ-soma differentiation in volvox. Dev Biol 238:213-223. Kirk DL. 2005. A twelve-step program for evolving multicellularity and a division of labor. Bioessays 27:299-310. Kirk DL, Harper JF. 1986. Genetic, biochemical, and molecular approaches to Volvox development and evolution. Int Rev Cytol 99:217-293. Kirk MM, Stark K, Miller SM, Muller W, Taillon BE, Gruber H, Schmitt R, Kirk DL. 1999. regA, a Volvox gene that plays a central role in germ-soma differentiation, encodes a novel regulatory protein. Development 126:639-647. Klie S, Nikoloski Z. 2012. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science. Front Genet 3:115. Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. Larson A, Kirk MM, Kirk DL. 1992. Molecular phylogeny of the volvocine flagellates. Mol Biol Evol 9:85-105. Li L, Stoeckert CJ, Jr., Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178-2189. Li Y, Liu D, Lopez-Paz C, Olson BJSC, Umen JG. 2016. A new class of cyclin dependent kinase in Chlamydomonas is required for coupling cell size to cell division. Elife 10767. Lopez D, Casero D, Cokus SJ, Merchant SS, Pellegrini M. 2011. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data. BMC Bioinformatics 12:282. Lynch M. 2008. The cellular, developmental and population-genetic determinants of mutation-rate evolution. Genetics 180:933-943. Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL. 2016. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet 17:704-714. Maliet O, Shelton DE, Michod RE. 2015. A model for the origin of group reproduction during the evolutionary transition to multicellularity. Biol Lett 11:20150157. Marcais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurences of k-mers. Bioinformatics 27:764-770. Maynard Smith J, Szathmáry E. 1995. The major transitions in evolution. Oxford ; New York: W.H. Freeman Spektrum. Meissner M, Stark K, Cresnar B, Kirk DL, Schmitt R. 1999. Volvox germline- specific genes that are putative targets of RegA repression encode chloroplast proteins. Curr Genet 36:363-370. Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, Terry A, Salamov A, Fritz-Laylin LK, Marechal-Drouard L, et al. 2007. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318:245-250. Miller SM, Kirk DL. 1999. glsA, a Volvox gene required for asymmetric division and germ cell specification, encodes a chaperone-like protein. Development 126:649-658. Miller SM, Schmitt R, Kirk DL. 1993. Jordan, an active Volvox transposable element similar to higher plant transposons. Plant Cell 5:1125-1138. Miller WT. 2012. Tyrosine kinase signaling and the emergence of multicellularity. Biochim Biophys Acta 1823:1053-1057. Muller H, Lukas J, Schneider A, Warthoe P, Bartek J, Eilers M, Strauss M. 1994. Cyclin D1 expression is regulated by the retinoblastoma protein. Proc Natl Acad Sci U S A 91:2945-2949. Nakada T, Misawa K, Nozaki H. 2008. Molecular systematics of Volvocales (Chlorophyceae, Chlorophyta) based on exhaustive 18S rRNA phylogenetic analyses. Mol Phylogenet Evol 48:281-291. Nakayama KI, Nakayama K. 2005. Regulation of the cell cycle by SCF-type ubiquitin ligases. Semin Cell Dev Biol 16:323-333. Nakayama KI, Nakayama K. 2006a. Ubiquitin ligases: cell-cycle control and cancer. Nat Rev Cancer 6:369-381. Nakayama KI, Nakayama K. 2006b. [Ubiquitin system regulating G1 and S phases of cell cycle]. Tanpakushitsu Kakusan Koso 51:1362-1369. Narasimha AM, Kaulich M, Shapiro GS, Choi YJ, Sicinski P, Dowdy SF. 2014. Cyclin D activates the Rb tumor suppressor by mono-phosphorylation. Elife 3. Nishii I, Miller SM. 2010. Volvox: simple steps to developmental complexity? Curr Opin Plant Biol 13:646-653. Nishii I, Ogihara S, Kirk DL. 2003. A kinesin, invA, plays an essential role in volvox morphogenesis. Cell 113:743-753. Noga M, Hayashi T, Tanaka J. 1997. Gene expressions of ubiquitin and hsp70 following focal ischaemia in rat brain. Neuroreport 8:1239-1241. Nozaki H. 1986. Sexual reproduction in Gonium sociale (Chlorophta, Volvocales). Phycologia 25:29-35. Nozaki H, Misawa K, Kajita T, Kato M, Nohara S, Watanabe MM. 2000. Origin and evolution of the colonial volvocales (Chlorophyceae) as inferred from multiple, chloroplast gene sequences. Mol Phylogenet Evol 17:256-268. Nozaki H, Ueki N, Isaka N, Saigo T, Yamamoto K, Matsuzaki R, Takahashi F, Wakabayashi KI, Kawachi M. 2016. A New Morphological Type of Volvox from Japanese Large Lakes and Recent Divergence of this Type and V. ferrisii in Two Different Freshwater Habitats. PLoS One 11:e0167148. Nozaki H, Yamada TK, Takahashi F, Matsuzaki R, Nakada T. 2014. New "missing link" genus of the colonial volvocine green algae gives insights into the evolution of oogamy. BMC Evol Biol 14:37. O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al. 2016. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733-745. Olson BJ, Nedelcu AM. 2016. Co-option during the evolution of multicellular and developmental complexity in the volvocine green algae. Curr Opin Genet Dev 39:107-115. Olson BJ, Oberholzer M, Li Y, Zones JM, Kohli HS, Bisova K, Fang SC, Meisenhelder J, Hunter T, Umen JG. 2010. Regulation of the Chlamydomonas cell cycle by a stable, chromatin-associated retinoblastoma tumor suppressor complex. Plant Cell 22:3331-3347. Pagano M. 1997. Cell cycle regulation by the ubiquitin pathway. FASEB J 11:1067-1075. Park SH, Bolender N, Eisele F, Kostova Z, Takeuchi J, Coffino P, Wolf DH. 2007. The cytoplasmic Hsp70 chaperone machinery subjects misfolded and endoplasmic reticulum import-incompetent proteins to degradation via the ubiquitin-proteasome system. Mol Biol Cell 18:153-165. Parra G, Bradnam K, Korf I. 2007. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061-1067. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. 2010. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20:110-121. Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, Fritz-Laylin LK, et al. 2010. Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329:223-226. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. 2005. InterProScan: protein domains identifier. Nucleic Acids Res 33:W116-120. Rashidi A, Shelton DE, Michod RE. 2015. A Darwinian approach to the origin of life cycles with group properties. Theor Popul Biol 102:76-84. Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al. 2008. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319:64- 69. Sathe S, Durand PM. 2016. Cellular aggregation in Chlamydomonas (Chlorophyceae) is chimaeric and depends on traits like cell size and motility. European Journal of Phycology 51:129-138. Schneider M, Lane L, Boutet E, Lieberherr D, Tognolli M, Bougueleret L, Bairoch A. 2009. The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. J Proteomics 72:567-573. Schultheiss KP, Suga H, Ruiz-Trillo I, Miller WT. 2012. Lack of Csk-mediated negative regulation in a unicellular SRC kinase. Biochemistry 51:8267-8277. Sebe-Pedros A, Ballare C, Parra-Acero H, Chiva C, Tena JJ, Sabido E, Gomez- Skarmeta JL, Di Croce L, Ruiz-Trillo I. 2016. The Dynamic Regulatory Genome of Capsaspora and the Origin of Animal Multicellularity. Cell 165:1224-1237. Sharpe SC, Eme L, Brown MW, Roger AJ. 2015. Timing the Origins of Multicellular Eukaryotes Through Phylogenomics and Relaxed Molecular Clock Analysis. In: Trillo IR, Nedelcu AM, editors. Evolutionary Transitions to Multicellular Life. Netherlands: Springer. p. 3-29. Shiber A, Breuer W, Brandeis M, Ravid T. 2013. Ubiquitin conjugation triggers misfolded protein sequestration into quality control foci when Hsp70 chaperone levels are limiting. Mol Biol Cell 24:2076-2087. Smith DR, Hamaji T, Olson BJ, Durand PM, Ferris P, Michod RE, Featherston J, Nozaki H, Keeling PJ. 2013. Organelle genome complexity scales positively with organism size in volvocine green algae. Mol Biol Evol 30:793-797. Smith DR, Lee RW. 2009. The mitochondrial and plastid genomes of Volvox carteri: bloated molecules rich in repetitive DNA. BMC Genomics 10:132. Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. 2016. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134-1144. Stajich JE, Wilke SK, Ahren D, Au CH, Birren BW, Borodovsky M, Burns C, Canback B, Casselton LA, Cheng CK, et al. 2010. Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus). Proc Natl Acad Sci U S A 107:11889-11894. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688- 2690. Stanke M, Steinkamp R, Waack S, Morgenstern B. 2004. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res 32:W309-312. Starr RC. 1968. Cellular differentiation in Volvox. Proc Natl Acad Sci U S A 59:1082-1088. Strzalka W, Ziemienowicz A. 2011. Proliferating cell nuclear antigen (PCNA): a key factor in DNA replication and cell cycle regulation. Ann Bot 107:1127-1140. Sumper M, Hallmann A. 1998. Biochemistry of the extracellular matrix of Volvox. Int Rev Cytol 180:51-85. Teixeira LK, Reed SI. 2013. Ubiquitin ligases and cell cycle control. Annu Rev Biochem 82:387-414. Tu Y, Chen C, Pan J, Xu J, Zhou ZG, Wang CY. 2012. The Ubiquitin Proteasome Pathway (UPP) in the regulation of cell cycle control and DNA damage repair and its implication in tumorigenesis. Int J Clin Exp Pathol 5:726-738. Umen JG, Goodenough UW. 2001. Control of cell division by a retinoblastoma protein homolog in Chlamydomonas. Genes Dev 15:1652-1661. Vallentine P, Hung C, Xie J, van Hoewyk D. 2014. The ubiquitin-proteasome pathway protects Chlamydomonas reinhardtii against selenite toxicity, but is impaired as reactive oxygen species accumulate. AoB PLANTS 6. Vlachostergios PJ, Voutsadakis IA, Papandreou CN. 2012. The ubiquitin- proteasome system in glioma cell cycle control. Cell Div 7:18. von Kampen J, Nielander U, Wettern M. 1995. Expression of ubiquitin genes in Chlamydomonas reinhardtii: involvement in stress response and cell cycle. Planta 197:528-534. von Kampen J, Wettern M. 1991. Ubiquitin-encoding mRNA and mRNA recognized by genes encoding ubiquitin-conjugating enzymes are differentially expressed in division-synchronized cultures of Chlamydomonas reinhardtii. Eur J Cell Biochem 55:312-317. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. 2017. GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963. Wences AH, Schatz MC. 2015. Metassembler: merging and optimizing de novo genome assemblies. Genome Biol 16:207. Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586-1591. Zhang Z, Wood WI. 2003. A profile hidden Markov model for signal peptides generated by HMMER. Bioinformatics 19:307-308.
Tables Table 1: Summary Statistics of Volvocine Genomes Used in this Analysis Chlamydomonas Tetrabaena Gonium Volvox carteri reinhardtii socialis pectorale V1* Assembly 111.1MB 138MB 148.8MB 137.8MB Size Contig N50 90,6KB 7,4KB 16,2KB 43 981 Scaffold N50 6 617KB 146KB 1276KB 1 491KB Number of 88 5858 2373 1 265 Scaffolds GC Content 64.1 66.0 64.5 56.0 Number of 17 737 15 513 17 984 14 520 Coding Genes Average 279.17 635 349.83 496.67 Intron Length Genes with 10 889** 9 803 10 029 8 456 Pfam-A Domains *For this analysis Volvox carteri version 1 data were used (see supplementary text) ** Including alternative splice variants
Figures
Fig. 1. Traits relevant to the current investigation and phylogenetic relationships of genome-sequenced volvocines.
Fig. 2. Wagner’s symmetrical parsimony reconstruction of gene family history for whole genome sequenced chlorophytes. A) Gene families gained and lost. B) Gene familes expanded and contracted. Phylogeny based on 927 single orthologue protein families (100% bootstrap support for all branches).
100/100 CycA1 Volvox carteri 99/99 CYCA1 Chlamydomonas reinhardtii CycB1 Volvox carteri 99/89 100/100 CYCB1 Chlamydomonas reinhardtii CycAB1 Volvox carteri 100/100 CYCAB1 Chlamydomonas reinhardtii CYCD4 Chlamydomonas reinhardtii 96/97 CycD3 Volvox carteri 93/57 CYCD3 Gonium pectorale 100/99 99/89 CYCD3 Tetrabaena socialis CYCD3 Chlamydomonas reinhardtii 67/- CycD4 Volvox carteri CYCD4 Tetrabaena socialis CycD2 Volvox carteri 100/99 CYCD2 Gonium pectorale 94/- CYCD2 Tetrabaena socialis 94/68 CYCD2 Chlamydomonas reinhardtii CYCD1.2 Gonium pectorale CYCD1.1 Gonium pectorale 70/- CYCD1.4 Gonium pectorale 96/79 CYCD1.3 Gonium pectorale 98/74 CYCD5 Gonium pectorale 100/100 CYCD1.1 Tetrabaena socialis CYCD1.2 Tetrabaena socialis CYCD1.3 Tetrabaena socialis ML/NJ CYCD1 Chlamydomonas reinhardtii Bootstrap values CycD1.1 Volvox carteri 70/50 CycD1.2 Volvox carteri 0.5 74/96 CycD1.3 Volvox carteri 100/- CycD1.4 Volvox carteri
Fig. 3. Maximum likelihood and Neighbour-joining phylogeny of cyclins from Chlamydomonas reinhardtii (green), Tetrabaena socialis (magenta), Gonium pectorale (blue) and Volvox carteri (black). The phylogeny shows the expansion of CYCD1 from a single copy in Chlamydomonas reinhardtii to three copies in Tetrabaena socialis and further expansion to 4 copies in Gonium pectorale and Volvox carteri.
Fig. 4. Putative phosphorylation targets in volvocine retinoblastoma protein sequences. Putative phosphorylation targets were manually identified and identified using the PlantPhos server.
Fig. 5. The core aspects of ubiquitin proteasomal pathway (UPP) and components highlighted in the current study. E1 activators, E2 conjugators, E3 ligases and the chaperone HSP70, a 26S proteasomal regulatory subunit, 3 AAA-ATPase putative components of the proteasome and 2 transducins similar to Cullin-RING E3 ligase complex transducins showed evidence of positive/accelerating selection. Furthermore, a family of RING/U-box E3 type ligases, a family of RPM1 interacting proteins, and an FTSH AAA-type protease were gained at the origin of multicellularity. Together these data point towards modification to multiple aspects of the UPP in the multicellular volvocines relative Chlamydomonas reinhardtii.
Fig. 6. Phylogeny of fasciclin-domain containing volvocine proteins. Included in the phylogeny are proteins from Chlamydomonas reinhardtii (green), Tetrabaena socialis (magenta), Gonium pectorale (blue) and Volvox carteri (black). Highlighted in yellow is a branch identified by Count to have been gained at the origin of multicellularity. Highlighted in purple are Volvox algal-CAM proteins.
Supplementary Text
Table of Contents
Tetrabeana socialis
Methods
Culturing and nucleic acid extraction
Library preparation and sequencing
Genome Assembly
Allpaths-LG
SPAdes
Gapfilling, assembly merger and error correction
Assembly description and quality
Transcriptome sequencing and assembly
Volvox carteri annotations
Genome annotation
Functional annotation
Phylogenomics
Domain Analysis
Gene family expansions and contractions
Positive selection
Genome level evolution
Cell cycle genes
Cyclins
Cyclin-dependent kinases
Retinoblastoma homologue
Extracellular matrix genes and the cell wall Pherophorins
Matrix metalloprotease proteins
Cell wall proteins
RegA-related genes
Fasciclin-domain containing proteins
Prochnik table genes
Tetrabaena socialis
Tetrabaena socialis (Figure S1) is the “type-species” for the
Tetrabaenaceae family of volvocines. Other taxa from this family include the genus Basichlamys. The Tetrabaenaceae are the most deep-branching of multicellular volvocines and ancestors of this family gave rise to the
Goniaceae and the Volvocaceae. Tetrabaena socialis was initially classified as Gonium sociale (Nozaki 1986) and until recently an in-depth morphological description of the 4-celled T. socialis was lacking. Phylogenetic methods identified the Tetrabaenaceae as a separate family from the Goniaceae and the Volvocaceae (e.g. (Nozaki, et al. 2000)). Recently, Arakaki et al. (2013) provided an ultrastructural morphological description of T. socialis, which included an assessment of morphological traits associated with Kirk’s 12 steps (Kirk 2005) and identified at least 4 of Kirk’s steps present in the basal
T. socialis (Arakaki, et al. 2013). These are: “incomplete cytokinesis”, “rotation of basal bodies”, “transformation of cell walls into an ECM” and “genetic modulation of cell number”. Cytoplasmic bridges connecting T. socialis cells were confirmed using transition electron microscopy (TEM) and their formation within dividing T. socialis embryos were observed. Morphological and phylogenetic features place T. socialis as amongst the simplest of extant eukaryotic “integrated multicellular organisms” and key for understanding basal developments in the volvocine lineage (Arakaki, et al. 2013).
Methods
Culturing and nucleic acid extraction
An axenic culture of Tetrabaena socialis strain NIES-571 was generated through multiple rounds of isolating and transferring single cells onto sterile solid media (Pringsheim, 1946). The culturing conditions were the same as those followed by Arakaki et al. (2013) where an axenic culture of T. socialis was grown at 25°C in tris-acetate-phosphate medium (Gorman and
Levine 1965) with aeration. Cool-white fluorescent lamps were used for lighting with a 12:12 light-dark cycle. Total DNA was extracted using the standard Volvox CTAB extraction method (Miller, et al. 1993). RNA was isolated with Thermo Fisher (Waltham, MA, USA) TRI reagent from cultures that were two hours into the dark phase of the life cycle, when both dividing and non-dividing cells were observed.
Library preparation and sequencing
The Illumina TruSeq DNA sample preparation kit (v2 Illumina, San
Diego, CA, USA) was used to prepare paired-end (PE) libraries from total
DNA for Illumina sequencing. Two libraries with PE insert sizes of ~180bp and
~450bp were prepared. A ~6kb insert mate-pair (MP) library was prepared using the Illumina Nextera Mate-Pair library preparation kit and with automated pulse-field size-selection performed using the Sage Bioscience
Blue Pippen (Beverly, MA, USA, 0.75% agarose cassette). Preparation of total RNA for RNAseq was performed using the TruSeq Stranded mRNA library preparation kit (Illumina). Sequencing was variously performed on the
Illumina HiScanSQ (2x100bp), Illumina HiSeq 2500 (2x100bp and 2x25bp) or
Illumina MiSeq (2x300bp). The ~180bp and ~6kb MP insert libraries were sequenced using 2x100bp chemistry, RNAseq was performed using 2x125bp chemistry while the ~450bp library was sequenced using both 2x100bp and
2x300bp chemistries.
Genome assembly
Allpaths-LG
For Allpaths-LG assembly all data were trimmed only for adaptor presence with Trimmomatic (0.32) (Bolger, et al. 2014), as recommended by the Allpaths-LG developer (Gnerre, et al. 2011). In order to utilise MiSeq data for Allpaths-LG assembly a custom python script was used to extract overlapping reads that produced a merged fragment size of between 220-
270bp from MiSeq 2x300bp data. Thus, 2 libraries were used for Allpaths-LG
“fragment” data: a ~180bp PE insert library and MiSeq data that were overlapped using Flash (Magoc and Salzberg 2011), further size selected in silico (for a 220-270bp fragment size) insert size and were finally “hard trimmed” to 2x150bp. Allpaths-LG jump libraries were of three types: ~6kb insert MP data, 2x100bp PE data (~450bp insert size) and non-overlapping
MiSeq data (with an insert size greater than ~550bp). Allpaths-LG assembly was performed without modification with a ploidy of 1 defined as a parameter. For assembly Allpaths-LG calculated the read-separation distance of the 6kb library as 7,1kb, which may have resulted in inflated gap estimation.
SPAdes assembly
A contig-level SPAdes assembly (v3.1.1) (Bankevich, et al. 2012) was performed using only MiSeq 2x300bp data that were trimmed to remove adaptors and low quality sequence (quality filtering below Q20 and PCR- duplicate removal of 35bp) with Fastq-mcf
(https://github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqMcf.md). The assembly pipeline was performed in “careful mode’ with error correction and with the multi-kmer assembly protocol optimised for MiSeq data (k of 21, 33,
55, 77, 99 and 127). Scaffolding using the ~6kp insert mate-pair data was performed after contig level assembly using Opera (v1) (Gao, et al. 2011) with bwa (v0.7.5a) mapping.
Gapfilling, assembly merger and error correction
Prior to assembly merger gap filling was performed for both assemblies with GapFiller (v1.11) (Boetzer and Pirovano 2012) with default paramaters (- m 30 -o 2 -r 0.7 -n 10 -d 50 -t 10 –i 10). To improve assembly contiguity and accuracy metassembler (v1) (Wences and Schatz 2015) was used to merge the two assemblies with the Allpaths-LG assembly as the “master” assembly and the SPAdes assembly as the “slave” assembly. After assembly merger 3 rounds of error correction and further gap filling were performed using Pilon
(Walker, et al. 2014).
Assembly description and quality
The final assembly was 66% GC, 135.9mb in length, half of the assembly size was made up of 263 scaffolds (the L50 value) and the largest scaffold was 697kb in length. The scaffold N50 was 145.9kb and the contig
N50 was 7.5kb. The C-value of T. socialis has not been established. In order to assess the genome size three kmer based approaches were used. Initially,
Allpaths-LG automatically calculated the estimated genome size and repeat content based on a 25mer profile. Allpaths-LG estimated the genome size to be 179,MB and 43% repetitive. These values are substantially larger than the assembly and the genomes of other volvocines. Therefore additional efforts were made to estimate the genome size using Jellyfish (Marcais and
Kingsford 2011) with GenomeScope (Vurture, et al. 2017) and with BBTools
(http://jgi.doe.gov/data-and-tools/bbtools/). Jellyfish was used to count kmers of various lengths (K=19, 21, 25, 33, 45, 55 and 77). BBTools was tested at a
K=31. Jellyfish with GenomeScope kmer analysis indicated that the genome size ranged from 101MB (K=19) to 118MB (K=77) while repeat length ranged from 20.7MB to 28.0MB (Supplementary table 1 and Figure S2). BBTools estimated the genome size to be 114,9MB with a 10% repeat content. These data are perhaps more in line with expectation than values derived by
Allpaths-LG and it is most likely that Allpaths-LG overestimated the genome size and may have over-predicted the distance between gaps. Improvements in the C. reinhardtii and V. carteri assemblies have resulted in the total genome size being reduced once a higher percentage of repeats have been filled or more greater scaffolding resolution has been performed. The same is likely to occur for T. socialis. On the contrary, initial C-value estimates for V. carteri were 180MB (Kirk and Harper 1986) while the v2 genome assembly is only 130MB and therefore another possibility is that some volvocine genomes are replete with large stretches of highly-repetitive heterochromatic regions that are not amenable to typical sequencing approaches. The repeat content of the genome as well as the high GC content were found to affect assembly contiguity and the assembly contains 27,8% of ambiguous sequence content
(denoted by N’s), which is greater than that of G. pectorale (20,9%).
Transcriptome sequencing and assembly
Approximately 75million paired-reads (2x125bp) were used to produce a de novo transcriptome for gene modelling. Quality and adaptor-content trimming was performed using Trimmomatic. Quality trimming was performed to exclude terminal sequence data that fell below a minimum quality of 5 and reads shorter than 36bp were removed. Bridger de novo (Chang, et al. 2015) was used for transcriptome assembly using a stranded library type and a kmer of 25. A total of 27 172 transcripts were produced with a N50 of 670bp.
TransRate (v1.0) (Smith-Unna, et al. 2016) was used to evaluate the quality of assembly as well as to filter transcripts with evidence of errors (4 153 transcripts were removed). The TransRate optimal assembly score was
0.362. Transcripts were aligned to the T. socialis genome and 92% (n=21
823) of transcripts aligned. All transcripts aligned with greater than 82% percentage identity and 90% mapped with > 95% percentage identity.
Volvox carteri annotations For all analyses, except for where proteins have been manually annotated or curated, v1 V. carteri proteins were used. This was due to previously reported inconsistencies regarding the domain content of the v2 predicted proteome (Hanschen, et al. 2016). For example, from version 2 proteins only 7795 Pfam-A domains could be identified whereas 13 160 were identified in version 1 (Hanschen, et al. 2016). A newer version of V. carteri annotations is now available (v2.1) at JGI-DOE Phytozome
(http://www.phytozome.net/), however, we considered it prudent to continue with v1 annotations for the purposes of comparing these data with previous publications (Hanschen, et al. 2016).
Genome annotation
Automated gene model annotations for T. socialis were generated using the MAKER2 pipeline (v2.31.3) (Holt and Yandell 2011). Evidence for modelling included the de novo transcriptome assembly, proteome data from
C. reinhardtii (v5.5) (Merchant, et al. 2007; Blaby, et al. 2014), G. pectorale
(Hanschen, et al. 2016) and V. carteri (v1) (Prochnik, et al. 2010) and the curated collection of UNIPROT-SWISSPROT (Boutet, et al. 2016) proteins
(accessed 25 May 2015). The T. socialis genome was masked for simple repeats and C. reinhardtii RepBase repeats selected. All repeats were soft masked, which allowed annotations to extend into repetitive regions but masked regions could not seed annotations. Soft masking was employed to mitigate excessive masking of genic region caused by GC bias, which has been reported in C. reinhardtii (Blaby, et al. 2014). Within the MAKER2 pipeline three gene finders were used: SNAP (v2013-11-29) (Korf 2004), GeneMark (v4.10) (Besemer and Borodovsky 2005) and Augustus (v3.02)
(Stanke, et al. 2006). For Augustus (v3.02) and GeneMark refined C. reinhardtii models were utilised without further training, whereas a T. socialis specific model was generated for SNAP through two rounds of training. The
MAKER pipeline for “gene rescue” was followed (Campbell, et al. 2014) such that models derived exclusively by ab initio predictions (gene models with annotation edit distance (AED) score of 1) were retained if they possessed a
Pfam-A domain (Finn, et al. 2014). Pfam-A domains were identified using
InterProScan 5 (Jones, et al. 2014). Single-exon gene models were retained if
>250bp and a maximum intron size of 10kb was selected. A total of 33 030 unfiltered (for AED or domains) gene models, were generated by MAKER2, of which, 10 734 (32,5%) had AED scores <1. A total of 15 513 gene models were retained with an AED less<1 or with a domain. A total of 93.8% of KOG domains (n=458) were identified within the MAKER2 derived T. socialis proteins (HMMER E-value threshold <1x10-5), which is less than but still comparable to the KOG content of C. reinhardtii (v5.5- 98.7%) and V. carteri
(v1- 96.3%). Complete and manually curated C. reinhardtii KOG orthologues
(n=406) were obtained from the CEGMA website
(http://korflab.ucdavis.edu/datasets/cegma/). Tetrabaena socialis proteins
(from MAKER2 annotations) were aligned to C. reinhardtii KOG proteins using
BLASTP (Altschul, et al. 1997) and subsequently sorted to retain only the highest scoring matching T. socialis homologues. A total of 382 proteins
(94%) were identified with significant homology (E=1x10-10) to C. reinhardtii
KOG proteins, of which, a total of 367 unique T. socialis proteins (90%) were identified. While 393 (96.7%) C. reinhardtii KOG proteins aligned to the T. socialis genome (E-value=1x10-10).
Functional annotation
Putative protein functions were assigned by using BLASTP to query T. socialis proteins against the UNIPROT-TREMBL database with an (E=10-10).
Pfam-A domains and GO terms were assigned using InterProScan (Jones, et al. 2014) while HMMER3 (Eddy 2011) was used for later analysis of domain content. InterProScan identified 8072 GO terms, 9803 proteins were found to contain PFAM-A domains and 14 041 total domains were identified of which
2998 were unique. BLASTP queries to the C. reinhardtii, G. pectorale and V. carteri proteomes were used for further analysis of specific genes and to inspect the coding content of T. socialis in relation to other volvocines. The outcome of homology searches against other volvocine proteomes as well as against the UNIPROT databases TREMBL and SWISSPROT are summarised in Supplementary Table 3. OrthoMCL (v2.09) (Li, et al. 2003) was used to generate family clusters in 2 separate analyses: firstly, including only volvocine proteomes and, secondly, including the proteomes of available genome-sequenced chlorophytes (n=13) from sources summarised in
Supplementary Table 2. For both analyses a markov clustering (MCL) inflation value of 1.5 and an E-value of 1x10-5 were used as OrthoMCL parameters. A
Venn diagram of volvocine-specific family clusters was generated and visualized using OrthoVENN (Wang, et al. 2015) (Figure S3). The Venn diagram showed fewer orthologs for T. socialis. However, reciprocal blast orthologues identified by OrthoMCL to the reference-quality C. reinhardtii proteome showed that the number of T. socialis orthologs was similar to that of G. pectorale and V. carteri (Supplementary Table 4). Results from functional annotations do suggest that some gene content is missing from the
T. socialis genome and/or annotations, although automated gene modelling most likely captured >90% of gene content.
Phylogenomics
A phylogenomic “supermatrix” protein phylogeny (Delsuc, et al. 2005) approach was followed to generate a phylogeny of whole-genome sequenced chlorophytes for down-stream analyses. A total of 927 “singleton” (clusters that contained 1 protein per species) were identified from OrthoMCL clusters of 13 genome-sequenced chlorophytes (see Functional Annotation, fig. 2).
Proteins from each family were separately aligned using MUSCLE (3.8.31)
(Edgar 2004), alignments were trimmed with trimal (1.3) (Capella-Gutierrez, et al. 2009) and concatenated using fastconcat (Kuck and Meusemann 2010).
After testing for the most appropriate phylogenetic model of amino acid substitution with ProtTest (Abascal, et al. 2005), RAxML (8.2.0) (Stamatakis
2014) was used to generate a phylogeny with 100 bootstraps and the model parameters of “gamma”, “invariable sites”, “WAG” and “empirical frequency estimates”. Bootstrap support of 100% was obtained for all nodes. The phylogeny is in agreement with previous phylogenies, which have variously been produced from whole proteome (Hanschen et al, 2016) or from phylogenetically informative genes (e.g. ITS2 (Buchheim, et al. 2011)).
Domain analysis
The development of novel domain architecture has been identified as a feature of the evolution of plant and animal multicellularity (Putnam, et al.
2007; Srivastava, et al. 2008; Srivastava, et al. 2010). Previous analyses that have compared the proteomes of V. carteri and G. pectorale with that of C. reinhardtii have not identified the same degree of domain novelty associated with the evolution of multicellularity in animals and plants (Prochnik, et al.
2010; Hanschen, et al. 2016). Here, domains identified in the proteomes of T. socialis, G. pectorale and V. carteri were compared to those of C. reinhardtii.
Other unicellular chlorophytes were not directly compared with those of the multicellular taxa for enrichment analysis because we were interested in exploring lineage specific differences. Pfam-A domains (Pfam-A 27.0) were identified in volvocine proteomes using HMMER (v3.1, E-value=1x10-5) (Eddy
2011). Domain content between the proteomes of the multicellular volvocines with that of C. reinhardtii proteome was tested for enrichment using a chi- squared distribution test (p=0.05). A total of 61 domains were significantly different at an alpha of 0.05, of which only 18 were enriched in the multicellular volvocines. At an alpha of 0.0001 only 16 domains were significantly different and the inclusion of the T. socialis proteome here is in agreement with previous findings, which found that increased domain complexity does not correlate with the evolution of multicellularity. A further 30 domains were identified as absent in C. reinhardtii but present in T. socialis,
G. pectorale and V. carteri (multicellular domains). The phylogenetic origin of enriched domains as well as multicellular domains was explored. These
“domains of interest” were combined into a list of 91 domains (Supplementary Table 5). As before, HMMER was used to identify Pfam-A domains in a broader collection of eukaryotic taxa (n=26) (Supplementary Table 2 and
Figure S4), including: all whole-genome sequenced chlorophytes, transcriptome derived proteomes of taxa from the Chlamydomonadales order, model animals, model plants and yeast (Supplementary Table 2 and Figure
S4). In order to assess if domains of interest were lineage specific or were common to most eukaryotes, the domains were categorised for shared presence with other major eukaryotic phyla. A total of 93 domains were identified that are unique to the volvocine lineage (C. reinhardtii, T. socialis, G. pectorale and V. carteri). Only 1 domain of interest (PF08707) was found amongst the 93-volvocine specific domains, showing that the domains of interest were not enriched in volvocine specific domains. Subsequently, domains identified in all taxa were classified in a three-way Venn diagram that compared domain content between (i) chlorophyta, (ii) plants and (iii) opisthokonta (here - metazoa and yeast) (Figure S5). Finally, domains of interest were classified according to/or compared with the results of the three- way Venn diagram, which revealed that the majority of significant domains
(n=59/91) are common to all examined phyla. Furthermore, besides 1 domain, all domains absent from C. reinhardtii but present in T. socialis, G. pectorale and V. carteri are present in other chloropytes, plants and opisthokonta, which suggests that the absence of these domains in C. reinhardtii may be due to the loss, either real or artefactual.
The 5 domains reported by Prochnik et al (2010) as differentially present between C. reinhardtii and V. carteri were examined in T. socialis.
Counts for C. reinhardtii, G. pectorale and V. carteri were obtained from Hanschen et al (2016) (Supplementary Table 6). The domain content of T. socialis was relatively similar to those of C. reinhardtii and G. pectorale reported by Hanschen et al (2016). The outlier in terms of organisms for these
5 domains remains V. carteri, which shows reduced content of histones
(PF00125) and the ankyrin repeat (PF00023) as well as expansion of the gametolysin domain (PF05548), which is found in extracellular matrix proteins. Relative to C. reinhardtii leucine-rich repeats are reduced in G. pectorale and V. carteri but not in T. socialis. Other ankyrin domains were examined and all showed reduction in V. carteri (Supplementary Table 7).
Gene family expansions and contractions
The expansion and contraction of gene families as well as the phylogenetic origin of new gene families was identified using the Count software package (v10.04) (Csuros 2010). Proteome data from 13 chlorophyte taxa were obtained from sources listed in Supplementary Table 2.
Numerical profiles of OrthoMCL (v2.09) clusters (n=24 949) generated for phylogenetic analysis were used together with the phylogeny generated from
927 singleton families as input for Count. Bias in the rate of gene loss or gain was assessed for branches and significant variations were not identified.
Dollo’s and Wagner’s parsimony (symmetric and asymmetric) were evaluated as alternative parsimony implementations (Supplementary Table 8).
Asymmetric parsimony analysis was performed with a gain cost of 2.
Symmetric Wagner’s parsimony yielded the most conservative estimate of gains tracing to the origin the multicellular volvocines and because this node was of particular interest a conservative collection of gene family gains was selected for further analysis. At this node a balance between gains and losses was obtained with symmetric Wagner’s parsimony while asymmetrica
Wagner’s parsimony and Dollo’s parsimony favoured gains. Technical differences between alternative parsimony methods as well as technical differences in clustering of gene families may affect the outcome of reconstruction efforts (Demuth and Hahn 2009). Here, we chose to focus on the most conservative results for genes gained at the origin of multicellularity
(fig. 2). Families gained in the branch leading to the colonial/multicellular volvocines (ORIMC families) were annotated using a combination of C. reinhardtii homology, V. carteri annotations, Argot2
(http://www.medcomp.medicina.unipd.it/Argot2/) (Falda, et al. 2012) and
InterProScan (v5) (Jones, et al. 2014). Where possible a consensus annotation was assigned. These are listed in Supplementary Table 9 and described in the main text.
Families gained at the Gonium/Volvox branch were examined and filtered to include only families present in both G. pectorale and V. carteri.
Amongst 1248 families gained at the Gonium/Volvox branch 274 families were gained in both G. pectorale and V. carteri the remaining families were species specific. These 274 families include protein kinases, ankyrin-repeat containing proteins, AAA-domain containing proteins (with domains: AAA,
AAA_2, AAA_5, AAA_11, AAA_15, AAA_16, AAA_17, AAA_18, AAA_19,
AAA21, AAA_22, AAA_23, AAA_29, AAA_30, AAA_33), EF-hand domain containing proteins, a family of dynein heavy chain related proteins, regulator of chromatin condensation domain proteins as well as proteins with possible
DNA interacting domains such as Myb DNA-binding, zinc finger (zf)-MYND, zf-C3HC4 and zf-MYND. Families of particular interest include a family of proteins with anaphase-promoting complex subunit 3 domains (the V. carteri protein is annotated as a kinesin), a family of histone H3-11 proteins and the
Kif4A-type kinesin protein. Furthermore, two families of Volvox metalloproteases and a family of pherophorin proteins were gained at this node.
Families expanded at the origin of multicellularity were identified from
Wagner’s asymmetrical parsimony outputs (fig. 2, Supplementary Table 10).
For expanded families a further filter was applied to ensure that only families with at least one protein member present in C. reinhardtii, V. carteri, G. pectorale and T. socialis were considered and furthermore expanded families were only considered if expanded in all colonial/multicellular taxa relative to C. reinhardtii. In addition, families expanded in G. pectorale and V. carteri relative to C. reinhardtii and T. socialis were examined. The annotation of expanded families was based on C. reinhardtii annotations. Only 27 families demonstrated expansion relative to C. reinhardtii while a further 53 families were expanded in G. pectorale and V. carteri relative to T. socialis and C. reinhardtii. Families contracted at the ORIMC numbered 8.
Positive selection
Family clusters generated with OrthoMCL were tested for evidence of positive selection using the POTION pipeline (Hongo, et al. 2015). OrthoMCL was repeated with the same methodology as described under the heading
“Phylogenetics”, except that only the proteomes of C. reinhardtii, T. socialis,
G. pectorale and V. carteri were included (12 614 families). OrthoMCL family clusters were filtered within the Potion package on the basis of a number of stipulated criteria. Orthologs groups/families that contained paralogues were removed and thus analysis was restricted to groups that consisted of only 1 protein per taxa (singleton families). Although a more permissive approach that included paralogues would have identified more families with evidence of positive selection the retention of only singleton families was performed in order to reduce errors associated with assigning orthologues from amongst families containing paralogues. Groups with a protein/s without valid stop codons, groups that contained proteins indivisible by 3 and groups that contained non-ACGT nucleotides (in CDS sequences) were removed. A minimum group identity of 70% was applied.
PhiPack (Bruen, et al. 2006) was used to identify families with significant evidence (q=0.05) of gene recombination and these families were removed.
After filtering 3 610 families were retained. Multiple sequence alignments of proteins were generated with MUSCLE (v3.8.31) for each ortholog family, which were used to inform codon alignments of CDS sequences. Protein- informed CDS alignments were generated, trimmed (using trimal) (Capella-
Gutierrez, et al. 2009) and used to create a phylogeny (using Phylip (Retief
2000)) for each orthologue group. Chlamydomonas reinhardtii was selected as the “reference” organism, which produced rooted trees for each family.
Potion utilized the PAML (Yang 2007) package with statistical testing and false-discovery rate correction with codeml site model evaluations of positive selection (Supplementary Table 11). Enrichment analyses of Gene ontology
(GO), KEGG pathways and MapMan for families undergoing positive selection was performed via the Algal Functional Annotation portal (http://pathways.mcdb.ucla.edu/algal/) using C. reinhardtii annotations (Lopez, et al. 2011). GO enrichment analysis identified 82 enriched GO terms
(Supplementary Table 12).
Genome-level evolution
In order to extend the analysis of selection beyond orthologue protein families signatures of selection were explored from whole-genome alignments using the PHAST software suite (Hubisz, et al. 2011). PHAST was used to identify genomic regions that were highly conserved as well those under accelerating selection from both coding and non-coding regions of the genome. Pair-wise alignments for the genome sequences of C. reinhardtii
(v5), T. socialis, G. pectorale and V. carteri (v1) were performed using LAST
(Kielbasa, et al. 2011). Simple repeat regions were masked (cR01 command option). Using the single coverage functionality of threaded-blockset aligner
(TBA) (Blanchette, et al. 2004) duplicate alignments were filtered such that a genomic region from one organism did not align to multiple regions in the other organism. Threaded-blockset aligner with multiz was used to create a multiple sequence alignment (MSA) in maf-format from pair-wise alignments
(Blanchette, et al. 2004). The MSA was subsequently mapped to C. reinhardtii
- thereby only retaining alignments that included a C. reinhardtii sequence.
To provide a guide tree for PhastCONS a simple phylogeny of C. reinhardtii, T. socialis, G. pectorale and V. carteri was generated with PhyML
(Guindon, et al. 2009) and 7 mitochondrial genes (cob, cox1, nad1, nad2, nad4, nad5 and nad6). Chlamydomonas reinhardtii annotations (v5.5.) in GFF format were used to define coding and non-coding regions of genome alignments. PhastConS (Siepel and Haussler 2004; Hubisz, et al. 2011) was used to generate a neutral model of substitution rate for non-coding regions while a model of substitution rates from conserved regions was generated from first codon positions using PhyloFit (part of the PHAST package). Using both models PhastCons was used to identify “highly-conserved” regions. The
PhastCONS parameters of “expected length of conserved regions” (gamma) and “target coverage” (omega) were first optimised to minimize the PIT value
(PIT- Phylogenetic Information Threshold, entropy of the phylogenetic model multiplied by the expected minimum number of conserved sites required to predict a conserved element), while increasing the coverage of conserved
CDS sequences. The consEntropy program of PHAST was used to calculate
PIT values at varying levels of omega and gamma. High-conserved sequences for each permutation of omega and gamma were intersected with
CDS annotations from C. reinhardtii chromsome 1 using bedtools (v2-2.20.1)
(Quinlan and Hall 2010; Quinlan 2014) intersect and the percentage of conserved CDS bases was calculated. The most successful parameters were omega of 12 and gamma of 25 (PIT of 11 and CDS coverage
~2.5%)(Supplementary Table 22). Conserved regions were subsequently categorised into annotation features using bedtools (Supplementary Table
13). Highly conserved intergenic regions were examined for proximity to coding regions using bedtools “nearest” (Supplementary Table 14).
Conserved non-coding elements nearby to the aforementioned genes may have been conserved for >200 MYA, which suggests a regulatory role for these elements. In order to identify accelerating selection the phyloP tool of the PHAST package was used (Hubisz, et al. 2011). Annotation features from C. reinhardtii v5.5 were divided per chromosome (PHAST algorithms examined one chromosome/scaffold at a time). Separate C. reinhardtii annotation features were tested for conservation and acceleration using phyloP in
CONservationACCeleration mode (CONACC). The features “genes”, “CDS”,
“3’UTR” and “5’UTR” were used in separate analyses. The log ratio test (LRT) in CONACC mode was used to identify features under accelerating selection within the multicellular lineage (the sub-tree or branch that included T. socialis, G. pectorale and V. carteri). Benjamini-Hochberg False-Discovery
Rate (FDR) correction was used to account for multiple testing (at an alpha of
0.05) (Hochberg and Benjamini 1990). Furthermore, regions accelerating relative to C. reinhardtii were tested for overall evidence of conservation by repeating the phyloP analysis without the “sub-tree” test. Regions accelerating relative to C. reinhardtii and that did not demonstrate evidence of conservation in the “whole tree” analysis were identified and may be considered to be more rapidly evolving (faster than the neutral rate of evolution) than features that are only accelerating relative to C. reinhardtii.
Results with and without this filtering step were inspected. A total of 129
(Supplementary Table 15) whole genes were identified as experiencing accelerating selection relative to C. reinhardtii (FDR corrected q-value <0.05), however, 105 were evolving more slowly than the neutral rate of evolution
(leaving 24 evolving faster than the neutral rate). Annotations were derived from C. reinhardtii v5.5 annotations. Whole genes and CDS regions undergoing acceleration were tested for enrichment of Gene ontology (GO) terms, KEGG pathways and MapMan annotation via the Algal Functional
Annotation website (http://pathways.mcdb.ucla.edu/algal/) using C. reinhardtii annotations (Lopez, et al. 2011)(Supplementary Table 16). A total of 42 CDS regions were identified (associated with 38 genes) as accelerating of which 16 were accelerating faster than the neutral rate (Supplementary Table 17). Of the 42 accelerating CDS, 6 CDS overlap with gene-level analysis.
Corroborating whole gene analysis and indicating that the coding regions of genes are under acceleration are the ubiquitin activating enzyme 2, ubiquitin ligase 1, Spc97/Spc 98 spindle body protein, tubulin beta chain 2, a transcription regulator and a histone superfamily protein. GO enrichment analysis identified the cellular compartment terms “microtubule organising centre”, “SWI/SNF complex” and “SWI/SNF-type complex” and the molecular function term “ubiquitin activating enzyme activity” as enriched. Enriched
MapMan terms and KEGG pathways respectively included “signalling phosphinositides” and “Cell adhesion molecules”.
The Potion pipeline was repeated to identify if accelerating whole- genes and CDS sequences showed evidence of positive selection. Altered parameters than what were used for the proteome-wide scan of positive selection were applied. OrthoMCL families that housed the accelerating gene or CDS were identified and further analysis with potion was limited to these families. Paralogues were included in this analysis and the minimum sequence identity, minimum group identity and the relative minimum and maximum sequence size were set to 10%, 20%, 0,1 and 3 respectively – thereby retaining more homologous clusters/families for evaluation than used for global analyses of positive selection in orthologous protein families. To identify acceleration in volvocine intergenic regions a number of approaches were assessed on how best to divide intergenic regions into smaller coordinates, thereby balancing statistical constraints. These approaches included: (i) dividing all C. reinhardtii intergenic regions into
100bp windows, (ii) applying phyloP to “highly-conserved elements” identified by PhastCONS and (iii) selecting flanking regions upstream and downstream of coding loci. However, the shortage of alignments stemming from intergenic regions hampered attempts to analyse intergenic regions. Ultimately, the lack of alignable regions from intergenic coordinates was used as the basis for creating coordinates for intergenic regions by simply extracting all aligned regions from the MSA that derived from C. reinhardtii intergenic regions (with bedtools). These coordinates were then directly inputted into phyloP and tested for evidence of acceleration. However, no intergenic regions were identified that were significant after Benjamini-Hochberg FDR correction was applied. The same was found for 3’UTRs and 5’UTRs.
Cell cycle genes
Cyclins
The observation of cyclin D1 expansion in G. pectorale and V. carteri relative to C. reinhardtii is important for understanding modifications to the cell cycle in the multicellular taxa because otherwise the repertoire of core cell cycle genes such as cyclins and cyclin-dependent kinases is similar between the multicellular taxa and C. reinhardtii. Tetrabaena socialis cyclins
(Supplementary Table 19, fig. 3) were explored to identify additional CYCD1 genes. OrthoMCL, BLAST, InterProScan, HMMER3 and protein phylogenies were used to identify and characterize cyclins (Bisova et al., 2005; Hanschen et al., 2016; Olson et al., 2010; Prochnik et al., 2010). Cyclin protein sequences from C. reinhardtii, G. pectorale and V. carteri were used as queries against the genome of T. socialis with TBLASTN and as queries against the T. socialis predicted proteins using BLASTP (Evalue=10-5). In addition to blast-based homology searches, HMMER3.1 was used to search for un-annotated cyclin D1 genes. A nucleotide sequence alignment of C. reinhardtii, G. pectorale and V. carteri cyclin D1 CDS sequences was generated with MAFFT (v7.045), which was used to create a cyclin D1 HMM that was subsequently scanned against the T. socialis genome with
HMMER3.1.
Duplicates of CYCD1 (3 copies) were identified from homology searches and were used to generate primer sequences (Supplementary Table
18). In order to produce complete sequences cDNA sequencing and rapid amplification of cDNA ends (RACE) with sequencing was performed on an
ABI sequencer (Thermo Fisher Scientific). Reverse transcribed (Superscript III reverse transcriptase Thermo Fisher Scientific) cDNA was used for PCR as follows: 94°C for 1 minute, 35 cycles of 94°C for 30 seconds, 60°C for 30 seconds, and 72°C for 2 minutes, followed by 72°C for 5 minutes with
TaKaRa LA-Taq with GC Buffer (Takara Bio Inc., Otsu, Japan). Amplicons were purified with the illustra GFX PCR DNA and Gel Band Purification Kit
(GE Healthcare, Buckinghamshire, UK), and additionally some samples were
TA subcloned with TOPO TA Cloning Kit for sequencing (Thermo Fisher
Scientific) according to the manufacture’s protocol. Sequencing was performed directly or after TA cloning on an ABI PRISM 3100 Genetic Analyzer (Thermo Fisher Scientific) using the BigDye Terminator cycle sequencing ready reaction kit, v.3.1 (Thermo Fisher Scientific). The 5’ and 3’ ends of cyclin D T. socialis genes were determined by RACE using the
GeneRacer kit (Thermo Fisher Scientific) according to the manufacturer’s protocol. RACE products were amplified with KOD FX Neo DNA polymerase
(TOYOBO, Osaka, Japan) and PCR was carried out as follows: 94°C for 2 minutes, 5 cycles of 98°C for 10 seconds and 74°C for 1.5 minutes, 5 cycles of 98°C for 10 seconds and 72°C for 1.5 minutes, 5 cycles of 98°C for 10 seconds and 70°C for 1.5 minutes, 15 cycles of 98°C for 10 seconds and
68°C for 1.5 minutes, and followed by 68°C for 7 minutes. Amplified DNA were purified and sequenced as described above. The amino acid sequences of CYC family from C. reinhardtii, G. pectorale and V. carteri were aligned with newly determined CYC proteins of T. socialis with MAFFT (v7) and low- alignment regions were manually trimmed. Phylogenetic analyses of the CYC family was performed using the maximum-likelihood (ML) and the neighbour joining (NJ) method with PhyML (Guindon, et al. 2009) and MEGA 5.2.2
(Tamura, et al. 2011) with WAG+I+G+F model selected by ProtTest (Abascal, et al. 2005). The bootstrap analyses were performed with 1000 replicates.
The sequences of CYCA1, CYCB1, and CYCAB1 were used as outgroup proteins based on previous CYC phylogenetic analyses (Prochnik, et al. 2010;
Hanschen, et al. 2016). Based on the present phylogenetic analyses, volvocine CYCD proteins (CYCD1-4) formed a robust monophyletic group with 99% and 89% bootstrap values in ML and NJ analysis, respectively (fig.
3). Within the CYCD clade, CYCD1 of C. reinhardtii, T. socialis, G. pectorale, and V. carteri formed a monophyletic group though it was supported with low bootstrap values (49% and 18% bootstrap values in ML and NJ analysis, respectively). Although supported with low bootstrap values, expanded
CYCD1s in each volvocine species formed a clade (fig. 3). These results were consistent with those of Hanschen et al. (2016). As described above and in the main text 3 CYCD1 genes were identified in T. socialis (at least 2 found as tandem repeats – Figure S6), which indicates that CYCD1 expansion traces to the beginning of multicellularity in the volvocines and that a further duplication event occurred after the Gonium/Volvox split from Tetrabaena.
Together with interactions with the retinoblastoma protein CYCD1 expansion correlates with a modified program of cell division in the multicellular volvocines. Although here we show that the initial expansion traces to the origin of multicellularity there remain a number of unresolved questions regarding CYCD1 expansion. A number of factors related to CYCD1 expansion require further consideration. As yet the functional consequences of this expansion have not been demonstrated and it remains unknown if the expansion is a form of functional divergence (e.g. subfunctionalization or neofunctionalization) or if the expansion is related to dosage. Previously, it was found that when compared to other cell cycle genes that D-type cyclins show elevated rates of non-synonymous over synonymous substitutions
(dN/dS) (Hanschen, et al. 2016). However, median dN/dS values were below
1, which is typically indicative of purifying or near-neutral selection and elevated rates relative to other cyclins may reflect a relaxation of purifying selection rather than positive selection (Yang and Bielawski 2000). The phylogeny of cyclins presented in this work (and the previous analysis by
Hanschen et al. (2016)) shows that D1-type cyclins cluster by species rather than by subtype, which might suggest that D1-type cyclin paralogues have not functionally diverged in a form that is conserved throughout the lineage. Better statistical support for a volvocine CYCD protein phylogeny is required, which we were not able to generate. A dosage effect may better explain the second duplication event leading to 4 copies rather than functional divergence because 3 copies is sufficient for enabling a multicellular program of division in T. socialis. Another aspect to consider relates to the retinoblastoma-like
(RB/MAT3) gene, which alone appears able to alter the program of division in
C. reinhardtii to activate a multicellular program even though only 1 copy of
CYCD1 is present in C. reinhardtii (Hanschen, et al. 2016).
Cyclin-dependent kinases
To identify putative T. socialis cyclin dependant kinases (CDK) and rhodanese/cell-cycle control protein orthologues, annotated proteins from C. reinhardtii, G. pectorale and V. carteri were searched against the T. socialis predicted proteome and genome using BLASTP and TBLASTN (E-value=10-
5). ProtTest was used to identify the most suitable model for phylogenetic analysis. A phylogeny was generated using RAxML with the WAG model, gamma distribution and 1000 bootstrap iterations. In addition, orthologs were confirmed by inspecting OrthoMCL families. As per previous analyses
(Prochnik, et al. 2010; Hanschen, et al. 2016) CDKs do not display extensive differences across the volvocine lineage (Figure S7).
Retinoblastoma homologue (MAT3/RB)
Retinoblastoma protein sequences from C. reinhardtii, G. pectorale and V. carteri were aligned to the T. socialis proteome using BLASTP and a computationally predicted orthologue was identified. Following the same methodology as for cyclins the model of the T. socialis MAT3/RB gene was determined through additional capillary sequencing with primers designed from the genome assembly (Supplementary Table 23). The putative binding partners of MAT3/RB – Elongation factor 2 (E2F) and dimerization protein
(DP) - were also identified in T. socialis (Supplementary Table 24).
Chlamydomonas reinhardtii, G. pectorale and V. carteri MAT3/RB proteins sequences as well as additional MAT3/RB protein sequences from Hiraide et al. (2015) were used to inspect features of MAT3/RB. Using MAFFT (v7) a protein alignment was generated from available MAT3/RB protein sequences
(Figure S8): C. reinhardtii (plus and minus strains), T. socialis (homothallic),
Yamagishiella unicocca (plus), Eudorina spp (female), Gonium pectorale
(plus), Pleodorina starrii (female strains), V. carteri f nagariensis (male and female strains). The RB/MAT3 sequences were aligned with newly determined TsRB/MAT3, based on the alignment of Hanschen et al. (2016).
The alignment of MAT3/RB proteins was inspected for the length of the region between the RB-A and RB-B domains as well as for putative serine/threonine kinase phosphorylation targets within the region linking the RB-B domain to the C-terminal domain (Hanschen et al., 2016). The alignment showed that all colonial/multicellular taxa examined possess a shortened linker region between the RB-A and RB-B domains (Supplementary Table 20). In addition to the linker length between the RB-A and RB-B domains the linker region between RB-B and the carboxy (C)-terminal domain was previously found to be devoid of putative cyclin dependent kinase phosphorylation targets in G. pectorale and V. carteri (male and female) whereas two such sites were identified in C. reinhardtii (Hanschen, et al. 2016). Within the linker region between the RB-B and C domains typical CDK phosphorylation targets (with the (S/T)PX(K/R) motif) were identified manually and, in addition, the
PlantPhos server (http://csb.yzu.edu.tw/PlantPhos) was used to identify additional targets (Lee, et al. 2011). The first C. reinhardtii MAT3 target within the RB-B/C terminal identified by Hanschen et al. (2016) possesses an SPRG motif, which is atypical (or lineage-specific), and the second target contains a more typical SPTK motif. Atypical targets and targets predicted by PlantPhos for phosphorylation were identified within the RB-B to C-terminal domain linker-region of MAT3/RB in all multicellular volvocines (fig 4). Typical phosphorylation targets were present in T. socialis, G. pectorale, V. africanus and V. carteri f. nagariensis (female - TPGK). The remaining putative phosphorylation targets were atypical (where the 4th amino acid position is not a K/R).
Phylogenetic analyses of MAT3/RB were performed using the ML and the NJ method with PhyML program and MEGA 5.2.2 program, respectively, with JTT+G+F model, selected by ProtTest optimised using
Akaike information criteria. As per Hiraide et al. (2010) the MAT3/RB proteins of Chlamydomonas globosa and C. reinhardtii were included as outgroups in a new MAFFT alignment. The bootstrap analyses were performed 1000 replicates. Broadly, phylogenetic relationships were in accordance with that of
Hiraide et al. (2010)(Figure S9). MAT3/RB gene structures for C. reinhardtii, T. socialis, G. pectorale and both sexes of V. carteri were generated and inspected using WebScipio
(Odronitz, et al. 2008) and Genepainter (Muhlhausen, et al. 2015). WebScipio employed blat (Kent 2002) to align proteins to genomic sequences and subsequently created gene structures in YAML file format. A protein multiple sequence alignment was generated using MAFFT (v7.045), which together with YAML gene structure files were submitted online to GenePainter
(http://www.motorprotein.de/genepainter/) to map aligned MAT3/RB proteins to gene structures. The aligned gene structure identified 3 regions where introns are present in T. socialis, G. pectorale and V. carteri but absent in C. reinhardtii (Figure S10)
Extracellular matrix genes and the cell wall
The biochemical and molecular characteristics of volvocine extracellular matrices (ECM) and cell walls have been investigated (Goodenough and
Heuser 1985; Kirk, et al. 1986; Ertl, et al. 1989; Woessner and Goodenough
1989; Woessner, et al. 1994; Godl, et al. 1995; Godl, et al. 1997; Hallmann, et al. 1998; Sumper and Hallmann 1998; Hallmann and Kirk 2000; Hallmann
2003; Ishida 2007), in particular for C. reinhardtii and V. carteri. Two important innovations in the lineage are related to the ECM and/or cell wall: (i) the transformation of the cell wall into an extracellular matrix and (ii) the expansion of the ECM (Kirk 2005). The ECM forms as little as 1% of a C. reinhardtii cell (Kirk 2005) and in smaller colonial volvocines (e.g. G. pectorale) the ECM contribution to organismal volume varies but overall contributes significantly less to organismal volume than in V. carteri and other larger taxa where the ECM can constitute >95% of organismal volume
(Coleman 2012). The major protein constituents of the volvocine ECM are pherophorin proteins and matrix metalloproteases (MMPs). Comparative genomic investigations of C. reinhardtii, G. pectorale and V. carteri have identified that pherophorin and MMPs gene expansion associates with the expansion of the ECM in V. carteri (Prochnik, et al. 2010; Hanschen, et al.
2016) but not in G. pectorale (Prochnik, et al. 2010; Hanschen, et al. 2016).
The pherophorins are hydroxy-proline rich glycoproteins that characteristically possess at least 1 pherophorin domain (PF12499) (Tschochner, et al. 1987;
Mages, et al. 1988; Godl, et al. 1995; Godl, et al. 1997; Merchant, et al. 2007;
Prochnik, et al. 2010; Hanschen, et al. 2016). Pherophorins cross-link and together with simple carbohydrates create nets that support the gelatinous
ECM and maintain cellular connectivity (Goodenough and Heuser 1985; Kirk, et al. 1986; Ender, et al. 2002). The expression of certain pherophorins has been shown to be upregulated in V. carteri by sex inducers and by nitrogen starvation in C. reinhardtii (Sumper, et al. 1993; Amon, et al. 1998; Rodriguez, et al. 1999). Pherophorin encoding genes are expanded in V. carteri (n=78) relative to C. reinhardtii (n=35) and G. pectorale (n=31) and thus the expansions are more appropriately associated with the expansion of the ECM than the transformation of the ECM into a cell wall (Prochnik, et al. 2010;
Hanschen, et al. 2016). The matrix metalloproteases (MMPs) are another major constituent of proteins found in the volvocine ECM (Hallmann, et al.
2001). Metalloprotease proteins typically possess a metalloprotease metal binding domain (for copper or zinc binding) and a proline rich domain (HR domain) (Heitzer and Hallmann 2002). The specific substrates of MMPs are not well characterised but have been shown to be involved in degrading ECM components of the C. reinhardtii cell wall during gamete release and hence are also referred to as gametolysins (Goodenough and Heuser 1985).
Wounding can also trigger MMP expression, which suggests a role in ECM remodelling response (Hallmann, et al. 2001). The timing of MMP expression possibly indicates that a major role of MMPs is to degrade pherophorins
(Hallmann, et al. 2001; Heitzer and Hallmann 2002), which may form part of a signal transduction cascade that activates sexual reproduction in V. carteri
(Sumper, et al. 1993). It has been suggested that specific MMPs might exist to target specific pherophorins, however, this remains unconfirmed. As per pherophorins, relative to C. reinhardtii and G. pectorale, the expansion of
MMP coding genes is an exclusive feature of V. carteri with 44 genes in C. reinhardtii, 36 genes in G. pectorale and 98 genes in V. carteri (Hanschen, et al. 2016).
The C. reinhardtii cell wall is composed of 3 layers (Goodenough and
Heuser 1985) and each cell of the simpler colonials such as T. socialis and G. pectorale is surrounded at least by a layer akin to the central layer of C. reinhardtii (the tripartite boundary layer) while in the larger Volvocaceae the central layer surrounds the colony as opposed to each cell (Kirk 2005;
Coleman 2012). In the smaller taxa such T. socialis and G. pectorale ECM material is deposited between cells surrounding each cell but they work together with other structures such as cytoplasmic bridges to maintain cellular connectivity (Nozaki 1990; Arakaki, et al. 2013). This form of ECM deposition and cell wall modifications that connect cells has been described as a form (or intermediate form) of cell wall transformation into an ECM (Kirk 2005). As yet the transformation of the cell wall into an ECM is not well understood at the molecular level. Proteins characterized from the outer layers of the C. reinhardtii cell wall include glycoproteins GP1, GP2 and GP3 (Adair, et al.
1987) and notably, orthologs of GP2 and GP3 are present in both V. carteri and G. pectorale but GP1 is absent from both taxa (Prochnik, et al. 2010;
Hanschen, et al. 2016) (Supplementary Table 24).
Pherophorins
Using C. reinhardtii pherophorin genes phC1 and GAS28, T. socialis gene models were searched (E-value=10-2) for pherophorins. The identified collection of proteins was examined for the presence of the DUF3707 domain
(the protein domain present in pherophorins, up to three domains per protein) using HMMER3 (with E-value=10-5). Manual modifications of computer annotations were made to improve the pherophorin domains flanking a proline-rich repeat. The intergenic region of the T. socialis genome was searched (E-value=10-3) and models of pherophorins were manually built. The presence of DUF3707 domains was verified using direct Pfam submission (E- value=10-5). A protein phylogenetic tree was constructed using the automatic pipeline (Hanschen et al. 2016) described for MMPs. A total of 36 pherophorin proteins were identified in T. socialis. RAxML (v8.0.20) phylogeny of curated pherophorins demonstrated both the close relationship of many pherophorins throughout the volvocine lineage as well as species-specific expansions
(Figure S11). The most prominent of species-specific expansions are found in
V. carteri. Pherophorins have been associated with the development of the expanded extracellular matrix of V. carteri. The deepest branching proteins in the phylogeny include the C. reinhardtii hydroxyproline-rich glycoproteins
(GAS proteins) that are upregulated during N-starvation and putative orthologs of GAS proteins are found in all taxa. As per findings by Hanschen et al. (2016) in describing G. pectorale pherophorins, extensive expansions of pherophorins are not a feature of the colonial T. socialis.
Matrix metalloprotease proteins
A curated and published collection of matrix metalloproteases (MMPs) from C. reinhardtii, G. pectorale and V. carteri as compiled by Hanschen et al
(2016) were used to search for T. socialis MMPs from amongst MAKER2 predicted proteins (E-value of 10-1). These proteins were queried for the
Peptidase M11 domain (PF05548- E-value=10-5) using HMMER3. In addition, the published collection of MMPs was searched against the genome of T. socialis using TBLASTN (E-value=10-5) to identify MMPs that were not automatically modelled. Four additional genes that contained the PF05548 domain were manually annotated bringing the total number of T. socialis MMP genes to 34. For comparison, C. reinhardtii has 44 MMP genes, G. pectorale has 36 MMP genes, and V. carteri has 98 MMP genes. A phylogenetic tree of
Peptidase M11 proteins was constructed using a using a custom pipeline
(Hanschen et al. 2016) of SATe version 2.2.7 (Liu et al. 2009) coupled with
RAxML (v8.0.20) (Stamatakis 2014). Full protein sequences were passed to
SATe using a FASTTREE tree estimation, an RAxML (Liu, et al. 2011) search after initial tree formation, a maximum limit of 10 iterations and the “longest” decomposition strategy. Bootstraps were made on the SATe output alignment and tree using RAxML with automatic model selection, a rapid hill climbing algorithm (-f d) and 100 bootstrap partitions. Bipartition information (-f a) was obtained using the SATe output tree and RAxML bootstrap tree. As is in the case for pherophorins, the phylogeny of MMPs shows lineage-wide orthologues as well as V. carteri-specific expansions (Figure S12). Included amongst V. carteri expansions are proteins that form numerous clades that are devoid of proteins from other taxa, such as, a branch of 46 proteins that consist only of V. carteri proteins. Related-to and sister-to the large clade of
46 V. carteri proteins is a clade of proteins from C. reinhardtii, T. socialis and
G. pectorale, which includes the MMP4 C. reinhardtii protein. The inclusion of
T. socialis proteins strengthens the correlation between the number of pherophorin and MMP proteins that has previously been reported (Hanschen, et al. 2016) and thus it may be concluded that genomic evidence to date does not contradict the hypothesis that specific MMPs exist to target specific pherophorins.
Cell wall proteins
As per G. pectorale and V. carteri the GP1 gene was not identified in T. socialis (Supplementary Table 24). It is unknown if GP1 is unique to C. reinhardtii or if the absence of GP1 in the multicellular volvocines was due to loss. If gene loss occurred then the loss likely occurred near the origin of volvocine multicellularity, which might prompt speculation of whether or not the loss of GP1 was somehow of consequence for cell wall modifications. To further explore the origin of the C. reinhardtii GP1 gene the GP1 protein was used as a query against available transcriptome-derived proteomes of other unicellular members of Chlamydomonadales order using BLASTP (Supplementary Table 2). No homologs were identified (Evalue=10-5) but as yet the evolutionary history of GP1 remains unknown. The GP2 gene was identified in T. socialis as a partial sequences but was not modelled.
Reg-A related genes
Reg-A related sequences (rls) genes (also known as the Volvocine
Algal RegA-Like (VARL) genes) were identified in T. socialis by following previous approaches (Hanschen et al. 2016). All published VARL gene sequences from C. reinhardtii, G. pectorale and V. carteri (Duncan et al. 2007;
Merchant et al. 2007; Hanschen et al. 2014, 2016) were searched against both the predicted genes and assembly of T. socialis. The presence of a
SAND domain in computationally predicted and manually constructed VARL proteins was verified using HMMER3 to query Pfam-A version 26.0 (Finn et al. 2010) (E-value of 10-5). There was one instance (RLS12 in C. reinhardtii) where a hypothesized SAND domain did not pass the E-value threshold. An alignment of RLS proteins from T. socialis, C. reinhardtii, G. pectorale and V. carteri was generated but was limited to the VARL domain because this is the only conserved region in RLS-proteins (Duncan et al. 2007; Hanschen et al.
2014)). MAFFT (v6.859b) with the L-INS-I option (Katoh et al. 2005) was used to generate a protein alignment. A phylogenetic tree was producing using
RAxML (v8.0.20) (Stamatakis 2014) with the Protein Gamma distribution and automatic model selection (Figure S13). Rapid bootstrapping analysis to search for the best-scoring maximum-likelihood (ML) tree was run with an automatically selected number of bootstraps. A total of 8 REGA-related (RLS) proteins were identified in T. socialis, which is the same number as G. pectorale. As is the case with G. pectorale - REGA-cluster proteins (regA, rlsA, rlsB, rlsC and rlsN/O) were not identified in T. socialis.
Fasciclin domain-containing proteins
The molecular mechanisms that underlie incomplete cytokinesis in the volvocines are as yet unknown, however, an important morphological feature of multicellular volvocines is the cytoplasmic bridges that maintain cellular connections. Cellular adhesion has been recognized as an important feature for the early development of multicellularity and adhesion molecules in a number of lineages have been characterized (e.g. pectins in plants, glycoproteins in fungi, phlorotannins in brown algae and cadherins in metazoa) (Niklas 2014). Besides ECM components that envelop cells and thereby maintain connectivity adhesive molecules are not well characterized in the volvocine lineage. One candidate cell adhesion molecule is the fasciclin-domain containing protein named algal-cell adhesion molecule
(Algal-CAM1) (Huber and Sumper 1994), which has to date only been identified in V. carteri and where it may play a role in cell adhesion as well as embryo inversion. Treating synchronised V. carteri cultures with antibodies targeting these proteins prior to embryogenesis resulted in the formation of misshapen “doughnut” shaped blastomers while later treatment with antibodies resulted in normal shaped blastomers except that embryological inversion was inhibited (Huber and Sumper 1994). Furthermore, a family of fasciclin-domain containing proteins was identified as gained at the ORIMC in the current investigation. To explore fasciclin-domain containing proteins throughout the volvocine lineage fasciclin domain (PF02469) containing proteins in C. reinhardtii, T. socialis, G. pectorale and V. carteri were identified
HMMER3 scans to Pfam-A. Proteins were aligned using MAFFT (v7.045), model testing was performed with ProtTest and RAxML (v8.2.0) (Blosum62, invariant sites, gamma distribution, empirical frequencies and 1000 bootstraps) was used for the generation of a phylogenetic tree (fig. 5). The V. carteri algal cell adhesion molecule (Algal-CAM) protein (XP_002958363.1) was identified in the V. carteri v1 proteome and was identified in protein phylogeny of fasciclin-domain containing proteins (Huber & Sumper, 1994).
The protein phylogeny identified 3 putative paralogues of V. carteri Algal-CAM
(fig. 5). Other fasciclin-domain containing proteins from C. reinhardtii, T. socialis, G. pectorale and V. carteri did not cluster with Algal-CAM paralogues. The family identified as originating at the ORIMC contains two T. socialis proteins (TSOC_001029-RA and TSOC_001027-RA) and one protein from both G. pectorale (scaffold00001.g603t.1) and V. carteri (96911) (fig. 5).
Prochnik table genes
The Prochnik et al (2010) genes “involved in processes that are associated with increased developmental complexity in V. carteri relate to C. reinhardtii” (hereafter Prochnik genes/proteins) were identified in T. socialis.
Orthologues of Prochnik genes have been identified in G. pectorale and revisions have been made that include orthologues of updated C. reinhardtii and V. carteri proteomes (Hanschen et al., 2016). Results from various informatics tools, were manually inspected to identify putative T. socialis orthologues of Prochnik table proteins (Supplementary Table 24). Protein sequences from C. reinhardtii, G. pectorale and V. carteri were used to identify T. socialis orthologues using homology searches (TBLASTN and
BLASTP). Furthermore, OrthoMCL protein-family clusters were inspected and where required protein phylogenies were generated to identify T. socialis orthologues. Additional efforts were taken to identify and classify matrix metalloproteases (MMPs), pherophorins and cell cycle proteins and are discussed separately below. Besides pherophorins and MMPs approximately
138 orthologues were identified from the predicted proteome of T. socialis. A further 4 genes were identified from TBLAST searches against the T. socialis genome assembly that were not generated by automated gene prediction.
This demonstrated that gene models expected to be present in the T. socialis genome/annotations were generally well captured.
Supplementary references
Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104-2105. Adair WS, Steinmetz SA, Mattson DM, Goodenough UW, Heuser JE. 1987. Nucleated assembly of Chlamydomonas and Volvox cell walls. J Cell Biol 105:2373-2382. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. Amon P, Haas E, Sumper M. 1998. The sex-inducing pheromone and wounding trigger the same set of genes in the multicellular green alga Volvox. Plant Cell 10:781-789. Arakaki Y, Kawai-Toyooka H, Hamamura Y, Higashiyama T, Noga A, Hirono M, Olson BJ, Nozaki H. 2013. The simplest integrated multicellular organism unveiled. PLoS One 8:e81641. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455- 477. Besemer J, Borodovsky M. 2005. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451-454. Blaby IK, Blaby-Haas CE, Tourasse N, Hom EF, Lopez D, Aksoy M, Grossman A, Umen J, Dutcher S, Porter M, et al. 2014. The Chlamydomonas genome project: a decade on. Trends Plant Sci 19:672-680. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708-715. Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome Biol 13:R56. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114-2120. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I. 2016. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54. Bruen TC, Philippe H, Bryant D. 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics 172:2665-2681. Buchheim MA, Keller A, Koetschan C, Forster F, Merget B, Wolf M. 2011. Internal transcribed spacer 2 (nu ITS2 rRNA) sequence-structure phylogenetics: towards an automated reconstruction of the green algal tree of life. PLoS One 6:e16931. Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics 48:4 11 11-39. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972-1973. Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X. 2015. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30. Coleman AW. 2012. A Comparative Analysis of the Volvocaceae (Chlorophyta)(1). J Phycol 48:491-513. Csuros M. 2010. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26:1910-1912. Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361-375. Demuth JP, Hahn MW. 2009. The life and death of gene families. Bioessays 31:29- 39. Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol 7:e1002195. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792-1797. Ender F, Godl K, Wenzl S, Sumper M. 2002. Evidence for autocatalytic cross- linking of hydroxyproline-rich glycoproteins during extracellular matrix assembly in Volvox. Plant Cell 14:1147-1160. Ertl H, Mengele R, Wenzl S, Engel J, Sumper M. 1989. The extracellular matrix of Volvox carteri: molecular structure of the cellular compartment. J Cell Biol 109:3493-3501. Falda M, Toppo S, Pescarolo A, Lavezzo E, Di Camillo B, Facchinetti A, Cilia E, Velasco R, Fontana P. 2012. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics 13 Suppl 4:S14. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. 2014. Pfam: the protein families database. Nucleic Acids Res 42:D222-230. Gao S, Sung WK, Nagarajan N. 2011. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 18:1681- 1691. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, et al. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108:1513-1518. Godl K, Hallmann A, Rappel A, Sumper M. 1995. Pherophorins: a family of extracellular matrix glycoproteins from Volvox structurally related to the sex- inducing pheromone. Planta 196:781-787. Godl K, Hallmann A, Wenzl S, Sumper M. 1997. Differential targeting of closely related ECM glycoproteins: the pherophorin family from Volvox. EMBO J 16:25- 34. Goodenough UW, Heuser JE. 1985. The Chlamydomonas cell wall and its constituent glycoproteins analyzed by the quick-freeze, deep-etch technique. J Cell Biol 101:1550-1568. Gorman DS, Levine RP. 1965. Cytochrome f and plastocyanin: their sequence in the photosynthetic eletron transport chain of Chlamydomonas reinhardtii. Proc Natl Acad Sci U S A 54:1665-1669. Guindon S, Delsuc F, Dufayard JF, Gascuel O. 2009. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol 537:113-137. Hallmann A. 2003. Extracellular matrix and sex-inducing pheromone in Volvox. Int Rev Cytol 227:131-182. Hallmann A, Amon P, Godl K, Heitzer M, Sumper M. 2001. Transcriptional activation by the sexual pheromone and wounding: a new gene family from Volvox encoding modular proteins with (hydroxy)proline-rich and metalloproteinase homology domains. Plant J 26:583-593. Hallmann A, Godl K, Wenzl S, Sumper M. 1998. The highly efficient sex-inducing pheromone system of Volvox. Trends Microbiol 6:185-189. Hallmann A, Kirk DL. 2000. The developmentally regulated ECM glycoprotein ISG plays an essential role in organizing the ECM and orienting the cells of Volvox. J Cell Sci 113 Pt 24:4605-4617. Hanschen ER, Marriage TN, Ferris PJ, Hamaji T, Toyoda A, Fujiyama A, Neme R, Noguchi H, Minakuchi Y, Suzuki M, et al. 2016. The Gonium pectorale genome demonstrates co-option of cell cycle regulation during the evolution of multicellularity. Nat Commun 7:11370. Heitzer M, Hallmann A. 2002. An extracellular matrix-localized metalloproteinase with an exceptional QEXXH metal binding site prefers copper for catalytic activity. J Biol Chem 277:28280-28286. Hochberg Y, Benjamini Y. 1990. More powerful procedures for multiple significance testing. Stat Med 9:811-818. Holt C, Yandell M. 2011. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491. Hongo JA, de Castro GM, Cintra LC, Zerlotini A, Lobo FP. 2015. POTION: an end- to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes. BMC Genomics 16:567. Huber O, Sumper M. 1994. Algal-CAMs: isoforms of a cell adhesion molecule in embryos of the alga Volvox with homology to Drosophila fasciclin I. EMBO J 13:4212-4222. Hubisz MJ, Pollard KS, Siepel A. 2011. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 12:41-51. Ishida K. 2007. Sexual pheromone induces diffusion of the pheromone- homologous polypeptide in the extracellular matrix of Volvox carteri. Eukaryot Cell 6:2157-2162. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240. Kent WJ. 2002. BLAT--the BLAST-like alignment tool. Genome Res 12:656-664. Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. 2011. Adaptive seeds tame genomic sequence comparison. Genome Res 21:487-493. Kirk DL. 2005. A twelve-step program for evolving multicellularity and a division of labor. Bioessays 27:299-310. Kirk DL, Birchem R, King N. 1986. The extracellular matrix of Volvox: a comparative study and proposed system of nomenclature. J Cell Sci 80:207-231. Kirk DL, Harper JF. 1986. Genetic, biochemical, and molecular approaches to Volvox development and evolution. Int Rev Cytol 99:217-293. Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. Kuck P, Meusemann K. 2010. FASconCAT: Convenient handling of data matrices. Mol Phylogenet Evol 56:1115-1118. Lee TY, Bretana NA, Lu CT. 2011. PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity. BMC Bioinformatics 12:261. Li L, Stoeckert CJ, Jr., Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178-2189. Liu K, Linder CR, Warnow T. 2011. RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One 6:e27731. Lopez D, Casero D, Cokus SJ, Merchant SS, Pellegrini M. 2011. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data. BMC Bioinformatics 12:282. Mages HW, Tschochner H, Sumper M. 1988. The sexual inducer of Volvox carteri. Primary structure deduced from cDNA sequence. FEBS Lett 234:407-410. Magoc T, Salzberg SL. 2011. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957-2963. Marcais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurences of k-mers. Bioinformatics 27:764-770. Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, Terry A, Salamov A, Fritz-Laylin LK, Marechal-Drouard L, et al. 2007. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318:245-250. Miller SM, Schmitt R, Kirk DL. 1993. Jordan, an active Volvox transposable element similar to higher plant transposons. Plant Cell 5:1125-1138. Muhlhausen S, Hellkamp M, Kollmar M. 2015. GenePainter v. 2.0 resolves the taxonomic distribution of intron positions. Bioinformatics 31:1302-1304. Niklas KJ. 2014. The evolutionary-developmental origins of multicellularity. Am J Bot 101:6-25. Nozaki H. 1986. Sexual reproduction in Gonium sociale (Chlorophta, Volvocales). Phycologia 25:29-35. Nozaki H. 1990. Ultrastructure of the extracellular matrix of Gonium (Volvocales, Chlorophyta). Phycologia 29:1-8. Nozaki H, Misawa K, Kajita T, Kato M, Nohara S, Watanabe MM. 2000. Origin and evolution of the colonial volvocales (Chlorophyceae) as inferred from multiple, chloroplast gene sequences. Mol Phylogenet Evol 17:256-268. Odronitz F, Pillmann H, Keller O, Waack S, Kollmar M. 2008. WebScipio: an online tool for the determination of gene structures using protein sequences. BMC Genomics 9:422. Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, Fritz-Laylin LK, et al. 2010. Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329:223-226. Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry A, Shapiro H, Lindquist E, Kapitonov VV, et al. 2007. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317:86-94. Quinlan AR. 2014. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics 47:11 12 11-34. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841-842. Retief JD. 2000. Phylogenetic analysis using PHYLIP. Methods Mol Biol 132:243- 258. Rodriguez H, Haring MA, Beck CF. 1999. Molecular characterization of two light- induced, gamete-specific genes from Chlamydomonas reinhardtii that encode hydroxyproline-rich proteins. Mol Gen Genet 261:267-274. Siepel A, Haussler D. 2004. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21:468-488. Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. 2016. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134-1144. Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, Kuo A, Mitros T, Salamov A, Carpenter ML, et al. 2008. The Trichoplax genome and the nature of placozoans. Nature 454:955-960. Srivastava M, Simakov O, Chapman J, Fahey B, Gauthier ME, Mitros T, Richards GS, Conaco C, Dacre M, Hellsten U, et al. 2010. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466:720-726. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30:1312-1313. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435-439. Sumper M, Berg E, Wenzl S, Godl K. 1993. How a sex pheromone might act at a concentration below 10(-16) M. EMBO J 12:831-836. Sumper M, Hallmann A. 1998. Biochemistry of the extracellular matrix of Volvox. Int Rev Cytol 180:51-85. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolution Distance and Maximum Parsimony Methods. Mol Biol Evol 28:2731-2739. Tschochner H, Lottspeich F, Sumper M. 1987. The sexual inducer of Volvox carteri: purification, chemical characterization and identification of its gene. EMBO J 6:2203-2207. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. 2017. GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963. Wang Y, Coleman-Derr D, Chen G, Gu YQ. 2015. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res 43:W78-84. Wences AH, Schatz MC. 2015. Metassembler: merging and optimizing de novo genome assemblies. Genome Biol 16:207. Woessner JP, Goodenough UW. 1989. Molecular characterization of a zygote wall protein: an extensin-like molecule in Chlamydomonas reinhardtii. Plant Cell 1:901-911. Woessner JP, Molendijk AJ, van Egmond P, Klis FM, Goodenough UW, Haring MA. 1994. Domain conservation in several volvocalean cell wall proteins. Plant Mol Biol 26:947-960. Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586-1591. Yang Z, Bielawski JP. 2000. Statisitical methods for detecting molecular adaption. Trends Ecol Evol 15:496-503.
Supplememntary Figures
Supplementary Figure 1: A light microscope image of Tetrabaena socialis.
The extracellular matrix surrounding each cell is faintly visible.
Supplementary Figure 2: Kmer frequency plot.
Kmer counting was performed using Jellyfish and the outputs were used to estimate genome size using GenomeScope. Presented here is a k of 77.
Supplementary Figure 3: Venn diagram of orthologous genes amongst volvocine algae. The abbreviations Crei, Tsoc, Gpec and Vcar denote
Chlamydomonas reinhardtii, Tetrabaena socialis, Gonium pectorale and
Volvox carteri. Image generated using OrthoVenn with OrthoMCL families as an input.
Supplementary Figure 4: The number of Pfam-A domains identified in the proteomes of each species examined using HMMER3.
Supplementary Figure 5. Multicellular and enriched domains as categorised by presence in selected chlorophytes, plants and opisthokonta. In the left
Venn diagram identified Pfam-A domains were sorted for presence amongst selected chlorophytes, plants and opisthokonta. In the right Venn diagram enriched domains between Chlamydomonas reinhardtii and the multicellular taxa as well as multicellular domains were sorted according to the distribution of domains amongst selected chlorophyta, plantae and opisthokonta as represented in the left Venn diagram.
Supplementary Figure 6: Synteny relationships of CYCD1 genes
Tetrabaena_RDP1
100 78 Volvox_RDP1 Gonium_RDP2 Gonium_RDP1
58 96 Chlamydomonas_RPD2 Tetrabaena_RDP2 62 Tetrabaena_RDP3 73 89 Chlamydomonas_RDP1 Chlamydomonas_RDP3 Volvox_RDP3 Gonium_RDP3 Volvox_CDKD1 90 Gonium_CDKD1 82 61 Tetrabaena_CDKD1 Chlamydomonas_CDKD1 Gonium_CDKI1 57 70 Volvox_CDKI1 100 99 Chlamydomonas_CDKI1 Tetrabaena_CDKI1 Chlamydomonas_CDKE1 100 72 Gonium_CDKE1 Volvox_CDKE1 53 83 Tetrabaena_CDKB1 60 Chlamydomonas_CDKB1 100 Gonium_CDKB1 Volvox_CDKB1 Gonium_CDKA1 58 86 99 Tetrabaena_CDKA1 98 60 Chlamydomonas_CDKA1 Volvox_CDKA1 Volvox_CDKA2 100 85 Chlamydomonas_CDKA2 Tetrabaena_CDKA2
64 Chlamydomonas_CDKG2 93 Volvox_CDKG1 94 Gonium_CDKG1
76 Chlamydomonas_CDKG1 Tetrabaena_CDKG1 Chlamydomonas_CDKH1 100 52 Volvox_CDKH1
76 Gonium_CDKH1 Gonium_CDKC1 92 Volvox_CDKC1 Tetrabaena_CDKC1 Chlamydomonas_CDKC1
1.0
Supplementary Figure 7: Mid-root phylogeny of cyclin-dependent kinases and rhodanese/cell-cycle phosphatases (RDPs). Cyclin-dependent kinases from
Chlamydomonas reinhardtii (green), Tetrabaena socialis (magenta), Gonium pectorale (blue) and Volvox carteri (black).
N1 N1 N2
N2
N2 N3
N3 RB-A
RB-A
RB-A
RB-B
RB-B
C1
C4 Conserved CDK phosphorylation sites indicated in Hanschen et al. 2016 Nat. Commun. Degenerate or species specific CDK phosphorylation sites indicated in Hanschen et al. 2016 Nat. Commun. CDK phosphorylation sites predicted by PlantPhos server.
Supplementary Figure 8: MAT3/RB alignment and putative cyclin-dependent kinase phosphorylation targets.
Supplementary Figure 9: Maximum-likelihood (ML) tree of RB/MAT3 of volvocine algae.
Bootstrap values (≥50%) for the ML and neighbor-joining (NJ) analyses are indicated left and right side, respectively. The scale bar corresponds to 0.1 amino acid substitutions per position.
Supplementary Figure 10: Retinoblastoma gene structures of genome sequenced volvocines. Black bars underline regions where introns have been gained in the colonial/multicellular taxa.
Supplementary Figure 11: Mid-pointed rooted phylogeny of pherophorins from
Chlamydomonas reinhardtii, Tetrabaena socialis, Gonium pectorale and
Volvox carteri.
Proteins were from Hanschen et al (2016) collection of curated pherophorins with the addition of Tetrabaena socialis proteins. Nodes where proteins from all taxa were present were collapsed in order to simplify the image.
Highlighted in green are nodes and branches that consist exclusively of
Volvox carteri proteins, in red are branches where Volvox carteri proteins are absent and in purple are branches that contain Chlamydomonas reinhardtii hydroxyproline-rich glycoproteins upregulated during N-starvation (GAS proteins) as well as homologues from Gonium pectorale, Tetrabaena socialis and Volvox carteri.
Supplementary Figure 12: Mid-pointed rooted phylogeny of Matrix
Metalloproteases from Chlamydomonas reinhardtii, Tetrabaena socialis,
Gonium pectorale and Volvox carteri.
Proteins were from Hanschen et al (2016) collection of curated MMPs with the addition of Tetrabaena socialis proteins. Nodes where proteins from three or more taxa were present were collapsed in order to simplify the image.
Highlighted in green are nodes and branches that consist exclusively of
Volvox carteri proteins, in red Chlamydomonas reinhardtii proteins and in blue a branch with only Chlamydomonas reinhardtii and Tetrabaena socialis proteins.
Supplementary Figure 13: Mid-root phylogeny of RegA-related sequences from Chlamydomonas reinhardtii, Tetrabaena socialis, Gonium pectorale and
Volvox carteri.
RegA-related sequences from Chlamydomonas reinhardtii (green),
Tetrabaena socialis (magenta), Gonium pectorale (blue) and Volvox carteri
(black). The phylogeny was generated from an alignment of only the conserved VARL domain region. The phylogeny confirms that the regA-cluster of proteins (denoted by red branches that include regA, rlsA, rlsB and rlsC) is not present in Tetrabaena socialis. RLS1/rlsD proteins are shown as sister to the regA-cluster but without significant support. Supplementary Table 1: Kmer analysis with Jellyfish, Allpaths-LG and
BBtools. A substantial variance was observed between different tools or kmer sizes and a consensus for genome size or repeat content was not obtained.
Kmer/algorithm Estimated Genome Repeat content (%)
Size (mb)
Jellyfish/GenomeScope
K=19 101,475 20,4%
K=21 102,175 17,9%
K=25 103,308 16,7%
K=33 105,230 16,1%
K=45 108,311 16,9%
K=55 111,586 18,9%
K=77 117,848 23,7%
Allpaths-LG
K=25 178,9 43%
BBtools
K=31 114,954 10,8%
Supplementary Table 2: Sources of data used in comparative genome analyses.
Organism Version Source Reference
Chlamydomona Version 5.5, http://phytozome.jgi.doe.gov/ (Blaby et s reinhardtii Version 5.5 al., 2014; S.
(Chlorophyte: S. Merchant
Chlamydomona et al., 2007) dales)
Volvox carteri Version 1, http://genome.jgi.doe.gov/ and (Prochnik et
(Chlorophyte: Version 2.0, http://phytozome.jgi.doe.gov/ al., 2010)
Chlamydomona Version 2.X dales)
Gonium Version 1 Volvocales Genome (Hanschen pectorale Consortium now available at et al., 2016)
(Chlorophyte: http://www.ncbi.nlm.nih.gov/ge
Chlamydomona nome/?term=gonium+pectorale dales)
Ostreococcus Version 2 http://genome.jgi.doe.gov/ (Hindle et tauri al., 2014)
(Chlorophyte)
Ostreococcus Version 2 http://genome.jgi.doe.gov/ (Palenik et
RCC809 al., 2007)
(Chlorophyte)
Ostreococcus Version 2 http://genome.jgi.doe.gov/ (Palenik et lucimarinus al., 2007)
(Chlorophyte)
Chlorella Version 1 http://genome.jgi.doe.gov/ (Blanc et viriabilis al., 2010)
(NC64) (Chlorophyte)
Micromonas Version 3 http://genome.jgi.doe.gov/ (Worden et pusilla al., 2009)
CCMP1545
(Chlorophyte)
Micromonas Version 3 http://genome.jgi.doe.gov/ (Worden et pusilla RCC299 al., 2009)
(Chlorophyte)
Coccomyxa Version 2 http://genome.jgi.doe.gov/ (Blanc et subellipsoidea al., 2012)
(Chlorophyte)
Dunaliella Version 1 http://marinemicroeukaryotes.o (Rismani- tertiolecta (Transcriptome) rg/ yazdi et al.,
(Chlorophyte: 2011)
Chlamydomona dales)
Polytomella Version 1 http://marinemicroeukaryotes.o (Smith & parva (Transcriptome) rg/ Lee, 2014)
(Chlorophyte:
Chlamydomona dales)
Chlamydomona Version 1 http://marinemicroeukaryotes.o (Keeling et s leiostraca (Transcriptome) rg/ al., 2014)
(Chlorophyte: Chlamydomona dales)
Chlamydomona Version 1 http://marinemicroeukaryotes.o (Keeling et s (Transcriptome) rg/ al., 2014) chlamydogama
(Chlorophyte:
Chlamydomona dales)
Chlamydomona Version 1 http://marinemicroeukaryotes.o (Keeling et s eurale (Transcriptome) rg/ al., 2014)
(Chlorophyte:
Chlamydomona dales)
Bathycoccus Version 1 http://bioinformatics.psb.ugent. X(Moreau prasinos be/ et al., 2012)
(Chlorophyte:
Chlamydomona dales)
Auxenochlorell Version 1 www.ncbi.nlm.nih.gov/) (C. Gao et a al., n.d.) protothecoides
(Chlorophyte)
Caenorhabditu Version 84.245 http://www.wormbase.org/ (Equence & s elegans Iology, (Metazoan) 2012)
Drosophila Version 6 http://www.flybase.net/ (Ketchum et melanogaster al., 2000)
(Metazoan)
Selaginella Version 1 http://phytozome.jgi.doe.gov/ (Banks et moellendorffii al., 2007)
(Plant)
Arabidopsis TAIR10 https://www.arabidopsis.org/do thaliana (Plant) wnload/index-
auto.jsp?dir=/download_files/Pr
oteins
Physcomitrella Version 3.3 http://phytozome.jgi.doe.gov/ (Shapiro et patens (Plant) al., 2008)
Monosiga Version 1 http://genome.jgi.doe.gov/ (King et al., brevicolis 2008)
(Metazoan)
Saccharomyce Version http://www.yeastgenome.org/ (Mewes et s cerevisea R64.2.1 al., 1997)
(Fungi)
Homo sapiens Version 84.20 http://www.ensembl.org/ (Venter et
(Metazoan) al., 2001)
Supplementary Table 3: BLAST results of Tetrabaena socialis proteins against UNIPROT and the proteomes of other volvocines. Note that results include only top blast hits (E-values <1x10-7).
Query Proteomes Tetrabaena socialis matches (15
153 proteins)
C. reinhardtii 12 954 (83.5%)
G. pectorale 12 740 (82.1%)
V. carteri 12 218 (78.7%)
Volvocines 13 712 (88.3)
UNIPROT-SWISSPROT 7 838 (50.5)
UNIPROT-TREMBL 13 091 (84.3%)
All Blast hits 13 982 (90%)
Supplementary Table 4: The number of Chlamydomonas reinhardtii orthologues for multicellular volvocines. Percentages reflect the percentage of the multicellular volvocine proteins with Chlamydomonas reinhardtii orthologues.
Chlamydomonas Proteins from multicelled taxa
reinhardtii with Chlamydomonas
Orthologues reinhardtii orthologues.
Tetrabaena socialis 8931 57,5%
Gonium pectorale 9384 55,8%
Volvox carteri 9384 64%
Supplementary Table 5: Domains that are significantly different (p<0,05) between Chlamydomonas reinhardtii and the multicellular volvocines.
Domains are classified for presence amongst major eukaryotic lineages examined. A (+) symbol in column 3 indicates that domains are enriched in multicellular volvocines. Domains with an (X) in column 2 indicate that they are absent from Chlamydomonas reinhardtii but present in the proteomes of all 3 multicellular volvocines (Tetrabaena socialis, Gonium pectorale and
Volvox carteri).
PFAM P-value Enriched Annotation of Domain
Domain Multicell-
ular (+/-)
Common to all examined Eukaryotic lineages
PF00331 X Glycosyl hydrolase family 10
PF00832 X Ribosomal L39 protein
PF02320 X Ubiquinol-cytochrome C reductase hinge protein
PF03671 X Ubiquitin fold modifier 1 protein
PF05557 X Mitotic checkpoint protein
PF06331 X Transcription factor TFIIH complex subunit Tfb5
PF06645 X Microsomal signal peptidase 12 kDa subunit
(SPC12)
PF07145 X Ataxin-2 C-terminal region
PF07720 X Tetratricopeptide repeat
PF09202 X Rio2, N-terminal PF10138 X vWA found in TerF C terminus
PF10537 X ATP-utilising chromatin assembly and
remodelling N-terminal
PF11559 X Afadin- and alpha -actinin-Binding
PF12627 X Probable RNA and SrmB- binding site of
polymerase A
PF13465 X Zinc-finger double domain
PF14811 X Protein of unknown function TPD sequence-
motif
PF14822 X Vasohibin
PF13857 1,11E-16 (+) Ankyrin repeats (many copies)
PF13637 1,11E-16 (+) Ankyrin repeats (many copies)
PF00078 2,55E-15 (-) Reverse transcriptase (RNA-dependent DNA
polymerase)
PF12796 1,40E-14 (+) Ankyrin repeats (3 copies)
PF13606 1,67E-10 (+) Ankyrin repeat
PF00023 9,29E-08 (+) Ankyrin repeat
PF01011 3,24E-05 (-) PQQ enzyme repeat
PF13360 1,44E-04 (-) PQQ-like domain
PF00652 3,63E-04 (+) Ricin-type beta-trefoil lectin domain
PF14291 4,11E-04 (-) Domain of unknown function (DUF4371)
PF01753 7,84E-04 (+) MYND finger
PF00010 9,45E-04 (-) Helix-loop-helix DNA-binding domain
PF00665 1,08E-03 (+) Integrase core domain PF05686 1,16E-03 (-) Glycosyl transferase family 90
PF14200 1,41E-03 (+) Ricin-type beta-trefoil lectin domain-like
PF07645 1,44E-03 (-) Calcium-binding EGF domain
PF01453 2,50E-03 (+) D-mannose binding lectin
PF00233 2,80E-03 (-) 3'5'-cyclic nucleotide phosphodiesterase
PF13570 3,71E-03 (-) PQQ-like domain
PF01663 4,69E-03 (-) Type I phosphodiesterase / nucleotide
pyrophosphatase
PF07731 6,19E-03 (-) Multicopper oxidase
PF13371 8,49E-03 (-) Tetratricopeptide repeat
PF00059 1,26E-02 (-) Lectin C-type domain
PF08373 1,86E-02 (-) RAP domain
PF00394 1,94E-02 (-) Multicopper oxidase
PF13563 1,94E-02 (-) 2'-5' RNA ligase superfamily
PF00704 2,14E-02 (-) Glycosyl hydrolases family 18
PF13738 2,51E-02 (-) Pyridine nucleotide-disulphide oxidoreductase
PF05729 2,53E-02 (-) NACHT domain
PF10275 2,53E-02 (-) Peptidase C65 Otubain
PF00884 2,60E-02 (-) Sulfatase
PF01541 2,63E-02 (-) GIY-YIG catalytic domain
PF07732 3,01E-02 (-) Multicopper oxidase
PF13405 3,03E-02 (-) EF-hand domain
PF00581 3,04E-02 (-) Rhodanese-like domain
PF13469 3,07E-02 (-) Sulfotransferase family PF13499 3,35E-02 (-) EF-hand domain pair
PF13401 3,49E-02 (-) AAA domain
PF07676 3,55E-02 (+) WD40-like Beta Propeller Repeat
PF00188 4,20E-02 (+) Cysteine-rich secretory protein family
PF01636 4,56E-02 (+) Phosphotransferase enzyme family
PF13481 4,65E-02 (-) AAA domain
Unique to Chlorophytes and Opistokonta
PF02607 X B12 binding domain
PF07221 X N-acylglucosamine 2-epimerase (GlcNAc 2-
epimerase)
PF09594 X Protein of unknown function (DUF2029)
PF12051 X Protein of unknown function (DUF3533)
PF13358 X DDE superfamily endonuclease
PF13688 X Metallo-peptidase family M12
PF14740 X Domain of unknown function (DUF4471)
PF14864 X Alkyl sulfatase C-terminal
PF05887 2,10E-06 (-) Procyclic acidic repetitive protein (PARP)
PF12851 1,19E-04 (-) Oxygenase domain of the 2OGFeDO
superfamily
PF16588 6,17E-03 (-) C2H2 zinc-finger
PF07699 1,20E-02 (-) GCC2 and GCC3
PF07562 1,49E-02 (-) Nine Cysteines Domain of family 3 GPCR
PF06701 1,73E-02 (+) Mib_herc2 PF03762 2,83E-02 (+) Vitelline membrane outer layer protein I (VOMI)
PF02494 3,13E-02 (+) HYR domain
PF01822 3,47E-02 (-) WSC domain
PF01607 4,49E-02 (-) Chitin binding Peritrophin-A domain
Unique to Chlorophytes and Plants
PF07123 X Photosystem II reaction centre W protein
(PsbW)
PF08706 X D5 N terminal like
PF07460 2,44E-07 (-) NUMOD3 motif (2 copies)
PF00759 1,41E-03 (+) Glycosyl hydrolase family 9
PF09118 1,09E-02 (-) Domain of unknown function (DUF1929)
PF07250 1,39E-02 (-) Glyoxal oxidase N-terminus
PF03924 2,32E-02 (-) CHASE domain
Unique to Chlorophytes
PF02415 X Chlamydia PMP repeat
PF04230 X Polysaccharide pyruvyl transferase
PF08707 X Primase C terminal 2 (PriCT-2)
PF07282 1,55E-15 (-) Putative transposase DNA-binding domain
PF12499 1,89E-02 (+) Pherophorin
PF06863 1,44E-03 (-) Protein of unknown function (DUF1254)
PF06742 1,44E-03 (-) Protein of unknown function (DUF1214)
Supplementary Table 6: Prochnik outlier domains identified in
Chlamydomonas reinhardtii, Tetrabaena socialis, Gonium pectorale and
Volvox carteri.
Domain Annotation Chlamydomonas Tetrabaena Gonium Volvox
reinhardtii socialis pectorale carteri
PF00125 Histones 124 170 134 56
PF00023 Ankyrin 80 135 143 31
Repeats
PF00112 Cysteine 24 17 18 16
Protease
PF05548 Gametolysin 50 38 41 109
PF00560 Leucine- 36 39 21 23
Rich
Repeats
Supplementary Table 7: HMMER3 detected Ankyrin repeat domains in
Chlamydomonas reinhardtii, Tetrabaena socialis, Gonium pectorale and
Volvox carteri.
Ankyrin Chlamydomonas Tetrabaena Gonium Volvox
Domains reinhardtii socialis pectorale carteri
PF00023 104 203 253 61
PF12796 208 416 518 111
PF13606 112 247 286 70 PF13637 158 383 476 99
PF13857 150 365 403 105
Supplementary Table 8: Evaluation of parsimony implementations at 3 nodes
in the phylogeny for family gain and loss
Wagner’s Wagner’s Dollo
symmetric (gain asymmetric
cost=1) (gain cost=2)
Gonium/Volvox 1248 gain 261 gain 234 gain
67 loss 317 loss 406 loss
Multicellular 131 gain 517 gain 467 gain
volvocines 137 loss 70 loss 111 gain
Chlorophyceae/ 2657 gain 3099 gain 3455 gain
Trebouxiophyceae 203 loss 389 loss 601 loss
Supplementary Table 9: Gene Families gained at the ORIMC. The most
common domain found is included as well as a description of the domain.
Consensus annotations derived from Chlamydomonas reinhardtii v5.5 and
Volvox carteri v2 annotations as well as from argot2 are provided for each
family.
Common Domain/s
Family ID Domain/s Description Description
Protein kinase
OGFLEX_1019 PF00069 domain STELAR K+ outward rectifier Cornifin (SPRR) Putative nucleic acid binding
OGFLEX_1034 PF02389 family family
Ankyrin repeats (3
OGFLEX_1035 PF12796 copies) Ankyrin repeat family protein
PF14295 PAN domain/
/ Calcium-binding Cysteine proteinases
OGFLEX_1046 PF07645 EGF domain superfamily protein
Cysteine proteinases
OGFLEX_1063 PF00008 EGF-like domain superfamily protein
OGFLEX_1079 Unknown
Adenylate and
Guanylate cyclase Unknown putative signalling
OGFLEX_1126 PF00211 catalytic domain functions
BTB-POZ and MATH domain
OGFLEX_1172 PF00651 BTB/POZ domain 6
OGFLEX_1215 PF03016 Exostosin family Exostosin family protein
OGFLEX_1256 Unknown
General control non-
OGFLEX_1271 PF00098 Zinc knuckle repressible
Cysteine-rich
secretory protein
OGFLEX_1382 PF00188 family Unknown
BTB And C-terminal
OGFLEX_1467 PF07707 Kelch Unknown
OGFLEX_1610 PF14295 PAN domain Cysteine proteinases superfamily
OGFLEX_1619 PF02892 BED zinc finger Unknown
OGFLEX_2177 Unknown
OGFLEX_4318 Unknown
DNA N-6-adenine-
methyltransferase Unknown putative
OGFLEX_4319 PF05869 (Dam) methyltransferase activity
Microsomal signal
peptidase
12 kDa subunit Putative proteolytic signal
OGFLEX_4334 PF06645 (SPC12) peptidase
OGFLEX_5276 Unknown
CAP (Cysteine-rich secretory
proteins, Antigen 5, and
Cysteine-rich Pathogenesis-related 1
secretory protein protein)
OGFLEX_5317 PF00188 family superfamily
Similar to bacterial putative
OGFLEX_5320 PF01549 ShK domain-like methyl transferase
Calcineurin-like Unknown hydrolase putative
OGFLEX_5717 PF00149 phosphoesterase activity
OGFLEX_5804 Unknown glycine-rich family
Protein tyrosine ACT-like protein tyrosine
OGFLEX_6272 PF07714 kinase kinase family protein
OGFLEX_7103 Unknown
OGFLEX_7221 Unknown
OGFLEX_7232 Unknown
Ubiquitin fold Putative ubiquitin fold
OGFLEX_7343 PF03671 modifier 1 protein modifier 1 protein
DEAD/DEAH box
RNA helicase family
OGFLEX_7355 PF00270 protein RECQ helicase SIM
OGFLEX_7359 PF13639 Ring finger domain RPM1 interacting protein 2
Galactosyltransferas Galactosyltransferase family
OGFLEX_7429 PF01762 e protein
Colon cancer- Unknown putative
associated protein microtubule
OGFLEX_7511 PF07035 Mic1-like kinesin
OGFLEX_7538 PF02535 ZIP Zinc transporter Zinc/iron permease
OGFLEX_8628 PF00484 Carbonic anhydrase Unknown
RNA recognition
motif.
(a.k.a. RRM, RBD, Unknown putative nucleotide
OGFLEX_8632 PF00076 or RNP domain) binding
CAAD domains of
cyanobacterial
aminoacyl-tRNA
OGFLEX_8798 PF14159 synthetase Unknown
Glycosyl hydrolase
OGFLEX_8819 PF00331 family Glycosyl hydrolase family Glycosyl transferase
OGFLEX_8913 PF00535 family Glycosyl transferase family
OGFLEX_8916 Unknown
Chlorophyll A-B Chlorophyll A-B binding
OGFLEX_8987 PF00504 binding protein family
Protein phosphatase 2C
OGFLEX_9949 family
Formin Homology 2
OGFLEX_9953 PF02181 Domain Putative formin like family
OGFLEX_9954 MAPKKK-related family
OGFLEX_9968 Unknown
Domain of unknown
OGFLEX_9969 PF11822 function (DUF3342) Unknown
Fasciclin domain containing
OGFLEX_10455 PF02469 Fasciclin domain family
OGFLEX_10485 PF03016 Exostosin family Exostosin family protein
OGFLEX_10494 Unknown
Domain of unknown
OGFLEX_10512 PF14494 function (DUF4436) Unknown
PPPDE putative
OGFLEX_10556 PF05903 peptidase domain Unknown
OGFLEX_10589 Unknown
SM0018
OGFLEX_10729 4 Ring finger Breast cancer susceptibility1 SPFH/Band 7/PHB domain-
SPFH domain / Band containing membrane-
OGFLEX_12088 PF01145 7 family associated protein family
OGFLEX_12096 Unknown
OGFLEX_12100 Recovery protein 3 family
OGFLEX_12102 Unknown
Cyclic nucleotide-
OGFLEX_12103 PF00027 binding domain Unknown
MutL C terminal DNA mismatch repair protein,
OGFLEX_12106 PF08676 dimerisation domain putative
OGFLEX_12109 Unknown
NOL1/NOP2/sun NOL1/NOP2/sun family
OGFLEX_12898 PF01189 family protein
Polynucleotidyl transferase,
CAF1 family ribonuclease H-like
OGFLEX_13054 PF04857 ribonuclease superfamily protein
Protein kinase Unknown putative protein
OGFLEX_13080 PF00069 domain kinase family
Uncharacterized
conserved protein
OGFLEX_13153 PF10184 (DUF2358) Unknown
Ribonuclease III family
OGFLEX_13170 protein
Cyclophilin type Peptidyl-prolyl cis-trans
OGFLEX_13210 PF00160 peptidyl-prolyl isomerases cis-trans
isomerase/CLD
OGFLEX_13554 Unknown
Protein of unknown
OGFLEX_13619 PF07059 function (DUF1336) Unknown
Adenylyl- / guanylyl
SM0004 cyclase, catalytic Unknown putative signalling
OGFLEX_13648 4 domain functions
Leucine rich repeat,
SM0036 ribonuclease
OGFLEX_14038 8 inhibitor type Unknown
OGFLEX_14319 PF00258 Flavodoxin Putative P450 family
DNA-binding domain
in plant proteins
SM0038 such as APETALA2
OGFLEX_14401 0 and EREBPs AINTEGUMENTA-like 5
OGFLEX_14433 Unknown
N -terminal region of
Chorein,
a TM vesicle-
OGFLEX_14435 PF12624 mediated sorter Autophagy 2
2OG-Fe(II)
oxygenase 2OG-Fe(II) oxygenase
OGFLEX_14464 PF13532 superfamily superfamily
OGFLEX_14477 PF04117 Mpv17 / PMP22 Peroxisomal membrane family 22 kDa (Mpv17/PMP22)
family
OGFLEX_14502 PF03265 Deoxyribonuclease II Unknown
Proliferating cell
nuclear antigen,
OGFLEX_14506 PF02747 C-terminal domain Unknown
Protein of unknown Protein of unknown function
OGFLEX_14507 PF04720 function (DUF506) (DUF506)
OGFLEX_14508 PF00023 Ankyrin repeat Unknown
OGFLEX_14509 Unknown
Formin Homology 2
OGFLEX_14510 PF02181 Domain Actin-binding FH2 protein
OGFLEX_14511 PF00458 WHEP-TRS domain Unknown
OGFLEX_14513 Unknown
OGFLEX_14514 Unknown
OGFLEX_14515 Unknown
CASC3/Barentsz CASC3/Barentsz
OGFLEX_14516 PF09405 eIF4AIII binding eIF4AIII binding
RecF/RecN/SMC N Structural maintenance of
OGFLEX_14517 PF02463 terminal domain chromosome 3
Right handed beta ATPase, AAA-type, CDC48
OGFLEX_14518 PF13229 helix region family
OGFLEX_14519 Unknown
OGFLEX_14521 PF00069 Protein kinase Unknown putative protein domain kinase family
OGFLEX_14522 PF12906 RING-variant domain Unknown
OGFLEX_14523 Unknown
OGFLEX_14525 Unknown
Protein of unknown
OGFLEX_14526 PF10049 function (DUF2283) Unknown
OGFLEX_14527 Unknown
OGFLEX_14528 Unknown
FKBP-type peptidyl-prolyl
cis-trans isomerase
OGFLEX_14529 PF13414 TPR repeat family protein
OGFLEX_14530 Unknown
Homeodomain-like/winged-
helix
OGFLEX_14532 DNA-binding family protein
OGFLEX_14533 Unknown
OGFLEX_14534 Unknown
OGFLEX_14535 Unknown
OGFLEX_14536 Unknown
OGFLEX_14537 PF08373 RAP domain RAP family
N-acylglucosamine
2-epimerase
(GlcNAc 2- Putative cellobiose
OGFLEX_14538 PF07221 epimerase) epimerase OGFLEX_14539 Unknown
OGFLEX_14541 Unknown
OGFLEX_14542 Unknown
OGFLEX_14545 Unknown
OGFLEX_14546 Unknown
OGFLEX_14547 Unknown
Up -frameshift Unknown RNA binding
OGFLEX_14548 PF04050 suppressor protein family
OGFLEX_14549 Unknown
Diacylglycerol Diacylglycerol acyltransferase
OGFLEX_14550 PF03982 acyltransferase family
WD40/YVTN
WD domain, G-beta repeat-like-containing
OGFLEX_14551 PF00400 repeat domain family
OGFLEX_14552 FMN binding family
PAS domain-containing
Protein kinase protein tyrosine kinase
OGFLEX_14553 PF00069 domain family
Hemerythrin HHE
cation binding
OGFLEX_14554 PF01814 domain Unknown
Protein of unknown
OGFLEX_14555 PF12051 function (DUF3533) Unknown
Protein of unknown
OGFLEX_14556 PF08574 function (DUF1762) Unknown OGFLEX_14557 PF03600 Citrate transporter Unknown
OGFLEX_14559 Unknown
OGFLEX_14560 PF09359 VTC domain Unknown
OGFLEX_14561 Unknown
Peptidase family
OGFLEX_14562 PF01434 M41 FTSH protease 10 family
CorA-like Mg2+
OGFLEX_14563 PF01544 transporter protein Magnesium transporter 9
WD domain, G-beta Transducin family protein
OGFLEX_14564 PF00400 repeat / WD-40 repeat family
OGFLEX_14566 PF03016 Exostosin family Exostosin family protein
OGFLEX_14567 PF01124 MAPEG family Unknown
OGFLEX_14568 Unknown
Supplementary Table 10: Gene families expanded in multicellular volvocines.
For column 3 only the most common domain within a family is listed. Column
4 includes Chlamydomonas reinhardtii v5.5 annotations.
Gene Family PFAM Description of Chlamydomonas reinhardtii
domain Domain derived annotation
OGFLEX_1000 PF12796 Ankyrin repeats Unknown
(3 copies)
OGFLEX_1010 PF03372 Endonuclease/Ex Unknown
onuclease/phosp
hatase family
OGFLEX_1022 PF00078 Reverse Unknown transcriptase
(RNA-dependent
DNA polymerase)
OGFLEX_1097 PF02889 Sec63 Brl domain U5 small nuclear
ribonucleoprotein helicase,
putative
OGFLEX_1274 PF09359 VTC domain Major Facilitator Superfamily
with SPX
(SYG1/Pho81/XPR1)
OGFLEX_1334 PF03054 tRNA methyl tRNA (5-
transferase methylaminomethyl-2-
thiouridylate)-
methyltransferases
OGFLEX_1436 PF02676 Methyltransferase Met-10+ like family protein /
TYW3 kelch repeat-containing
protein
OGFLEX_1480 PF00069 Protein kinase casein kinase 1-like protein
domain 2
OGFLEX_1509 PF08029 HisG, C-terminal ATP phosphoribosyl
domain transferase 2
OGFLEX_1575 PF00069 Protein kinase Protein kinase superfamily
domain protein/mixed-lineage
kinase and Raf protein
kinase-like
OGFLEX_1581 PF00580 UvrD/REP P-loop containing helicase N- nucleoside triphosphate
terminal domain hydrolases superfamily
protein
OGFLEX_1614 PF00504 Chlorophyll A-B photosystem I light
binding protein harvesting complex gene 3
OGFLEX_1895 PF02922 Carbohydrate- limit dextrinase
binding module
48 (Isoamylase
N-terminal
domain)
OGFLEX_1966 PF00462 Glutaredoxin Glutaredoxin family protein
OGFLEX_2166 PF00112 Papain family Cysteine proteinases
cysteine protease superfamily protein
OGFLEX_2171 PF00106 short chain NAD(P)-binding Rossmann-
dehydrogenase fold superfamily protein
OGFLEX_3446 Unknown
OGFLEX_3451 PF13589 Histidine kinase-, MUTL protein homolog 3
DNA gyrase B-,
and HSP90-like
ATPase
OGFLEX_4309 PF00232 Glycosyl beta glucosidase 29
hydrolase family
1
OGFLEX_4312 PF03992 Antibiotic Unknown
biosynthesis monooxygenase
OGFLEX_4363 PF12499 Pherophorin Unknown
OGFLEX_4885 PF01494 FAD binding zeaxanthin epoxidase (ZEP)
domain
OGFLEX_4887 PF01485 IBR domain RING/U-box superfamily
protein
OGFLEX_4915 PF06650 Protein of Unknown
unknown function
(DUF1162
OGFLEX_5737 PF03351 DOMON domain Auxin-responsive family
protein
OGFLEX_6377 PF12499 Pherophorin Pherophorin
OGFLEX_6403 Unknown
Supplementary Table 11: Genes identified using the Potion pipeline and
Codeml M7/8 model with significant (q=0.05) evidence of positive selection.
Descriptions are from Chlamydomonas reinhardtii v5.5 annotations. Blank cells indicate that the gene has not been characterised.
Chlamydomonas Description
Reference Gene
Cre01.g004500 isopropyl malate isomerase large subunit 1
Cre01.g016050 Pre-mRNA-processing-splicing factor
Cre01.g034400 NagB/RpiA/CoA transferase-like superfamily protein
Cre01.g043350 Pheophorbide a oxygenase family protein with Rieske [2Fe-2S] domain
Cre01.g055420 Protein phosphatase 2A, regulatory subunit PR55
Cre02.g080200 Transketolase
Cre02.g091050 Aldolase superfamily protein
Cre02.g098450 eukaryotic translation initiation factor 2 gamma subunit
Cre02.g113400
Cre02.g118900 Metal-dependent protein hydrolase
Cre02.g143200 Alanyl-tRNA synthetase
Cre03.g168450 TCP-1/cpn60 chaperonin family protein
Cre03.g178150 calmodulin 6
Cre03.g180750 methionine synthase 3
Cre03.g183200 adenosine monophosphate kinase
Cre03.g185550 sedoheptulose-bisphosphatase
Cre03.g193850 Succinyl-CoA ligase, alpha subunit
Cre04.g214500 isocitrate dehydrogenase
Cre04.g216600 regulatory particle triple-A ATPase 6A
Cre04.g229300 rubisco activase
Cre05.g233800 glycyl-tRNA synthetase / glycine--tRNA ligase
Cre05.g234639 TCP-1/cpn60 chaperonin family protein
Cre05.g234652 plastid transcriptionally active 17
Cre06.g267350
Cre06.g269950 ATPase, AAA-type, CDC48 protein
Cre06.g278142 Methylthiotransferase
Cre06.g281250 Cyclopropane-fatty-acyl-phospholipid synthase Cre06.g294950 NAD(P)-binding Rossmann-fold superfamily protein
Cre06.g308350 flower-specific, phytochrome-associated protein phosphatase 3
Cre06.g308500 carbamoyl phosphate synthetase A
Cre07.g325500 magnesium-chelatase subunit chlH, chloroplast, putative / Mg-
protoporphyrin IX chelatase, putative (CHLH)
Cre07.g329700 regulatory particle AAA-ATPase 2A
Cre07.g340350
Cre08.g359350 acetyl Co-enzyme a carboxylase biotin carboxylase subunit
Cre08.g375500 glutamine-fructose-6-phosphate transaminase (isomerizing)s;sugar
binding;transaminases
Cre08.g378850 Phosphoribosyltransferase family protein
Cre09.g394436 Inorganic H pyrophosphatase family protein
Cre09.g405850 NADH dehydrogenase subunit 7
Cre09.g410950 nitrate reductase 2
Cre09.g415550
Cre10.g430050 ubiquitin-conjugating enzyme 22
Cre10.g433000 glycine-tRNA ligases
Cre10.g434750 ketol-acid reductoisomerase
Cre10.g439150 regulatory particle triple-A ATPase 5A
Cre10.g449250
Cre10.g461050 vacuolar ATP synthase subunit A
Cre11.g476850
Cre11.g481450 ATPase, F0 complex, subunit B/B\', bacterial/chloroplast
Cre12.g483950 Lactate/malate dehydrogenase family protein Cre12.g484000 acetyl-CoA carboxylase carboxyl transferase subunit beta
Cre12.g490350 4-hydroxy-3-methylbut-2-enyl diphosphate synthase
Cre12.g494900 protein phosphatase X 2
Cre12.g508150 chromatin-remodeling protein 11
Cre12.g514200 Glucose-methanol-choline (GMC) oxidoreductase family protein
Cre12.g560900 FAD/NAD(P)-binding oxidoreductase family protein
Cre13.g564250 eukaryotic translation initiation factor 3A
Cre13.g568650 Ribosomal protein S3Ae
Cre13.g572700 FIZZY-related 3
Cre13.g573351 Ribosomal protein S5 domain 2-like superfamily protein
Cre13.g582201 nudix hydrolase homolog 24
Cre13.g584400
Cre13.g587050 eukaryotic release factor 1-3
Cre13.g606050 Tyrosyl-tRNA synthetase, class Ib, bacterial/mitochondrial
Cre14.g612700 Galactose oxidase/kelch repeat superfamily protein
Cre14.g625400 regulatory particle triple-A 1A
Cre16.g666150
Cre16.g672385 histidinol phosphate aminotransferase 1
Cre16.g676197 26S proteasome regulatory subunit S2 1B
Cre17.g697450 RNA polymerase I-associated factor PAF67
Cre17.g703900 Transducin/WD40 repeat-like superfamily protein
Cre17.g734200 Pyridoxal phosphate (PLP)-dependent transferases superfamily
protein
Cre17.g734450 Ribosomal protein L19 family protein
Supplementary Table 12: Gene ontology (GO) enrichment analysis of gene identified by Potion as positively selected (p<0.05). Gene ontology terms were derived from Chlamydomonas reinhardtii v5.5 annotations.
Enriched GO terms P-vale glycine-tRNA ligase activity 0,00089293 response to bacterium 0,0013308 defense response to bacterium 0,0029941 embryo sac development 0,0035238 carboxylic acid metabolic process 0,0035564 organic acid metabolic process 0,0035564 oxoacid metabolic process 0,0035564 ligase activity 0,0041871 cellular ketone metabolic process 0,0044649 response to biotic stimulus 0,0049168 glycoside biosynthetic process 0,0049527 proteasomal protein catabolic process 0,0049527 gametophyte development 0,0050684 acetyl-CoA carboxylase activity 0,0051529 response to cadmium ion 0,0078005 multicellular organismal development 0,010828 ligase activity, forming carbon-carbon bonds 0,012392
CoA carboxylase activity 0,012392 response to other organism 0,014293 amino acid activation 0,016176 tRNA aminoacylation 0,016176 tRNA aminoacylation for protein translation 0,016176 porphyrin biosynthetic process 0,016176 tetrapyrrole biosynthetic process 0,01774 fatty acid omega-oxidation 0,018821 glycyl-tRNA aminoacylation 0,018821 sucrose biosynthetic process 0,018821 meristem growth 0,018821 isocitrate metabolic process 0,018821 cellular response to redox state 0,018821 cell-cell signaling 0,018821 response to redox state 0,018821 response to metal ion 0,019039 multi-organism process 0,020574 ligase activity, forming aminoacyl-tRNA and related 0,023233 compounds ligase activity, forming carbon-oxygen bonds 0,023233 aminoacyl-tRNA ligase activity 0,023233 glycoside metabolic process 0,023685 defense response 0,024101
5-methyltetrahydropteroyltriglutamate-homocysteine 0,030189
S-methyltransferase activity chlorophyllide a oxygenase activity 0,030189 6-phosphogluconolactonase activity 0,030189 sedoheptulose-bisphosphatase activity 0,030189 transketolase activity 0,030189 oxidoreductase activity, acting on single donors with 0,030189 incorporation of molecular oxygen, incorporation of one atom of oxygen (internal monooxygenases or internal mixed function oxidases)
ADP binding 0,030189 sugar binding 0,030189 ribulose-1,5-bisphosphate carboxylase/oxygenase 0,030189 activator activity isocitrate dehydrogenase (NADP+) activity 0,030189 mandelonitrile lyase activity 0,030189
5-methyltetrahydrofolate-dependent 0,030189 methyltransferase activity
5-methyltetrahydropteroyltri-L-glutamate-dependent 0,030189 methyltransferase activity enoyl-[acyl-carrier-protein] reductase (NADH) activity 0,030189 enoyl-[acyl-carrier-protein] reductase activity 0,030189 nitrate reductase activity 0,030189 ketol-acid reductoisomerase activity 0,030189 carbamoyl-phosphate synthase (glutamine- 0,030189 hydrolyzing) activity methionine synthase activity 0,030189 porphobilinogen synthase activity 0,030189 glutamine-fructose-6-phosphate transaminase 0,030189
(isomerizing) activity intramolecular transferase activity, transferring 0,030189 hydroxy groups fatty acid metabolic process 0,030597 modification-dependent macromolecule catabolic 0,032164 process modification-dependent protein catabolic process 0,032164 ubiquitin-dependent protein catabolic process 0,032164 monocarboxylic acid metabolic process 0,032197 proteolysis involved in cellular protein catabolic 0,033778 process proteasome regulatory particle, base subcomplex 0,034436 porphyrin metabolic process 0,035517 alanyl-tRNA aminoacylation 0,037295 root cap development 0,037295 proteasome core complex assembly 0,037295 root morphogenesis 0,037295 maintenance of root meristem identity 0,037295 nitric oxide biosynthetic process 0,037295 nitric oxide metabolic process 0,037295 fatty acid biosynthetic process 0,037902 tetrapyrrole metabolic process 0,037902 response to inorganic substance 0,039546 carbohydrate biosynthetic process 0,040367 cellular nitrogen compound biosynthetic process 0,041151 chlorophyll biosynthetic process 0,04378
Supplementary Table 13: Highly conserved elements identified by
PhastCONS sorted by annotation feature. The features were derived from
Chlamydomonas reinhardtii v5.5 annotations.
Annotation Feature Percentage of Conserved Bases
(811 033bp)
Gene 99,27%
Intergenic 0,73%
3 prime 0,28%
5 prime 0,22%
Exon 70,89%
CDS 70,52%
Intron 28,37%
*Note: In alternative transcripts 3’ and 5’ regions may overlap with CDS regions from another alternative transcript and thus a degree of redundancy is included within these calculations. Highly conserved elements from transcribed but un-translated regions are estimated to amount to 0,37%.
Supplementary Table 14: Intergenic regions and nearby gene identified as highly conserved in the volvocine lineage. Conserved regions are referenced to the Chlamydomonas reinhardtii genome. In column 4 the distance of the conserved region to a coding locus is indicated (in base-pairs) such that a (-) symbol indicates that the locus is upstream of the coding locus and no symbol indicates that the region is downstream of the coding locus.
Chromosom Conserv Nearest Distance Description e ed Gene from
Region Gene
(bp)
Chr1 1148- Cre01.g000 -17596
1170 017
Chr1 1958- Cre01.g000 -16777
1989 017
Chr1 2004- Cre01.g000 -16616
2150 017
Chr1 2164- Cre01.g000 -16577
2189 017
Chr1 2395- Cre01.g000 -16325
2441 017
Chr1 2648- Cre01.g000 -15864
2902 017
Chr1 2933- Cre01.g000 -15620
3146 017
Chr1 3250- Cre01.g000 -15467
3299 017 Chr1 16629- Cre01.g000 -2040
16726 017
Chr1 17000- Cre01.g000 -1700
17066 017
Chr1 17222- Cre01.g000 -1485
17281 017
Chr1 17301- Cre01.g000 -1418
17348 017
Chr1 17425- Cre01.g000 -1269
17497 017
Chr1 17512- Cre01.g000 -1084
17682 017
Chr1 17699- Cre01.g000 -1004
17762 017
Chr1 17885- Cre01.g000 -679
18087 017
Chr1 18348- Cre01.g000 -385
18381 017
Chr1 18427- Cre01.g000 -272
18494 017
Chr1 18534- Cre01.g000 -127
18639 017
Chr1 3126208- Cre01.g019 298 sumo conjugation
3126237 450 enzyme 1 Chr1 3126243- Cre01.g019 333
3126274 450
Chr1 3266665- Cre01.g020 1498
3266702 650
Chr1 4858662- Cre01.g033 -6758 maternal effect embryo
4858762 700 arrest 18
Chr1 6221036- Cre01.g044 3082
6221054 450
Chr1 6711070- Cre01.g048 904 RNA helicase, putative
6711145 200
Chr10 1629927- Cre10.g429 -2906
1629966 750
Chr10 1629995- Cre10.g429 -2840
1630032 750
Chr10 1630207- Cre10.g429 -2645
1630227 750
Chr10 1631353- Cre10.g429 -1500
1631372 750
Chr11 479595- Cre11.g467 1159 Calcium-dependent
479627 595 lipid-binding (CaLB
domain) family protein
Chr11 1888175- Cre11.g468 345
1888203 100
Chr11 3521801- Cre11.g481 -952 3521854 375
Chr12 365823- Cre12.g494 -155
365849 252
Chr12 5562351- Cre12.g531 -1269 eukaryotic translation
5562410 550 initiation factor 2 beta
subunit
Chr12 5562763- Cre12.g531 -891
5562788 550
Chr12 6219399- Cre12.g536 211
6219432 683
Chr12 6219506- Cre12.g536 81
6219562 683
Chr12 9716381- Cre12.g557 1832
9716416 503
Chr13 2602283- Cre13.g581 2164
2602367 200
Chr13 3005242- Cre13.g584 -1120
3005263 250
Chr13 3005398- Cre13.g584 -1276
3005435 250
Chr13 3005560- Cre13.g584 -1438
3005628 250
Chr13 3005740- Cre13.g584 -1618
3005766 250 Chr13 3005806- Cre13.g584 -1684
3005867 250
Chr13 3005944- Cre13.g584 -1822
3005972 250
Chr14 1881883- Cre14.g620 -4860
1881971 702
Chr14 1882292- Cre14.g620 -5269
1882320 702
Chr14 1882933- Cre14.g620 -5910
1882952 702
Chr14 1883022- Cre14.g620 -5999
1883045 702
Chr14 4124838- Cre14.g634 -611
4124905 193
Chr15 1754425- Cre15.g643 1361
1754438 515
Chr16 1072640- Cre16.g649 -1196
1072668 500
Chr16 1074963- Cre16.g649 935
1075035 550
Chr16 2491796- Cre16.g660 -449
2491852 850
Chr16 2492003- Cre16.g660 -656
2492046 850 Chr16 6228369- Cre16.g675 -375
6228387 050
Chr16 6228409- Cre16.g675 -415
6228427 050
Chr16 6228581- Cre16.g675 -587
6228605 050
Chr17 4547861- Cre17.g732 2751
4547933 700
Chr17 4885028- Cre17.g734 -257 translocon outer
4885089 300 complex protein 120
Chr2 6161018- Cre02.g113 -1144 ataurora3
6161173 600
Chr2 6161454- Cre02.g113 -1580
6161502 600
Chr2 6162350- Cre02.g113 1752
6162368 626
Chr2 9199259- Cre02.g143 255
9199274 647
Chr4 1819109- Cre04.g211 1087 Chlorophyll A-B binding
1819129 850 family protein
Chr4 1880001- Cre04.g211 1252
1880082 850
Chr4 1881351- Cre04.g217 -1471
1881432 900 Chr4 2179600- Cre04.g219 1117
2179636 800
Chr4 2180593- Cre04.g219 1591
2180643 851
Chr4 2270007- Cre04.g220 -941 K+ efflux antiporter 2
2270043 200
Chr4 2824610- Cre04.g224 1114 Actin-like ATPase
2824693 150 superfamily protein
Chr5 685313- Cre05.g230 1440 SNF2 domain-
685348 800 containing protein /
helicase domain-
containing protein / F-
box family protein
Chr5 714533- Cre05.g230 264 Regulator of
714606 600 chromosome
condensation (RCC1)
family protein
Chr5 715213- Cre05.g230 944
715284 600
Chr5 3482251- Cre05.g243 848
3482286 358
Chr5 3482295- Cre05.g243 788
3482346 358
Chr6 3272450- Cre06.g276 19 histone H2A 10 3272470 950
Chr6 6607837- Cre06.g276 243
6607911 950
Chr7 2239598- Cre07.g327 186
2239705 317
Chr7 5851183- Cre07.g353 285
5851255 600
Chr7 6205920- Cre07.g356 508
6205979 450
Chr8 533541- Cre08.g359 -30
533581 166
Chr8 3473462- Cre08.g374 -3949
3473558 950
Chr8 3761889- Cre08.g377 606 chromatin remodeling
3761931 200 factor CHD3 (PICKLE)
Chr8 4810011- Cre08.g384 622 P-loop containing
4810056 355 nucleoside triphosphate
hydrolases superfamily
protein
Chr8 4810082- Cre08.g384 577
4810101 355
Chr8 5030323- Cre08.g386 -6689
5030388 100
Chr8 5030444- Cre08.g386 -6810 5030656 100
Chr8 5030729- Cre08.g386 -7095
5030752 100
Chr8 5030778- Cre08.g386 -7144
5030875 100
Chr8 5030934- Cre08.g386 -7300
5031004 100
Chr9 584478- Cre09.g403 944
584579 700
Chr9 2191673- Cre09.g393 3
2191696 100
Chr9 2344681- Cre09.g392 89
2344706 105
Chr9 2573667- Cre09.g390 867 Protein phosphatase 2C
2573687 678 family protein
Chr9 3997546- Cre09.g392 -2250
3997661 840
Chr9 3997748- Cre09.g392 -2112
3997799 840
Chr9 6120006- Cre09.g405 535 Pseudouridine synthase
6120075 150 family protein
Chr9 6429892- Cre09.g407 -1051 Ribosomal protein S5
6429918 100 family protein
Chr9 6475030- Cre09.g407 2135 alpha/beta-Hydrolases 6475058 300 superfamily protein
Chr9 6817306- Cre09.g409 -1469
6817396 728
Supplementary Table 15: Genes evolving more rapidly in the multicellular volvocines than in Chlamydomonas reinhardtii but not necessarily faster than the neutral rate. Genes are sorted by putative functions into related to signalling, ubiquitination (the ubiquitin proteasomal pathway), membrane trafficking, cell cycle regulators, chaperone activity, integral components of the cytoskeleton, microtubule interacting kinesins and genes coding for DNA interacting proteins
Gene Name/symbol/Description Chlamydomonas reinhardtii
identifier
Signalling
Calcium-Ion Dependant or related:
Calmodulin 4 (CAM4) Cre06.g292150
Centrin 2 (CEN2) Cre01.g055428
Kinesin-like calmodulin-binding Cre06.g278125
(zwickel)
Membrane Trafficking:
RAB GTPase homolog B1C member Cre09.g390763 of RAB gene family (RABB1C)
Endomembrane protein 70 protein Cre01.g024350 family Major facilitator superfamily protein Cre16.g688850
Alpha-adaptin (alpha-ADR) Cre12.g488850
Sorting nexin 2B (SNX2b) Cre07.g326450
Importin alpha isoform 1 (IMP) Cre04.g215850
Importin alpha isoform 9 (IMPA-9) Cre03.g161100
Transporter associated with antigen Cre02.g083354 processing protein 1 (TAP1)
Exostosin family protein Cre02.g116600
Serine/Threonine Kinases:
Leucine-rich repeat receptor-like Cre06.g269200 protein kinase family protein (FLS2)
Protein kinase family proteins: Cre06.g281100
Protein kinase superfamily protein Cre13.g568050
(OST1)
Protein kinase superfamily protein Cre16.g673821
PAS-domain containing protein Cre13.g563300 tyrosine kinase family protein
Serine/threonine protein kinase Rio1 Cre13.g580900
Cytoskeleton/Microtubule/Microtubule motor activity
Myosin 1 (MYO1) Cre09.g416250
Tubulin beta chain 2 (TUB2) Cre12.g542250
Tubulin beta chain 2 (TUB2) Cre13.g549550
Spc97 / Spc98 family of spindle pole Cre12.g525500 body (SBP) P-loop containing nucleoside Cre10.g427700 triphosphate hydrolases superfamily protein
P-loop containing nucleoside Cre13.g568450 triphosphate hydrolases superfamily protein
Cell Cycle
Retinoblastoma-related 1 (RB/MAT3) Cre06.g255450
Regulator of chromosome Cre16.g691050 condensation family with FYVE zinc finger domain (RCC1)
Cell differentiation Rcd1-like protein Cre12.g540400
Cyclin-dependent kinase B1 (CDKB1) Cre08.g372550
Ataurora1 (AUR1) Cre17.g719450
Transducin/WD40 repeat-like Cre06.g255400 superfamily protein
Transducin/WD40 repeat-like Cre12.g522450 superfamily protein (ubiquitin-related)
Transducin family protein / WD-40 Cre17.g703900 repeat family protein (CUL4 RING ubiquitin ligase complex)
Ubiquitination
Ubiquitin activating enzyme 2 (UBA2) Cre09.g386400
Ubiquitin-protein ligase 1 (UPL1) Cre10.g433900 Chaperonins
Heat shock protein 70A (HSP70A) Cre08.g372100
Heat shock protein 70 family member Cre02.g080700
(BIP2)
DNA/RNA interacting
Topoisomerase II (TOPII) Cre01.g009250
DNA-directed RNA polymerase family Cre02.g075150 protein (NRPB2)
RNA polymerase II large subunit Cre16.g680900
(NRPB1)
MEI2-like protein 5 RNA binding Cre07.g338150 protein (AML5)
DNA/RNA-binding protein Kin17 Cre16.g654350
Squamosa promoter binding protein- Cre09.g390023 like 1 (SPL1)
LA RNA-binding protein Cre10.g441200
Nuclear RNA polymerase C2 (NRPC2) Cre12.g542800
Decapping 2 (DCP2) Cre07.g344850
Myb domain protein 3r-5 (MYB3R-5) Cre01.g034350
Supplementary table 16: Significantly enriched annotations associated with accelerating genes. Included are gene ontology terms, MapMAn annotations and KEGG pathways.
Annotation Category P-value /Level
Gene Ontology terms deoxyribonucleotide metabolic process Biological 0.0035976
process
(BP) response to high light intensity BP 0.014487 nucleobase, nucleoside and nucleotide BP 0.01457 metabolic process nucleoside phosphate metabolic process BP 0.020628 nucleotide metabolic process BP 0.020628 acetate metabolic process BP 0.025094 cellular monovalent inorganic cation BP 0.025094 homeostasis deoxyribonucleoside triphosphate BP 0.025094 biosynthetic process deoxyribonucleoside triphosphate metabolic BP 0.025094 process
ER-associated protein catabolic process BP 0.025094 generative cell differentiation BP 0.025094
L-phenylalanine biosynthetic process BP 0.025094 maltose biosynthetic process BP 0.025094 photosynthetic electron transport in BP 0.025094 cytochrome b6/f regulation of cellular pH BP 0.025094 regulation of intracellular pH BP 0.025094 response to hydroperoxide BP 0.025094 response to lipid hydroperoxide BP 0.025094 response to non-ionic osmotic stress BP 0.025094
S-adenosylmethionine biosynthetic process BP 0.025094
S-adenosylmethionine metabolic process BP 0.025094 response to light intensity BP 0.034549 response to temperature stimulus BP 0.041941 regulation of cell cycle BP 0.048085 aromatic amino acid family biosynthetic BP 0.049569 process, prephenate pathway actin filament-based movement BP 0.049569 cytoskeleton-dependent intracellular BP 0.049569 transport deoxyribonucleoside diphosphate metabolic BP 0.049569 process deoxyribonucleotide biosynthetic process BP 0.049569
L-phenylalanine metabolic process BP 0.049569 monovalent inorganic cation homeostasis BP 0.049569 nucleoside diphosphate metabolic process BP 0.049569 nucleotide transport BP 0.049569 phosphoinositide-mediated signaling BP 0.049569 purine nucleotide transport BP 0.049569 regulation of pH BP 0.049569 ribonucleoside-diphosphate reductase Molecular 0.0016465 activity function
(MF) oxidoreductase activity, acting on CH or CH2 MF 0.0093643 groups, disulfide as acceptor acid-thiol ligase activity MF 0.0093643
CoA-ligase activity MF 0.0093643 ligase activity, forming carbon-sulfur bonds MF 0.015197 oxidoreductase activity, acting on CH or CH2 MF 0.015197 groups motor activity MF 0.020867 copper ion binding MF 0.028122
D-arabinono-1,4-lactone oxidase activity MF 0.040881 electron transporter, transferring electrons MF 0.040881 from cytochrome b6/f complex of photosystem II activity
GDP-mannose 3,5-epimerase activity MF 0.040881 oxidoreductase activity, acting on the CH- MF 0.040881
OH group of donors, oxygen as acceptor acetate-CoA ligase activity MF 0.040881 acireductone synthase activity MF 0.040881 aconitate hydratase activity MF 0.040881 acyl-[acyl-carrier-protein] hydrolase activity MF 0.040881 anion:cation symporter activity MF 0.040881 arogenate dehydratase activity MF 0.040881 beta-amylase activity MF 0.040881 cyclin binding MF 0.040881
GDP binding MF 0.040881 homoserine dehydrogenase activity MF 0.040881 methionine adenosyltransferase activity MF 0.040881 prephenate dehydratase activity MF 0.040881 sodium:dicarboxylate symporter activity MF 0.040881 ubiquitin activating enzyme activity MF 0.040881
MapMan Term Level cell 1 0.0034654 nucleotide metabolism 1 0.023417 transport 1 0.045198
RNA.transcription 2 0.01709 cell.organisation 2 0.022647 cell.division 2 0.026768 transport.metabolite transporters at the 2 0.033262 mitochondrial membrane nucleotide metabolism.deoxynucleotide 3 0.0002380 metabolism.ribonucleoside-diphosphate 8 reductase protein.targeting.nucleus 3 0.0024086 lipid metabolism.FA synthesis and FA 3 0.015743 elongation.ACP thioesterase amino acid metabolism.synthesis.aspartate 3 0.02801 family nucleotide metabolism.phosphotransfer and 3 0.031248 pyrophosphatases.guanylate kinase nucleotide metabolism.synthesis.PRS-PP 3 0.031248
PS.lightreaction.cytochrome b6/f 3 0.046518 amino acid metabolism.synthesis.aromatic 4 0.011686 aa.phenylalanine redox.ascorbate and 4 0.011686 glutathione.ascorbate.GME major CHO 4 0.023255 metabolism.degradation.starch.starch cleavage amino acid metabolism.synthesis.aspartate 4 0.046045 family.misc protein.degradation.ubiquitin.E1 4 0.046045
Supplementary table 17: CDS sequences accelerating in the multicellular volvocines relative to Chlamydomonas reinhardtii.
Gene Description Chlamydomonas ID
Endomembrane protein 70 protein family Cre01.g024350
ARM repeat superfamily protein Cre01.g034000
Cre01.g053450
DNA-directed RNA polymerase family protein Cre02.g075150 Cre02.g098300
Phosphatidylinositol 3- and 4-kinase family protein Cre02.g100300
Cre03.g155300
Cre03.g167550
Cre04.g217700
Cre04.g217911
Cre06.g250300
Cre06.g278212
Cre07.g324200 magnesium-chelatase subunit chlH, chloroplast, putative / Cre07.g325500
Mg-protoporphyrin IX chelatase, putative (CHLH)
Cre07.g330750
Cre08.g358150 nucleotide transporter 1 Cre08.g358526 ubiquitin activating enzyme 2 Cre09.g386400
Cre09.g391050 ribosomal protein S1 Cre09.g394750 vacuolar proton ATPase A2 Cre09.g402500 ubiquitin-protein ligase 1 Cre10.g433900
Root hair defective 3 GTP-binding protein (RHD3) Cre10.g439600
Cre11.g481750 ribonucleotide reductase 1 Cre12.g492950
Cre12.g512000
Ribosomal protein S5/Elongation factor G/III/V family Cre12.g516200 protein
Cre12.g520450
Spc97 / Spc98 family of spindle pole body (SBP) Cre12.g525500 component tubulin beta chain 1 Cre12.g542250 nuclear RNA polymerase C2 Cre12.g542800
SNF2 domain-containing protein / helicase domain- Cre14.g618860 containing protein transcription regulators Cre14.g632950 two-pore channel 1 Cre16.g665050
Cre16.g691753
Transducin/WD40 repeat-like superfamily protein Cre17.g703900
Histone superfamily protein Cre17.g711800
Cre17.g715300
Supplementary Table 18. Primers used for amplifications and sequencing of cyclin D genes of Tetrabaena socialis (TsCYCD1.3, TsCYCD2, TsCYCD3 and
TsCYCD4).
Primer Designation Sequences (5’-3’)
TsCYCD1.3_F1 CATGTCCATTGCGGTCAAGTA
TsCYCD1.3_R2 AGCTGCTGCAGGCTGAGG
TsCYCD1.3_3'RACE_F1 CGACGTCCATCCTAGACCGCTTCAC
TsCYCD1.3_5'RACE_R1 ACCTCCTCGTACTTGACCGCAATGG TsCYCD2_F1 GGATGGTGGAGGTGGTGACT
TsCYCD2R2 TGCTGTAGCCGTACGATAGGAA
TsCYCD2_3'RACE_F1 GTTGACTCGGACAACAACCCGCTCTACCTG
TsCYCD2_3'RACE_F2 AAGGGTCTCGTGTATGGTACAGCCCGTGAG
TsCYCD2_5'RACE_R1 GTACATGAGCGACAGCTCCGCCAGGAAGTT
TsCYCD2_5'RACE_R2 CAGTTCCAGGCTCGACTCATCCTCATCG
TsCYCD3_F1 CGTTCCTCGATCGCTTCCTA
TsCYCD3_R2 GAGTCCATGAGGGCAAGCTC
TsCYCD3_3'RACE_F1 GTCGCTTCTGCAGCTCCTGGCTCTAACCTG
TsCYCD3_3'RACE_F2 GACTCCGAGCTCACAGGCGTCAGCTACA
TsCYCD3_5'RACE_R1 TGCAGGAACATGGAGAGGAAGGTGTAGATG
G
TsCYCD3_IC_F2 CGTCGCTTCTGCAGCTCCTGGCTCT
TsCYCD3_IC_R3 CGATGCGGTGCAGGAACATGGAGAG
TsCYCD4_F1 AGACTCTCTTCGCCGCCACTT
TsCYCD4_R3 AAGCGCTGCAGGAAGGTGAG
Supplementary Table 19. Protein IDs, LXCXE motifs, and abbreviations of volvocine cyclin proteins used in this study.
Protein Species Name Protein IDs LXCXE Abbreviation
Name Motif
CYCD1.1 Chlamydomonas Cre11.g467772a LICTE CrCYCD1
reinhardtii
Tetrabaena scaffold_357b LQCLE TsCYCD1.1 socialis
Gonium GPECTOR_47g299a N. D GpCYCD1.1
pectorale
Volvox carteri Vocar20010063ma LTCTE VcCYCD1.1
CYCD1.2 T. socialis scaffold_357b LQCLE TsCYCD1.2
G. pectorale GPECTOR_47g300a LLCDE GpCYCD1.2
V. carteri Vocar20010067ma N. D VcCYCD1.2
CYCD1.3 T. socialis scaffold_221b N. D TsCYCD1.3
G. pectorale GPECTOR_47g301a N. D GpCYCD1.3
V. carteri Vocar20010127ma N. D VcCYCD1.3
CYCD1.4 G. pectorale GPECTOR_100g16a LLCTE GpCYCD1.4
V. carteri Vocar20013188ma LLCTE VcCYCD1.4
CYCD2 C. reinhardtii Cre06.g289750a LQCDE CrCYCD2
T. socialis scaffold_137b LACDE TsCYCD2
G. pectorale GPECTOR_41g668a LECEE GpCYCD2
V. carteri Vocar20013437ma LICEE VcCYCD2
CYCD3 C. reinhardtii Cre06.g298750a LFCGE CrCYCD3
T. socialis scaffold_137b LACDE TsCYCD3
G. pectorale GPECTOR_44g48a LECED GpCYCD3
V. carteri Vocar20007422ma LHCED VcCYCD3
CYCD4 C. reinhardtii Cre06.g259500a LDCTE CrCYCD4
T. socialis scaffold357b LGCTE TsCYCD4
V. carteri Vocar20007145ma LECSE VcCYCD4
CYCD5 G. pectorale GPECTOR_11g188a N. D GpCYCD5 CYCA1 C. reinhardtii Cre03.g207900a N. D CrCYCA1
V. carteri Vocar20005704ma N. D VcCYCA1
CYCB1 C. reinhardtii Cre08.g370401a N. D CrCYCB1
V. carteri Vocar20013188ma N. D VcCYCB1
CYCAB1 C. reinhardtii Cre10.g466200a N. D CrCYCAB1
V. carteri Vocar20015027ma N. D VcCYCAB1
N. D.: Not detected. aBased on Hanschen et al. (2016). bDetermined in this study.
Supplementary Table 20: The length of the linker region between the retinoblastoma A and B domain in Chlorophyte orthologues of MAT3/RB.
Species Length of Linker Region
Chlamydomonas reinhardtii 130
Yamagishiella unicocca 106
Tetrabaena socialis 76
Gonium pectorale 47
Eudorina spp 93
Pleodorina starrii 96
Volvox carteri f. nagariensis male 83
Volvox carteri f. nagariensis female 47
Supplementary Table 21: Genes identified in this study related to the ubiquitin proteasomal pathway
Chlamydomonas Description Role in the Comparative reinhardtii identifier ubiquitin Analysis
(where appropriate). proteasomal
pathway
Cre09.g386400 Ubiquitin Ubiquitin PhyloP
activating activation E1-type Whole Gene/
enzyme 2 Potion
Cre10.g430050 Ubiquitin- Ubiquitin Potion
conjugating conjugation E2-
enzyme 22 type
Cre10.g433900 Ubiquitin-protein Ubiquitin ligation PhyloP
ligase 1 to target protein Whole Gene
E3-type & CDS
Cre03.g150850 RING/U-box RING protein Family
superfamily similar to ubiquitin expanded at
protein ligase E3. the ORIMC
Related to RPM1- Ubiquitin ligation Family
Cre05.g231050 interacting to target protein gained at
protein 2 E3-type ORIMC
Cre12.g522450 WD40 repeat Similar to Cullin- PhyloP
transducin Ring ubiquitin Whole Gene
ligase complex transducin.
Cre06.g255400 WD40 repeat Similar to Cullin- PhyloP
transducin Ring ubiquitin Whole Gene
ligase complex
transducin.
Cre16.g672350* breast cancer Ubiquitin ligation Family
susceptibility1 E3-type gained at
ORIMC
Mib-herc2 Common in Domain
domain metazoan E3 expanded in
ligases. multicelled
taxa
Cre08.g372100 Heat shock Interplay with PhyloP
protein 70 (Hsp UPP through Whole Gene
70) family regulating protein
protein entry into the UPP
Cre02.g080700 heat shock Interplay with PhyloP
protein 70 UPP through Whole Gene
(Hsp70) family regulating protein & CDS
protein entry into the UPP
Similar to FTSH protease Triple-A ATPase Family
Cre01.g019850 10 type function gained at
ORIMC Cre10.g439150 Regulatory Triple-A ATPase Potion
particle triple-A of proteasome
ATPase 5A
Cre04.g216600 Regulatory Triple-A ATPase Potion
particle triple-A of proteasome
ATPase 6A
Cre07.g329700 Regulatory Triple-A ATPase Potion
particle AAA- of proteasome
ATPase 2A
Cre16.g676197 26S Proteasome Component of Potion
regulatory Proteasome
subunit S2 1B
*An orthologue in found in Chlamydomonas reinhardtii.
Supplementary Table 22: Optimising Gamma and Omega values based on
the phylogenetic information threshold (PIT) and highly conserved CDS
bases from chromosome 1 of Chlamydomonas reinhardtii.
Gamma 25 Gamma 20 Gamma 15 Gamma 10
Omega 12 Omega 15 Omega 30 Omega 30
PIT 11,11 12,14 13,74 14,64
CDS bases 72 224 62419 64 561 61 209
(bp)
Supplementary Table 23. Primers used for amplification and sequencing of
RB/MAT3 gene of Tetrabaena socialis (TsRB/MAT3).
Primer Designation Sequences (5’-3’)
TsMAT3_F1 GACTGCGATGAGACTGTTTCTG
TsMAT3_F3 CGAGCAAGACGAAGACGTCAAG
TsMAT3_F5 GTAATGGGGTTGCTGGCAAAAA
TsMAT3_F8 CAGTGCAGCAGCTCAGTCGT
TsMAT3_F17 AATTGGGCAGCAAGTTGGTCGG
TsMAT3_R2 GGCACAATAGACATACGCAAAA
TsMAT3_R6 GTCGAAGGCCTTGATGTGAAGC
TsMAT3_R7 TGCTCAATGGCCTCGTAAACCT
TsMAT3_3'RACE_F1 CAGCCGCAGTCACAGCAATCCATTT
TsMAT3_3'RACE_F2 AGCACAGGCATATCCCCCTTGAGAC
TsMAT3_3'RACE_F3 ACAGCCAGCAGAGTGCAGACAACGAAGAAG
TsMAT3_5'RACE_R1 TTTTGCCAGCAACCCCATTACCACCAC
TsMAT3_5'RACE_R2 TACGACGTTGGCCTCGCGAAAGAAGTC
Supplementary Table 24. Prochnik Table of Genes
Chla Chla Chlam Vol mydo mydo Goni ydom vox mona Chlamy mona um onas Goniu Volvox Volvox prot s v4 Volvox v1 JGI Tetrabaen domona s protei v5.5 m v2 Defline from ein JGI protein ID(s) a (ID)s s JGI protei n transc ID(s) ID(s) JGI nam protei defline n name ript e n name ID(s) ID(s) Protei n secre tion and mem brane traffic king Rab-related scaffol small Au9.Cr Vocar2 GTPase; RABA RABA Rab 19551 d0000 Rab- e03.g1 60379 000568 Rab-related 1 1 A 9 1.g861 related 89250 6m GTPase; .t1 GTPase TSOC_01 RabA/Rab11 2290-RA Ypt scaffol small small G- Au9.Cr Vocar2 RABB RABB V4/ 14883 d0000 TSOC_00 Rab- protein e02.g1 106534 001038 1 1 Rab 6 5.g307 6167-RA related (yptV4); 26100 4m B .t1 GTPase RabB/Rab2
scaffol small Au9.Cr Vocar2 Rab-related RABC RABC Rab 19551 d0000 Rab- e09.g3 66052 000372 GTPase; 1 1 C1 8 5.g394 related 86900 9m RabC/Rab18 .t1 TSOC_00 GTPase 6791-RA scaffol small Au9.Cr Vocar2 Rab-related RABC RABC Rab 19587 d0001 TSOC_01 Rab- e13.g5 78758 000246 GTPase; 2 2 C2 2 0.g942 1166-RA related 95450 6m RabC/Rab18 .t1 GTPase
Ypt scaffol small small G- Au9.Cr Vocar2 RABC RABC V3/ d0001 Rab- protein e11.g4 108490 001445 3 3 Rab 0.g942 related (yptV3); 82900 8m C3 .t1 TSOC_00 GTPase RabC/Rab18 4957-RA Ypt scaffol small small G- Au9.Cr Vocar2 RABD RABD V1/ d0007 Rab- protein 60490 e12.g5 104295 000850 1 1 Rab 5.g726 related (yptV1); 60150 7m D .t1 TSOC_00 GTPase RabD/Rab1 0131-RA Ypt scaffol small small G- Au9.Cr Vocar2 RABE RABE V2/ 19552 d0006 Rab- protein e15.g6 83299 000605 1 1 Rab 0 2.g924 related (yptV2); 41800 6m E .t1 TSOC_00 GTPase RabE/Rab8 9328-RA scaffol small Au9.Cr Vocar2 RABF RABF Rab d0000 Rab- 81259 e12.g5 108963 001435 RabF/Rab5 1 1 F 8.g301 TSOC_01 related 17400 0m .t1 0627-RA GTPase RABG1, small Rab- Ypt scaffol small G- Vocar2 related RABG RABG V5/ d0000 protein 79428 000791 GTPase; 1 1 Rab 4.g531 (yptV5); 3m small G .t1 RabG/Rab7 Rab- TSOC_00 related 6651-RA GTPase scaffol small Au9.Cr Vocar2 Rab-related RABH RABH Rab 19552 d0000 Rab- e01.g0 73845 000956 GTPase; 1 1 H 2 3.g233 TSOC_00 related 47550 3m RabH/Rab6 .t1 9040-RA GTPase scaffol small Rab-related RABL Au9.Cr Vocar2 RABL Rabl 19244 d0020 Rab- GTPase; 1/RA e17.g7 102687 000137 1 2A 1 9.g407 related Rab-related BL2A 22350 2m .t1 TSOC_01 GTPase GTPase 2595-RA RAB23, small Rab- scaffol Vocar2 related Rab-related Rab2 Rab2 Rab d0001 115722 000256 GTPase; GTPase; 3 3 23 0.g849 6m small Rab23 .t1 Rab- TSOC_00 related 9161-RA GTPase scaffol small Au9.Cr Vocar2 FAP1 FAP1 Fap 12919 d0000 Rab- Rab-related e01.g0 80291 000826 56 56 156 3 3.g206 TSOC_00 related GTPase 47950 2m .t1 2912-RA GTPase scaffol small Au9.Cr Vocar2 Rab-related RAB2 RAB2 Rab 19552 d0000 Rab- e17.g7 62839 001521 GTPase; 8 8 28 3 5.g394 TSOC_00 related 15050 7m Rab28-like .t1 8030-RA GTPase Qa- scaffol Au9.Cr Vocar2 SNARE Qa-SNARE, Syp 19596 d0005 SYP1 SYP1 e13.g5 89085 000227 Sso1/Sy Sso1/Syntaxi 1 9 9.g666 88550 0m ntaxin1, n1 (PM)-type .t1 Manual Build PM-type Qa- scaffol SNARE Au9.Cr Vocar2 Qa-SNARE, Syp 19541 d0002 protein, SYP3 SYP3 e16.g6 81752 000226 Sed5/Syntaxi 3 2 3.g53.t Sed5/Sy 92050 7m n5-family 1 TSOC_00 ntaxin5- 9357-RA family Qa- scaffol SNARE Au9.Cr Vocar2 Qa-SNARE, Syp 19540 d0000 protein, SYP4 SYP4 e12.g5 103767 000102 Tlg2/Syntaxi 4 1 2.g148 Tlg2/Syn 07450 3m n16-family 3.t1 TSOC_00 taxin16- 3932-RA family Qc- scaffol SNARE Au9.Cr Vocar2 Qc-SNARE, Syp 18901 d0022 protein, SYP5 SYP5 e17.g7 91993 000324 Syn8/Syntaxi 5 6 2.g484 Syn8/Sy 09350 9m n8-family .t1 TSOC_01 ntaxin8- 2559-RA family Qc- scaffol SNARE Au9.Cr Vocar2 Qc-SNARE, Syp 13055 d0002 protein, SYP6 SYP6 e06.g2 55748 000934 Tlg1/Syntaxi 6 9 1.g597 Tlg1/Syn 90100 1m n 6-family .t1 TSOC_01 taxin 6- 3462-RA family Qc- SNARE, SYP7- family; Au9.Cr Qc- SYP7 19540 e02.g0 scaffol SNARE Vocar2 1; SYP7 Syp 5; 99050; d0020 protein, Qc-SNARE, 91067 000606 SYP7 1 71 19540 Au9.Cr 0.g359 SYP7- SYP7-family 0m 2 6 e02.g0 .t1 family; 98950 TSOC_00 Qc- 5813- SNARE RA/TSOC protein, _010988- SYP7- RA family Qa- SNARE scaffol Au9.Cr Vocar2 protein, Syp 19541 d0002 SYP8 SYP8 e17.g7 96826 001171 Ufe1/Sy 8 1 4.g207 11450 8m ntaxin .t1 TSOC_00 18 7239-RA family Qc- scaffol Au9.Cr Vocar2 SNARE Qc-SNARE, SYP6 SYP6 Syp 16002 d0002 e06.g2 55748 000934 SYP6- Tlg1/Syntaxi L L 6L 5 1.g597 93100 1m TSOC_01 like n 6-family .t1 3462-RA protein Qa- scaffol SNARE Au9.Cr Vocar2 Syp 40624 d0000 protein, SYP2 SYP2 e01.g0 86761 000998 2 7 7.g130 Pep12/S 10700 6m 6.t1 TSOC_00 yntaxin 9496-RA 7-family R- scaffol VAM5 VAM5 Au9.Cr Vocar2 SNARE Vam 19540 d0022 /VAM /VAM e13.g5 54976 000299 protein, p75 9 0.g470 P75 P75 96900 8m TSOC_00 VAMP71 .t1 6464-RA -family R- scaffol Au9.Cr Vocar2 SNARE SEC2 SEC2 Sec d0002 R-SNARE, 57076 e12.g5 80612 000652 protein, 2 2 22 1.g733 Sec22-family 54700 0m TSOC_01 Sec22- .t1 2066-RA family R- scaffol Au9.Cr SNARE 19546 d0000 Vocar2 YKT6 YKT6 Ykt6 e17.g7 75062 protein, 5 6.g510 001313 28150 TSOC_00 YKT6- .t1 6m 6634-RA family R- scaffol VAM1 VAM1 Au9.Cr Vocar2 SNARE Vam 12877 d0001 /VAM /VAM e04.g2 70315 001182 protein, p71 7 6.g606 P71 P71 24750 3m TSOC_00 VAMP72 .t1 7676-RA -family R- scaffol VAM2 VAM2 Au9.Cr Vocar2 SNARE Vam 19540 d0001 /VAM /VAM e04.g2 70298 001180 protein, p72 3 6.g625 P72 P72 25900 2m VAMP72 .t1 -family R- scaffol VAM3 VAM3 Au9.Cr Vocar2 SNARE Vam 19540 d0001 /VAM /VAM e04.g2 78158 001183 protein, p73 4 6.g606 P73 P73 25850 6m TSOC_00 VAMP72 .t1 7676-RA -family R- VAM4 VAM4 Au9.Cr SNARE 13618 /VAM /VAM e04.g2 protein, 8 P74 P74 24800 VAMP72 -family Qc- scaffol SNARE Au9.Cr Vocar2 Qc-SNARE, 18390 d0000 protein, BET1 BET1 Bet1 e03.g1 120875 000540 Bet1/mBET1 4 1.g916 Bet1/mB 98100 0m family .t1 TSOC_00 ET1 9557-RA family Bet3 scaffol Au9.Cr Vocar2 .1; 18461 d0004 BET3 BET3 e01.g0 76384; 76381 000770 Bet3 3 0.g592 59600 2m .2 .t1 R- scaffol SNARE Au9.Cr Vocar2 R-SNARE, SNR7 SNR7 Tml 19546 d0007 protein, e12.g5 86534 000899 Tomsyn-like /TML1 /TML1 1 8 5.g772 Tomsyn- 58250 9m family .t1 TSOC_00 like 0104-RA family Qb+c- SNARE, SNAP25 scaffol Au9.Cr Vocar2 -family; SNAP SNAP Sna 19540 d0006 e02.g1 104786 000589 Qb+c- 34 34 p34 8 2.g897 00250 1m SNARE .t1 protein, TSOC_00 SNAP25 1539-RA -family Qb- scaffol SNARE Au9.Cr Vocar2 Qb-SNARE, MEM MEM Me 19546 d0002 protein, e12.g5 104271 000639 Bos1/Membri B1 B1 mb1 2 1.g740 Bos1/Me 54200 9m n family .t1 TSOC_00 mbrin 4534-RA family scaffol Au9.Cr Vocar2 SNAP SNAP Sna 17371 d0008 gamma e08.g3 95241 000473 G1 G1 pG1 0 7.g421 TSOC_00 SNAP 85700 1m .t1 8997-RA scaffol Au9.Cr Vocar2 SNAP SNAP Sna d0059 alpha- 23974 e02.g0 81196 000411 A1 A1 pA1 7.g669 TSOC_00 SNAP 99150 7m .t1 8196-RA Protein involved in ubiquitin scaffol Au9.Cr Vocar2 - CDC4 CDC4 Cdc 13417 d0002 e06.g2 78972 000291 depende 8 8 48 1 7.g630 69950 7m nt .t1 degradat ion of TSOC_00 ER- 7546-RA bound substrat es
Au9.Cr scaffol Vocar2 VPS4 VPS4 Vps 19547 e07.g3 d0007 77223 001394 TSOC_00 5 5 45 1 33950 7.g4.t1 8m 7955-RA scaffol Au9.Cr Vocar2 VPS3 VPS3 Vps 19547 d0009 e12.g5 103690 001427 3 3 33 0 0.g527 TSOC_00 15550 6m .t1 4500-RA Actin cytos kelet on scaffol d0000 MYO1 9.g540 myosin type XI Au9.Cr Vocar2 A; Myo 18510 .t1; TSOC_01 heavy myosin MYO1 e16.g6 82838 001052 MYO1 A 4 scaffol 0099- chain, heavy chain 58650 2m B d0000 RA/TSOC class XI MyoA 9.g542 _012467- .t1 RA scaffol myosin type XI Au9.Cr Vocar2 Myo 11931 d0001 heavy myosin MYO2 MYO2 e13.g5 57247 001474 B 7 3.g634 chain, heavy chain 63800 0m .t1 TSOC_00 class XI MyoB 2407-RA myosin scaffol type VIII Au9.Cr Vocar2 heavy Myo 11376 d0005 myosin MYO3 MYO3 e09.g4 107085 000551 chain, C 0 5.g312 heavy chain 16250 6m class .t1 TSOC_00 MyoC 8396-RA VIII scaffol class XI Vocar2 Myo d0000 myosin 103833 000100 D 9.g540 heavy chain 0m .t1 TSOC_01 MyoD 2467-RA scaffol myosin putative Au9.Cr Vocar2 Myo 37786 d0000 heavy class XI MYO5 MYO5 e03.g1 90836 000540 E 3 1.g921 chain, myosin 97800 9m .t1 TSOC_01 class XI MyoE 5383-RA fgenes h4_pg. Myo type XI 95387 C_scaf F myosin MyoF fold_43 000114 scaffol Au9.Cr Vocar2 Act d0009 IDA5 IDA5 24392 e13.g6 109972 001110 actin A 7.g773 TSOC_01 03700 1m .t1 2574-RA scaffol d0000 1.g510 NAP1 Nap .t1; Vocar2 novel actin- NAP1 ; 127374 1 scaffol 000810 like protein NAP2 d0009 4m 7.g773 TSOC_01 .t1 2574-RA scaffol Aug.Cr Vocar2 Actin- Arp d0000 actin-related ARP2 ARP2 24114 e16.g6 107669 000157 related 2 4.g946 TSOC_01 protein Arp2 87500 6m protein .t1 3115-RA scaffol Au9.Cr Vocar2 Actin- Arp 11874 d0009 actin-related ARP3 ARP3 e16.g6 101864 000022 related 3 5 7.g773 TSOC_01 protein Arp3 76050 6m protein .t1 2574-RA Au9.Cr scaffol Vocar2 Actin- Arp 11237 actin-related ARP5 ARP5 e06.g2 d0000 69220 000348 TSOC_00 related 4 8 protein Arp4 49300 5.g126 2m 8038-RA protein .t1
scaffol Vocar2 Actin- Arp d0000 actin-related 121135 001101 related 6 2.g104 protein Arp6 5m protein 3.t1 scaffol Au9.Cr Vocar2 Actin- Arp 10306 d0000 actin-related ARP7 ARP7 e12.g5 94505 000641 related 7 7 7.g119 TSOC_00 protein Arp7 45000 6m protein 0.t1 1376-RA ARP2/3 scaffol Au9.Cr Vocar2 (actin- actin-related ARPC ARPC Arp 17720 d0000 e16.g6 58274 000942 related protein 1 1 C1 3 3.g208 48100 6m TSOC_00 protein) ArpC1 .t1 8086-RA complex scaffol Vocar2 Actin- actin-related ARPC ARPC Arp 51833 d0000 119936 000397 related protein 2 2 C2 2 9.g713 TSOC_00 2m protein ArpC2 .t1 6552-RA scaffol Vocar2 Actin- actin-related ARPC ARPC Arp 39956 d0001 65632 000456 related protein 3 3 C3 3 8.g105 TSOC_00 3m protein ArpC3 .t1 3210-RA scaffol Au9.Cr Vocar2 Actin- actin-related ARPC ARPC Arp 40084 d0001 e06.g2 72447 000245 related protein 4 4 C4 8 0.g757 TSOC_01 60800 8m protein ArpC4 .t1 5482-RA
putative scaffol Au9.Cr Vocar2 protein For 14402 d0000 FOR1 FOR1 e03.g1 107471 000684 formin containing a A 7 1.g640 66700 4m FH2 domain; .t1 TSOC_00 formin 0371-RA scaffol actin- actin- Au9.Cr Vocar2 Adf 37887 d0000 depolym depolymerizi e07.g3 68164 000153 A 7 4.g845 erizing ng factor 39050 7m .t1 TSOC_00 factor AdfA 7706-RA scaffol Au9.Cr Vocar2 actin-binding gelsoli gelsoli gels 19092 d0005 e12.g5 108308 000867 gelsolin protein n n olin 5 6.g375 TSOC_01 55600 5m Gelsolin .t1 5472-RA scaffol Au9.Cr Vocar2 18188 d0000 PRO1 PRO1 PrfA e10.g4 106260 001158 profilin profilin 7 1.g67.t TSOC_00 27250 8m 1 4650-RA actin actin scaffol monome Au9.Cr Vocar2 monomer- Cap 40205 d0027 r-binding CAP1 CAP1 e06.g3 79641 000221 binding A 6 5.g721 protein 04100 0m protein .t1 Cap/Srv TSOC_00 Cap/Srv2p 1558-RA 2p scaffol TSOC_01 f-actin- Au9.Cr Vocar2 20621 d0001 3691- binding, villin-like VIL1 VIL1 VilA e12.g4 97468 001190 5 1.g292 RA/TSOC Viilin-like protein 93700 3m .t1 _009069- protein RA Micro tubul e cytos kelet on scaffol TSOC_00 Au9.Cr d0010 Vocar2 3975- Tub 12852 e03.g1 9.g204 000340 RA/TSOC alpha TUA1; TUA1; A1; 3; 90950; .t1; 1m; tubulin alpha tubulin 77526; 109786 _003980- TUA2 TUA2 Tub 18602 Au9.Cr scaffol Vocar2 RA/TSOC 1; alpha (tubA2) A2 3 e04.g2 d0017 000737 _003982- tubulin 2 16850 5.g216 7m RA/TSOC .t1 _003984- RA scaffol Au9.Cr d0025 Vocar2 Tub 12987 e12.g5 4.g636 000658 beta TUB1; TUB1; B1; 6; 42250; .t1; 0m; TSOC_01 tubulin beta tubulin 75910; 77081 TUB2 TUB2 Tub 12986 Au9.Cr scaffol Vocar2 0371- 1; beta (tubB2) B2 8 e12.g5 d0000 000654 RA/TSOC tubulin 2 49550 7.g111 6m _011286- 6.t1 RA scaffol Gamma Au9.Cr Vocar2 Tub 18893 d0002 tubulin; Gamma TUG1 TUG1 e06.g2 80553 000922 G 3 2.g770 TSOC_01 was tubulin 99300 7m .t1 0899-RA TUG scaffol Au9.Cr Vocar2 Tub 13608 d0000 delta UNI3 UNI3 e03.g1 60395 000566 Delta tubulin D 2 1.g844 TSOC_00 tubulin 87350 2m .t1 8734-RA scaffol Eta Au9.Cr Vocar2 Tub 15437 d0003 tubulin; TUH1 TUH1 e12.g5 116596 001430 Eta tubulin H 6 5.g836 TSOC_00 was 13450 2m .t1 8569-RA TUH scaffol Epsilon Au9.Cr Vocar2 Tub 18819 d0000 tubulin; Epsilon TUE1 TUE1 e03.g1 56250 000789 E 5 1.g530 TSOC_00 was tubulin 72650 4m .t1 7487-RA TUE scafffol d0030 Vocar2 cent 5.g832 000616 CEN1 Au9.Cr VFL2/ rin; 15955 .t1; 6m; putative ; e11.g4 109845; 66997 CEN1 Cnr 4 scaffol Vocar2 centrin CEN2 68450 A d0030 001186 5.g832 8m TSOC_00 .t1 7964-RA scaffol katanin katanin Au9.Cr Vocar2 Kat d0000 catalytic catalytic KAT1 KAT1 53314 e10.g4 82654 001129 A 1.g65.t subunit, subunit, 60 27600 0m 1 TSOC_00 60 kDa kDa 4648-RA katanin scaffol p60 Au9.Cr Vocar2 katanin p60 Kat d0003 catalytic VPS4 VPS4 98650 e10.g4 76999 001381 catalytic B 4.g691 subunit; 46400 4m subunit .t1 TSOC_00 was 4336-RA KAT2
microtubule scaffol Au9.Cr Vocar2 severing Kat d0000 PF15 PF15 80954 e03.g1 94919 001134 protein C 1.g457 60450 3m katanin, p80 .t1 TSOC_00 subunit 4336-RA scaffol microtubule Vocar2 Map d0074 associated 99065 000391 1 9.g916 protein 9m .t1 TSOC_00 (MAP) 1a 3008-RA Microtub scaffol Au9.Cr Vocar2 ule microtubule Mor 17514 d0004 TOG1 TOG1 e03.g1 121331 000757 associat organizing A 3 0.g529 49800 5m TSOC_00 ed protein MorA .t1 0455-RA protein scaffol microtubule- Au9.Cr Vocar2 MAP6 MAP6 map 39446 d0008 associated e14.g6 120144 001443 5 5 65 2 20.g49 protein 14050 8m .t1 TSOC_00 MAP65 1728-RA microtub scaffol microtubule Au9.Cr Vocar2 ule plus- EBP1/ EBP1/ 19420 d0005 plus-end EB1 e17.g7 107043 001003 end EB1 EB1 9 3.g116 binding 41200 6m binding .t1 TSOC_01 protein EB1 1100-RA protein scaffol CLIP- Au9.Cr Vocar2 CLIP- CLAS CLAS Clas 20620 d0001 associati e09.g4 120833 000067 associating P P p 6 2.g410 TSOC_00 ng 15700 2m protein .t1 1489-RA protein
putative scaffol cortical Au9.Cr Vocar2 Spr 41618 d0006 microtubule SPR1 SPR1 e02.g1 103536 000509 1 8 2.g875 associated 30150 4m .t1 protein TSOC_00 SPIRAL1 2968-RA Gamma scaffol gamma Au9.Cr Vocar2 tubulin Gcp 15071 d0000 tubulin GCP2 GCP2 e12.g5 105911 000386 interacti 2 2 8.g158 interacting 25500 6m ng .t1 TSOC_00 protein 9859-RA protein Gamma scaffol gamma Au9.Cr Vocar2 tubulin Gcp 15258 d0001 tubulin GCP3 GCP3 e22.g7 97867 001469 interacti 3 5 8.g57.t interacting 64400 5m ng 1 TSOC_00 protein 7776-RA protein Gamma scaffol gamma Au9.Cr Vocar2 tubulin Gcp 14632 d0002 tubulin GCP4 GCP4 e01.g0 118030 001017 interacti 4 3 5.g308 interacting 19150 5m ng .t1 TSOC_01 protein 4290-RA protein
putative putative MT scaffol MT associated Au9.Cr Vocar2 Pld 19040 d0001 associat signaling PLD1 PLD1 e13.g5 99461 001346 A 3 7.g787 ed protein 91900 9m .t1 signaling phospholipas TSOC_00 protein e D 2024-RA Cytoplas mic Cytoplasmic scaffol dynein dynein 1b Au9.Cr Vocar2 D1BLI D1b 13039 d0002 1b light light e02.g1 92778 001239 C LIC 4 7.g622 intermed intermediate 35900 6m .t1 iate chain, TSOC_00 chain, D1bLIC 6086-RA D1bLIC
TSOC_00 3946- scaffol DHC1 Au9.Cr Vocar2 RA/TSOC Cytoplasmic DHC1 DH d0004 2/DH 24009 e06.g2 64869 000353 _005462- dynein 1b B C1b 9.g493 C1B 50300 4m RA/TSOC heavy chain .t1 _009812- RA/TSOC _010820- RA scaffol Au9.Cr Vocar2 abnorma microtubule- 30682 d0006 ASP ASP Asp e06.g2 121726 000250 l spindle associated 8 1.g829 TSOC_00 81150 4m protein protein Asp .t1 3050-RA scaffol Au9.Cr Vocar2 40205 d0027 e06.g3 79641 000221 6 5.g721 TSOC_00 04100 0m .t1 1558-RA scaffol Au9.Cr Tubulin tubulin 40179 d0002 e_gw1. TTL1 TtlA e06.g3 59059 tyrosine tyrosine 0 2.g769 13.31.1 Manual 00250 ligase ligase .t1 Build scaffol TTL2/ TTL2/ Au9.Cr Vocar2 Tubulin tubulin 10076 d0001 FAP2 FAP2 TtlB e17.g6 30444 001082 tyrosine tyrosine 0 4.g277 TSOC_01 67 67 99500 0m ligase ligase .t1 5101-RA scaffol Au9.Cr Vocar2 Tubulin 11925 d0004 TTL3 TTL3 TtlC e01.g0 65051 000743 tyrosine 0 0.g581 TSOC_00 59200 9m ligase .t1 1227-RA scaffol Vocar2 d0003 TTL4 TtlD 105441 001231 1.g393 TSOC_00 1m .t1 0326-RA scaffol Au9.Cr Vocar2 Tubulin 51385 d0000 TTL5 TTL5 TtlE e12.g5 107180 000634 tyrosine 2 7.g114 TSOC_00 47700 4m ligase 1.t1 0073-RA scaffol Au9.Cr Tubulin 14689 d0000 e_gw1. TTL6 TtlF e01.g0 69077 tyrosine 3 3.g176 86.15.1 TSOC_01 50451 ligase .t1 1047-RA scaffol CYG4 CYG4 Au9.Cr Vocar2 Tubulin 19039 d0001 0/TTL 0/TTL TtlG e13.g5 99465 001348 tyrosine 8 7.g783 TSOC_00 7 7 91700 6m ligase .t1 2020-RA scaffol Au9.Cr Vocar2 Tubulin 41726 d0005 TTL8 TTL8 TtlH e02.g1 108470 001438 tyrosine 7 8.g590 TSOC_00 20750 2m ligase .t1 2685-RA Basal Body
Protei ns Bardet- scaffol Au9.Cr Vocar2 Biedl bbs 18229 d0003 Bardet-Biedl BBS5 BBS5 e06.g2 78967 000291 syndrom 5 9 0.g279 syndrome 5 67550 0m TSOC_00 e 5 .t1 7731-RA protein scaffol Au9.Cr Vocar2 SF- 12799 d0005 SF- SFA SFA SFA e07.g3 84701 000499 assembli 5 4.g252 TSOC_00 assemblin 32950 8m n .t1 6732-RA TRP scaffol putative TRP Au9.Cr Vocar2 protein bbs 14011 d0013 protein for BBS8 BBS8 e16.g6 78311 000678 for 8 3 0.g573 flaggelar 66500 3m ciliary .t1 TSOC_00 function 5276-RA function Bardet- scaffol Au9.Cr Biedl 19005 d0000 BBS7 e01.g0 syndrom 4 3.g298 43750 TSOC_00 e 7 .t1 4733-RA protein Bardet scaffol Au9.Cr Vocar2 Biedl Bardet-Biedl Bbs 12994 d0000 BBS4 BBS4 e12.g5 66874 000673 syndrom syndrome 4 8 7.g112 48650 3m TSOC_01 e 4 protein 4 6.t1 5012-RA protein Bardet- scaffol Au9.Cr Vocar2 Biedl small Arf- BBS3 BBS3 d0000 arf8 24475 e16.g6 70270 001022 syndrom related B B 9.g460 64500 4m TSOC_00 e protein GTPase .t1 1462-RA 3B Bardet- Au9.Cr Biedl Bardet-Biedl Bbs 12675 BBS2 e06.g2 121664 syndrom syndrome 2 8 57250 TSOC_01 e protein protein 2 2520-RA 2 scaffol Bardet- Au9.Cr Vocar2 Bardet-Biedl Bbs 13253 d0005 Biedl BBS1 BBS1 e17.g7 84205 001255 syndrome 1 7 3.g86.t TSOC_00 syndrom 41950 4m protein 1 1 8472-RA e protein 1
Au9.Cr Vocar2 basal Ofd basal body OFD1 31640 e17.g7 90133 001063 TSOC_00 body 1 protein 03600 4m 9165-RA protein protein required protein scaffol for conserved Au9.Cr Vocar2 13054 d0006 template only in VFL3 VFL3 Vfl3 e06.g2 84510 000252 2 1.g821 d organisms 79900 2m .t1 centriole with motile TSOC_00 assembl cilia 3059-RA y scaffol Au9.Cr Vocar2 basal BLD1 BLD1 16606 d0003 basal body bldJ e10.g4 57967 000103 body 0 0 2 6.g112 Manual protein 18250 4m protein .t1 Build Bardet- scaffol Biedl d0030 syndrom 0.g813 BBS9; Vocar2 e 9; bbs 10113 .t1; Bardet-Biedl BBS9 BBS1 119731 000189 Bardet- 9 7 scaffol syndrome 9 0 2m Biedl d0030 syndrom 0.g814 TSOC_01 e 9 .t1 0849-RA protein Kines in Motor Protei ns kinesin-like 93532 protein scaffol Au9.Cr Vocar2 Kinesin-II FLA1 FLA1 18575 d0005 flaJ e17.g7 107307 000500 Motor 0 0 0 5.g287 TSOC_00 30950 7m Protein .t1 7372-RA scaffol Au9.Cr Vocar2 Kinesin- flaH/ 15076 d0000 kinesin-like FLA8 FLA8 e12.g5 103736 001418 II motor klpA 6 2.g130 protein 22550 6m subunit 6.t1 scaffol Au9.Cr Vocar2 34535 d0001 putative e09.g4 97023 000565 4 5.g280 TSOC_01 Kif3C kinesin 15450 3m .t1 0691-RA
Kif9 type scaffol Au9.Cr Vocar2 TSOC_00 Kinesin- kinesin, 18641 d0000 KLP1 KLP1 e02.g0 58470 001367 3895- like similar to C. 4 3.g471 73750 7m RA/TSOC protein reinhardtii .t1 _003893- KLP1 RA 94697 scaffol Au9.Cr Vocar2 Kif6 type d0000 e01.g0 64266 000833 kinesin-like 5.g430 TSOC_01 36800 3m protein .t1 4361-RA estExt_ Genew putative Kif9 76107 ise1.C kinesin _40003 9 HAL_e scaffol stExt_f Au9.Cr kinesin 12608 d0003 genesh IAR1 IAR1 invA e10.g4 127192 family kinesin invA 1 6.g129 5_synt. 18950 protein .t1 C_901 Manual 08 Build kinesin-like 127420 protein kinesin Au9.Cr 14971 motor kinesin-like e09.g3 127421 3 TSOC_00 family protein 86700 4599-RA protein Kinesin Au9.Cr family kinesin-like 13743 e03.g2 127422 member protein 02000 TSOC_00 heavy 1315-RA chain kinesin-like 127423 protein kinesin-like 105173 protein subfamily 99650 14A kinesin kinesin-like 95520 protein scaffol Vocar2 d0007 kinesin-like 127417 001094 6.g809 TSOC_00 protein 1m .t1 4316-RA scaffol kinesin Vocar2 d0004 related to 95391 001081 0.g569 Arabidopsis 3m .t1 TSOC_01 ATK5 0103-RA scaffol Kar3 Vocar2 d0000 member 127355 000243 9.g588 kinesin-like 4m .t1 TSOC_01 protein 1162-RA kinesin-like 127424 protein kinesin-like 93886 protein kinesin-like 127418 protein scaffol Vocar2 d0002 kinesin-like 106268 001169 8.g750 TSOC_00 protein 5m .t1 8189-RA scaffol Au9.Cr Vocar2 kinesin- similar to FAP1 FAP1 19081 d0003 e12.g5 94482 000675 like kinesin 25 25 6 4.g710 TSOC_01 46100 5m protein FAP125 .t1 1506-RA
scaffol appears to Vocar2 d0012 contain 93168 001411 2.g462 kinesin motor 5m .t1 TSOC_00 domain 3022-RA kinesin-like 96903 protein fgenes h4_pg. putative 98132 C_scaf Kif21-type fold_68 TSOC_00 kinesin 000053 0006-RA STM_f genesh 4_pg.C kinesin-like 127414 _scaffo protein ld_110 TSOC_00 00145 3249-RA scaffol TSOC_00 Au9.Cr Vocar2 40875 d0000 3800- e17.g7 31481 000049 9 8.g90.t RA/TSOC 35200 8m 1 _014900- RA 127427 scaffol Vocar2 d0000 Kif4A type 44127 001298 1.g269 kinesin 9m .t1 STM_f genesh 4_pg.C kinesin-like 127415 _scaffo protein ld_160 00212 Cell cycle Putative rhodane scaffol se RDP1 RPD1 rdp1 d0000 98123 Au9.Cr domain 4.g624 e05.g2 TSOC_00 phospha 47400 5421-RA tase Putative rhodane scaffol Vocar2 se RDP2 RDP2 rpd2 d0002 000671 Cre01. domain 1.g722 8m g0248 TSOC_00 phospha 50 4527-RA tase Putative rhodane scaffol se RDP3 RPD3 rdp3 d0000 120993 Cre07. domain 6.g640 g3525 TSOC_00 phospha 50 0732-RA tase scaffol Au9.Cr Vocar2 cyclin CDKA CDKA cdk 12728 d0004 e10.g4 127504 001508 dependent 1 1 a1 5 7.g387 TSOC_00 65900 5m kinase .t1 7286-RA
scaffol plant specific Au9.Cr Vocar2 CDKB CDKB cdk d0007 cyclin 59842 e08.g3 103386 001354 1 1 b1 9.g124 dependent 72550 5m .t1 TSOC_00 kinase 6010-RA scaffol Au9.Cr Vocar2 cyclin CDKC CDKC cdkc 14839 d0008 e08.g3 82776 000448 dependent 1 1 1 5 7.g425 TSOC_00 85850 8m kinase .t1 9000-RA scaffol Au9.Cr Vocar2 CDKD CDKD cdk 13745 d0000 e09.g3 65162 000357 1 1 d1 7 5.g340 TSOC_00 88000 5m .t1 9378-RA scaffol Au9.Cr Vocar2 cyclin CDKE CDKE cdk 12088 d0005 e04.g2 68336 000207 dependent 1 1 e1 1 4.g202 13850 4m kinase .t1 scaffol cyclin Cyclin Au9.Cr Vocar2 CDK CDK cdk 12677 d0001 depende dependent e06.g2 127266 000275 G1 G1 g1 6 0.g105 nt kinase; G1 71100 4m 8.t1 TSOC_00 kinase subfamily 4551-RA cyclin Au9.Cr cyclin CDK CDK cdk 13990 depende e17.g7 127318 dependent G2 G2 g2 8 nt 42250 kinase. kinase scaffol cyclin Au9.Cr Vocar2 cyclin CDKH CDKH cdk 15397 d0000 depende e07.g3 83876 000684 dependent 1 1 h1 0 6.g738 nt 55400 8m kinase .t1 kinase scaffol cyclin Au9.Cr Vocar2 cyclin CDKI CDKI cdki 19578 d0001 depende e12.g4 119542 001324 dependent 1 1 1/lf2 1 9.g245 TSOC_00 nt 94500 3m kinase .t1 7933-RA kinase scaffol d0000 CYCA 1.g929 Au9.Cr Vocar2 CYCA 1; cyca 14745 .t1; A-type e03.g2 127267 000570 A type cyclin. 1 CYCA 1 3 scaffol cyclin 07900 4m 2 d0090 3.g158 TSOC_00 .t1 9562-RA scaffol Au9.Cr Vocar2 CYCB CYCB cycb 20611 d0000 B-type B type e06.g2 127276 000376 1 1 1 5 8.g51.t TSOC_01 cyclin mitotic cyclin 84350 8m 1 2370-RA scaffol d0004 7.g388 CYCA .t1; B1; scaffol Vocar2 Related to A CYCA CYCA cyca 20611 d0004 127293 001502 and B type B1 B2; b1 2 7.g389 7m cyclins. CYCA .t1; B3 scaffol d0004 7.g390 TSOC_01 .t1 3800-RA scaffol Vocar2 CYCC CYCC cycc d0000 127505 000627 C type cyclin 1 1 1 1.g491 TSOC_01 9m .t1 0052-RA scaffol Au9.Cr Vocar2 CYCD CYCD cycd 19578 d0004 D-type e11.g4 127281 001006 D type cyclin. 1 1.1 1.1 0 7.g299 TSOC_00 cyclin 67772 3m .t1 8914-RA scaffol Vocar2 CYCD cycd d0004 D-type 127284 001006 D type cyclin 1.2 1.2 7.g300 TSOC_00 cyclin 7m .t1 8919-RA scaffol Vocar2 CYCD cycd d0004 D-type 127282 001012 D type cyclin 1.3 1.3 7.g301 TSOC_00 cyclin 7m .t1 6929-RA scaffol Vocar2 CYCD cycd d0010 D-type 127283 001318 D type cyclin 1.4 1.4 0.g16.t cyclin 8m 1 scaffol Au9.Cr Vocar2 CYCD CYCD cycd 19176 d0004 D-type e06.g2 127277 001343 D type cyclin 2 2 2 2 1.g668 TSOC_00 cyclin 89750 7m .t1 5199-RA scaffol Au9.Cr Vocar2 CYCD CYCD cycd 20611 d0004 D-type e06.g2 127287 000742 D type cyclin 3 3 3 0 4.g48.t TSOC_00 cyclin 84350 2m 1 5193-RA scaffol Au9.Cr Vocar2 CYCD CYCD cycd 20616 d0004 D-type e09.g4 127321 000714 D type cyclin 4 4 4 6 1.g668 TSOC_00 cyclin 14416 5m .t1 8922-RA scaffol CYCD d0001 D-type D-type cyclin 5 1.g188 cyclin .t1 Aug9. scaffol Vocar2 CYCL CYCL cycl Cre07. d0000 120598 000317 L type cyclin 1 1 1 g3216 1.g226 TSOC_01 8m 50 .t1 0026-RA scaffol Au9.Cr Vocar2 CYC CYC cyc d0066 e10.g4 107638 001048 cyclin M1 M1 m1 9.g798 TSOC_00 53050 0m .t1 9098-RA conserv ed express scaffol Au9.Cr Vocar2 ed CYCU CYCU cycu d0001 e02.g1 127509 000515 protein cyclin 1 1 1 5.g335 18050 7m of .t1 unknow n function scaffol d0045 3.g353 Au9.Cr Vocar2 CYCT CYCT cyct 19346 .t1; TSOC_00 e14.g6 120142 001439 1 1 1 1 scaffol 1716- 13900 9m d0045 RA/TSOC 3.g354 _001717- .t1 RA scaffol Au9.Cr Vocar2 CDK wee 19458 d0000 wee1 kinase WEE1 WEE1 e07.g3 127274 000684 inhibitor 1 9 6.g736 TSOC_00 ortholog 55250 5m y kinase .t1 3531-RA scaffol d0001 7.g809 CKS1 Au9.Cr Vocar2 18277 .t1; CKS1 CKS1 ; cks1 e03.g1 127315 000802 9 scaffol homolog CKS2 80350 6m d0000 6.g542 TSOC_00 .t1 7834-RA Minus Au9.Cr Vocar2 retinobla mat 18724 _MT.g MAT3 MAT3 e06.g2 127376 000352 stoma 3 8 1278.t TSOC_01 55450 6m protein 1 2621-RA scaffol E2F Au9.Cr Vocar2 d0000 transcription E2F1 E2F1 e2f1 e01.g0 127253 000930 3.g108 factor family 52300 8m .t1 TSOC_00 homolog. 8865-RA scaffol Vocar2 putative DP Au9.Cr d0000 DP1 DP1 dp1 121369 001274 transcription e07.g3 1.g256 TSOC_00 2m factor 23000 .t1 4326-RA related to E2F and DP transcrip tion factors, scaffol related to Au9.Cr Vocar2 Chlamyd E2FR E2FR e2fr 16856 d0002 E2F and DP e13.g5 127270 000983 omonas 1 1 1 3 2.g868 transcription 73000 9m specific; .t1 factors transcrip tion factor, E2F and DP- related Cell wall protei ns Hydroxy proline- rich Cre09. glycopro GP1 g3989 tein 00 compon ent of the outer cell wall Hydroxy proline- rich Cre06. scaffol glycopro gb_EFJ52729. GP2 GP2 gp2 g2588 d0001 tein 1 00 0.g752 compon ent of the outer cell wall Hydroxy proline- rich gb_CA scaffol glycopro GP3 GP3 gp3 J9872 d0002 105161 tein 1.1 6.g494 compon ent of TSOC_00 the outer 4978-RA cell wall