Revised MS (EVOB-D-15-00320R2) Click here to download Manuscript Revised_MS.docx

Click here to view linked References 1 2 3 4 5 6 7 8 9 10 Chloroplast phylogenomic analysis of chlorophyte green 11 identifies a novel lineage sister to the Sphaeropleales 12 13 (Chlorophyceae) 14 15 16 17 18 Research Article 19 20 21 22 23 Claude Lemieux*, Antony T. Vincent, Aurélie Labarre, Christian Otis, and 24 Monique Turmel 25 26 27 28 Institut de Biologie Intégrative et des Systèmes, Département de biochimie, de 29 microbiologie et de bio-informatique, Université Laval, Québec (Québec) Canada 30 31 32 33 34 35 36 37 38 39 40 Authors’ email addresses: [email protected] 41 [email protected] 42 [email protected] 43 44 [email protected] 45 [email protected] 46 47 48 49 50 51 52 *Corresponding Author: Claude Lemieux 53 Institut de Biologie Intégrative et des Systèmes 54 55 1030 ave de la médecine, Pavillon Marchand 56 Université Laval, Québec (QC) Canada G1V 0A6; 57 phone: 418-656-2131 ext. 5171; Fax: 418-656-7176. 58 59 60 61 62 63 64 65 1 2 2 3 4 5 Abstract 6 7 Background: The class Chlorophyceae () includes morphologically and 8 9 10 ecologically diverse green algae. Most of the documented species belong to the clade formed by 11 12 the Chlamydomonadales (also called Volvocales) and Sphaeropleales. Although studies based on 13 14 the nuclear 18S rRNA gene or a few combined genes have shed light on the diversity and 15 16 17 phylogenetic structure of the Chlamydomonadales, the positions of many of the monophyletic 18 19 groups identified remain uncertain. Here, we used a chloroplast phylogenomic approach to 20 21 22 delineate the relationships among these lineages. 23 24 25 Results: To generate the analyzed amino acid and nucleotide data sets, we sequenced the 26 27 28 chloroplast DNAs (cpDNAs) of 24 chlorophycean taxa; these included representatives from 16 29 30 of the 21 primary clades previously recognized in the Chlamydomonadales, two taxa from a 31 32 33 coccoid lineage (Jenufa) that was suspected to be sister to the Golenkiniaceae, and two 34 35 sphaeroplealeans. Using Bayesian and/or maximum likelihood inference methods, we analyzed 36 37 an amino acid data set that was assembled from 69 cpDNA-encoded proteins of 73 core 38 39 40 chlorophyte (including 33 chlorophyceans), as well as two nucleotide data sets that were 41 42 generated from the 69 genes coding for these proteins and 29 RNA-coding genes. The protein 43 44 45 and gene phylogenies were congruent and robustly resolved the branching order of most of the 46 47 investigated lineages. Within the Chlamydomonadales, 22 taxa formed an assemblage of five 48 49 50 major clades/lineages. The earliest-diverging clade displayed Hafniomonas laevis and the 51 52 Crucicarteria, and was followed by the Radicarteria and then by the Chloromonadinia. The 53 54 latter lineage was sister to two superclades, one consisting of the Oogamochlamydinia and 55 56 57 Reinhardtinia and the other of the Caudivolvoxa and Xenovolvoxa. To our surprise, the Jenufa 58 59 species and the two spine-bearing green algae belonging to the Golenkinia and Treubaria genera 60 61 62 63 64 65 1 3 2 3 4 were recovered in a highly supported monophyletic group that also included three taxa 5 6 7 representing distinct families of the Sphaeropleales (Bracteacoccaceae, Mychonastaceae, and 8 9 Scenedesmaceae). 10 11 12 Conclusions: Our phylogenomic study advances our knowledge regarding the circumscription 13 14 and internal structure of the Chlamydomonadales, suggesting that a previously unrecognized 15 16 lineage is sister to the Sphaeropleales. In addition, it offers new insights into the flagellar 17 18 19 structures of the founding members of both the Chlamydomonadales and Sphaeropleales. 20 21 22 Key Words: Chlorophyceae, Chlamydomonadales, Golenkiniaceae, Treubarinia, Jenufa genus, 23 24 25 Plastid genome, Flagellar apparatus 26 27 28 29 30 Background 31 32 33 The Chlorophyceae occupies the crown of the Chlorophyta, one of the two divisions of the 34 35 36 [1]. This monophyletic class of green algae comprises five orders or main clades 37 38 [2, 3] that form two major lineages [4, 5]: the Chlamydomonadales (or Volvocales) + 39 40 Sphaeropleales (CS clade) and the Oedogoniales + Chaetophorales + Chaetopeltidales (OCC 41 42 43 clade). Members of the Chlorophyceae are found in a wide range of habitats and display diverse 44 45 cell organizations (unicells, coccoids, colonies, simple flattened thalli, unbranched and branched 46 47 48 filaments) [1, 6]. Their motile cells also exhibit variability at the level of the flagellar apparatus. 49 50 The flagellar basal bodies of most chlorophycean green algae are displaced in a clockwise (CW, 51 52 53 1-7 o'clock) direction or are directly opposed (DO, 12-6 o'clock). In the Chlamydomonadales 54 55 (often designated as the CW clade), biflagellates display a CW orientation of basal bodies, 56 57 whereas quadriflagellates harbor distinct and more complex flagellar apparatus ultrastructures [7, 58 59 60 8]. The vegetatively nonmotile unicellular or colonial taxa found in the Sphaeropleales (DO 61 62 63 64 65 1 4 2 3 4 clade) produce zoospores with two flagella arranged in a DO configuration [9]. Quadriflagellates 5 6 7 with the perfect DO configuration of flagellar bodies characterize the Chaetopeltidales [10], 8 9 whereas quadriflagellates from the Chaetophorales display a polymorphic arrangement in which 10 11 12 one pair of basal bodies has the DO configuration and the other is slightly displaced in a 13 14 clockwise orientation [11, 12]. The members of the Oedogoniales have an unusual flagellar 15 16 apparatus that is characterized by a stephanokont arrangement of flagella [13]. 17 18 19 The Chlamydomonadales, the largest order of the Chlorophyceae, contains about half of 20 21 the 3,336 species currently described in this class [14]. Phylogenetic studies based on the nuclear 22 23 24 18S rRNA gene [7, 15-22] and/or a few chloroplast genes (e.g. atpB, psaB, rbcL) [23-25] as well 25 26 as combined 18S and 26S rDNAs [26] have highlighted numerous chlamydomonadalean 27 28 29 lineages. In an exhaustive phylogenetic analysis of the 449 chlamydomonadalean 18S rDNA 30 31 sequences available in GenBank at the time, Nakada et al. [16] uncovered 21 primary clades 32 33 34 following the rules of PhyloCode. The sequences of the spine-bearing Golenkinia species were 35 36 used to root the phylogeny because a sister relationship between the Golenkiniaceae and the 37 38 Chlamydomonadales had been reported earlier [27] and also because the motile cells of these 39 40 41 two groups have a CW basal body configuration [28]. Nakada et al. [16] identified a sister 42 43 relationship for the strongly supported Xenovolvoxa and Caudivolvoxa superclades, which are 44 45 46 composed of four and six primary clades, respectively. All remaining clades were found to be 47 48 basal relative to these superclades, with the four deepest-branching lineages displaying 49 50 51 quadriflagellates (Hafnionomas, Treubarinia, Radicarteria, and Crucicarteria clades) as 52 53 originally described by Nozaki et al. [24]. The interrelationships between most of the primary 54 55 clades, however, received low statistical support and were influenced by the method used for 56 57 58 phylogenetic inference. Phylogenetic studies based on a combination of two or three genes 59 60 61 62 63 64 65 1 5 2 3 4 (atpB, psaB, rbcL, 18S and 26S rDNAs) yielded trees with improved statistical support for some 5 6 7 of the clades, but their topologies were variable depending on the gene data sets employed and 8 9 were generally in conflict with 18S rDNA phylogenies. One study based on 18S and 26S rDNAs 10 11 12 even called into question the alliance of the deep-branching Treubarinia clade with the 13 14 Chlamydomonadales [2]. 15 16 In the present investigation, we used a chloroplast phylogenomic approach to resolve 17 18 19 problematic relationships among the major chlamydomonadalean clades proposed by Nakada et 20 21 al. [16]. Our taxon sampling included representatives from 16 of the 21 primary clades as well as 22 23 24 two taxa from a coccoid lineage (Jenufa) that was suspected to be sister to the Golenkiniaceae 25 26 [19]. We undertook the partial or complete sequencing of 24 chlorophycean chloroplast genomes 27 28 29 in order to generate the analyzed amino acid and nucleotide data sets. The results of our 30 31 phylogenomic analyses enabled us to resolve the branching order of most of the investigated 32 33 34 clades. Unexpectedly, the two Jenufa species and the representatives of the Golenkiniaceae and 35 36 Treubarinia were recovered in a highly supported monophyletic group that also included the taxa 37 38 belonging to the Sphaeropleales. 39 40 41 42 43 44 Results 45 46 47 The 24 chlorophycean taxa that were selected for chloroplast DNA (cpDNA) sequencing are 48 49 listed in Table 1. As mentioned earlier, they represent 16 of the 21 primary clades proposed by 50 51 52 Nakada et al. [16] and also include two bona fide sphaeroplealean taxa representing the 53 54 Bracteacoccaceae and Mychonastaceae. When we undertook our study, the partial or complete 55 56 cpDNA sequences of only five taxa from the CS clade were available: Chlamydomonas 57 58 59 reinhardtii [29], Volvox carteri f. nagariensis [30], Chlamydomonas moewusii [5], Dunaliella 60 61 62 63 64 65 1 6 2 3 4 salina [31] and Scenedesmus obliquus [32]. We used the Roche 454 or Illumina platform to 5 6 7 sequence the chloroplast genomes of the examined taxa and obtained complete genome 8 9 sequences for 13 taxa (Table 1). The contigs making up the assemblies of the remaining 10 11 12 chloroplast genomes (7 to 111 contigs) typically exhibited repeats at their extremities and one or 13 14 more genes in their internal portions, suggesting that the presence of abundant repeats in 15 16 intergenic regions prevented the assembly of complete genome sequences. Despite this problem, 17 18 19 most if not all genes were recovered from the chloroplast genome of each examined taxon. We 20 21 present here our phylogenetic analyses of concatenated chloroplast genes and proteins; the 22 23 24 salient features of the newly sequenced chloroplast genomes will be reported in a separate 25 26 article. 27 28 29 Phylogenomic analyses of 69 cpDNA-encoded proteins 30 31 32 We initiated our phylogenomic study by analyzing an amino acid data set (PCG-AA) that was 33 34 35 assembled from 69 cpDNA-encoded proteins of 73 core chlorophytes (total of 14,209 sites; see 36 37 Methods for the list of corresponding genes). Missing data were allowed but their proportion 38 39 accounted for only 1.6% of the total data set. This data set was analyzed with PhyloBayes using 40 41 42 the site-heterogeneous CATGTR+Γ4 model and also with RAxML using the site-homogeneous 43 44 GTR+Γ4 model. In the latter analysis, the data set was partitioned by gene, with the model 45 46 47 applied to each partition. 48 49 The majority-rule consensus trees inferred using maximum likelihood (ML) and Bayesian 50 51 52 inference methods display essentially the same topology, with high bootstrap support (BS) 53 54 values found at most of the nodes (Figure 1). As expected, the relationships observed for the 55 56 57 pedinophyceans, trebouxiophyceans and ulvophyceans that were used as outgroup taxa are 58 59 essentially identical to those reported by Lemieux et al. [33]. In addition, the strongly supported 60 61 62 63 64 65 1 7 2 3 4 clade formed by the algae in the OCC lineage is sister to the strongly supported clade uniting the 5 6 7 Chlamydomonadales and the Sphaeropleales (CS clade). The three representatives of recognized 8 9 families within the Sphaeropleales (Bracteacoccaceae, Mychonastaceae, and Scenedesmaceae) 10 11 12 form a robust clade. In the Bayesian tree, this clade occupies a sister position relative to that 13 14 containing the two Jenufa species, Golenkinia longispicula and Treubaria triappendiculata. The 15 16 node coinciding with the ancestor of these seven species received maximal support in both the 17 18 19 Bayesian and ML analyses. However, Treubaria is positioned differently in these analyses: 20 21 instead of being sister to the Golenkinia and Jenufa lineages as in the Bayesian tree, it branches 22 23 24 at the base of the lineages containing Bracteococcus giganteus, Mychonastes jurisii and 25 26 Scenedesmus obliquus in the ML tree (BS = 54%). 27 28 29 The 22 taxa within the Chlamydomonadales form an assemblage of five major 30 31 clades/lineages, all of which received very strong support in both the Bayesian and ML trees, 32 33 34 with the exception of the lineage leading to Carteria sp. SAG 8-5 (BS = 75%). The earliest- 35 36 diverging clade, which includes Hafniomonas laevis and the Crucicarteria (Carteria crucifera 37 38 and Carteria cerasiformis), is followed by the Radicarteria (Carteria sp. SAG 8-5). The next 39 40 41 lineage, which is occupied by the representatives of the Chloromonadinia (two Chloromonas 42 43 species), is sister to an assemblage formed by two major clades, one consisting of the 44 45 46 Oogamochlamydinia (Oogamochlamys and two Lobochlamys species) and Reinhardtinia 47 48 (Volvox and two Chlamydomonas species) and the other of the Caudivolvoxa and Xenovolvoxa. 49 50 51 The Caudivolvoxa contains representatives of the Characiosiphonia (Characiochloris), 52 53 Chlorogonia (Chlorogonium and Haematococcus), Dunaliellinia (Dunaliella), Polytomia 54 55 (Chlamydomonas applanata) and Stephanosphaerinia (Stephanosphaera and Chloromonas 56 57 58 59 60 61 62 63 64 65 1 8 2 3 4 perforata), while the Xenovolvoxa contains representatives of Moewusinia (Chlamydomonas 5 6 7 moewusii), Monadinia (Microglena) and Phacotinia (Phacotus). 8 9 10 Phylogenomic analyses of 98 chloroplast genes 11 12 13 We also wished to infer trees using the 69 genes corresponding to the proteins represented in the 14 15 amino acid data set as well as 29 RNA-coding genes (three rRNA genes and 26 tRNA genes). 16 17 18 Before undertaking these analyses, we evaluated the phylogenetic performance of the third 19 20 codon positions using the saturation test of Xia et al. [34]. We found that the index of 21 22 substitution saturation (Iss = 0.624) was significantly higher (P < 0.001) than the critical value of 23 24 25 the index of saturation (IssAsym = 0.573), implying that third codon positions experienced a 26 27 high level of saturation and are thus useless for phylogenetic reconstructions. Furthermore, the 28 29 30 AT- and GC-skew calculations carried out with DAMBE [35] using the PCG12RNA (34,121 31 32 sites) and PCG123RNA (48,172 sites) data sets, which differ only by the presence/absence of 33 34 35 third codon positions, indicated that inclusion of the third codon positions induced nucleotide 36 37 compositional bias. As shown in Figure 2, the AT-skew values turned negative for many taxa in 38 39 the data set containing these codon positions and both the AT- and GC-skew values became 40 41 42 more scattered. To reduce the saturation level and compositional bias contributed by the third 43 44 codon positions, we assembled the PCG123degenRNA data set (48,172 sites) in which all codon 45 46 47 positions of the 69 protein-coding genes were fully degenerated using degen1 [36]. This script 48 49 operates by degenerating nucleotides at all sites that can potentially undergo synonymous change 50 51 52 in all pairwise comparisons of sequences in the data matrix, thereby making synonymous 53 54 changes largely invisible and reducing compositional heterogeneity but leaving the inference of 55 56 57 nonsynonymous changes largely intact. 58 59 60 61 62 63 64 65 1 9 2 3 4 The PCG12RNA and PCG123degenRNA data sets were analyzed with RAxML using the 5 6 7 GTR+Γ4 model of sequence evolution. These data sets, which contained only 1.4% of missing 8 9 data, were partitioned into 71 groups with the model applied to each partition. The partitions 10 11 12 included the 69 individual protein-coding genes, the concatenated rRNA genes and the 13 14 concatenated tRNA genes. The inferred majority-rule consensus gene trees are essentially 15 16 congruent with the protein trees (Additional file 1); moreover, the same lineages received weak 17 18 19 support in both the gene and protein trees (i.e. the Treubaria and Carteria sp. SAG 8-5 lineages). 20 21 The only notable deviation with the protein trees is that the affiliation of Hafniomonas with the 22 23 24 Crucicarteria suffered from low statistical support in both nucleotide-based phylogenetic 25 26 analyses. As observed in the protein trees, Treubaria is found either at the base of the clade 27 28 29 containing Golenkinia and the two Jenufa species (in the tree inferred from the PCG12RNA data 30 31 set) or at the base of the clade containing Bracteococcus, Mychonastes and Scenedesmus (in the 32 33 34 tree inferred from the PCG123degenRNA). 35 36 37 38 39 Discussion 40 41 In this study, we used a phylogenomic approach to resolve ambiguous nodes in the phylogeny of 42 43 44 the Chlamydomonadales and to better delineate this chlorophycean order from its sister lineage, 45 46 the Sphaeropleales. The chloroplast genome sequences we gathered from 24 taxa allowed us to 47 48 49 increase taxon sampling for these chlorophycean lineages by a factor of 6-fold compared to 50 51 earlier phylogenomic analyses. We examined a total of 29 chlamydomonadalean and 52 53 sphaeroplealean taxa that represent 16 of the 21 primary clades that Nakada et al. [16] recovered 54 55 56 for the Chlamydomonadales, three of the major lineages recognized for the Sphaeropleales 57 58 (Scenedesmaceae, Mychonastaceae and Bracteacoccaceae) as well as the Jenufa lineage, which 59 60 61 62 63 64 65 1 10 2 3 4 was suspected to be sister to the Golenkinia outgroup used by Nakada et al. [16]. The trees 5 6 7 inferred from both the amino acid and nucleotide data sets received strong statistical support for 8 9 the majority of branches (Figures 1 and 3), providing important insights into the phylogenies of 10 11 12 the Chlamydomonadales and Sphaeropleales. 13 14 15 A newly identified lineage sister to the Sphaeropleales 16 17 18 Our results suggest that a previously unrecognized lineage of the Chlorophyceae is sister to the 19 20 Sphaeropleales (Figures 1 and 3). This lineage is detected as a strongly supported clade that 21 22 unites the spine-bearing Golenkinia longispicula with the two Jenufa species (Figures 1 and 3). 23 24 25 This clade may also comprise the Treubarinia; the latter multi-genera lineage, which includes 26 27 Cylindrocapsa, Trochiscia and Elakatothrix in addition to Treubaria [2, 16, 37], is represented in 28 29 30 our study by the spine-bearing Treubaria triappendiculata. While the latter alga is sister to the 31 32 Golenkinia + Jenufa lineages in the Bayesian protein tree (PP = 0.98) and the ML gene tree 33 34 35 inferred with the PCG12RNA data set (BS = 85%), it branches at the base of the lineages formed 36 37 by the representatives of the Scenedesmaceae, Mychonastaceae and Bracteacoccace in the ML 38 39 protein tree (BS = 54%) and the gene tree inferred with the PCG123degenRNA data set (BS = 40 41 42 54%). Like the Golenkinia species, some members of the Treubarinia (Treubaria and Trochiscia 43 44 species) bear spines, but spine morphology and composition differ in the two lineages [28]. 45 46 47 Among the previously published phylogenies of chlorophyceans, only the 18S + 28S rDNA 48 49 study of Shoup and Lewis [37] recovered the Treubarinia within the Sphaeropleales. These 50 51 52 authors sampled diverse lineages of the Chlamydomonadales and Sphaeropleales but no 53 54 representatives of the Golenkinia were examined. In both the Bayesian and maximum parsimony 55 56 57 trees they inferred, Cylindrocapsa geminella and Trochiscia hystrix were found to be sister to the 58 59 Sphaeropleaceae, but support was weak (BS < 50% and PP < 0.50). In contrast, the previously 60 61 62 63 64 65 1 11 2 3 4 reported 18S + 28S rDNA analyses of Buchheim et al. [2] revealed, with weak support, an 5 6 7 affinity between the Treubaria + Cylindrocapsa lineage and the Chlamydomonadales. More 8 9 recently, consistent with the 18S rDNA trees inferred by Gerloff-Elias et al. [15], the 18S rDNA 10 11 12 analyses of Němcová et al. [19] recovered the Treubarinia as sister to the Hafniomonas lineage 13 14 (Chlamydomonadales) but again with no statistical support. 15 16 The taxonomic history of the Golenkinia genus is confusing because it underwent several 17 18 19 revisions. Ettl and Komárek [38] included Golenkinia together with Chlorotetraedron and 20 21 Polyedriopsis in the Neochloridaceae, a family of the Sphaeropleales containing aquatic coccoid 22 23 24 algae that are mostly multinucleate. Subsequently, Komárek and Fott [39] erected the 25 26 Golenkiniaceae to accommodate unicellular algae exhibiting spherical cells with spiny 27 28 29 projections on their cells walls, including Golenkinia and Polyedriopsis. However, an affinity 30 31 between these two genera could not be confirmed by 18S rDNA analyses: it was found that 32 33 34 Polyedriopsis is closely related to members of the Neochloridaceae, in particular Neochloris and 35 36 Chorotetraedron [40] but that Golenkinia is sister to the Chlamydomonadales [27, 41]. Using 37 38 also the 18S rDNA marker, Němcová et al. [19] identified a loose relationship between the 39 40 41 Jenufa and Golenkinia genera, which is in agreement with our study, but this clade could not be 42 43 assigned to a specific chlorophycean order. 44 45 46 The novel lineage identified here as sister to the Sphaeropleales possibly represents one of 47 48 the the deepest branch of this order. To elucidate its relationships to the major clades and 49 50 51 families recognized in this order, phylogenomic analyses with a broader taxon sampling 52 53 including all major sphaeroplealean lineages as well as additional representatives of the 54 55 Treubarinia will be required. Fučíková et al. [42] recently proposed an updated family-level 56 57 58 taxonomy comprising ten new families based on a study of seven genes (three nuclear and four 59 60 61 62 63 64 65 1 12 2 3 4 chloroplast genes) from taxa sampled across the Sphaeropleales, but the relationships among 5 6 7 most of the 17 recognized families could not be resolved. The inferred trees suggested that the 8 9 genus Mychonastes, which contains aquatic uninucleate coccoid algae, may be the deepest- 10 11 12 diverging lineage. This observation is not consistent with our finding that the soil multinucleate 13 14 coccoid alga Bracteococcus (Bracteacoccaceae) is sister to the clade formed by the unicleate 15 16 Scenedesmus (Scenedesmaceae) and Mychonastes (Mychonastaceae) (Figures 1 and 3). The 17 18 19 latter relationships, however, are compatible with the 18S rDNA tree of Němcová et al. [19]. 20 21 22 Relationships within the Chlamydomonadales 23 24 25 Prior to our investigation, multiple clades had been delineated for the Chlamydomonadales but 26 27 their inter-relationships remained ambiguous. We present here for the first time a robust 28 29 30 phylogeny of the Chlamydomonadales that resolves with confidence the branching order of most 31 32 of the main lineages investigated (Figures 1 and 3). In addition to the Xenovolvoxa and 33 34 35 Caudivolvoxa, we recovered an additional superclade that unites the Oogamochlamydinia and 36 37 Reinhardtinia; this superclade is sister to the Caudivolvoxa + Xenovolvoxa. 38 39 The internal structures of the Caudivolvoxa and Xenovolvoxa superclades were fully resolved 40 41 42 in our trees. The topology observed for the Caudivolvoxa was identical to that found in the 18S 43 44 rDNA phylogeny of Nakada et al. [16] and the three-gene phylogeny of Nakada and Tomita [18], 45 46 47 but the clade containing the Polytominia and Dunaliellinia received little or no support in the 48 49 latter phylogenies. As observed previously [16-19, 26], we found that the Chlorogonia and 50 51 52 Stephanosphaerinia form a clade and that the Characiosiphonia is the most basal lineage of the 53 54 Caudivolvoxa. For the Xenovolvoxa, we identified the Phacotinia as sister to the Monadinia + 55 56 57 Moewusinia clade. In contrast, the three-gene phylogeny of Nakada and Tomita [18] and the 18S 58 59 60 61 62 63 64 65 1 13 2 3 4 rDNA tree inferred by Nakada et al. [16] placed the Moewusinia at the base of this clade with 5 6 7 little or no support. 8 9 The relationships we observed among the basal chlamydomonadalean lineages 10 11 12 (Crucicarteria + Hafniomonas, Radicarteria and Chloromonadinia) were congruent with the 13 14 three-gene (18S rDNA, atpB and psaB) phylogenies inferred by Nozaki et al. [25] and Matsuzaki 15 16 et al. [23]. The representative of the Radicarteria (Carteria sp. SAG 8-5) was not positioned 17 18 19 with high confidence in both our protein and gene trees and the sister-relationship between the 20 21 Hafniomonas lineage and the Crucicarteria was poorly supported in the gene tree (Figures 1 and 22 23 24 3). The Tetraflagellochloris and Spermatozopsis lineages, which were resolved as deep branches 25 26 in 18S rDNA trees [7, 16, 17, 19, 22], will need to be sampled in future phylogenomic studies in 27 28 29 order to clarify the branching order of the earliest-diverging lineages of the Chlamydomonadales. 30 31 In the phylogeny inferred by Barsanti et al. [7], Tetraflagellochloris mauritanica, a 32 33 34 quadriflagellate recently isolated from the desert, occupies the deepest position within this order 35 36 and consistent with some other 18S rDNA studies [16, 17, 20], the Spermatozopsis lineage 37 38 occupies a sister position relative to the Radicarteria; but, in contrast to our study and all other 39 40 41 phylogenies reported so far, the Hafniomonas lineage was found to be allied with the 42 43 Reinhardtinia. 44 45 46 Evolution of flagellar apparatus structure 47 48 49 Considering that the novel clade reported here as sister to the Sphaeropleales contains at least 50 51 52 one lineage with quadriflagellate motile cells (Golenkinia; G. radiata, the type species, is 53 54 quadriflagellate while G. longispicula is biflagellate), it is reasonable to propose, as hypothesized 55 56 57 by Nozaki et al. [24] for the Chlamydomonadales, that quadriflagellates also gave rise to the 58 59 biflagellate motile cells found in the Sphaeropleales. Interestingly, the biflagellate motile cells of 60 61 62 63 64 65 1 14 2 3 4 G. longispicula exhibits a CW orientation of basal bodies [27, 28], whereas all sphaeroplealean 5 6 7 taxa that have been investigated for their flagellar apparatus have a DO configuration [9]. This 8 9 phylogenetic distribution of flagellar architecture suggests that the quadriflagellate ancestor of 10 11 12 these algae possessed a CW + DO organization and that loss of the flagellar pair exhibiting the 13 14 CW organization gave rise to the DO flagellar apparatus in sphaeroplealeans. To substantiate this 15 16 hypothesis, it would be important to examine the flagellar apparatus of quadriflagellate motile 17 18 19 cells from genera belonging to both the Golenkinia and Treubarinia. In this connection, it is 20 21 worth mentioning that two pairs of basal bodies with a unusual arrangement (diagonally 22 23 24 opposed) have been reported for Cylindrocapsa, a member of the Treubarinia [43]. 25 26 Given the recent finding of a CW + DO flagellar architecture in the deeply branching 27 28 29 chlamydomonadalean Tetraflagellochloris mauritanica [7], it appears that not only the 30 31 architecture characteristic of the Sphaeropleales (DO) but also that characteristic of the 32 33 34 Chlamydomonadales (CW) were derived from a CW + DO quadriflagellate ancestor. Actually, 35 36 mapping of character states for the flagellar apparatus on the topology of the Chlorophyceae 37 38 reveals that the quadriflagellate ancestor of all chlorophyceans also exhibited the CW + DO 39 40 41 flagellar architecture (Figure 4). In the predicted scenario, the DO condition changed to CW 42 43 during the evolution of the Chlamydomonadales (CW + DO → CW + CW) and the CW 44 45 46 condition changed to DO during the evolution of the Chaetopeltidales (CW + DO → DO + DO). 47 48 The latter shift is not consistent with the hypothesis that O’Kelly and Floyd [44] proposed for the 49 50 51 evolution of the flagellar apparatus in the green algae. According to this hypothesis, the CCW 52 53 configuration displayed by the members of the Ulvophyceae and was 54 55 converted to the CW configuration by progressive clockwise rotation of flagellar components, 56 57 58 with the unidirectional sequence of arrangements proposed being CCW → DO → CW. The 59 60 61 62 63 64 65 1 15 2 3 4 previous models based on phylogenetic and flagellar ultrastructural data were in agreement with 5 6 7 the hypothesis of O’Kelly and Floyd [44]: the flagellar architecture of the ancestor of all 8 9 chlorophyceans was the DO + DO configuration and convergent shifts from the DO to CW 10 11 12 condition marked the evolution of the Chlamydomonadales and Chaetophorales [2, 5]. 13 14 15 Conclusions 16 17 18 Our chloroplast phylogenomic study advances our understanding regarding the circumscription 19 20 of both the Chlamydomonadales and Sphaeropleales and the relationships of major lineages 21 22 within these orders. We inferred robust protein and gene trees using the newly determined 23 24 25 chloroplast genome sequences of 24 taxa representing 16 of the 21 primary clades previously 26 27 recognized in the Chlamydomonadales and of two Jenufa species that belonged to a lineage of 28 29 30 uncertain affiliation suspected to be sister to the Golenkiniaceae. Our most surprising discovery 31 32 is the placement of the Jenufa, Golenkinia and Treubaria genera in a clade sister to the 33 34 35 Sphaeropleales. Whether this new clade should be considered as part of the Sphaeropleales will 36 37 await future analyses with a better representation of deep-branching lineages from both the 38 39 Chlamydomonales and Sphaeropleales. Character state reconstruction of basal body orientation 40 41 42 on the topology reported here for the Chlorophyceae also enabled us to refine the model for the 43 44 evolution of the flagellar apparatus in this class. 45 46 47 The chloroplast genome sequences generated in this study constitute a highly valuable 48 49 resource for future studies on the phylogeny of chlorophyceans and the evolution of their 50 51 52 chloroplast genome. The 13 fully sequenced and annotated genomes more than double the 53 54 number of chloroplast genomes publicly available for the Chlamydomonadales and 55 56 57 Sphaeropleales. In a forthcoming article, we will show that these newly acquired sequences 58 59 greatly improve our understanding of chloroplast genome evolution in the CS clade. 60 61 62 63 64 65 1 16 2 3 4 5 6 7 Methods 8 9 10 Strains and culture conditions 11 12 13 The 24 green algal strains that were selected for chloroplast genome sequencing are listed in 14 15 Table 1. All of these strains, except Carteria sp. SAG 8-5, were obtained from the culture 16 17 18 collections of algae at the University of Goettingen (SAG), the University of Texas at Austin 19 20 (UTEX), the National Institute of Environmental Studies in Tsukuba (NIES), and Charles 21 22 23 University in Prague (CAUP). Carteria sp. SAG 8-5 (=UTEX 2) was a gift of Dr. Mark 24 25 Buchheim (University of Tulsa). All strains were grown in C medium [45] at 18°C under 26 27 28 alternating 12h-light/12h-dark periods. 29 30 31 Genome sequencing, assembly and annotation 32 33 34 As indicated in Table 1, 15 of the genomes analyzed were sequenced using the Roche 454 35 36 method and the remaining nine using the Illumina method. For 454 sequencing, shotgun libraries 37 38 (700-bp fragments) of A+T-rich DNA fractions obtained as described previously [46] were 39 40 41 constructed using the GS-FLX Titanium Rapid Library Preparation Kit of Roche 454 Life 42 43 Sciences (Branford, CT, USA). Library construction and 454 GS-FLX DNA Titanium 44 45 46 pyrosequencing were carried out by the “Plateforme d’Analyses Génomiques de l’Université 47 48 Laval” [47]. Reads were assembled using Newbler v2.5 [48] with default parameters, and contigs 49 50 51 were visualized, linked and edited using the CONSED 22 package [49]. Contigs of chloroplast 52 53 origin were identified by BLAST searches against a local database of organelle genomes. 54 55 56 Regions spanning gaps in the cpDNA assemblies were amplified by polymerase chain reaction 57 58 (PCR) with primers specific to the flanking sequences. Purified PCR products were sequenced 59 60 61 62 63 64 65 1 17 2 3 4 using Sanger chemistry with the PRISM BigDye Terminator Ready Reaction Cycle Sequencing 5 6 7 Kit (Applied Biosystems, Foster City, CA, USA). 8 9 For Illumina sequencing, total cellular DNA was isolated using the EZNA HP Plant Mini 10 11 12 Kit of Omega Bio-Tek (Norcross, GA, USA). Libraries of 700-bp fragments were constructed 13 14 using the TrueSeq DNA Sample Prep Kit (Illumina, San Diego, CA, USA) and paired-end reads 15 16 were generated on the Illumina HiSeq 2000 (100-bp reads) or the MiSeq (300-bp reads) 17 18 19 sequencing platforms by the Innovation Centre of McGill University and Genome Quebec [50] 20 21 and the “Plateforme d’Analyses Génomiques de l’Université Laval” [47], respectively. Reads 22 23 24 were assembled using Ray 2.3.1 [51] and contigs were visualized, linked and edited using the 25 26 CONSED 22 package [49]. Identification of cpDNA contigs and gap filling were performed as 27 28 29 described above for 454 sequence assemblies. 30 31 Genes were identified on the final assemblies using a custom-built suite of bioinformatics 32 33 34 tools [52]. Genes coding for rRNAs and tRNAs were localized using RNAmmer [53] and 35 36 tRNAscan-SE [54], respectively. Intron boundaries were determined by modeling intron 37 38 secondary structures [55, 56] and by comparing intron-containing genes with intronless 39 40 41 homologs. 42 43 44 Phylogenomic analyses of the amino acid data set 45 46 47 The chloroplast genomes of 73 core chlorophyte taxa were used to generate the analyzed amino 48 49 acid and nucleotide data sets. The GenBank accession numbers of the genomes sequenced in this 50 51 52 study are presented in Table 1; those of the remaining genomes are as follows: Pedinomonas 53 54 minor UTEX LB 1350, [GenBank:NC_016733]; Pedinomonas tuberculata SAG 42.84, 55 56 57 [GenBank:KM462867]; Marsupiomonas sp. NIES 1824, [GenBank:KM462870]; Pseudochloris 58 59 wilhelmii SAG 1.80, [GenBank:KM462886]; Chlorella variabilis NC64A, 60 61 62 63 64 65 1 18 2 3 4 [GenBank:NC_015359]; Chlorella vulgaris C-27, [GenBank:NC_001865]; Dicloster acuatus 5 6 7 SAG 41.98, [GenBank:KM462885]; Marvania geminata SAG 12.88, [GenBank:KM462888]; 8 9 Parachlorella kessleri SAG 211-11g, [GenBank:NC_012978]; Botryococcus braunii SAG 807- 10 11 12 1, [GenBank:KM462884]; Choricystis minor SAG 17.98, [GenBank:KM462878]; Coccomyxa 13 14 subellipsoidea NIES 2166, [GenBank:NC_015084]; Elliptochloris bilobata CAUP H7103, 15 16 [GenBank:KM462887]; Paradoxia multiseta SAG 18.84, [GenBank:KM462879]; 17 18 19 Trebouxiophyceae sp. MX-AZ01, [GenBank:NC_018569]; Geminella minor SAG 22.88, 20 21 [GenBank:KM462883]; Geminella terricola SAG 20.91, [GenBank:KM462881]; Gloeotilopsis 22 23 24 sterilis UTEX 1704, [GenBank:KM462877]; Fusochloris perforata SAG 28.85, 25 26 [GenBank:KM462882]; Microthamnion kuetzingianum UTEX 318, [GenBank:KM462876]; 27 28 29 Oocystis solitaria SAG 83.80, [GenBank:FJ968739]; Planctonema lauterbornii SAG 68.94, 30 31 [GenBank:KM462880]; “Chlorella” mirabilis SAG 38.88, [GenBank:KM462865]; Koliella 32 33 34 longiseta UTEX 339, [GenBank:KM462868]; Pabia signiensis SAG 7.90, 35 36 [GenBank:KM462866]; Stichococcus bacillaris UTEX 176, [GenBank:KM462864]; 37 38 Prasiolopsis sp. SAG 84.81, [GenBank:KM462862]; Myrmecia israelensis UTEX 1181, 39 40 41 [GenBank:KM462861]; Trebouxia aggregata SAG 219-1D, [GenBank:EU123962-EU124002]; 42 43 Dictyochloropsis reticulata SAG 2150, [GenBank:KM462860]; Watanabea reniformis SAG 44 45 46 211-9b, [GenBank:KM462863]; Pleurastrosarcina brevispinosa UTEX 1176, 47 48 [GenBank:KM462875]; “Koliella” corcontica SAG 24.84, [GenBank:KM462874]; Leptosira 49 50 51 terrestris UTEX 333, [GenBank:NC_009681]; incisa SAG 2007, 52 53 [GenBank:KM462871]; Neocystis brevis CAUP D802, [GenBank:KM462873]; Parietochloris 54 55 pseudoalveolaris UTEX 975, [GenBank:KM462869]; Xylochloris irregularis CAUP H7801, 56 57 58 [GenBank:KM462872]; Oltmannsiellopsis viridis NIES 360, [GenBank:NC_008099]; 59 60 61 62 63 64 65 1 19 2 3 4 Pseudendoclonium akinetum UTEX 1912, [GenBank:NC_008114]; Oedogonium cardiacum 5 6 7 SAG 575-1b, [GenBank:NC_011031]; Floydiella terrestris UTEX 1709, 8 9 [GenBank:NC_014346]; Stigeoclonium helveticum UTEX 441, [GenBank:NC_008372]; 10 11 12 Schizomeris leibleinii UTEX LB 1228, [GenBank:NC_015645]; Scenedesmus obliquus UTEX 13 14 393, [GenBank:NC_008101]; Chlamydomonas moewusii UTEX 97, [GenBank:EF587443- 15 16 EF587503]; Dunaliella salina CCAP 19/18, [GenBank:NC_016732]; Volvox carteri f. 17 18 19 nagariensis UTEX 2908, [GenBank:GU084820]; and Chlamydomonas reinhardtii, 20 21 [GenBank:NC_005353]. 22 23 24 A total of 69 protein-coding genes were used to construct the amino acid data set (PCG- 25 26 AA): atpA, B, E, F, H, I, ccsA, cemA, chlB, L, N, clpP, ftsH, infA, petA, B, D, G, L, psaA, B, C, J, 27 28 29 M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 12, 14, 16, 20, 23, 32, 36, rpoA, B, 30 31 C1, C2, rps2, 3, 4, 7, 8, 9, 11, 12, 14, 18, 19, tufA, ycf1, 3, 4, 12. This data set was prepared as 32 33 34 follows: the deduced amino acid sequences from the 69 individual genes were aligned using 35 36 MUSCLE 3.7 [57], the ambiguously aligned regions in each alignment were removed using 37 38 TRIMAL 1.3 [58] with the options block=6, gt=0.7, st=0.005 and sw=3, and the protein 39 40 41 alignments were concatenated using Phyutility 2.2.6 [59]. 42 43 Phylogenies were inferred from the PCG-AA data set using the ML and Bayesian methods. 44 45 46 ML analyses were carried out using RAxML 8.1.14 [60] and the GTR+Γ4 model of sequence 47 48 evolution; in these analyses, the data set was partitioned by gene, with the model applied to each 49 50 51 partition. Confidence of branch points was estimated by fast-bootstrap analysis (f=a) with 500 52 53 replicates. Bayesian analyses were performed with PhyloBayes 3.3f [61] using the site- 54 55 heterogeneous CATGTR+Γ4 model [62]. Five independent chains were run for 2,000 cycles and 56 57 58 consensus topologies were calculated from the saved trees using the BPCOMP program of 59 60 61 62 63 64 65 1 20 2 3 4 PhyloBayes after a burn-in of 500 cycles. Under these conditions, the largest discrepancy 5 6 7 observed across all bipartitions in the consensus topologies (maxdiff) was lower than 0.15, 8 9 indicating that convergence between the chains was achieved. 10 11 12 Phylogenomic analyses of nucleotide data sets 13 14 15 Two DNA datasets were constructed: PCG123degenRNA (all degenerated codon positions of 69 16 17 18 protein-coding genes plus three rRNA genes and 26 tRNA genes) and PCG12RNA (first and 19 20 second codon positions of the 69 protein-coding genes plus three rRNA genes and 26 tRNA 21 22 genes). The PCG123degenRNA data set was prepared as follows. The multiple sequence 23 24 25 alignment of each protein was converted into a codon alignment, the poorly aligned and 26 27 divergent regions in each codon alignment were excluded using Gblocks 0.91b [63] with the - 28 29 30 t=c, -b3=5, -b4=5 and -b5=half options, and the individual gene alignments were concatenated 31 32 using Phyutility 2.2.6 [59]. The Degen1.pl 1.2 script of Regier et al. [36] was applied to the 33 34 35 resulting concatenated alignment (PCG123) and finally, the degenerated matrix was combined 36 37 with the concatenated alignment of the following RNA genes: rrf, rrl, rrs, trnA(ugc), C(gca), 38 39 D(guc), E(uuc), F(gaa), G(gcc), G(ucc), H(gug), I(cau), I(gau), K(uuu), L(uaa), L(uag), Me(cau), 40 41 42 Mf(cau), N(guu), P(ugg), Q(uug), R(acg), R(ucu), S(gcu), S(uga), T(ugu), V(uac), W(cca), Y(gua). 43 44 The latter genes were aligned using MUSCLE 3.7 [57], the ambiguously aligned regions in each 45 46 47 alignment were removed using TRIMAL 1.3 [58] with the options block=6, gt=0.9, st=0.4 and 48 49 sw=3, and the individual alignments were concatenated using Phyutility 2.2.6 [59]. To obtain the 50 51 52 PCG12RNA data set, the third codon positions of the PCG123 alignment were excluded using 53 54 Mesquite 3.03 [64] and the resulting alignment was merged with the filtered RNA gene 55 56 57 alignment. 58 59 60 61 62 63 64 65 1 21 2 3 4 ML analyses of the PCG12RNA and PCG123degenRNA nucleotide data sets were carried 5 6 7 out using RAxML 8.1.14 [60] and the GTR+Γ4 model of sequence evolution. In these analyses, 8 9 the data sets were partitioned into 71 groups, with the model applied to each partition. The 10 11 12 partitions included the 69 individual protein-coding genes, the concatenated rRNA genes and the 13 14 concatenated tRNA genes. Confidence of branch points was estimated by fast-bootstrap analysis 15 16 (f=a) with 500 replicates. 17 18 19 Nucleotide substitution saturation for each of the three codon positions of concatenated 20 21 chlorophycean protein coding genes was assessed using the test of Xia et al. [34] implemented in 22 23 24 DAMBE [35]. This program was also employed to calculate AT-skew and GC-skew of 25 26 chlorophycean sequences within the PCG12RNA and PCG123RNA data sets as a measure of 27 28 29 nucleotide compositional differences. 30 31 32 Availability of supporting data 33 34 35 The sequence data generated in this study are available in GenBank under the accession numbers 36 37 KT624630 - KT625422 (see Table 1). The data sets supporting the results of this article are 38 39 40 available in the Dryad Digital Repository (doi: 10.5061/dryad.nh149) [65]. 41 42 43 44 Additional file 45 46 Additional file 1: Phylogeny of chlorophycean taxa inferred using nucleotide data sets 47 48 49 assembled from 69 protein-coding and 29 RNA-coding genes. The tree presented in this figure is 50 51 the best-scoring ML tree inferred using the PCG12RNA data set under the GTR+Γ4 model. BS 52 53 54 support values are reported for the ML analyses of the PCG12RNA and PCG123degenRNA data 55 56 sets. 57 58 59 60 Abbreviations 61 62 63 64 65 1 22 2 3 4 BS: bootstrap support; CCW: counterclockwise; cpDNA: Chloroplast DNA; CS: 5 6 7 Chlamydomonadales and Sphaeropleales; CW: clockwise; DO: directly opposed; OCC: 8 9 Oedogoniales, Chaetophorales and Chaetopeltidales; PCR: polymerase chain reaction; PP: 10 11 12 posterior probability. 13 14 15 Competing interests 16 17 18 The authors declare that they have no competing interests. 19 20 21 22 Authors’ contributions 23 24 CL and MT conceived the study and wrote the manuscript. ATV, AL and CO performed the 25 26 27 experimental work. ATV, AL, CO and CL carried out the genome assemblies and annotations. 28 29 CL performed the phylogenetic analyses and generated the figures. MT and CL analyzed the 30 31 32 phylogenetic data. All authors read and approved the final manuscript. 33 34 35 Acknowledgments 36 37 38 This work was supported by a Discovery grant from the Natural Sciences and Engineering 39 40 41 Research Council of Canada (to C.L. and M.T.). We thank Jeff Gauthier for his participation in 42 43 the assemblies of the Lobochlamys culleus, Lobochlamys segnis and Chlamydomonas applanata 44 45 46 chloroplast genomes. We are also grateful to Dr. Hisayoshi Nozaki and two anonymous 47 48 reviewers for their helpful suggestions. 49 50 51 52 References 53 54 55 1. Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, De Clerck 56 57 O: Phylogeny and molecular evolution of the green algae. CRC Crit Rev Plant Sci 58 59 2012, 31:1-46. 60 61 62 63 64 65 1 23 2 3 4 2. Buchheim MA, Michalopulos EA, Buchheim JA: Phylogeny of the Chlorophyceae with 5 6 7 special reference to the Sphaeropleales: a study of 18S and 26S rDNA data. J Phycol 8 9 2001, 37(5):819-835. 10 11 12 3. Wolf M, Buchheim M, Hegewald E, Krienitz L, Hepperle D: Phylogenetic position of 13 14 the Sphaeropleaceae (Chlorophyta). Plant Syst Evol 2002, 230(3-4):161-171. 15 16 4. Brouard JS, Otis C, Lemieux C, Turmel M: The exceptionally large chloroplast 17 18 19 genome of the green alga Floydiella terrestris illuminates the evolutionary history of 20 21 the Chlorophyceae. Genome Biol Evol 2010, 2:240-256. 22 23 24 5. Turmel M, Brouard J-S, Gagnon C, Otis C, Lemieux C: Deep division in the 25 26 Chlorophyceae (chlorophyta) revealed by chloroplast phylogenomic analyses. J 27 28 29 Phycol 2008, 44(3):739-750. 30 31 6. Lewis LA, McCourt RM: Green algae and the origin of land plants. Am J Bot 2004, 32 33 34 91(10):1535-1556. 35 36 7. Barsanti L, Frassanito AM, Passarelli V, Evangelista V, Etebari M, Paccagnini E, Lupetti 37 38 P, Lenzi P, Verni F, Gualtieri P: Tetraflagellochloris mauritanica gen. et sp. nov. 39 40 41 (Chlorophyceae), a new flagellated alga from the mauritanian desert: morphology, 42 43 ultrastructure, and phylogenetic framing. J Phycol 2013, 49(1):178-193. 44 45 46 8. Lembi CA: The fine structure of the flagellar apparatus of Carteria. J Phycol 1975, 47 48 11(1):1-9. 49 50 51 9. Deason TR, Silva PC, Watanabe S, Floyd GL: Taxonomic status of the species of the 52 53 green algal genus Neochloris. Plant Syst Evol 1991, 177(3-4):213-219. 54 55 56 57 58 59 60 61 62 63 64 65 1 24 2 3 4 10. O'Kelly CJ, Watanabe S, Floyd GL: Ultrastructure and phylogenetic relationships of 5 6 7 Chaetopeltidales ord nov (Chlorophyta, Chlorophyceae). J Phycol 1994, 30(1):118- 8 9 128. 10 11 12 11. Bakker ME, Lokhorst GM: Ultrastructure of Draparnaldia glomerata 13 14 (Chaetophorales, Chlorophyceae). I. The flagellar apparatus of the zoospore. Nord J 15 16 Bot 1984, 4(2):261-273. 17 18 19 12. Watanabe S, Floyd GL: Ultrastructure of the quadriflagellate zoospores of the 20 21 filamentous green algae Chaetophora incrassata and Pseudoschizomeris caudata 22 23 24 (Chaetophorales, Chlorophyceae) with emphasis on the flagellar apparatus. Bot Mag 25 26 Tokyo 1989, 102:533-546. 27 28 29 13. Pickett-Heaps J: Green Algae: Structure, Reproduction and Evolution in Selected 30 31 Genera. Sunderland, Massachusetts: Sinauer Associates, Inc.; 1975. 32 33 34 14. Guiry MD, Guiry GM: AlgaeBase. World-wide electronic publication. National 35 36 University of Ireland, Galway. In. http://www.algaebase.org; 2015. 37 38 15. Gerloff-Elias A, Spijkerman E, Pröschold T: Effect of external pH on the growth, 39 40 41 photosynthesis and photosynthetic electron transport of Chlamydomonas acidophila 42 43 Negoro, isolated from an extremely acidic lake (pH 2.6). Plant, Cell & Environment 44 45 46 2005, 28(10):1218-1229. 47 48 16. Nakada T, Misawa K, Nozaki H: Molecular systematics of Volvocales 49 50 51 (Chlorophyceae, Chlorophyta) based on exhaustive 18S rRNA phylogenetic 52 53 analyses. Mol Phylogenet Evol 2008, 48(1):281-291. 54 55 56 57 58 59 60 61 62 63 64 65 1 25 2 3 4 17. Nakada T, Nozaki H: Taxonomic study of two new genera of fusiform green 5 6 7 flagellates, Tabris gen. nov. and Hamakko gen. nov. (Volvocales, Chlorophyceae). J 8 9 Phycol 2009, 45(2):482-492. 10 11 12 18. Nakada T, Tomita M: Chlamydomonas neoplanoconvexa nom. nov. and its unique 13 14 phylogenetic position within Volvocales (Chlorophyceae). Phycol Res 2011, 15 16 59(3):194-199. 17 18 19 19. Němcová Y, Eliáš M, Škaloud P, Hodač L, Neustupa J: Jenufa gen. nov.: a new genus 20 21 of coccoid green algae (Chlorophyceae, incertae sedis) previously recorded by 22 23 24 environmental sequencing. J Phycol 2011, 47(4):928-938. 25 26 20. Pröschold T, Marin B, Schlösser UG, Melkonian M: Molecular phylogeny and 27 28 29 taxonomic revision of Chlamydomonas (Chlorophyta). I. Emendation of 30 31 Chlamydomonas Ehrenberg and Chloromonas Gobi, and description of 32 33 34 Oogamochlamys gen. nov. and Lobochlamys gen. nov. Protist 2001, 152(4):265-300. 35 36 21. Watanabe S, Mitsui K, Nakayama T, Inouye I: Phylogenetic relationships and 37 38 taxonomy of sarcinoid green algae: Chlorosarcinopsis, Desmotetra, Sarcinochlamys 39 40 41 gen. nov., Neochlorosarcina, and Chlorosphaeropsis (Chlorophyceae, Chlorophyta). 42 43 J Phycol 2006, 42(3):679-695. 44 45 46 22. Watanabe S, Tsujimura S, Misono T, Nakamura S, Inoue H: Hemiflagellochloris 47 48 kazakhstanica gen. et sp. nov.: a new coccoid green alga with flagella of considerably 49 50 51 unequal lengths from a saline irrigation land in Kazakhstan (Chlorophyceae, 52 53 Chlorophyta). J Phycol 2006, 42(3):696-706. 54 55 56 57 58 59 60 61 62 63 64 65 1 26 2 3 4 23. Matsuzaki R, Nakada T, Hara Y, Nozaki H: Light and electron microscopy and 5 6 7 molecular phylogenetic analyses of Chloromonas pseudoplatyrhyncha (Volvocales, 8 9 Chlorophyceae). Phycol Res 2010, 58(3):202-209. 10 11 12 24. Nozaki H, Misumi O, Kuroiwa T: Phylogeny of the quadriflagellate Volvocales 13 14 (Chlorophyceae) based on chloroplast multigene sequences. Mol Phylogenet Evol 15 16 2003, 29(1):58-66. 17 18 19 25. Nozaki H, Nakada T, Watanabe S: Evolutionary origin of Gloeomonas (Volvocales, 20 21 Chlorophyceae), based on ultrastructure of chloroplasts and molecular phylogeny. J 22 23 24 Phycol 2010, 46(1):195-201. 25 26 26. Buchheim MA, Sutherland DM, Buchheim JA, Wolf M: The blood alga: phylogeny of 27 28 29 Haematococcus (Chlorophyceae) inferred from ribosomal RNA gene sequence data. 30 31 Eur J Phycol 2013, 48(3):318-329. 32 33 34 27. Wolf M, Hegewald E, Hepperle D, Krienitz L: Phylogenetic position of the 35 36 Golenkiniaceae (Chlorophyta) as inferred from 18S rDNA sequence data. Biologia, 37 38 Bratislava 2003, 58(4):433-436. 39 40 41 28. Hegewald E, Schnepf E: Zur Struktur und Taxonomie bestachelter Chlorococcales 42 43 (Micractiniaceae, Golenkiniaceae, Siderocystopsis). Nova Hedwigia 1984, 39:297-383. 44 45 46 29. Maul JE, Lilly JW, Cui L, dePamphilis CW, Miller W, Harris EH, Stern DB: The 47 48 Chlamydomonas reinhardtii plastid chromosome: islands of genes in a sea of repeats. 49 50 51 Plant Cell 2002, 14(11):2659-2679. 52 53 30. Smith DR, Lee RW: The mitochondrial and plastid genomes of Volvox carteri: 54 55 bloated molecules rich in repetitive DNA. BMC Genomics 2009, 10:132. 56 57 58 59 60 61 62 63 64 65 1 27 2 3 4 31. Smith DR, Lee RW, Cushman JC, Magnuson JK, Tran D, Polle JE: The Dunaliella 5 6 7 salina organelle genomes: large sequences, inflated with intronic and intergenic 8 9 DNA. BMC Plant Biol 2010, 10:83. 10 11 12 32. de Cambiaire J-C, Otis C, Lemieux C, Turmel M: The complete chloroplast genome 13 14 sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact 15 16 gene organization and a biased distribution of genes on the two DNA strands. BMC 17 18 19 Evol Biol 2006, 6:37. 20 21 33. Lemieux C, Otis C, Turmel M: Chloroplast phylogenomic analysis resolves deep-level 22 23 24 relationships within the green algal class Trebouxiophyceae. BMC Evol Biol 2014, 25 26 14:211. 27 28 29 34. Xia X, Xie Z, Salemi M, Chen L, Wang Y: An index of substitution saturation and its 30 31 application. Mol Phylogenet Evol 2003, 26(1):1-7. 32 33 34 35. Xia X: DAMBE5: a comprehensive software package for data analysis in molecular 35 36 biology and evolution. Mol Biol Evol 2013, 30(7):1720-1728. 37 38 36. Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, Wetzer R, Martin JW, Cunningham 39 40 41 CW: Arthropod relationships revealed by phylogenomic analysis of nuclear protein- 42 43 coding sequences. Nature 2010, 463(7284):1079-1083. 44 45 46 37. Shoup S, Lewis LA: Polyphyletic origin of parallel basal bodies in swimming cells of 47 48 chlorophycean green algae (Chlorophyta). J Phycol 2003, 39(4):789-796. 49 50 51 38. Ettl H, Komárek J: Was versteht man unter dem Begriff "coccale Grünalgen"? 52 53 Algological Studies/Archiv für Hydrobiologie, Supplement Volumes 1982, 29:345-374. 54 55 56 57 58 59 60 61 62 63 64 65 1 28 2 3 4 39. Komárek J, Fott B: Chlorophyceae (Grünalgen). Ordnung Chlorococcales. In: Das 5 6 7 Phytoplankton des Süßwassers. Edited by Huber-Pestalozzi G, vol. 7. Stuttgart: E. 8 9 Schweizerbart'sche Verlagsbuchhandlung; 1983: 1-1044. 10 11 12 40. Hegewald E, Hepperle D, Wolf M, Krienitz L: Phylogenetic placement of 13 14 Chlorotetraedron incus, C. polymorphum and Polyedriopsis spinulosa 15 16 (Neochloridaceae, Chlorophyta). Phycologia 2001, 40(5):399-402. 17 18 19 41. Caisová L, Marin B, Sausen N, Pröschold T, Melkonian M: Polyphyly of Chaetophora 20 21 and Stigeoclonium within the Chaetophorales (Chlorophyceae), revealed by 22 23 24 sequence comparisons of nuclear-encoded SSU rRNA genes. J Phycol 2011, 25 26 47(1):164-177. 27 28 29 42. Fučíková K, Lewis PO, Lewis LA: Putting incertae sedis taxa in their place: a 30 31 proposal for ten new families and three new genera in Sphaeropleales 32 33 34 (Chlorophyceae, Chlorophyta). J Phycol 2014, 50(1):14-25. 35 36 43. Hoffman LR: Fine structure of Cylindrocapsa zoospores. Protoplasma 1976, 87(1- 37 38 3):191-219. 39 40 41 44. O'Kelly CJ, Floyd GL: Flagellar apparatus absolute orientations and the phylogeny 42 43 of the green algae. Biosystems 1984, 16(3–4):227-251. 44 45 46 45. Andersen RA: Algal culturing techniques. Boston, Mass.: Elsevier/Academic Press; 47 48 2005. 49 50 51 46. Turmel M, Otis C, Lemieux C: Tracing the evolution of streptophyte algae and their 52 53 mitochondrial genome. Genome Biol Evol 2013, 5(10):1817-1835. 54 55 47. Plateforme d’Analyses Génomiques de l’Université Laval 56 57 58 [http://pag.ibis.ulaval.ca/seq/en/index.php] 59 60 61 62 63 64 65 1 29 2 3 4 48. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, 5 6 7 Braverman MS, Chen YJ, Chen ZT et al: Genome sequencing in microfabricated high- 8 9 density picolitre reactors. Nature 2005, 437(7057):376-380. 10 11 12 49. Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. 13 14 Genome Res 1998, 8:195-202. 15 16 50. Innovation Centre of McGill University and Genome Quebec 17 18 19 [http://www.gqinnovationcenter.com/index.aspx] 20 21 51. Boisvert S, Laviolette F, Corbeil J: Ray: simultaneous assembly of reads from a mix of 22 23 24 high-throughput sequencing technologies. J Comput Biol 2010, 17(11):1519-1533. 25 26 52. Pombert JF, Otis C, Lemieux C, Turmel M: The chloroplast genome sequence of the 27 28 29 green alga Pseudendoclonium akinetum (Ulvophyceae) reveals unusual structural 30 31 features and new insights into the branching order of chlorophyte lineages. Mol Biol 32 33 34 Evol 2005, 22(9):1903-1918. 35 36 53. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW: RNAmmer: 37 38 consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007, 39 40 41 35(9):3100-3108. 42 43 54. Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer 44 45 46 RNA genes in genomic sequence. Nucleic Acids Res 1997, 25(5):955-964. 47 48 55. Michel F, Umesono K, Ozeki H: Comparative and functional anatomy of group II 49 50 51 catalytic introns - a review. Gene 1989, 82(1):5-30. 52 53 56. Michel F, Westhof E: Modelling of the three-dimensional architecture of group I 54 55 catalytic introns based on comparative sequence analysis. J Mol Biol 1990, 216:585- 56 57 58 610. 59 60 61 62 63 64 65 1 30 2 3 4 57. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high 5 6 7 throughput. Nucleic Acids Res 2004, 32(5):1792-1797. 8 9 58. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T: trimAl: a tool for automated 10 11 12 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 13 14 25(15):1972-1973. 15 16 59. Smith SA, Dunn CW: Phyutility: a phyloinformatics tool for trees, alignments and 17 18 19 molecular data. Bioinformatics 2008, 24(5):715-716. 20 21 60. Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis 22 23 24 of large phylogenies. Bioinformatics 2014, 30(9):1312-1313. 25 26 61. Lartillot N, Lepage T, Blanquart S: PhyloBayes 3: a Bayesian software package for 27 28 29 phylogenetic reconstruction and molecular dating. Bioinformatics 2009, 25(17):2286- 30 31 2288. 32 33 34 62. Lartillot N, Philippe H: A Bayesian mixture model for across-site heterogeneities in 35 36 the amino-acid replacement process. Mol Biol Evol 2004, 21(6):1095-1109. 37 38 63. Castresana J: Selection of conserved blocks from multiple alignments for their use in 39 40 41 phylogenetic analysis. Mol Biol Evol 2000, 17(4):540-552. 42 43 64. Maddison WP, Maddison DR: Mesquite: a modular system for evolutionary analysis. 44 45 46 Version 3.03. In. http://mesquiteproject.org; 2015. 47 48 65. Lemieux C, Vincent AT, Labarre A, Otis C, Turmel M: Data from: Chloroplast 49 50 51 phylogenomic analysis of chlorophyte green algae identifies a novel sister lineage of 52 53 the Sphaeropleales (Chlorophyceae). In. Dyad Digital Repository 54 55 (http://dx.doi.org/10.5061/dryad.nh149). 56 57 58 59 60 61 62 63 64 65 1 31 2 3 4 66. Culture Collection of Algae at the University of Goettingen [http://www.uni- 5 6 7 goettingen.de/en/45175.html] 8 9 67. The Culture Collection of Algae at The University of Texas at Austin 10 11 12 [http://web.biosci.utexas.edu/utex/default.aspx] 13 14 68. Microbial Culture Collection at the National Institute of Environmental Studies 15 16 [http://mcc.nies.go.jp] 17 18 19 69. Culture Collection of Algae of Charles University in Prague 20 21 [http://botany.natur.cuni.cz/algo/caup-list.html] 22 23 24 70. Škaloud P, Nedbalová L, Elster J, Komárek J: A curious occurrence of Hazenia broadyi 25 26 spec. nova in Antarctica and the review of the genus Hazenia (Ulotrichales, 27 28 29 Chlorophyceae). Polar Biol 2013, 36(9):1281-1291. 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 32 2 3 4 Table 1 Chlorophycean taxa whose chloroplast genomes were sequenced in this study 5 6 Taxa Source a Clade Accession no. b Sequencing 7 method 8 9 Chlorophyceae incertae sedis 10 11 Jenufa minuta CAUP H8102 [GenBank:KT625414] * Roche 454 12 Jenufa perforata CAUP H8101 [GenBank:KT625413] * Illumina 13 Sphaeropleales 14 Bracteacoccus giganteus UTEX 1251 Bracteacoccaceae [GenBank:KT625421] * Roche 454 15 Mychonastes jurisii SAG 37.98 Mychonastaceae [GenBank:KT625411] * Roche 454 16 17 Chlamydomonadales c 18 Golenkinia longispicula SAG 73.80 Golenkinia [GenBank:KT625092 - KT625150] Roche 454 19 Treubaria triappendiculata SAG 38.83 Treubarinia [GenBank:KT625410] * Roche 454 20 Hafniomonas laevis NIES 257 Hafniomonas [GenBank:KT625415] * Roche 454 21 Carteria cerasiformis NIES 425 Crucicarteria [GenBank:KT625420] * Roche 454 22 Carteria crucifera UTEX 432 Crucicarteria [GenBank:KT624870 - KT624932] Roche 454 23 Carteria sp. SAG 8-5 Radicarteria [GenBank:KT625419] * Roche 454 24 Chloromonas typhlos d UTEX LB 1969 Chloromonadinia [GenBank:KT624630 - KT624716] Roche 454 25 Chloromonas radiata UTEX 966 Chloromonadinia [GenBank:KT625008 - KT625084] Roche 454 26 Oogamochlamys gigantea SAG 44.91 Oogamochlamydinia [GenBank:KT625412] * Illumina Lobochlamys segnis SAG 9.83 Oogamochlamydinia [GenBank:KT624806 - KT624869] Roche 454 27 Lobochlamys culleus SAG 19.72 Oogamochlamydinia [GenBank:KT625151 - KT625204] Roche 454 28 Chlamydomonas asymmetrica SAG 70.72 Reinhardtinia [GenBank:KT624933 - KT625007] Roche 454 29 Phacotus lenticularis SAG 61-1 Phacotinia [GenBank:KT625422] * Illumina 30 Microglena monadina e SAG 31.72 Monadinia [GenBank:KT624717 - KT624805] Illumina 31 Characiochloris acuminata SAG 31.95 Characiosiphonia [GenBank:KT625418] * Illumina 32 Chlamydomonas applanata SAG 11-9 Polytominia [GenBank:KT625417] * Roche 454 33 Stephanosphaera pluvialis SAG 78-1a Stephanosphaerinia [GenBank:KT625299 - KT625409] Illumina 34 Chloromonas perforata SAG 11-43 Stephanosphaerinia [GenBank:KT625416] * Illumina 35 Haematococcus lacustris SAG 34-1b Chlorogonia [GenBank:KT625205 - KT625298] Illumina 36 Chlorogonium capillatum UTEX 11 Chlorogonia [GenBank:KT625085 - KT625091] Illumina 37 38 a 39 The taxa originate from the culture collections of algae at the University of Goettingen (SAG, [66]), the University 40 of Texas at Austin (UTEX, [67]), the National Institute of Environmental Studies in Tsukuba (NIES, [68]), and 41 Charles University in Prague (CAUP, [69]). b 42 The GenBank accession number of the chloroplast genome is given for each taxon. The asterisks denote the 43 genomes that were completely sequenced. 44 c The clade designation of the chlamydomonadelean taxa follows the PhyloCode classification scheme of Nakada et 45 al. [16]. 46 d Previously designated as Chlamydomonas nivalis. 47 e Previously designated as Chlamydomonas monadina. 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 33 2 3 4 Figure 1 Phylogeny of 73 core chlorophytes inferred using the PCG-AA data set assembled 5 6 7 from 69 cpDNA-encoded proteins. The tree presented here is the majority-rule posterior 8 9 consensus tree inferred under the CATGTR+Γ4 model. Support values are reported on the nodes: 10 11 12 from left to right, are shown the posterior probability (PP) values for the PhyloBayes 13 14 CATGTR+Γ4 analyses and the BS values for the RAxML GTR+Γ4 analyses. Black dots indicate 15 16 that the corresponding branches received PP values of 1.00 and BS values ≥ 95% in the analyses; 17 18 19 a dash denotes a BS value < 50%. The scale bar denotes the estimated number of amino acid 20 21 substitutions per site. Note that the genus Pseudendoclonium (Ulvophyceae) is polyphyletic and 22 23 24 that P. akinetum (Ulotrichales), a close relative of Trichosarcina species, was wrongly classified 25 26 in this genus [70]. 27 28 29 30 Figure 2 GC/AT-skew plots of the PCG12RNA (A) and PCG123RNA (B) nucleotide data 31 32 sets. The nucleotide skew values were calculated using DAMBE [35]; each point corresponds to 33 34 35 one of the 33 chlorophycean taxa included in the data sets. 36 37 38 Figure 3 Phylogeny of chlorophycean taxa inferred using nucleotide data sets assembled 39 40 41 from 69 protein-coding and 29 RNA-coding genes. The tree presented here is the best-scoring 42 43 ML tree inferred using the PCG12RNA data set under the GTR+Γ4 model. Note that the portion 44 45 of the tree containing the pedinophycean and trebouxiophycean outgroup taxa is not shown (see 46 47 48 Additional file 1 for the complete topology). Support values are reported on the nodes: from left 49 50 to right, are shown the BS values for the analyses of the PCG12RNA and PCG123degenRNA 51 52 53 data sets. Black dots indicate that the corresponding branches received BS values of 100% in the 54 55 two analyses; a dash denotes a BS support lower than 50%. Shaded areas identify the clades that 56 57 58 are well supported in 18S rDNA phylogenies. The scale bar denotes the estimated number of 59 60 nucleotide substitutions per site. 61 62 63 64 65 1 34 2 3 4 Figure 4 Evolution of the flagellar apparatus in the Chlorophyceae. The ancestral states of 5 6 7 the absolute orientation of the flagellar apparatus were reconstructed using Mesquite 3.03 [64]. 8 9 The most parsimonious scenario of character states is shown, with colored lines denoting the 10 11 12 orientation patterns observed for biflagellate and quadriflagellate motile cells within the 13 14 chlorophycean lineages. In the case of the Oedogoniales stephanokonts, the orientation pattern is 15 16 ambiguous. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Figure 1 Click here to download Figure Figure_1.eps Figure 2 Click here to download Figure Figure_2.eps Figure 3 Click here to download Figure Figure_3.eps Figure 4 Click here to download Figure Figure_4.eps