Table S1 Taxa and Sequences Used in the Phylogenetic Analysis Additional File 5: Number of Taxa and Sequences Used in the Phylogenetic Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Table S1 Taxa and sequences used in the phylogenetic analysis Additional file 5: Number of taxa and sequences used in the phylogenetic analysis. Protein Databases Num Taxa Num Seqs Nucleotide Databases Num Taxa Num Seqs Ciliophora 7 66,495 Perkinsea 2 24,884 Alveolate Perkinsea 1 23,654 Chromerida 1 2,856 Apicomplexa 26 76,675 Alveolate Apicomplexa 1 1,844 Stramenopiles 32 171,687 Ciliophora* 22 287,638 Rhizaria 2 22,919 Dinophyceae* 22 614,141 Glaucocystophyceae 1 149 Stramenopiles 35 624,095 Plantae Rhodophyta 11 1,168 Rhizaria 7 9,472 Viridiplantae 167 590,694 Glaucocystophyceae 3 27,409 Haptophyceae 1 39,265 Plantae Rhodophyta 10 48,138 Cryptophyta 4 26,259 Viridiplantae 5 28,565 Euglenozoa 8 52,587 Haptophyceae 5 133,743 Fornicata 1 6,502 Cryptophyta* 3 60,187 Heterolobosea 1 15,801 Heterolobosea 3 37,017 Excavate Jakobida 1 67 Jakobida 5 34,041 Malawimonadidae 1 49 Malawimonadidae 2 12,998 Excavate Parabasalia 1 59,679 Parabasalia 3 30,751 Amoebozoa 11 42,905 Euglenozoa 15 91,788 Choanoflagellida 1 18,399 Fornicata 3 36,304 Opistokont Fungi 113 459,222 Amoebozoa 11 184,511 Metazoa 1,856 759,023 Opistokont Choanoflagellida 3 73,213 Chlorobi 11 27,326 Chloroflexi 10 36,755 *New next-generation assemblies added to the NCBI TSA database Cyanobacteria 28 193,842 Dinophysis acuminata (dinoflagellate) GAIJ01000000; Alexandrium tamraense Group IV CCMP1598 (dinoflagellate) Deferribacteres 1 3,115 GAII01000000; Alexandrium tamarese Group I SPE10-03 Deinococci 5 13,118 (dinoflagellate) GAIK01000000; Alexandrium tamarese Group I Dictyoglomi 2 3,652 (dinoflagellate) GAIV01000000; Alexandrium tamarense Group III Acidobacteria 3 15,845 (dinoflagellate) GAIU01000000; Gymnodinium catenatum GC744 Elusimicrobia 2 2,294 (dinoflagellate) GAIL01000000; Pyrodinium bahamense Fibrobacteres 1 3,081 CCFW293-B5 (dinoflagellate) GAIO01000000; Geminigera Firmicutes 243 932,573 cryophila CCMP2564 (cryptophyte) GAIM01000000; Myrionecta Actinobacteria 161 604,647 rubra CCMP2563 (ciliate) GAIN01000000 Fusobacteria 12 45,397 Gemmatimonadetes 1 3,929 Bacteria Nitrospirae 2 2,044 Planctomycetes 6 36,770 Proteobacteria 526 2,341,088 Spirochaetes 18 39,221 Synergistetes 5 11,122 Tenericutes 29 20,021 Thermotogae 12 22,392 Thermus 3 5,258 Verrucomicrobia 8 38,409 Aquificae 9 18,801 Bacteria-other 7 19,118 Bacteroidetes 77 316,572 Chlamydiae 9 11,232 Archaea 84 174,912 Virus 2,372 66,922 Other 39 1,031 TABLE S3 Summary of KEGG annotated sequences in the A. tamarense Group IV transcriptome and 16 other eukaryotic genomes. Species Total sequences Genes annotated KOs Dinoflagellate A. tamarense Group IV 85,163 5,347 2,959 Apicomplexans B. bovis 3,706 1,085 998 C. parvum 3,805 1,002 950 P. falciparum 5,337 1,460 1,090 T. annulata 3,795 1,037 941 T. gondii 7,987 1,414 1,244 Ciliates P. tetraurelia 39,580 3,252 1,471 T. thermophila 24,770 2,439 1,532 Stramenopiles A. anophagefferens 11,501 3,075 2,195 E. siliculosus 16,580 3,404 2,651 P. tricornutum 10,402 2,194 1,818 T. pseudonana 11,776 2,381 1,899 Haptophyte E. huxleyi 39,125 5,060 2,705 Cryptophyte G. theta 24,840 3,768 2,749 Green Algae M. pusilla 10,672 2,824 2,356 O. lucimarinus 7,651 2,392 2,014 O. tauri 7,725 2,193 1,902 Total sequences: Total number of gene sequences per genome. Genes annotated: Number of genes with a KAAS assigned KEGG annotation per genome. KOs: Total number of unique KOs assigned per genome. SUPPLEMENTAL FIGURES Supplementary Figure S1: Rarefaction curve assessment of de novo transcriptome assembly quality. Seven single kmer assemblies (23-47) were compared to a multi kmer merged assembly (orange). The merged assembly maximized assembly length (number of basepairs) while minimizing the number of contigs and Oases loci. Supplementary Figure S2: Distribution of dinoflagellate-prokaryote associations recovered in the phylogenomic analysis. The number of gene trees that support dinoflagellate sequences grouping with each prokaryotic phyla is given in parenthesis. a Bacteria, the sister clade to dinoflagellates contained more than one bacterial phyla. b Archaea, the sister subtree contained one or more archaeal phyla. cOther Bacteria, the sister clade to dinoflagellates consisted of bacteria not assigned to a bacterial phyla. Supplementary Figure S3: Distribution of KEGG pathways in A. tamarense sequence sets. Supplementary Figure S4: Color-coded KEGG pathway maps. Maps reflecting gene gain/loss transitions of A. tamarense, apicomplexans, and ciliates relative to the alveolate common ancestor using Dollo Parsimony. Genes that have been retained from the alveolate ancestor in one of the three major lineages (e.g. retained in A. tamarense) have been necessarily lost in the other two (e.g. lost in ciliates and apicomplexans). A) Photosynthesis, ko00195 B) Arginine and proline metabolism, ko00330 C) Phenylalanine, tyrosine, and tryptophan biosynthesis, ko00400 D) Histidine metabolism, ko00340 E) Fatty acid biosynthesis, ko00061 F) Fatty acid metabolism, ko00071 G) Oxidative phosphorylation, ko00190 Supplementary Figure S5: Phylogenetic trees of A. tamarense proteins. Taxa names are formatted as follows, Group-Genus_species_GeneID. Dinoflagellate taxa are bolded. Branch numbers indicate bootstrap support. When bootstrap numbers are <= 1, the tree was taken directly from the automated phylogenetic pipeline. Pipeline trees were constructed using fasttree with 1000 bootstrap replicates. When bootstrap numbers are >=50, the taxa and alignments were manually curated and the tree was built using RAxML with 100 bootstrap replicates. Numbers in parenthesis to the right of clade names indicate the number of taxa within the collapsed node. Trees S4B, S4F, S4H, S4I, S4L, S4M are rooted on bacteria; all other trees are midpoint rooted. A) Alternative NADH dehydrogenase. B) FAD-dependent glycerol-3-phosphate dehydrogenase. C) Malate:quinone oxidoreductase. D) Pyruvate dehydrogenase, AceE E) Pyruvate:NADP+oxidoreductase, PNO (N-terminal domain) F) Pyruvate:NADP+oxidoreductase, PNO (C-terminal domain) G) Isocitrate dehydrogenase form 1 H) Isocitrate dehydrogenase form 2 I) Fumarase class I J) Fumarase class II K) Valine:pyruvate aminotransferase L) Branched-chain amino acid aminotransferase M) Molecular chaperone, grpE (alveolate-like) N) Molecular chaperone, grpE (algal-like) O) Molecular chaperone, grpE (bacterial-like) P) Phosphatidylserine synthase Q) Citrate lyase Supplementary Figure S1: Rarefaction curve assessment of de novo transcriptome assembly quality. Supplementary Figure S2: Distribution of dinoflagellate-prokaryote associations recovered in the phylogenomic analysis. !"#$%#&'($%")'*+,,-*Proteobacteria (55) a ./&'($%")'*+01-*Bacteria (27) 2)"3)(/$%4*+5,-*Firmicutes (15) 6(78#&'($%")'*+9-*Actinobacteria (8) :%""/(#3)("#&)'*+;-*Verrucomicrobia (6) <'($%"#)=%$%4*+,-*Bacteriodetes (5) >?@#"#A%B)*+C-*Chloroblexi (3) 6"(?'%'*+5-*Archaea (1)b c <'($%")'*+5-*Other Bacteria (1) 2/4#&'($%")'*+5-*Fusobacteria (1) !@'8($#3D(%$%4*+5-*Planctomycetes (1) E%8%")(/$%4*+5-*Tenericutes (1) E?%"3#$#F'%*+5-*Thermotogae (1) Supplementary Figure S3: Distribution of KEGG pathways in A. tamarense sequence sets. All A. tamarense KOs KOs unique to A. tamarense 1 1 2 2 3 3 4 4 5 5 6 6 22 7 20 21 7 8 8 9 17 19 18 16 9 13 10 12 10 16 15 11 17 20 11 13 14 12 22 1415 21 18 19 A. tamarense KOs missing KOs of putative bacterial in other alveolates origin in A. tamarense 1 1 2 3 20 4 2 17 5 14 6 10 13 7 20 22 3 9 11 8 21 9 17 4 8 11 12 13 19 5 6 18 16 10 14 15 7 Metabolism 1 - Carbohydrate Metabolism 2 - Amino Acid Metabolism Genetic information 3 - Lipid Metabolism processing 4 - Energy Metabolism 5 - Nucleotide Metabolism Cellular processes 6 - Metabolism of Cofactors and Vitamins 7 - Glycan Biosynthesis and Metabolism Environmental information 8 - Xenobiotics Biodegradation and Metabolism processing 9 - Metabolism of Other Amino Acids 10 - Metabolism of Terpenoids and Polyketides Not mapped to pathway 11 - Biosynthesis of Other Secondary Metabolites 12 - Translation 13 - Folding, Sorting and Degradation 14 - Replication and Repair 15 - Transcription 16 - Cell Growth and Death 17 - Transport and Catabolism 18 - Cell Communication 19 - Cell Motility 20 - Signal Transduction 21 - Membrane Transport 22 - Signaling Molecules and Interaction Supplementary Figure S4A – Photosynthesis (ko00195) A. tamarense A. tamarense Lost in ciliates Gained in Retained in Supplementary Figure S4B – Arginine and proline metabolism (ko00330) A. tamarense A. tamarense Lost in ciliates In all alveolates Retained in ciliates Lost in apicomplexans Gained in Retained in Supplementary Figure S4C – Phenylalanine, tyrosine, and tryptophan biosynthesis (ko00400) Gained in A. tamarense Lost in ciliates Retained in ciliates Retained in A. tamarense Lost in apicomplexans In all alveolates Supplementary Figure S4D – Histidine metabolism (ko00340) Retained in ciliates Lost in ciliates Lost in apicomplexans A. tamarense A. tamarense Gained in Retained in Supplementary Figure S4E – Fatty acid biosynthesis (ko00061) Lost in ciliates In all alveolates A. tamarense A. tamarense Gained in Retained in Supplementary Figure S4F – Fatty acid metabolism (ko00071) In all alveolates Lost in apicomplexans A. tamarense A. tamarense Gained in Retained in Supplementary Figure S4G – Oxidative phosphorylation (ko00190) A. tamarense In all alveolates Retained in ciliates Lost in Gained in DALCA Lost in ciliates A. tamarense A. tamarense Gained in Retained in Supplementary Figure S5A – Alternative NADH dehydrogenase 75 Viridiplantae (19) 94 95 Euglenozoa−Leishmania_major_157877617