bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Mitochondrial targeting of glycolysis in a major lineage of .

2

3 Carolina Río Bártulos1,9, Matthew B. Rogers2$, Tom A. Williams3, Eleni Gentekaki4&, Henner 4 Brinkmann5%, Rüdiger Cerff1, Marie-Françoise Liaud1, Adrian B. Hehl7, Nigel R. Yarlett8, Ansgar 5 Gruber9, Peter G. Kroth9*, Mark van der Giezen2*

6

7 1Institute of Genetics, University of Braunschweig, Spielmannstr. 7, D-38106 Braunschweig, Germany. 8 2Biosciences, University of Exeter, Stocker Road, Exeter EX4 4QD, UK. 3 School of Biological Sciences, 9 University of Bristol, BS8 1TH, United . 4Dalhousie University, Department of Biochemistry 10 and Molecular Biology, Halifax, Canada, B3H 1X5. 5Département de Biochimie, Université de 11 Montréal C.P. 6128, Montréal, Quebec, Canada. 7Institute of Parasitology, University of Zürich, 12 Switzerland. 8Department of Chemistry and Physical Sciences, Pace University, New York, NY 10038, 13 USA. 9Fachbereich Biologie, Universität Konstanz, 78457 Konstanz, Germany. *Corresponding 14 authors.

15 $Current address: Rangos Research Center, University of Pittsburgh, Children's Hospital, Pittsburgh, 16 PA 15224, U.S.A.

17 &Current address: School of Science and Human Gut Microbiome for Health Research Unit, Mae Fah 18 Luang University, Chiang Rai 57100, Thailand.

19 %Current address: Henner?

20

21

22 Glycolysis is a major cytosolic catabolic pathway that provides ATP for many organisms1. 23 Mitochondria play an even more important role in the provision of additional cellular ATP for 24 eukaryotes2. Here, we show that in many , the C3 part of glycolysis is localised in 25 mitochondria. We discovered genuine mitochondrial targeting signals on the six last enzymes of 26 glycolysis. These targeting signals are recognised and sufficient to import GFP into mitochondria of 27 a heterologous host. Analysis of eukaryotic genomes identified these targeting signals on many 28 glycolytic C3 enzymes in a large group of eukaryotes found in the SAR supergroup3, in particular 29 the stramenopiles. Stramenopiles, or , are a large group of ecologically important 30 eukaryotes that includes multi- and unicellular such as and , but also 31 economically important pathogens such as Phytophthora infestans. Confocal 32 immunomicroscopy confirmed the mitochondrial location of glycolytic enzymes for the human 33 parasite . Enzyme assays on cellular fractions confirmed the presence of the C3 part of 34 glycolysis in Blastocystis mitochondria. These activities are sensitive to treatment with proteases bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

35 and Triton X-100 but not proteases alone. Our work clearly shows that core cellular metabolism is 36 more plastic than previously imagined and suggests new strategies to combat 37 pathogens such as the causative agent of late potato blight, P. infestans.

38 Mitochondria provide the bulk of cellular ATP for eukaryotes by means of regenerating 39 reduced NAD via the electron transport chain and oxidative phosphorylation2. In addition, 40 mitochondria are essential for the production of iron-sulfur clusters4, and play roles in heme 41 synthesis, fatty acid and amino acid metabolism5. Cytosolic pyruvate is decarboxylated by 42 mitochondrial pyruvate dehydrogenase into acetyl-CoA which enters the citric acid cycle, 43 subsequently producing one GTP (or ATP) and precursors for several anabolic pathways. More 44 importantly, the reduction of NAD+ to NADH and production of succinate power the electron 45 transport pathway and oxidative phosphorylation, being responsible for the majority of cellular ATP 46 synthesis. The pyruvate is produced by glycolysis, a widespread cytosolic pathway that converts the 47 six-carbon sugar glucose via a series of ten reactions into the three-carbon sugar pyruvate. Glycolysis 48 is nearly universally present in the cytosol of most eukaryotes but also found in specialised 49 microbodies known as glycosomes originally described in trypanosomatids6. More recently, two 50 glycolytic enzymes were also found to be targeted to peroxisomes in fungi due to post- 51 transcriptional processes7.

52 When analysing the genome of the intestinal parasite Blastocystis8, we discovered putative 53 mitochondrial targeting signals on phosphoglycerate kinase (PGK) and on a fusion protein of triose 54 phosphate isomerase (TPI) and glyceraldehyde phosphate dehydrogenase (GAPDH). The amino- 55 terminal sequences conform to typical mitochondrial targeting signals9 and are easily predicted by 56 programmes such as MitoProt10. Analysis of the Blastocystis TPI-GAPDH and PGK sequences predicts 57 a mitochondrial localisation with high probabilities (P 0.99 and 0.97, respectively). The predicted 58 cleavage sites coincide with the start of the cytosolic enzymes from other organisms (Supplementary 59 Fig. S1A) suggesting that these amino-terminal sequences might target both proteins to the 60 mitochondrial organelle in this parasite11. We confirmed the functionality and sufficiency of these 61 putative targeting signals by targeting GFP fused to these signals to mitochondria of a heterologous 62 stramenopile host (Supplementary Fig. S2). Homologous antibodies were raised against Blastocystis 63 TPI-GAPDH and PGK to test whether these proteins show an organellar localisation. 64 Compartmentalised distribution of both TPI-GAPDH and PGK was clearly demonstrated using 65 confocal microscopy and 3-dimensional rendering of optical sections (Fig. 1 A-D). Both proteins co- 66 localise with the mitochondrial marker dye MitoTracker and in addition with DAPI which labels the 67 organellar genomic DNA11,12 (Fig. 1 E-I). bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

68 The unexpected mitochondrial localisation of three glycolytic enzymes in Blastocystis prompted 69 the analysis of all glycolytic enzymes in this intestinal parasite. Interestingly, targeting signals were 70 only observed on the enzymes of the pay-off phase of glycolysis but not the investment phase (Fig. 71 2). Although three-dimensional reconstruction of our confocal microscopy data strongly indicated 72 that these enzymes are indeed localised inside Blastocystis mitochondria (Fig. 1), we decided to 73 confirm these findings using classical enzyme assays following cellular fractionation. These assays 74 clearly showed that five C3 enzymes are found in the mitochondrial pellet while the five upstream 75 enzymes are all confined to the soluble fraction (Table 1). To assess whether the putative 76 mitochondrial enzymes were only laterally attached to the organelles, as in the case of hexokinase to 77 VDAC in tumours1, we tested the latency of enzymatic activities in the presence or absence of Triton 78 X-100. The increase of measurable activity of the C3 enzymes (not shown) suggests they are retained 79 within a membranous compartment. The addition of proteolytic enzymes only affected the 80 measured activity in the presence of the detergent Triton X-100 (Supplementary Table S1), clearly 81 demonstrating that the five C3 glycolytic enzymes in Blastocystis are protected by a membrane and 82 reside inside the mitochondria and not on the outside of the organelle, as observed in certain 83 tumours1 or as in some proteomics studies13,14.

84 As some of us previously reported the mitochondrial localisation of the TPI-GAPDH fusion 85 protein in a related stramenopile15, we wondered whether mitochondrial targeting of glycolytic 86 enzymes is more widespread in this group of organisms. When querying all available stramenopile 87 genomes, we noticed the widespread presence of mitochondrial targeting signals on glycolytic 88 enzymes within the whole group. Here, as with Blastocystis, only enzymes of the C3 part of glycolysis 89 seem to contain mitochondrial targeting signals (Fig. 2 and supplementary Fig. S1B). To test for 90 functionality, all mitochondrial targeting signals from Phaeodactylum C3 glycolytic enzymes were 91 fused to GFP and their cellular location was determined (Supplementary Fig. S2). As with 92 Blastocystis, all constructs were targeted to the mitochondrion suggesting these are genuine 93 mitochondrial targeting signals in vivo. In addition, we tested mitochondrial targeting signals found 94 on glycolytic enzymes of the oomycete pathogen Phytophthora infestans, the water mould Achlya 95 bisexualis and the multicellular brown alga Saccharina latissima, commonly known as kelp 96 (Supplementary Fig. S2). In all cases, these targeting signals targeted GFP into mitochondria. 97 However, some organisms also contain non-targeting signal bearing glycolytic enzymes suggesting 98 that these cells likely have a branched glycolysis (see Supplementary Fig. S4 for P. tricornutum).

99 Previously, some bioinformatics studies16,17 had hinted at the possible mitochondrial location of 100 several glycolytic enzymes. Here, using molecular, biochemical and cell biological methods, we bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

101 clearly demonstrate the mitochondrial location of glycolytic enzymes of the pay-off phase of 102 glycolysis in a major group of eukaryotes comprising both microbial and multicellular forms. The 103 mitochondrial proteome has a complex and contested evolutionary past18,19, and we wondered if 104 glycolytic enzymes targeted to mitochondria might have different evolutionary origins than those 105 that operate in the cytosol. Phylogenetic analysis of all glycolytic enzymes provided no support for 106 this hypothesis because stramenopile glycolytic enzymes cluster with the cytosolic forms of other 107 eukaryotes in phylogenetic trees (Supplementary Figure S3 A-F). This result suggests that the 108 canonical, cytosolic enzymes of glycolysis were targeted to the mitochondrion during stramenopile 109 evolution.

110 It is difficult to conclusively determine the selective rationale, if any, for the retargeting of 111 glycolysis to stramenopile mitochondria. In Blastocystis, and similar to many parasitic eukaryotes20, 112 two key glycolytic enzymes have been replaced by pyrophosphate using versions. Normally, the 113 reactions catalysed by phosphofructokinase and pyruvate kinase are virtually irreversible. However, 114 the reactions performed by diphosphate-fructose-6-phosphate 1-phosphotransferase and 115 phosphoenolpyruvate synthase (pyruvate, water dikinase) are reversible, due to the smaller free- 116 energy change in the reaction. As Blastocystis is an anaerobe and does not contain normal 117 mitochondrial oxidative phosphorylation11, any ATP not invested during glycolysis might be a 118 selective advantage. However, in the absence of these irreversible control points there is a risk of 119 uncontrolled glycolytic oscillations21. Separating the investment phase from the pay-off phase by the 120 mitochondrial membrane might therefore prevent futile cycling. However, as not all stramenopiles 121 use pyrophosphate enzymes, this cannot be the whole explanation.

122 The end-product of glycolysis, pyruvate, is transported into mitochondria via a specific 123 mitochondrial transporter that has only recently been identified22 and that is absent from the 124 Blastocystis genome8. The translocation of the C3 part of glycolysis into mitochondria would 125 necessitate a novel transporter (presumably for triose phosphates). The identification and 126 characterisation of such a transporter would open up new possible drug targets against important 127 pathogens. Examples include Phytophthora infestans, the causative agent of late potato blight, 128 which has a devastating effect on food security, but also fish parasites such as Saprolegnia parasitica 129 and Aphanomyces invadans. Both have serious consequences for aquaculture and the latter causes 130 epizootic ulcerative syndrome, an OIE listed disease23,24. Our recent genome analysis of Blastocystis 131 identified several putative candidate transporters lacking clear homology to non-stramenopile 132 organisms8. Such a unique transporter would not be present in the host (including humans) and bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

133 could be exploited to prevent, or control, disease outbreaks that currently affect food production 134 while the world population continuous to increase25.

135

136 Acknowledgements. The authors wish to thank Professors John F. Allen and Nick Lane (both UCL, 137 UK) for fruitful discussions and criticism. Furthermore, we want to thank Ulrike Brand (TU-BS) and 138 Doris Ballert (Uni KN) for technical assistance. TAW is supported by a Royal Society University 139 Research Fellowship. Work in the lab of MvdG was supported by Wellcome Trust grant 140 078566/A/05/Z. PGK wishes to acknowledge support by the German Research Foundation (DFG, 141 grant KR 1661/6-1), the Gordon and Betty Moore Foundation GBMF 4966 (grant DiaEdit), and the 142 BioImaging Center of the University of Konstanz as well as Professor Mendel and his group (TU-BS) 143 for using their equipment.

144

145 1 Lunt, S. Y. & Vander Heiden, M. G. Aerobic glycolysis: meeting the metabolic requirements of 146 cell proliferation. Annu Rev Cell Dev Biol 27, 441-464, doi:10.1146/annurev-cellbio-092910- 147 154237 (2011). 148 2 Müller, M. et al. Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. 149 Microbiol Mol Biol Rev 76, 444-495 (2012). 150 3 Adl, S. M. et al. The revised classification of eukaryotes. J Eukaryot Microbiol 59, 429-493, 151 doi:10.1111/j.1550-7408.2012.00644.x (2012). 152 4 Lill, R. et al. The essential role of mitochondria in the biogenesis of cellular iron-sulfur 153 proteins. Biol. Chem. 380, 1157-1166 (1999). 154 5 Scheffler, I. E. Mitochondria. 2nd edition. 2nd edn, (J. Wiley and Sons, Inc., 2008). 155 6 Opperdoes, F. R. & Borst, P. Localization of nine glycolytic enzymes in a microbody-like 156 organelle in Trypanosoma brucei: the glycosome. FEBS Lett 80, 360-364 (1980). 157 7 Freitag, J., Ast, J. & Bolker, M. Cryptic peroxisomal targeting via alternative splicing and stop 158 codon read-through in fungi. Nature 485, 522-525 (2012). 159 8 Gentekaki, E. et al. Extreme genome diversity in the hyper-prevalent parasitic 160 Blastocystis. PLoS Biol 15, e2003769 (2017). 161 9 Neupert, W. & Herrmann, J. M. Translocation of proteins into mitochondria. Annu Rev 162 Biochem 76, 723-749 (2007). 163 10 Claros, M. G. & Vincens, P. Computational method to predict mitochondrially imported 164 proteins and their targeting sequences. Eur. J. Biochem. 241, 779-786 (1996). 165 11 Stechmann, A. et al. Organelles in Blastocystis that blur the distinction between 166 mitochondria and . Curr Biol 18, 580-585 (2008). 167 12 Pérez-Brocal, V. & Clark, C. G. Analysis of two genomes form the mitochondrion-like 168 organelle of the intestinal parasite Blastocystis: complete sequences, gene content and 169 genome organization. Mol. Biol. Evol. 25, 2475-2482 (2008). 170 13 Smith, D. G. et al. Exploring the mitochondrial proteome of the protozoon 171 Tetrahymena thermophila: direct analysis by tandem mass spectrometry. J Mol Biol 374, 172 837-863 (2007). 173 14 Giege, P. et al. Enzymes of glycolysis are functionally associated with the mitochondrion in 174 Arabidopsis cells. Plant Cell 15, 2140-2151 (2003). bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

175 15 Liaud, M. F., Lichtle, C., Apt, K., Martin, W. & Cerff, R. Compartment-specific isoforms of TPI 176 and GAPDH are imported into mitochondria as a fusion protein: evidence in favor of 177 a mitochondrial origin of the eukaryotic glycolytic pathway. Mol Biol Evol 17, 213-223 (2000). 178 16 Kroth, P. G. et al. A model for carbohydrate metabolism in the diatom Phaeodactylum 179 tricornutum deduced from comparative whole genome analysis. PLoS ONE 3, e1426, 180 doi:10.1371/journal.pone.0001426 (2008). 181 17 Nakayama, T., Ishida, K. & Archibald, J. M. Broad distribution of TPI-GAPDH fusion proteins 182 among eukaryotes: evidence for glycolytic reactions in the mitochondrion? PLoS ONE 7, 183 e52340 (2012). 184 18 Pittis, A. A. & Gabaldon, T. Late acquisition of mitochondria by a host with chimaeric 185 prokaryotic ancestry. Nature 531, 101-104, doi:10.1038/nature16941 (2016). 186 19 Ku, C. et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature 524, 427- 187 432, doi:10.1038/nature14963 (2015). 188 20 Mertens, E. ATP versus pyrophosphate: glycolysis revisited in parasitic . Parasitol 189 Today 9, 122-126 (1993). 190 21 Chandra, F. A., Buzi, G. & Doyle, J. C. Glycolytic oscillations and limits on robust efficiency. 191 Science 333, 187-192, doi:10.1126/science.1200705 (2011). 192 22 Herzig, S. et al. Identification and functional expression of the mitochondrial pyruvate 193 carrier. Science 337, 93-96 (2012). 194 23 Jiang, R. H. & Tyler, B. M. Mechanisms and evolution of virulence in . Annu Rev 195 Phytopathol 50, 295-318, doi:10.1146/annurev-phyto-081211-172912 (2012). 196 24 Stentiford, G. D., Feist, S. W., Stone, D. M., Peeler, E. J. & Bass, D. Policy, phylogeny, and the 197 parasite. Trends Parasitol 30, 274-281, doi:10.1016/j.pt.2014.04.004 (2014). 198 25 FAO. How to feed the world in 2050. (Food and Agriculture Organization of the United 199 Nations (FAO), 2009).

200 201 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

202 Tables and Figures 203 204 205 Table 1. Pay-off phase glycolytic enzymes in Blastocystis are found in the pellet. Activities of 206 glycolytic enzymes from whole cell free extracts (c.f.e.) of Blastocystis suspended in phosphate 207 buffered isotonic sucrose solution (pH 7.2). Cells were mixed at a ratio of two volumes of cells: three 208 volumes of 0.5 mm glass beads and broken by three shakes of one minute each at maximum speed 209 on a bead beater (VWR mini bead mill homogenizer (Atlanta, GA, USA)). Cell-free extracts were 210 subjected to increasing centrifugal force producing nuclear, mitochondrial (pellet), lysosomal and

211 cytosolic (supernatant) fractions at 1,912 RCFav for 5 min, 6,723 RCFav for 15 min, 26,892 RCFav for 30 212 min, respectively. Enzyme activities are the average of three determinations + SD. *1 enzyme unit 213 (EU) is the amount of enzyme that converts 1 µmole substrate to product per minute. The yellow 214 box indicates the site of major activity (or in the case of triosephosphate isomerase, the dual 215 localization). 216 217 Figure 1. The glycolytic enzymes TPI-GAPDH and PGK localize to mitochondria in the human parasite 218 Blastocystis. Three-dimensional immunoconfocal microscopy reconstruction of optical sections 219 (volume rendering) showing representative subcellular localization of PGK (blue) and TPI (red) in 220 trophozoites (A-D). PGK (A) and TPI (B) volume signals show distinct distributions, consistent with 221 localization within mitochondria, with considerable overlap. The merged image (C) provides a 222 qualitative and the scatterplot (inset) of a quantitative measure of signal overlap. Co-localisation of 223 MitoTracker (red) and PGK (green) and DAPI (blue) in trophozoites (E-I). Merged images 224 MitoTracker/DAPI (E) PGK/DAPI (F) and TPI (G) and all three markers together (H) show considerable 225 overlap. Scatterplots (inset) give a quantitative measure of signal overlap for each merged pair of 226 markers (E-G). The DAPI signals (blue) representing nuclear DNA are indicated by asterisks (E). Scale 227 bar 3 µm (A-D) or 2 µm (E-I).

228 229 Figure 2. Stramenopile glycolytic enzymes contain mitochondrial-like amino-terminal targeting 230 sequences. Representative stramenopiles with whole genome data known are shown. Presence of 231 mitochondrial-like targeting signal is shown with a filled circle while open circle indicates no 232 mitochondrial-like targeting signal. Where multiple isoforms with and without targeting signal exist, 233 a half-filled circle is shown. bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

hexokinase 9.3 ± 2.6 21.1 ± 3.6 2.3 ± 0.6 phosphoglucose isomerase 3.6 ± 0.8 5.8 ± 1.2 1.7 ± 0.4 (pyrophosphate-dependent) phosphofructokinase 11.2 ± 1.8 27.1 ± 3.2 2.4 ± 1.6 fructose bisphosphate aldolase 3.8 ± 1.3 10.9 ± 1.3 0.62 ± 0.3 triosephosphate isomerase 18.3 ± 3.1 36 ± 5.1 18.0 ± 2.8 glyceraldehyde phosphate dehydrogenase 9.2 ± 1.4 0.5 ± 0.1 54.7 ± 5.8 phosphoglycerate kinase 9.8 ± 1.7 5.3 ± 1.3 36.2 ± 4.1 phosphoglycerate mutase 1.2 ± 0.5 0.04 ± 0.01 2.6 ± 0.7 enolase 0.42 ± 0.1 0.37 ± 0.07 1.6 ± 0.4 pyruvate kinase/phosphoenolpyruvate synthase 2.8 ± 0.7 3.3 ± 0.10 15.4 ± 2.3

Table 1 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

B E F A * *

*

C D G H

I

Figure 1 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

hexokinase phosphoglucose isomerase (pyrophosphate-dependent) phosphofructokinase fructose bisphosphate aldolase triosephosphate isomerase glyceraldehyde phosphate dehydrogenase phosphoglycerate kinase phosphoglycerate mutase enolase pyruvate kinase/phosphoenolpyruvate synthase

Figure 2 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Supplementary Tables and Figures 2 3 4 Supplementary Table 1. Blastocystis glycolytic enzymes are protected by a membrane. Control: 5 Mitochondrial fractions incubated without proteolytic enzymes. Protease: Mitochondria incubated 6 in 225 mM sucrose buffer at 25 oC containing 500 U bovine pancreas trypsin, 10 U papaya latex 7 papain and 250 U porcine pepsin for 15 minutes. Protease + Triton: Mitochondrial fractions 8 containing proteolytic enzymes and 1% Triton X-100 incubated for 15 min at 25 oC. Samples were 9 centrifuged (14,000 g) for 2 min and resuspended in fresh sucrose buffer without proteolytic 10 enzymes prior to assay. 11 12 Figure S1. Stramenopile glycolytic C3 enzymes contain amino-terminal targeting signals. A. 13 Comparison of Blastocystis amino-terminal sequences for TPI-GAPDH, PGK, PMG, and enolase with 14 homologs from showing the mitochondrial-like targeting signals. B. Phaeodactylum 15 tricornutum glycolytic C3 enzyme amino-termini of TPI-GAPDH, PGK, PMG, enolase, and pyruvate 16 kinase compared to yeast homologs demonstrate mitochondrial-like targeting signals. 17 18 Figure S2. Stramenopile glycolytic enzyme amino-terminal mitochondrial-like targeting signals are 19 sufficient to target GFP to mitochondria in the diatom Phaeodactylum tricornutum. A. The 20 Blastocystis glycolytic enzymes TPI-GAPDH and PGK contain amino-terminal targeting signals that 21 can target GFP to P. tricornutum mitochondria. B. Amino-terminal extensions on TPI-GAPDH, PGK, 22 phosphoglycerate mutase (PGM), enolase and pyruvate kinase (PK) from the diatom P. tricornutum 23 were cloned in front of GFP and constructs used to transform P. tricornutum. C. Amino terminal 24 extensions on TPI-GAPDH and PGM from Phytophthora infestans, PK from Achlya bisexualis and TPI- 25 GAPDH from Saccharina latissima were used as above to test for functionality of targeting 26 information in P. tricornutum. DIC, Differential interference contrast microscopy. Chl, Chlorophyll a 27 autofluorescence. GFP, Green fluorescent protein. Chl+GFP, Merged imaged showing the discrete 28 (mitochondrial) localization of GFP. MitoTraker, MitoTraker Orange stain. MitoTraker+GFP, Merged 29 image show considerable overlap of MitoTraker stain and GFP fluorescence. For the corresponding 30 amino acid sequences used for GFP targeting, see Supplementary File 1. Scale bar 5 µm. 31 32 Figure S3. Phylogenetic analysis of glycolytic enzymes of the pay-off phase. A. Triosephosphate 33 isomerase (TPI), 77 sequences and 167 amino acid positions were used to calculate the tree. 34 Bacterial sequences were used as outgroup. B. Glyceraldehyde-3-phosphate dehydrogenase bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

35 (GAPDH), only GAPDH from the C –type are illustrated in the tree. 96 sequences and 266 amino acid 36 positions were used to calculate the tree. One branch caused long branch attraction (LBA) artifact 37 and therefore is shortened in the figure. C. Phosphoglycerate kinase (PGK), 66 sequences and 360 38 amino acid positions were used to calculate the tree. D. Phosphoglycerate mutase (PGM), 96 39 sequences and 199 amino acid positions were used to calculate the tree. E. Enolase (ENO), 80 40 sequences and 345 amino acid positions were used to calculate the tree. F. Pyruvate kinase (PK), 80 41 sequences and 287 amino acid positions were used to calculate the tree. Values at nodes: posterior 42 probabilities (P.P. >0.5) / rapid bootstrap values (BS>30%). Species name in bold = localization 43 experimental proof (arrow = in this study, star Liaud et. al (2000)). TargetP analysis: M = mTP = 44 mitochondria (bold indicates scores > 0.700), O = other, SP = signal peptide, C = cTP = chloroplast 45 only for Viridiplantae, Rhodophyta and Glaucocystophyceae plant results was taken if non-plant 46 results differ. N.D.: sequences not analyzed; not complete at N-terminus or start methionine is 47 missing (proof by an alignment). Colour code: Viridiplantae, Stramenopiles, Alveolata, Rhizaria, 48 Rhodophyta, Cryptophyta, Haptophyceae, Bacteria, other Eukaryota, Archaea, Cyanobacteria, 49 Euglenozoa. 50 51 Figure S4. Phaeodactylum tricornutum contains, similar to some other stramenopiles, multiple 52 isoforms for the C3 part of glycolysis. The localization for all isoforms was tested via GFP-fusion 53 constructs. A “pre” suffix means that the predicted targeting signal was used; if the suffix is missing 54 the full length of the respective sequence was fused to GFP. The number is the JGI Protein ID and the 55 result of each localization is mentioned. For the corresponding amino acid sequences used for GFP 56 targeting see Supplementary File 2. A star (*) marks images were a maximum intensity projection 57 from a Z-Stack was used. Unclear indicates localization not possible to identify. TPI = 58 Triosephosphate isomerase, GAPDH = Glyceraldehyde-3-phosphate dehydrogenase, PGK = 59 Phosphoglycerate kinase, PGM = Phosphoglycerate mutase, ENO = Enolase, PK = Pyruvate kinase. 60 DIC, Differential interference contrast microscopy. Chl, Chlorophyll a autofluorescence. GFP, Green 61 fluorescent protein. Chl+GFP, Merged imaged showing the discrete localization of GFP compared to 62 Chlorophyll autoflourescence. For the corresponding amino acid sequences used for GFP targeting, 63 see Supplementary File 2. Scale bar 5 µm. 64 65 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

glyceraldehyde phosphate dehydrogenase 71 + 14 61 + 9.1 15 + 4.8 phosphoglycerate kinase 36 + 8.5 33 + 4.5 6.3 + 2.2 phosphoglycerate mutase 4.0 + 1.2 3.1 + 0.5 0.2 + 0.1 enolase 1.8 + 0.9 1.4 + 1.0 0.3 + 0.1 phosphoenolpyruvate synthase 28 + 5.1 25 + 4.9 7.6 + 2.0

(1 µmol substrate to product per minute)

Supplementary Table S1 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A

B

Supplementary Figure S1 DIC Chl GFP Chl + GFP MitoTraker MitoTraker + GFP A preTPI-GAPDH-GFP (Blastocystis)

prePGK-GFP (Blastocystis)

B preTPI-GAPDH-GFP (Phaeodactylum)

prePGK-GFP (Phaeodactylum)

bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

prePGM-GFP (Phaeodactylum)

preENO-GFP (Phaeodactylum)

prePK-GFP (Phaeodactylum)

C preTPI-GAPDH-GFP (Phytophthora infestans)

prePGM-GFP (Phytophthora infestans)

prePK-GFP (Achlya bisexualis)

preTPI-GAPDH-GFP (Saccharina latissima)

Supplementary Figure S2 A. TPI 1 Pseudonitzschia mul�series N.D. 0.99/95 0.78 Thalassiosira pseudonana O 0.666 Phaeodactylum tricornutum O 0.864 Aureococcus anophagefferens O 0.813 cytosol Bigelowiella natans M 0.802 S 0.688 0.71 Naegleria gruberi O 0.901 Acrasis rosea N.D. 0.61 1/100 Blastocys�s hominis O 0.663 Blastocys�s sp. NandII O 0.808 Auran�ochytrium limacinum N.D. 0.56/- Chromera velia N.D. Ectocarpus siliculosusO 0.737 0.99/94 Albugo laibachii O 0.770 Phytophthora sojae O 0.803 Perkinsus marinus O 0.773 0.68 Symbiodinium sp. N.D. 0.57 0.66 Symbiodinium sp. O 0.711 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. O 0.797 0.51 Toxoplasma gondii O 0.792 Oxyrrhis marina N.D. 0.69/- Gregarina niphandrodes N.D. Chromera veliaN.D. Symbiodinium sp. N.D. Perkinsus marinus M 0.608 1/100 Blastocys�s sp. NandII TPI-GAPDH M 0.836 Blastocys�s hominis TPI-GAPDH M 0.786 0.71 0.99/100 Blastocys�s hominis TPI-GAPDH M 0.805 Auran�ochytrium limacinum N.D. 0.54 mitochondria 1/100 Pseudonitzschia mul�seriesTPI-GAPDH N.D. 0.68/70 Thalassiosira pseudonana TPI-GAPDH M 0.744 Phaeodactylum tricornutumTPI-GAPDH M 0.925 0.79 Ectocarpus siliculosus N.D. 0.96/65 Phytophthora undulata TPI-GAPDH N.D. 0.95/67 Achlya bisexualis TPI-GAPDH M 0.789 0.91/37 Zea mays O 0.858 Chlorella pyrenoidosa M 0.826 0.96/75 Chlorella variabilis O 0.728 1/98 Tetrahymena thermophila M 0.609 Ichthyophthirius mul�filiis O 0.635 Capsaspora owczarzaki O 0.519 0.59/63 Porphyra umbilicalis O 0.306 Euglena intermedia O 0.677 1/93 Chondrus crispusO 0.761 Calliarthron tuberculosum N.D. 0.99/81 Melampsora larici-populina O 0.703 Cryptococcus neoformans O 0.690 0.84/- Us�lago maydis 0.607 0.83/- Batrachochytrium dendroba�dis O 0.669 0.94/67 Salpingoeca sp. M 0.605 Nematostella vectensisO 0.617 Euglena longa O 0.904 0.76/39 Pyropia yezoensisC 0.660 0.53 Porphyridium purpureum M 0.723 0.95/68 Gracilaria verrucosaN.D. 0.67 Bigelowiella natans N.D. Galdieria sulphuraria N.D. 0.81/31 Bigelowiella natans S 0.975 0.58/31 Polytomella parva C 0.652 Chlorella pyrenoidosa N.D. Pavlova lutheri O 0.792 0.97 0.78/90 Rhodomonas salinaN.D. Guillardia theta O 0.732 0.96 0.86 Karlodinium micrum O 0.836 1/100 Symbiodinium sp. O 0.765 0.53 Emiliania huxleyi O 0.718 Aureococcus anophagefferens O 0.743 0.99/83 Selaginella moellendorffii O 0.754 Physcomitrella patens O 0.890 0.96/75 Dictyostelium purpureum O 0.664 Dictyostelium fasciculatum O 0.494 0.53 Ostreococcus tauri N.D. Ostreococcus lucimarinus M 0.535 1/99 Oceanospirillum sp. Thiomicrospira crunogena 0.66/66 Bacteroides fragilis 0.88/62 Magnetococcus sp. Supplementary Figure S3 0.2 B. GAPDH

Olisthodiscus luteus N.D. 0.63/24 0.63/36 Haramonas dimorpha N.D. N.D. 0.94/38 Achlya bisexualis M 0.701 0.82/65 Phytophthora sojae O 0.736 1/98 Vaucheria litorea N.D. 0.86/88 Ectocarpus siliculosus M 0.515 Ascophyllum nodosum M 0.523 0.55 Toxoplasma gondii O 0.903 0.66 Eimeria tenella M 0.794 Blepharisma intermedium N.D. bovis M 0.459 0.6 Halteria grandinella N.D. Paulinella chromatophora N.D. Blastocys�s sp. NandII TPI-GAPDH O 0.618 M 0.619 1/100 Perkinsus marinus N.D. 0.85 Perkinsus marinus M 0.695 Karenia brevis M 0.733 0.97/53 Heterocapsa triquetra N.D. 0.99/72 Karlodinium micrum N.D. 1/96 0.9/91 Scrippsiella trochoidea N.D. Lepidodinium chlorophorum M 0.599 0.66/35 Amphidinium carterae M 0.681 0.52 Paramecium tetraureliaS 0.285 1/100 Phaeodactylum tricornutum M 0.654 cytosol Odontella sinensis M 0.619 0.99/93 Bigelowiella natansO 0.567 Bigelowiella natansO 0.631 0.53 Mesos�gma viride O 0.596 0.56 Galdieria sulphuraria M 0.641 Porphyra yezoensis M 0.494 0.99/85 1/100 Gracilaria lemaneiformis M 0.533 Chondrus crispus M 0.582 0.99 0.99/74 Selaginella moellendorffii O 0.728 0.85/22 Pinus sylvestris C 0.609 Sphagnum cuspidatum N.D. 0.53/- 0.88/64 Physcomitrella patens C 0.975 0.54/- 1/100 Pyrenomonas salina M 0.642 Guillardia theta M 0.624 Marchan�a polymorpha C 0.943 0.97/65 0.98/69 Prymnesium parvum N.D. 1/99 Pavlova lutheri N.D. Emiliania huxleyi M 0.431 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not Coccomyxa sp. C169C 0.840 certified by peer review) is the author/funder. All rights reserved. No reuse allowed0.98 without permission. 0.74 Scenedesmus vacuolatus N.D. 0.78 Chlorella sp. NC64A M 0.584 Chlorella sp. NC64A C 0.416 Micromonas sp. Rcc299 C 0.840 Cyanophora paradoxa O 0.454 0.97/85 Sphagnum cuspidatum N.D. Physcomitrella patens M 0.455 0.97/87 0.57/67 Selaginella lepidophyllaO 0.738 0.92/65 Marchan�a polymorpha O 0.728 Ginkgo biloba O 0.584 Coleochaete scutata N.D. 0.57/- Chara vulgaris N.D. Paulinella chromatophora N.D. 1/100 Ochromonas danica N.D. 0.78 Mallomonas rasilis N.D. Bigelowiella natans M 0.859 0.87 0.97/90 Pythium vexans N.D. Apodachlya brachynemaN.D. 1/99 Blastocys�s hominis TPI-GAPDH M 0.786 Blastocys�s sp. NandII TPI-GAPDH M 0.836 0.86 0.86/82 Phaeodactylum tricornutumTPI-GAPDH M 0.925 mitochondria Phaeodactylum tricornutum M 0.654 0.99/69 1/97 Odontella sinensis M 0.640 * Aureococcus anophagefferens M 0.716 0.85/54 1/97 Ectocarpus siliculosus TPI-GAPDH N.D. Ascophyllum nodosum N.D. 1/95 Ectocarpus siliculosus M 0.671 Ascophyllum nodosum M 0.751 1/100 Toxoplasma gondii GAPDH-ENO M 0.674 0.99/91 Sarcocys�s sp. N.D. Pavlova lutheri N.D. 1/99 Prymnesium parvum M 0.880 1/99 1/83 Karenia mikimotoi M 0.493 0.99/63 Karlodinium micrum S 0.720 0.99/94 Emiliania huxleyi M 0.542 1/99 Heterosigma akashiwo N.D. Haramonas dimorpha N.D. 1 Nannochloropsis oculata N.D. Ectocarpus siliculosusS 0.590 0.55 Olisthodiscus luteus N.D. plas�d 0.78 1/98 Ochromonas danica N.D. Mallomonas rasilis N.D. 0.55 1/100 Thalassiosira pseudonana N.D. Phaeodactylum tricornutumM 0.586 0.99/36 0.99/94 Pyrenomonas salina S 0.606 Guillardia theta S 0.817 * 0.99/74 Scrippsiella trochoidea N.D. 1/100 0.93/72 Heterocapsa triquetra S 0.786 0.96/50 Polarella glacialis N.D. 0.96/43 Gonyaulax polyedra S 0.978 Aureococcus anophagefferens M 0.461 0.2

Supplementary Figure S3 C. PGK 1/82 Oscillatoria sp. 0.99/93 Microcoleus vaginatus Lyngbya sp. PCC 8106 0.89/76 Arthrospira maxima Microcys�s aeruginosa 0.95 Synechocys�s sp. PCC 6803 Cyanothece sp. ATCC 51142 0.78/56bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not 0.76 certified0.7 by peer review)Cyanothece is the author/funder. sp. All PCCrights reserved. 8802 No reuse allowed without permission. 0.63/58 Crocosphaera watsonii Microcoleus chthonoplastes PCC 7420 Lyngbya majuscula 0.96/72 Thermosynechococcus elongatus 0.97 Acaryochloris marina MBIC 11017 0.76 Cyanothece sp. PCC 7425 1/99 Nostoc azollae 0708 Nostoc punc�forme Anabaena variabilis ATCC 29413 0.51/59 0.99/98 Synechocys�s sp. 0.81 1/98 Prochlorococcus marinus Paulinella chromatophora O 0.677 1/100 Synechococcus sp. Synechococcus sp. Synechococcus elongatus 1/100 Ectocarpus siliculosusO 0.917 Ectocarpus siliculosus O 0.925 1/100 Blastocys�s hominis M 0.906 0.99/83 Blastocys�s sp. NandIIM 0.897 0.98/98 Phytophthora ramorumN.D. 0.92 Albugo laibachii M 0.948 mitochondria 0.76 Aureococcus anophagefferens M 0.880 Ectocarpus siliculosus M 0.862 1/100 Thalassiosira pseudonana N.D. Phaeodactylum tricornutum M 0.959 1/86 Coccomyxa sp. C169 O 0.654 0.8/- Chlorella sp. NC64A O 0.722 1/100 Physcomitrella patens C 0.518 0.66/49 Physcomitrella patens C 0.808 0.95/99 Physcomitrella patens O 0.797 1/100 Selaginella moellendorffii O 0.798 0.71/51 Selaginella moellendorffii C 0.945 Coccomyxa sp. C169C 0.776 Chlorella variabilis O 0.861 0.79/- 1/100 Volvox carteri M 0.770 Chlamydomonas reinhard�i M 0.771 1/100 Ostreococcus lucimarinus C 0.712 Micromonas pusilla O 0.814 1/96 Micromonas sp. Rcc299 C 0.809 0.98/73 Galdieria sulphuraria N.D. Cyanidioschyzon merolae C 0.921 0.92 0.98/97 Isochrysis galbana S 0.558 Emiliania huxleyi S 0.800 0.94 0.890.99/100 Porphyra yezoensisC 0.763 Chondrus crispusO 0.638 0.66 Ectocarpus siliculosus S 0.758 Bigelowiella natans S 0.965 1/100 Neospora caninum O 0.925 0.52 0.67 Toxoplasma gondii S 0.949 Heterocapsa triquetra N.D. 1/100 Laminaria digitata N.D. Ectocarpus siliculosus S 0.551 0.92 1/100 Guillardia theta O 0.878 Guillardia theta O 0.918 Aureococcus anophagefferensN.D. 0.58 0.99/93 plas�d 1/100 Phaeodactylum tricornutum S 0.775 Thalassiosira pseudonana S 0.838 0.86/77 Thalassiosira pseudonana O 0.880 0.2 Supplementary Figure S3 D. PGM Alcanivorax borkumensis Aureococcus anophagefferens N.D. 0.58/43 Fusarium oxysporum O 0.700 0.6 Mycosphaerella graminicola O 0.700 0.99/100 Propionibacterium freudenreichii Propionibacterium acnes 0.6/53 1/99 Yarrowia lipoly�ca O 0.853 Lodderomyces elongisporus S0.941 0.99/80 Komagataella pastoris O 0.802 Hydrogenobaculum sp. 0.78/- Anaerolinea thermophila Oceanicaulis alexandrii 0.95/590.92/69 Halorhodospira halophila Alkalilimnicola ehrlichei 0.85/67 Desulfovibrio desulfuricans Desulfovibrio africanus 0.65 Phaeodactylum tricornutumM 0.702 1/100 Nitrosospira mul�formis Nitrosomonas europaea Rhodospirillum centenum Phaeodactylum tricornutumM 0.811 -/46 Phaeodactylum tricornutumC-terminus -/39 Phaeodactylum tricornutumS 0.893 N-terminus Phaeodactylum tricornutumS 0.917 Leptotrichia hofstadii Paenibacillus larvae Phaeodactylum tricornutum M 0.870 mitochondria Coprothermobacter proteoly�cus 1/94 Sebaldella termi�dis Listeria grayi 0.58/- Capnocytophaga gingivalis Nitrosococcus oceani Thermoanaerobacter tengcongensis -/69 Clostridium beijerincki Candidatus kuenenia Trichomonas vaginalis O 0.888 0.57 bioRxiv preprint doi: https://doi.org/10.1101/257790; thisBombus version postedimpa�ens January 31, 2018.O 0.738The copyright holder for this preprint (which was not 0.98/86 certified by peerNitrosospira review) is the author/funder. mul� formisAll rights reserved. No reuse allowed without permission. Nitrosomonas sp. 0.83 Methanospirillum hungatei Methanoplanus petrolearius 0.78/- Silurana tropicalis O 0.811 Parasu�erella excremen�hominis 0.98/72 Limnobacter sp. Desulfonatronospira thiodismutans Methylacidiphilum infernorum Cloacamonas acidaminovorans Thermotogales bacterium Solibacter usitatus Prosthecochloris vibrioformis 0.91/65 Chlorobium tepidum Dictyostelium purpureum O 0.826 0.89/- Polysphondylium pallidum O 0.728 0.99/86 Dictyostelium fasciculatum O 0.834 0.98/- Dictyostelium discoideum O 0.841 1/90 Schistosoma japonicum O 0.903 Brachionus plica�lis O 0.867 0.87/- Thermanaerovibrio acidaminovorans Pyramidobacter piscolens 0.88 Uncultured termite group I bacterium Brachyspira intermedia 0.98/36 Spirochaeta coccoides Borrelia hermsii Isosphaera pallida Bdellovibrio bacteriovorus 0.98 Danio rerio O 0.771 1/100 Tetraodon nigroviridis O 0.754 0.99/99 Homo sapiens O 0.802 Gallus gallus O 0.747 0.62 Xenopus laevis O 0.733 0.99/83 Xenopus laevis O 0.653 Tetraodon nigroviridis O 0.519 1/84 Bos taurus O 0.644 0.97/64 Tetrahymena thermophila O 0.830 Tetrahymena thermophila S 0.496 or M 0.677, there are two sequences with the same number in the fasta file, i am not sure which was used? 0.64 0.82/97 Paramecium tetraurelia M 0.603 1/100 Cryptosporidium hominis O 0.854 O 0.848 0.81/60 Toxoplasma gondii O 0.479 0.86/48 Theileria parva O 0.791 0.98/40 O 0.819 Spirochaeta smaragdinae Naegleria gruberi O 0.833 1/90 Thalassiosira pseudonana S 0.885 Phaeodactylum tricornutumS 0.836 0.71 Albugo laibachii O 0.889 1/100 Blastocys�s sp. NandII M 0.837 Blastocys�s hominis O 0.439 0.88 1/100 Salpingoeca sp. O 0.698 Monosiga brevicollisO 0.705 1/100 Phytophthora sojae M 0.957 0.58 Albugo laibachii O 0.830 0.95/53 Gymnochlora stellata M 0.660 Aureococcus anophagefferens 0.96/51 M 0.893 mitochondria 0.97/75 1/95 Thalassiosira pseudonana M 0.940 Phaeodactylum tricornutumM 0.935 Supplementary Figure S3 0.2 E. ENO 1/100 Blastocys�s hominis M 0.831 Blastocys�s sp. NandII M 0.869 0.88/35 Heterosigma akashiwo N.D. Ectocarpus siliculosus M 0.908 Ectocarpus siliculosus S 0.873 0.71/35 1/100 Thalassiosira pseudonanaM 0.906 0.96/70 1/98 Phaeodactylum tricornutumM0.952 mitochondria Aureococcus anophagefferens 0.96/52 0.74 M 0.860 1/99 Pseudonitzschia mul�series N.D. Phaeodactylum tricornutumS 0.777 0.91 Emiliania huxleyi M 0.865 0.97 Aureococcus anophagefferens M 0.723 1/91 0.98/89 Symbiodinium sp. N.D. 1/70 0.95/53 1/91 Heterocapsa triquetra O 0.863 0.8/37 Emiliania huxleyi O 0.865 Aureococcus anophagefferens O 0.894 Auran�ochytrium limacinum N.D. 1/77 Thraustotheca clavata N.D. 0.95/97 Apodachlya brachynema N.D. -/97 Albugo laibachii M 0.923 Phytophthora ramorumM 0.947 bioRxiv preprint doi: https://doi.org/10.1101/2577900.69 ; this version postedSalpingoeca January 31, 2018. sp. MThe 0.575copyright holder for this preprint (which was not certified by peer review) is the author/funder.Rhizopus All rights reserved. oryzae No reuseN.D. allowed without permission. 0.86/77 Cunninghamella elegansN.D. 1/79 Capsaspora owczarzaki O 0.801 Neocallimas�x frontalis O 0.718 Blastocladiella emersonii O 0.714 Batrachochytrium dendroba�dis O 0.806 1/100 Physcomitrella patens O 0.736 0.8/42 Physcomitrella patens O 0.702 1/100 Rhodomonas salina N.D. 1 Guillardia theta O 0.883 0.99/83 Porphyridium purpureum N.D. Mastocarpus papillatusN.D. 0.89/84 Prioni�s lanceolata N.D. 0.76 0.64 Porphyridium purpureum N.D. Porphyra umbilicalisN.D. Porphyra umbilicalis N.D. 0.69 Galdieria sulphulariaO 0.814 0.57 0.92/98 Symbiodinium sp.O 0.757 1/100 Heterocapsa triquetra O 0.691 Emiliania huxleyi O 0.843 1/95 Perkinsus marinus O 0.818 0.96/88 Symbiodinium sp. N.D. 0.63/39 Oxyrrhis marinaN.D. 0.87 Gregarina niphandrodes O 0.665 0.83 0.95/53 Toxoplasma gondii O 0.888 0.88 Eimeria tenella O 0.902 1/100 Tetrahymena thermophila O 0.635 Paramecium mul�micronucleatum N.D. Plasmodium falciparum O 0.769 0.41 Dictyostelium discoideum O 0.914 0.35 0.38/- Cryptosporidium parvum N.D. 1/95 Polysphondylium pallidum O 0.808 Dictyostelium discoideumO 0.813 1/93 Mas�gamoeba balamuthi O 0.844 0.59 0.81/48 Entamoeba dispar O 0.865 1/100 Tetrahymena bergeri N.D. Colpidium aqueous N.D. 0.86 Bigelowiella natans S 0.748 Bigelowiella natansN.D. 1/98 Isochrysis galbana N.D. Emiliania huxleyi O 0.936 0.86/43 Ostreococcus lucimarinus O 0.874 0.86 Pycnococcus provasolii N.D. 0.96 0.87 Chlorella pyrenoidosa N.D. Pyramimonas gelidicola N.D. -/42 Pycnococcus provasolii N.D. 0.66/- Micromonas pusilla O 0.870 Physcomitrella patens O 0.735 0.99/100 Dunaliella salina O 0.865 0.96/57 Chlamydomonas reinhard�i O 0.889 0.62/67 Physcomitrella patens O 0.715 Nitellopsis obtusa N.D. 1/90 Chara corallina N.D. Blastocys�s sp. NandII O 0.819 Bigelowiella natans O 0.610 0.56/- 1/98 Phytophthora infestans O 0.643 0.86/52 Apodachlya brachynemaN.D. 0.67/62 Albugo laibachii O 0.739 0.2 Supplementary Figure S3 F. PK Streptobacillus moniliformis 1/99 0.89 Ilyobacter polytropus 1 Fusobacterium sp. 1/99 Moritella marina 0.99/86 Escherichia coli 0.83/62 Vibrio cholerae Clostridium tetani 0.97 Bacillus stearothermophilus Thermoanaerobacter tengcongensis 0.54 Parachlamydia sp. 0.9/48 Caldicellulosiruptor saccharoly�cus 0.84/86 1/98 Thalassiosira pseudonana M 0.848 1/100 Phaeodactylum tricornutumM 0.939 Aureococcus anophagefferens M 0.887 1/89 1/68 Achlya bisexualis M 0.916 Phytophthora sojae M 0.907 Bigelowiella natans O 0.945 mitochondria 0.96/59 Ectocarpus siliculosus S 0.356 0.96/42 1/100 Aureococcus anophagefferens M 0.938 0.77/53 Emiliania huxleyi M 0.798 bioRxiv preprint doi: https://doi.org/10.1101/2577900.98/81; this version posted January 31,1/100 2018. The copyright holderThalassiosira for this preprint (which pseudonana was not N.D. certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.Phaeodactylum tricornutumM 0.958 0.88/61 Galdieria sulphuraria N.D. Bigelowiella natans O 0.604 1 1/100 Trypanosoma brucei O 0.896 Trypanoplasma borreli M 0.831 0.98 1/81 Trichoplax adhaerens O 0.702 Strongylocentrotus purpuratus O 0.900 0.57/61 Nematostella vectensis O 0.773 1/100 Salpingoeca sp. O 0.664 0.69/40 Monosiga brevicollis M 0.560 0.7/87 Pavlova lutheri O 0.877 0.93/43 Emiliania huxleyiO 0.917 Capsaspora owczarzaki O 0.360 0.98/68 0.88/76 Yarrowia lipoly�ca O 0.845 0.51/41 1 Schizosaccharomyces japonicus O 0.692 0.97/76 Schizophyllum commune O 0.879 0.92/48 Puccinia graminis O 0.699 0.99 Rhizopus oryzae N.D. 0.99/65 Mucor circinelloides O 0.796 0.59/- Batrachochytrium dendroba�dis N.D. 0.99 Naegleria gruberi M 0.544 Guillardia theta O 0.819 0.99/63 0.69 Guillardia theta N.D. 0.99 Porphyra purpurea O 0.611 Chondrus crispus O 0.666 0.84 Galdieria sulphularia O 0.820 1/100 Polysphondylium pallidum N.D. 0.79 Dictyostelium purpureum M 0.680 0.99/86 Selaginella moellendorffii O 0.852 1/100 Physcomitrella patens O 0.909 0.87/98 Volvox carteri O 0.850 0.87/63 Micromonas pusilla N.D. 1/100 Ostreococcus tauri N.D. 0.98/60 Micromonas sp. Rcc299O 0.615 0.99/100 Selaginella moellendorffii C 0.600 0.75/90 Physcomitrella patens O 0.650 Micromonas pusilla O 0.359 Volvox carteri O 0.850 0.99/83 0.99/100 Dunaliella salina C 0.500 0.99/94 Aureococcus anophagefferens O 0.538 Emiliania huxleyi O 0.670 0.97 1/100 Chlamydomonas reinhard�iO 0.800 0.96 Chlamydomonas reinhard�iC 0.647 Cryptosporidium muris M 0.545 0.65/55 1/89 Toxoplasma gondii M 0.784 0.98/57 Perkinsus marinus O 0.733 0.99/74 Phytophthora infestans O 0.852 Albugo laibachii O 0.894 1/100 Ectocarpus siliculosus O 0.741 0.94 Dictyota dichotoma N.D. 1/100 Blastocys�s sp. NandII O 0.899 Blastocys�s hominisO 0.870 1/100 Tetrahymena thermophila O 0.931 Paramecium tetraurelia O 0.865 0.87 Thalassiosira pseudonanaO 0.807 0.59 Thalassiosira pseudonana O 0.641 1 Phaeodactylum tricornutum O 0.841 1/89 Phaeodactylum tricornutum O 0.868 cytosol Phaeodactylum tricornutum O 0.565 0.2 Supplementary Figure S3 DIC Chl GFP Chl + GFP

preTPI_50738 plastid

preTPI_18228 plastid

TPI_54738 cytosol *

Gapdh3_23598 cytosol

prePGK_29157 plastid

PGK_51125 cytosol

PGM_43812 unclear *

PGM_43253 mitochondria

PGM_26201 unclear *

PGM_42857 plastid *

PGM_35164 mitochondria *

PGM_51298 cytosol

preENO_56468 cytosol *

bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. ENO_56468 plastid

PK_49098 cytosol *

PK_56445 cytosol *

PK_45997 cytosol *

PK_56172 mitochondria *

Supplementary Figure S4 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Materials and methods 2 3 Sources of cDNA and genomic DNA

4 DNA and cDNA from Blastocystis ST1 strain NandII, obtained from a symptomatic human (strain 5 obtained from the American Type Culture Collection, ATCC 50177), was used in this study. Genomic 6 and cDNA libraries of Phaeodactylum tricornutum (culture from SAG strain: 1090-1a, Göttingen) 7 were constructed with the “Lambda ZAP II XR library Construction Kit” from Stratagene and the 8 lambda vector EMBL3, respectively. P. tricornutum Bohlin (strain 646; University of Texas Culture 9 Collection, Austin) RNA was isolated using TRIzol following manufactures protocol (Thermo Fisher, 10 Germany) and cDNA synthesis was performed with the reverse Transcription system (A3500, 11 Promega, Germany). An Achlya bisexualis cDNA library1 was kindly provided by D. Bhattacharya 12 (Rutgers University). Screening of libraries, sequencing of positive clones and RACE analyses were 13 performed as described2. Phytophthora infestans RNA extracted from P. infestans mycelia with the 14 RNAeasy Plant Kit from Quiagen and cDNA was synthesized with the Thermo-RT Kit (Display 15 Systems, England). Sequences were also obtained from the EST/genome sequencing programmes 16 from Phaeodactylum tricornutum3 and http://genome.jgi-psf.org/Phatr2/Phatr2.home.html (JGI)4, 17 from Phytophthora infestans (http://www.pfgd.org5) and from P. sojae and P. ramorum 18 (http://www.jgi.doe.gov6).

19

20 GFP constructs for the stable transformation of Phaeodactylum tricornutum

21 Standard cloning procedures were applied7. Polymerase chain reaction (PCR) was performed with a 22 Master Cycler Gradient (Eppendorf) using Taq DNA Polymerase (Q BIOgene) according to the 23 manufacturer’s instructions. cDNA from Blastocystis ST1 strain NandII (Bl), Phaeodactylum 24 tricornutum (Pt), Phytophthora infestans (Pi) and Achlya bisexualis (Ab) was used as template for the 25 PCR reactions. For Saccharina latissima (Sl) a cDNA clone (ABU96661) was used as template.

26

27 Table 1: PCR primers for production of GFP fusion constructs.

Forward primer Reverse primer BlTPI AGGCCTATGCTTTCCCGTTCCTCCGTCATTGCCCGTTCCTTC GAGCTTGCGAGCGGCGGAGCCGAAGGAACGGGCAATGA GGCTCCGCCGCTCGCAAGCTC CGGAGGAACGGGAAAGCATAGGCCT BlPGK AGGCCTATGCTTTCCGCCTTCTCGAAGCGTCTC GTTGACGGTACGGCCGGTGGAAAAGAGACG TTTTCCACCGGCCGTACCGTCAAC CTTCGAGAAGGCGGAAAGCATAGGCCT PtTIG CCGAATTCTTGTTCTTCCTTG GGCCATGGAACCGTTGCATTTCCAG PiTIG CCGAATTCTGAGCTCATTCTCG GGCCATGGCCGTGTTAAGCATTCC SlTIG GAATTCATGTTCTCCGCAGC CGCCATGGACCCGTTGCACTTCCAG bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

PtPGK GCCCCGGGAGCAAATACCGCAACTCC GCCCATGGCTTGCTGGGCCAGTTG PtPGM CCGAATTCTCTCCCCAAGGTGC GGCCATGGGGCAGTCGTACCATC PtENO CCGAATTCCGTATTGTTCGTATTCG GGCCATGGTTCCCTGCGCCGTCG PtPK CCGAATTCCTATCGGCGCAGTAA GGCCATGGCTCCGGTAACGGGC PiPGM GCGAATTCGTCTCAGCATGG CGCATGGCCTCCTTATTG AbPK CCGGATCCAACCAGTCGTGTGTTG GGCCATGGTCATCGAAAAAGCATC PtTPI50738 ATTGAGCTTCCAGTTTCCCG TTCTCCACCGATGACTGGTG PtTPI18228 CTGTACAACGCTCACCATG ATTCAGTTTCCAGTTGCCC PtTPI54738 CGAACAATGCCTCGC CTGTTCTAGCAGATCTTTGGC PtGAPDH3_23598 ATCACCATGCCCGTGAAATGC CTCATACTTAAAAGAGGGTTC PtPGK29157 CCCACCATGAAATTCGTTC GGCAAAGCTACGGACACC PtPGK51125 AGTCACACCATGGCTTCCGAC TAGCAAGGCGGGCACTCC PtPGM43812 GAATGGGCAGGAGAACCAC GGGAATGGGCGGTAAAG PtPGM43253 CCAAGCATTGCGAAAATGGCATC GGTGGAGCGGGATCGC PtPGM26201 CAGGCATGTTGGTTCCTCATC TATTGCAGCCCCGCCAG PtPGM42857 GGTCATCATGGCAATGGACG CTGGGGAGGCTTCCAAAG PtPGM35164 CTGTCGCCGACCATGAG GGGCTCCTCCACGGCCG PtPGM51298 GAGACGATGTGCGACGAATCACG GCAGTCAACCCAACCAGTG PtENO56468pre ACCGTTGTCATGCTTTTCAAG ATTGCCGCGGGAATC PtENO56468 ACCGTTGTCATGCTTTTCAAG GAAAGTCACCTTGCCTGC PtPK49098 TTCACCATGACAGCGTCTC GGTGGCCCCACGCTG PtPK56445 CAGAATGTCCTTGTCCCAG AATTTCGTCCGCGGCGTATAC PtPK45997 AAAGTCCAACAGACG CTAGTGTTGTTACAACGT PtPK56172 CAGATATAATGTTCCGTCGAGCCG CAGTCGCATGATGCGCATCCC 28

29 PCR products were cloned into TA-vector PCR 2.1 (Invitrogen) or blunt cloned into pBluescript II SK+ 30 (Stratagene). The primers (Table 1) allowed insertion of restriction enzyme recognition sites 31 (EcoRI/NcoI or SmaI/NcoI) that were used to clone the presequences in frame to eGFP within 32 pBluescript-GFP. The presequence-GFP fusions were cut out with appropriate restrictions enzymes 33 (EcoRI/HindIII or SmaI/HindIII) and cloned into the Phaeodactylum tricornutum transformation vector 34 pPha-T18,9, either into the corresponding sites or, in case of SmaI, into the EcoRV site. For the 35 constructs with Protein ID (Fig. S4) a slightly different cloning approach was used. PCR with a proof 36 reading Polymerase (Pfu or Kapa Hifi) were used to amplify corresponding fragments from cDNA. 37 These fragments were cloned blunt end in a modified pPha-T1 Vector. These Vectors include an eGFP 38 with a StuI or KspAI restriction site, allowing a one-step cloning procedure, with subsequent screening 39 for the correct orientation of the fragment at the N-terminus of eGFP. The Blastocystis presequences 40 were produced by kinasing the primers using T4 polynucleotide kinase using manufacturer’s 41 procedures and subsequently annealing in a thermal cycler after which they were cloned into the 42 diatom expression vector equipped with eGFP and the StuI restriction site.

43

44 Transformation of Phaeodactylum tricornutum

45 Phaeodactylum tricornutum Bohlin (UTEX, strain 646) was grown at 22 °C under continuously light of 46 75 µE in artificial seawater (Tropic Marin) at a 0.5 concentration. Transformations were performed 47 as described by Zaslavskaia et al.8,10. For each transformation, tungsten particles M10 (0.7 μm bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

48 median diameter) covered with 7-20 μg DNA were used to bombard cells with the Particle Delivery 49 System PDS-1000 (Bio-Rad, HE-System) prepared with 650, 900, 1100 or 1350 psi rupture discs.

50

51 Microscopic analysis of transformed Phaeodactylum tricornutum

52 Reporter gene expression was visualized using confocal laser scanning microscopy (cLSM-510META, 53 Carl Zeiss, Jena, Germany) using a Plan-Neofluar 40x/1.3 Oil DIC objective. The eGFP fusion proteins 54 were excited with an argon laser at 488 nm with 8-10% of laser capacity. Excited fluorophores were 55 detected with a bandpass filter GFP (505-530 nm) using a photomultiplier. Chlorophyll a 56 autofluorescence was simultaneously detected with a META-channel (644-719 nm). MitoTraker

57 Orange CM-H2TMRos (Molecular Probes) was applied for fluorescence staining of mitochondria. 58 P. tricornutum cells were stained according to the protocol of the manufacturer. Cells were 59 incubated with 100 nM dye solution, incubated for 30 minutes, washed and observed (images were 60 recorded using the Multitracking mode with the following parameters for Wavelength T1 = 488 nm 61 10% and T2 = 543 nm 100% laser line, primary beam splitting mirrors UV/488/543/633 nm; emitted 62 light was detected with the META-channel).

63

64 Protein production and antibody generation

65 Blastocystis TPI-GAPDH was amplified from cDNA using primers TPI-GAPDH pET F: aga aga CAT ATG 66 TTC GTC GGT GGC AAT TGG AAG TGC AA and TPI-GAPDH pET R: tct tct GGA TCC TTA AGA GCG ATC 67 CAC CTT CGC CA adding NdeI and BamHI restriction sites, respectively, to facilitate cloning in gene 68 expression vector pET14b (Novagen, Merck, Whatford, UK). The Blastocystis PGK was amplified from 69 cDNA using PGK pET F: aga aga CAT ATG AAG CTG GGA GTT GCT GCC TAC G and PGK pET R: tct tct 70 CAT ATG TCA CGC GTC CGT CAG AGC GGC CAC ACC C which added NdeI restriction sites for pET14b 71 cloning. The mitochondrial targeting signals were not amplified as these would not be part of the 72 mature processed protein. All constructs were confirmed by sequencing. The in-frame His-tag 73 allowed for affinity chromatography purification of the recombinant protein. Recombinant 74 Blastocystis TPI-GAPDH and PGK were used to immunise guinea pigs and rabbits, respectively, for 75 polyclonal antibody generation at Eurogentec (Seraing, Belgium).

76

77 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

78 Culture conditions for Blastocystis

79 Blastocystis isolate B (originally designated Blastocystis sp. group VII11, now called ST712) was used. 80 The parasite was grown in 10 ml pre-reduced Iscove’s modified Dulbecco’s medium (IMDM) 81 supplemented with 10% heat-inactivated horse serum. Cultures were incubated for 48 h in 82 anaerobic jars using an Oxoid AneroGen pack at 37 oC. Two-day-old cultures were centrifuged at 83 1600 g for 10 min, washed once in a buffer consisting of 30 mM potassium phosphate, 74 mM NaCl,

84 0.6 mM CaCl2 and 1.6 mM KCl, pH 7.4 and resuspended in an a nitrogen gassed isotonic buffer 85 consisting of 200 mM sucrose (pH 7.2) containing 30 mM phosphate, 15 mM mercaptoethanol, 30

86 mM NaCl, 0.6 mM CaCl2, and 0.6 mM KCl (pH 7.2).

87

88 Subcellular fractionation of Blastocystis

89 Blastocystis cells were broken by mixing 2 volumes of the cell suspension with 3 volumes of 0.5 mm 90 beads and broken by 3 one minute duration shakes at maximum speed on a bead breaker (VWR mini 91 bead mill homogenizer, Atlanta, GA, USA) with one-minute pauses on ice. Cell-free extracts were

92 subjected to increasing centrifugal force producing nuclear (N, 1,912 RCFav for 5 min), mitochondria-

93 like (ML, 6,723 RCFav for 15 min), lysosomal (L, 26,892 RCFav for 30 min) and cytosolic (S) fractions, 94 respectively, using a using a Sorvall RC-2B centrifuge fitted with an SS-34 rotor.

95

96 Enzyme assays

97 Hexokinase was assayed by measuring the reduction of NAD+ at 340 nm in a coupled reaction with 98 Leuconostoc mesenteroides glucose-6-phophate dehydrogenase (3 EU), containing 38 mM Tris-HCl

+ 99 pH 7.6, 115 mM D-glucose, 10 mM MgCl2, 0.5 mM ATP, 0.2 mM NAD , 0.05 mL of Blastocystis cell- 100 free extract (0.08-0.12 mg) or fraction (N, 0.15-0.18 mg; ML, 0.12-0.17 mg; L, 0.08-0.11 mg; S, 0.09- 101 0.05 mg), in a final volume of 1 mL at 25 oC.

102 Phosphoglucose isomerase was assayed by measuring contained g the reduction of NADP+ at 340 nm 103 in a coupled reaction with Leuconostoc mesenteroides glucose-6-phophate dehydrogenase (2 EU), 104 containing 38 mM Tris-HCl pH 7.6, 3.3 mM D-fructose-6-phosphate, 0.66 mM β-NADP+, 3.3 mM

o 105 MgCl2, 0.05 mL of Blastocystis cell-free extract or fraction in a final volume of 3 mL at 25 C.

106 Phosphofructokinase was assayed using the standard coupled assay containing 38 mM Tris-HCl pH

107 7.6, 5 mM dithiothreitol, 5 mM MgCl2, 0.28 mM NADH, 0.1 mM ATP, 0.1 mM AMP, 0.8 mM fructose- bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

108 6-ohosphate, 0.4 mM (NH4)2SO4, 0.05 EU each of rabbit muscle aldolase, rabbit muscle 109 glycerophosphate dehydrogenase, and rabbit muscle triosephosphate isomerase, 0.05 mL of 110 Blastocystis cell-free extract or fraction in a final volume of 3 mL at 25 oC.

111 Aldolase was assayed using a modification of the hydrazine method in which 3- 112 phosphoglyceraldehyde reacts with hydrazine to form a hydrazone which absorbs at 240 nm; the 113 assay contained 12 mM fructose-1,6-bisphosphate, pH 7.6, 0.1 mM EDTA, 3.5 mM hydrazine sulfate 114 and 0.05 mL of Blastocystis cell-free extract or fraction in a final volume of 3 mL at 25 oC.

115 Triosephosphate isomerase was assayed by measuring the oxidation of NADH using a linked reaction 116 with glycerol-3-phosphate dehydrogenase; 220 mM triethanolamine pH 7.6, 0.20 mM DL- 117 glyceraldehyde-3-phosphate, 0.27 mM NADH, 1.7 EU glycerol-3-phosphate dehydrogenase, and 0.05 118 mL of Blastocystis cell-free extract or fraction in a final volume of 3 mL at 25 oC.

119 Glyceraldehyde-3-phosphate dehydrogenase was assayed by measuring the initial reduction of NAD+ 120 at 340 nm; the assay contained 13 mM sodium pyrophosphate pH 8.0, 26 mM sodium arsenate, 0.25 121 mM NAD, 3.3 mM dithiothreitol, and 0.05 mL of Blastocystis cell-free extract or fraction in a final 122 volume of 3 mL at 25 oC.

123 Phosphoglycerate kinase was assayed by measuring the 3-phosphoglycerate dependent oxidation of

124 NADH at 340 nm; the assay contained 40 mM Tris-HCl pH 8.0, 0.5 mM MgCl2, 0.26 mM NADH, 0.1 125 mM ATP, 2 EU S. cerevisiae glyceraldehydephosphate dehydrogenase, and 0.05 mL of B. hominis cell 126 free extract or fraction in a final volume of 3 mL at 25 oC.

127 Phosphoglycerate mutase was measured using the standard coupled assay and measuring the 128 decrease in absorbance at 340 nm; the assay contained 76 mM triethanolamine pH 8.0, 7 mM D(-) 3- 129 phosphoglyceric acid, 0.7 mM ADP, 1.4 mM 2,3-diphosphoglyceric acid, 0.16 mM NADH, 2.6 mM

130 MgSO4, 100 mM KCl, 5 EU pyruvate kinase/8 EU lactate dehydrogenase from rabbit muscle, 5 EU 131 rabbit muscle enolase, and 0.05 mL of Blastocystis cell-free extract or fraction in a final volume of 3 132 mL at 25 oC.

133 Enolase was determined using the standard coupled assay and measuring the decrease in 134 absorbance at 340 nm; the assay contained 80 mM triethanolamine pH 8.0, 1.8 mM D(+) 2-

135 phospholycerate, 0.1 mM NADH, 25 mM MgSO4, 100 mM KCl, 1.3 mM ADP, 5 EU pyruvate kinase/8 136 EU lactate dehydrogenase from rabbit muscle, and 0.05 mL of Blastocystis cell-free extract or 137 fraction in a final volume of 3 mL at 25oC. bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

138 Pyruvate kinase was determined by measuring the oxidation of NADH at 340 nm using the following 139 mixture, 45 mM imidazole-HCl pH 8.0, 1.5 mM ADP, 0.2 mM NADH, 1.5 mM phosphoenolpyruvate, 5 140 EU rabbit muscle lactate dehydrogenase, and 0.05 mL of Blastocystis cell-free extract or fraction in a 141 final volume of 3 mL at 25 oC.

142 Pyruvate phosphate dikinase was assayed spectrophotometrically by measuring the oxidation of

143 NADH at 340 nm in 3 mL cuvettes. The reaction contained HEPES buffer (pH 8.0), 6 mM MgSO4, 25

144 mM NH4Cl, 5 mM dithiothreitol, 0.1 mM disodium pyrophosphate, 0.25 mM AMP, 0.1 mM 145 phosphoenolpyruvate, and 0.05-0.25 mg of Blastocystis cell-free extract or fraction. The rate of 146 pyruvate production was determined by the addition of 2 U of lactate dehydrogenase and 0.25 mM 147 NADH, and compared to controls with phosphoenolpyruvate but lacking AMP, and those containing 148 AMP but lacking phosphoenolpyruvate. The concentration of AMP, pyrophosphate and 149 phosphoenolpyruvate used in the assay was selected from preliminary assays using varying 150 concentrations from 0.025-1.0 mM. The generation of ATP from AMP by pyruvate phosphate 151 dikinase was confirmed by measuring the ATP formed using a luciferin/luciferse assay (Molecular 152 Probes, In Vitrogen, Eugene, OR, USA). The assay was performed as described above but lacking 153 lactate dehydrogenase and NADH, after varying times 0, 15, 30, 45 and 60 min 0.1 mL of the assay is 154 removed and added to one well of a 96 well plate containing 0.1 mL of 0.25 μg firefly luciferase and 155 0.5 mM luciferin and the luminescence recorded using a Spectra Max M2 plate reader (Molecular 156 Devices, Sunnyvale, CA).

157 The activity of pyrophosphate dependent phosphofructokinase* in the direction of fructose-1,6- 158 bisphosphate formation (forward reaction) was determined in 1 mL assay volumes containing 0.1 M

159 HEPES-HCl, pH 7.8; 20 mM fructose-6-phosphate; 2 mM Na pyrophosphate; 5 mM MgCl2; 0.25 mM 160 NADH; 0.2 U of aldolase (from rabbit muscle); and 0.3 U each of glycerophosphate dehydrogenase 161 (from rabbit muscle) and triosephosphate isomerase (from rabbit muscle), 10 μM fructose 2,6 162 diphosphate. The reaction was initiated by addition of 0.05-0.25 mg of Blastocystis cell-free extract 163 or fraction, and the rate of NADH oxidation was followed at 340 nm on a Beckman DU 640 164 spectrophotometer (Indianapolis, IN, USA). The activity of the reverse reaction was determined by 165 measuring orthophosphate-dependent formation of fructose-6-phosphate from fructose-1,6- 166 bisphosphate. The reaction mixture (1 mL) contained 0.1 M HEPES-HCl, pH 7.8; 2 mM fructose-1,6- 167 bisphosphate; 15 mM NaH2PO4; 5 mM MgCl2; 0.3 mM NADP+ and 0.12 U glucose- 6-phosphate 168 dehydrogenase and 0.24 U glucose phosphate isomerase. The reaction was initiated by addition of 1 169 mg of pyrophosphate dependent phosphofructokinase and monitored at 340 nm. *Pyrophosphate 170 fructose-6-phosphate 1-phosphotransferase (PFP). bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

171

172 Confocal microscopy of Blastocystis

173 Blastocystis trophozoites were treated with MitoTracker Red (Molecular Probes), washed, fixed in 174 10% formalin and incubated in ice cold acetone for 15 minutes and air-dried.

175 Slides with fixed parasites were rehydrated in phosphate buffered saline (PBS) for 30 minutes and 176 blocked with 2% BSA in PBS for 1 hour at room temperature. All antibody incubations were 177 performed at room temperature in 2% BSA in PBS, 0.1% triton X-100. Slides were washed 5 times in 178 0.2 % BSA in PBS, 0.01% triton X-100 between incubations to remove unbound antibodies.

179 Primary antibodies: Rabbit, anti-PGK; Guinea Pig, anti-TPI-GAPDH (Eurogentec, Seraing, Belgium) 180 were used at a dilution of 1:500 and 1:300 in 2% BSA in PBS, 0.1% triton X-100, respectively.

181 Secondary antibodies: Alexa Fluor 488 conjugated Goat anti-Rabbit (Invitrogen, Eugene, OR, USA), 182 Alexa Fluor 405 conjugated Goat anti-Rabbit (Invitrogen, Eugene, OR, USA), TRITC-conjugated Goat 183 anti-Guinea Pig were used at 1:200 dilutions in 2% BSA in PBS, 0.1% triton X-100, each.

184 The DNA intercalating agent 4'-6-Diamidino-2-phenylindole (DAPI) for detection of nuclear and 185 mitochondrial DNA was added to the final but one washing solution at a concentration of 1 µg·ml-1 . 186 The labeled samples were embedded in Dako Glycergel Mounting Medium (DAKO, Carpinteira, CA, 187 USA) and stored at 4 °C.

188 Immunofluorescence analysis and image data collection was performed on a Leica SP2 AOBS 189 confocal laser-scanning microscope (Leica Microsystems, Wetzlar, Germany) using a glycerol 190 immersion objective lens (Leica, HCX PL APO CS 63x 1.3 Corr). Image z-stacks were collected with a 191 pinhole setting of Airy 1 and twofold oversampling. Image stacks of optical sections were further 192 processed using the Huygens deconvolution software package version 2.7 (Scientific Volume 193 Imaging, Hilversum, NL). Three-dimensional reconstruction, volume and surface rendering, and 194 quantification of signal overlap in the 3D volume model were generated with the Imaris software 195 suite (Version 7.2.1, Bitplane, Zurich, Switzerland). The degree of signal overlap in the 3D volume 196 model is depicted graphically as scatterplots. The intensity of two fluorescent signals in each voxel of 197 the 3D model is measured and plotted. Voxels with similar signal intensity for both signals appear in 198 the area of the diagonal. All image stacks were corrected for spectral shift before rendering and 199 signal colocalization analysis.

200 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

201

202 Phylogenetic analyses

203 Sequences of all glycolytic enzymes from Phaeodactylum tricornutum and Blastocystis ST1, strain 204 NandII, were used as seeds in BlastP searches in the non-redundant database at the NCBI13. We 205 were especially interested to identify all sequences in the SAR supergroup14 (Stramenopiles, 206 and Rhizaria). In addition, representatives from other eukaryotic groups were added and, 207 if required, closely related bacterial sequences. Sequences were automatically added to pre-existing 208 alignments and subsequently manually refined using the Edit option of the MUST package15. Final 209 datasets were generated after elimination of highly variable regions and positions with more than 210 50% gaps by G-blocks16. All datasets were first analysed with a maximum likelihood (ML) method 211 under two different models. PhyML v2.317 was used with the SPR moves option and the LG+F+4G 212 model18 and PhyML v3 (with SPR moves) was used using the C20+4G model, corresponding to 20 213 pre-calculated fixed profiles of positional amino-acid substitution18. Based on the likelihood values 214 (l), the number of parameters (K) and alignment positions (n), the AIC (AIC= -2l +2K) and the 215 corrected AIC (AICc; AIC+ 2K(K+1)/n-K-1) was calculated19. The lowest AICc value corresponds to the 216 best tree, if the value of the C20 analysis was better, then a second ML analysis under the C40+4G 217 model was performed and the AICc value estimated, until the overall best model was found. If the 218 AICc of C40 is better than C20 then C60 was tested. Once the best model was estimated for all six 219 datasets, a rapid bootstrap analysis with 100 replicates in RAxML v7 under the LG model was 220 performed20 and an additional analysis in Phylobayes v3 with the CATfix C20 model in all cases or, 221 alternatively, the best C-model. Two independent chains were run for 10,000 points and trees are 222 sampled at every tenth points21. Trees obtained with the best model are presented and both 223 posterior probabilities (PP) and rapid bootstrap values (BS) are indicated on trees if PP>0.5 or BS 224 >30%, respectively. 225

226 Cellular localisation predictions

227 TargetP22 and MitoProt23 were used to analyse putative subcellular localization. Using the non-plant 228 and no cut-offs settings. In case of Viridiplantae, Rhodophyta and Glaucocystophyta the plant results 229 were taken, if non-plant results differ.

230 231 232 References bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

233 1 Bhattacharya, D., Stickel, S. K. & Sogin, M. L. Molecular phylogenetic analysis of actin genic 234 regions from Achlya bisexualis (Oomycota) and Costaria costata (Chromophyta). J Mol Evol 235 33, 525-536 (1991). 236 2 Liaud, M. F., Brandt, U., Scherzinger, M. & Cerff, R. Evolutionary origin of cryptomonad 237 microalgae: two novel chloroplast/cytosol-specific GAPDH genes as potential markers of 238 ancestral endosymbiont and host cell components. J Mol Evol 44 Suppl 1, S28-37 (1997). 239 3 Maheswari, U. et al. The Diatom EST Database. Nucleic Acids Res 33, D344-347, 240 doi:10.1093/nar/gki121 (2005). 241 4 Bowler, C. et al. The Phaeodactylum genome reveals the evolutionary history of diatom 242 genomes. Nature 456, 239-244 (2008). 243 5 Tripathy, S., Pandey, V. N., Fang, B., Salas, F. & Tyler, B. M. VMD: a community annotation 244 database for oomycetes and microbial genomes. Nucleic Acids Res 34, D379-381, 245 doi:10.1093/nar/gkj042 (2006). 246 6 Tyler, B. M. et al. Phytophthora genome sequences uncover evolutionary origins and 247 mechanisms of pathogenesis. Science 313, 1261-1266, doi:10.1126/science.1128796 (2006). 248 7 Sambrook, J., Fritsch, E. & Maniatis, T. Molecular cloning, a laboratory manual. (Cold Spring 249 Harbor Laboratory Press, 1989). 250 8 Zaslavskaia, L. A., Lippmeier, J. C., Kroth, P. G., Grossman, A. R. & Apt, K. E. Transformation of 251 the diatom Phaeodactylum tricornutum (Bacillariophyceae) with a variety of selectable 252 marker and reporter genes. J. Phycol. 36, 379-386 (2000). 253 9 Gruber, A. et al. Protein targeting into complex diatom plastids: functional characterisation 254 of a specific targeting motif. Plant Mol Biol 64, 519-530, doi:10.1007/s11103-007-9171-x 255 (2007). 256 10 Kroth, P. Genetic transformation - a tool to study protein targeting in diatoms. Methods in 257 Molecular Biology 390, 257-268 (2007). 258 11 Noel, C. et al. Molecular phylogenies of Blastocystis isolates from different hosts: 259 implications for genetic diversity, identification of species, and zoonosis. J Clin Microbiol 43, 260 348-355, doi:10.1128/JCM.43.1.348-355.2005 (2005). 261 12 Stensvold, C. R. et al. Terminology for Blastocystis suntypes - a consensus. Trends Parasitol 262 23, 93-96 (2007). 263 13 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search 264 tool. J. Mol. Biol. 215, 403-410 (1990). 265 14 Adl, S. M. et al. The revised classification of eukaryotes. J Eukaryot Microbiol 59, 429-493, 266 doi:10.1111/j.1550-7408.2012.00644.x (2012). 267 15 Philippe, H. MUST, a computer package of management utilities for sequences and trees. 268 Nucleic.Acids.Research. 21, 5264-5272 (1993). 269 16 Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and 270 ambiguously aligned blocks from protein sequence alignments. Syst Biol 56, 564-577, 271 doi:10.1080/10635150701472164 (2007). 272 17 Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large 273 phylogenies by maximum likelihood. Syst. Biol. 52, 696-704 (2003). 274 18 Le, S. Q., Lartillot, N. & Gascuel, O. Phylogenetic mixture models for proteins. Philos Trans R 275 Soc Lond B Biol Sci 363, 3965-3976, doi:10.1098/rstb.2008.0180 (2008). 276 19 Posada, D. & Buckley, T. R. Model selection and model averaging in phylogenetics: 277 advantages of akaike information criterion and bayesian approaches over likelihood ratio 278 tests. Syst Biol 53, 793-808, doi:10.1080/10635150490522304 (2004). 279 20 Stamatakis, A., Hoover, P. & Rougemont, J. A Rapid Bootstrap Algorithm for the RAxML Web 280 Servers. Systematic Biology 57, 758-771, doi:10.1080/10635150802429642 (2008). 281 21 Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software package for 282 phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286-2288, 283 doi:10.1093/bioinformatics/btp368 (2009). bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

284 22 Emanuelsson, O., Brunak, S., von Heijne, G. & Nielsen, H. Locating proteins in the cell using 285 TargetP, SignalP and related tools. Nat Protoc 2, 953-971, doi:10.1038/nprot.2007.131 286 (2007). 287 23 Claros, M. G. & Vincens, P. Computational method to predict mitochondrially imported 288 proteins and their targeting sequences. Eur. J. Biochem. 241, 779-786 (1996).

289 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Amino acid sequences of mitochondrial targeting sequences used in GFP targeting experiments as 2 seen in Supplementary Figure S2.

3

4 A. 5 > preTPI-GAPDH-GFP (Blastocystis) (OAO12326) 6 MLSRSSVIARSFGSAARKL 7 8 >prePGK-GFP (Blastocystis) (OAO15536) 9 MLSAFSKRLFSTGRTVN

10

11 B. 12 > preTPI-GAPDH-GFP (Phaeodactylum) (NCBI AF063804) 13 MLASSRTAAASVQRMSSRAFHASSLTEARKFFVGGNWKCNGS 14 15 >prePGK-GFP (Phaeodactylum)(JGI 48983) 16 MLFRMLTSTALRRSPVTTSLTCCCKANAFAVRIRSFHAAPVIQAKMTVEQLAQQ 17 18 >prePGM-GFP (Phaeodactylum)(JGI 33839) 19 MFAVSRSSFLLATRVKTLRSFAAVQAADKHTLVLLRHGESTWNLENKFTGWYDCP 20 21 >preENO-GFP (Phaeodactylum)(JGI 1572) 22 MMWSRPVLRRNISTTRASSSSRRFLSAITGVHGREIDSRGNPTVEVDVTTAQGT 23 24 >prePK-GFP (Phaeodactylum) (JGI 49002) 25 MMRSFLRHAQGRACAQHLRTIGTLRLNQMPVTGA

26

27 C. 28 >preTPI-GAPDH-GFP (Phytophthora infestans) (NCBI X64537) 29 MSFRQVFKTQARHMSSSSRKFFVGGNWKCNGSLGQAQELVGMLNTA 30 31 >prePGM-GFP (Phytophthora infestans) (PfGD Pi_011_55705_Feb05.seq) 32 MVLALRRPLAISSRVANRSLGMLRQQQKAMKHTHTLVLIRHGESEWNKKNLFTGWYDVQLSEKGNKEA 33 34 >prePK-GFP (Achlya bisexualis) (NCBI AAU81895) bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

35 MLARSLRSRAVRSFARGLSNKPSKNDAFSMT 36 37 >preTPI-GAPDH-GFP (Saccharina latissima) (NCBI ABU96661) 38 MFSAALSAAGAKAPSAARGFASSASRMSGRKFFVGGNWKCNGS bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Phaeodactylum tricornutum amino acid sequences used in GFP targeting experiments as seen in 2 Supplementary Figure S4.

3 4 >preTPI_50738 plastid (pre-sequence) (JGI 50738) plastid 5 MTGDSTSLLDLISPDRERPQRKEPSRWIAFSVFPFVRFIPEAFATRLPYSIVMKFLALSVAALISSATAFAPTFR 6 GSPASTTASTTSLAARKPFISGNWKLN 7 8 >preTPI_18228 plastid (pre-sequence) (JGI 18228) plastid 9 MKFLALSVAALISSATAFAPTFRGSPASTTASTTSLAARKPFISGNWKLN 10 11 >TPI_54738 cytosol (242 Amino acids) (JGI 54738) cytosol 12 MPRPDGSSTPAAEGERKYLVAGNWKCNGTLASNEELVKTFNEAGPIPSNVEVAICCPSLYLPQLLSSLRDDIQIG 13 AQDCGVNDKNGAFTGEIGAFQIKDIGCDWVIIGHSERRDGFEMPGETPDLCAKKTRVAIDAGLKVMFCIGEKKEQ 14 REDGTTMDVCASQLEPLAAVLTESDWSSIAIAYEPVWAIGTGLTATPEMAQETHASIRDWISQNVSADVAGKVRI 15 QYGGSMKGANAKDLLEQ 16 17 >Gapdh3_23598 cytosol (full length) (JGI 23598) cytosol 18 MPVKCLVNGFGRIGRLCFRYAWDDPELEIVHVNDVCSCESAAYLVQYDSVHGTWSKSVVAAEDSQSFTVDGKLVT 19 FSQEKDFTKIDFASLGVDMVMECTGKFLTVKTLQPYFGMGVKQVVVSAPVKEDGALNVVLGCNHQKLTTDHTLVT 20 NASCTTNCLAPVVKVIQENFGIKHGCITTIHDVTGTQTLVDMPNTKKSDLRRARSGMTNLCPTSTGSATAIVEIY 21 PELKGKLNGLAVRVPLLNASLTDCVFEVNKEVTVEEVNAALKKASESGPLKGILGYETKPLVSTDYTNDTRSSII 22 DALSTQVIDKTMIKIYAWYDNEAGYSKRMAELCNIVAAMNITGQEPSFKYE 23 24 >prePGK_29157 plastid (pre-sequence) (JGI 29157) plastid 25 MKFVQAAIFALAASASTTAAFAPAKTFGVRSFAP 26 27 >PGK_51125 cytosol (193 Amino acids) (JGI 51125) cytosol 28 MASDMPKLAPGATRKRNVFDVIEALQKQSAKTILVRVDFNVPMNSDGKITDDSRIRGALPTIKAVVNAKCNAVLV 29 SHMGRPKLVQKAADDEETRQQRHELSLKPVADHLAKLLDQEVLFGDDCLHAQSTIRELPAEGGGVCLLENLRFYK 30 EEEKNGEDFRKTLASYADGYVNDAFGTSHRAHASVAGVPALLP 31 32 >PGM_43812 unclear (130 Amino acids) (JGI 43812) unclear localization, 33 cytosol plus ER or mitochondria 34 MGRRTTHRRLFPALALIFAELIMSTAYSLAWRTSAACWTTTTGTACSRSRIATTRKVRRSRPNPCNPWHPVAFSF 35 FGTSSRRCRSSGSLYGEIDADAEGPDSPSADDRSVPTPSTTSSLSRSETLPPIPP 36 37 >PGM_43253 mitochondria (112 Amino acids) (JGI 43253) mitochondria 38 MASITLNRSRFTMITAIGMSHPRSHGTPRSVLLLLLRQFSSKDWNSKGTDSASRSGPVLIKKTPRSAAAAKLRST 39 APSLNGSTTDSTTGAVKHHPAHHYINGGTPCDPAPPP 40 bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

41 >PGM_26201 unclear (408 Amino acids) (JGI 26201) unclear, mitochondria or 42 ER 43 MLVPHPSGKAMRGLREEACRFLSSRSFGATLDATHARMGGNFVNSVQACNNGKRVCWHQRNRRTFSVVATQRNGI 44 GHRTTQGETEAVPRRHFTSLNQSTPFQLCFLRHGQSTWNRDNIFIGWTDTPLTDDGVLEARVAGKMLHKSGIRFD 45 EVHTSLLRRSIRTTNLALMELGQEYLPVHKHWRLNERCYGDLVGKNKKEVVMQHGADQVKRWRRSYDEPPPPMSD 46 DHPYHPARDPRYQNILDELPKSESLKNTVERSSLYWDEVLAPALREGKTLLVVGHENNLRSLLMRLEDIAPEDII 47 NLSLPRAVPLAYRLDENLKPLPREDGKLDEATGFLKGTWLGGDQAVSEILDRDHKQVYDTAITTNLEIGQDREKW 48 NNWMEFIMGKPSAKQKRIGGDKQNGFAGGAAIP 49 50 >PGM_42857 plastid (175 Amino acids) (JGI 42857) plastid 51 MAMDAITMRKLTLTMAVLLIVSGCEALLVFLPRRSPFTVISTRSSTNSAGLLHLHSKANESDGLEGKWIKVSSAL 52 DEGVDAANEEKEGAFLSSDYNSMNGYNTDLNRYHTMLRERGTFVEALFGQRRSFVIAKRDGDENEDGWRDMRRQR 53 RPLWKHLLRLPISVAKNVLWKPPQP 54 55 >PGM_35164 mitochondria (351 Amino acids) (JGI 35164) mitochondria 56 MRIPCRRLHPQLSAKGTRRPFQYSSSNSIDDQHRSSHLDASPGRHIVVRHGQSVWNKGSNQLERFTGWTNVGLSE 57 NGQRQAVQAARKLHGYSIDCAYVSLLQRSQATLRLMLEELNDQGRRSEGYDDLTTDIPVISSWRLNERHYGALTG 58 QSKLQAEQLFGKAQLDLWRYSYKIPPPPMDPDTFSSWKHQAHCQMATYIHHRHNRSRVIEKGNSVWDSSRAVMPR 59 SEAFFDVLQRIVPLWKYGIAPRLARGETVLLVGHANSVKALLCLLDPHTVTPTSIGALKIPNTTPLVYQLIRDYP 60 GASTSVPASFPVLGDLRVVIPPSNSTRYPLSGTWLEDPPVARDAGTAVEEP 61 62 >PGM_51298 cytosol (131 Amino acids) (JGI 51298) cytosol. If a shorter 63 version, starting from the second Methionine is used, a localization at the 64 plastid as a blob like structure is the result (data not shown). 65 MCDESRQTATPMIHFEIFRFSDPLVRQDRQAPHLSLTSTVKILSDSNLHKLFIMMLRSLVLALSWTVASAFTHQS 66 TFWGRTAVTNSRILSLSPPTDASSSALCMKYMLVLVRHGESTWNKENRFTGWVDCP 67 68 >preENO_56468 cytosol (66 Amino acids) (JGI56468) cytosol. Start Methionine 69 from GFP was not included in the construct. 70 MLFKPSTLLALFAVAGTTLAFAPRSTTTPLTSTTRGSASSSVTTLAMSGITGVLAREILDSRGNPV 71 72 >ENO_56468 plastid (443 Amino acids) (JGI56468) plastid 73 MLFKPSTLLALFAVAGTTLAFAPRSTTTPLTSTTRGSASSSVTTLAMSGITGVLAREILDSRGNPTVEVEVTTAD 74 GVFRASVPSGASTDAYEAVELRDGGDRYMGKGVLQAVQNVNDILGPAVMGMDPVGQGSVDDVMLELDGTPNKANL 75 GANAILGVSLAVAKAGAAAKKVPLYRHFADLAGNNLDTYTMPVPCFNVINGGSHAGNKLAFQEYFVIPTGAKSFA 76 EAMQIGCEVYHTLGKIIKAKFGGDATLIGDEGGFAPPCDNREGCELIMEAISKAGYDGKCKIGLDVAASEFKVKG 77 KDEYDLDFKYDGDIVSGEELGNLYQSLAADFPIVTIEDPFDEDDWENWSKFTTKNGATFQVVGDDLTVTNIEKIE 78 RAIDEKACTCLLLKVNQIGSISESIAAVTKAKKAGWGVMTSHRSGETEDTYIADLAVGLCTGQIKTGA 79 80 >PK_49098 cytosol (507 Amino acids) (JGI: 49098) cytosol 81 MTASQTKITASGPELRGANITLDTIMKKTDVSTRQTKIVCTLGPACWEVEQLESLIDAGLSIARFNFSHGDHEGH 82 KACLDRLRQAADHKKKHVAVMLDTKGPEIRSGFFADGAKKISLVKGETIVLTSDYSFKGDKHKLACSYPVLAKSV 83 TPGQQILVADGSLVLTVLSCDEAAGEVSCRIENNAGIGERKNMNLPGVIVDLPTLTDKDIDDIQNWGIVNDIDFI 84 AASFVRKASDVHKIREVLGEKGKGIKIICKIENQEGMDNYDEILEATDAIMVARGDLGMEIPPEKVFLAQKMMIR bioRxiv preprint doi: https://doi.org/10.1101/257790; this version posted January 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

85 QANIAGKPVVTATQMLESMITNPRPTRAECSDVANAVLDGTDCVMLSGETANGEYPTAAVTIMSETCCEAEGAQN 86 TNMLYQAVRNSTLSQYGILSTSESIASSAAKTAIDVGAKAIIVCSESGMTATQVAKFRPGRPIHVLTHDVRVARQ 87 CSGYLRGASVEVISSMDQMDPAIDAYIERCKANGKAVAGDAFVVVTGTVAQRGVTNA 88 89 >PK_56445 cytosol (538 Amino acids) (JGI 56445) cytosol 90 MSLSQSSDVPILAGGFITLDTVKHPTNTINRRTKIVCTIGPACWNVDQLEILIESGMNVARFNFSHGDHAGHGAV 91 LERVRQAAQNKGRNIAILLDTKGPEIRTGFFANGASKIELVKGETIVLTSDYKFKGDQHKLACSYPALAQSVTQG 92 QQILVADGSLVLTVLQTDEAAGEVSCRIDNNASMGERKNMNLPGVKVDLPTFTEKDVDDIVNFGIKHKVDFIAAS 93 FVRKQSDVANLRQLLAENGGQQIKICCKIENQEGLENYDEILQATDSIMVARGDLGMEIPPAKVFLAQKMMIREA 94 NIAGKPVITATQMLESMINNPRPTRAECSDVANAVLDGTDCVMLSGETANGPYFEEAVKVMARTCCEAENSRNYN 95 SLYSAVRSSVMAKYGSVPPEESLASSAVKTAIDVNARLILVLSESGMTAGYVSKFRPERAIVCLTPSDAVARQTG 96 GILKGVHSYVVDNLDNTEELIAETGVEAVKAGIASVGDLMVVVSGTLYGIGKNNQVRVSVIEAPEGTVKETPAAM 97 KRLVSFVYAADEI 98 99 >PK_45997 cytosol (533 Amino acids) (JGI 45997) cytosol 100 MLSSTSTIPKLDGEVVTLSIIKKPTETKKRRTKIICTLGPACWSEEGLGQLMDAGMNVARFNFSHGDHEGHGKVL 101 ERLRKVAKEKKRNIAVLLDTKGPEIRTGFFADGIDKINLSKGDTIVLTTDYDFKGDSKRLACSYPTLAKSVTQGQ 102 AILIADGSLVLTVLSIDTANNEVQCRVENNASIGERKNMNLPGVVVDLPTFTERDVNDIVNFGIKSKVDFIAASF 103 VRKGSDVTNLRKLLADNGGPQIKIICKIENQEGLENYGDILEHTDAIMVARGDLGMEIPSSKVFLAQKYMIREAN 104 VAGKPVVTATQMLESMVTNPRPTRAECSDVANAVYDGTDAVMLSGETANGPHFEKAVLVMARTCCEAESSRNYNL 105 LFQSVRNSIVIARGGLSTGESMASSAVKSALDIEAKLIVVMSETGKMGNYVAKFRPGLSVLCMTPNETAARQASG 106 LLLGMHTVVVDSLEKSEELVEELNYELVQSNFLKPGDKMVVIAGRMAGMKEQLRIVTLDEGKSYGHIVSGTSFFF 107 ERTRLLDF 108 109 >PK_56172 mitochondria (86 Amino acids) (JGI: 56172) mitochondria 110 MFRRAVLSLSTRAIRTPVPCSVARGDASQVRSLAQTTFYLPDPADRSQDVHNRGNLQLSKIVATIGPTSEQEEPL 111 RLVTDAGMRIM

112