Accepted Manuscript

Title: The Cytoskeleton of Parabasalian Parasites Comprises Proteins that Share Properties Common to Intermediate Filament Proteins

Author: Harald Preisner Eli Levy Karin Gereon Poschmann Kai St¨uhler Tal Pupko Sven B. Gould

PII: S1434-4610(16)30051-7 DOI: http://dx.doi.org/doi:10.1016/j.protis.2016.09.001 Reference: PROTIS 25547

To appear in:

Received date: 10-2-2016 Revised date: 25-8-2016 Accepted date: 2-9-2016

Please cite this article as: Preisner, H., Karin, E.L., Poschmann, G., St¨uhler, K., Pupko, T., Gould, S.B.,The Cytoskeleton of Parabasalian Parasites Comprises Proteins that Share Properties Common to Intermediate Filament Proteins, Protist (2016), http://dx.doi.org/10.1016/j.protis.2016.09.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Manuscript

1 ORIGINAL PAPER 1 2 2 3 4 5 3 The Cytoskeleton of Parabasalian Parasites Comprises Proteins that Share Properties 6 4 Common to Intermediate Filament Proteins 7 8 9 5 10 a b c c b a,1 11 6 Harald Preisner , Eli Levy Karin , Gereon Poschmann , Kai Stühler , Tal Pupko , and Sven B. Gould 12 13 7 14 15 8 aInstitute for Molecular Evolution, Heinrich-Heine-University, Düsseldorf, Germany 16 17 9 bDepartment of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, 18 19 10 Tel-Aviv, Israel 20 c 21 11 Molecular Proteomics Laboratory (MPL), BMFZ, Heinrich-Heine-University, Düsseldorf, Germany 22 23 12 24 25 13 Submitted February 10, 2016; Accepted September 2, 2016 26 27 14 Monitoring Editor: George B. Witman 28 29 15 30 31 16 32 33 17 34 35 18 36 37 19 38 39 20 Runnng title: Cytoskeleton of Parabasalian Parasites 40 41

42 21 43 44 22 45 46 23 47 48 24 Accepted Manuscript 49 50 25 51 52 53 26 54 55 27 56 57 28 58 59 29 60 61 30 Corresponding author; e-mail [email protected] (S. B. Gould). 62 63 1 64 Page 1 of 33 65 31 1 2 32 Certain protist lineages bear cytoskeletal structures that are germane to them and define their 3 4 33 individual group. are excavate parasites united by a unique cytoskeletal 5 34 framework, which includes tubulin-based structures such as the pelta and axostyle, but also 6 7 35 other filaments such as the striated costa whose protein composition remains unknown. We 8 9 36 determined the proteome of the detergent-resistant cytoskeleton of Tetratrichomonas gallinarum. 10 37 203 proteins with homology to vaginalis were identified, which contain significantly 11 12 38 more long coiled-coil regions than control protein sets. Five candidates were shown to associate 13 14 39 with previously described cytoskeletal structures including the costa and the expression of a 15 40 single T. vaginalis protein in T. gallinarum induced the formation of accumulated, striated 16 17 41 filaments. Our data suggests that filament-forming proteins of protists other than actin and 18 19 42 tubulin share common structural properties with metazoan intermediate filament proteins, 20 43 while not being homologous. These filament-forming proteins might have evolved many times 21 22 44 independently in , or simultaneously in a common ancestor but with different 23 24 45 evolutionary trajectories downstream in different phyla. The broad variety of filament-forming 25 46 proteins uncovered, and with no homologs outside of the Trichomonadida, once more highlights 26 27 47 the diverse nature of eukaryotic proteins with the ability to form unique cytoskeletal filaments. 28 29 48 30 31 49 Key words: Cytoskeleton; intermediate filament proteins; coiled-coils; convergent evolution; 32 33 50 Trichomonadida; . 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Accepted Manuscript 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 2 64 Page 2 of 33 65 51 Introduction 1 2 52 3 4 53 The compartmentalization of the eukaryotic cell and the dynamic nature of the endomembrane system 5 6 54 rest upon an elaborate cytoskeleton (Koonin 2010). Actin and tubulin, the major components of the 7 8 55 cytoskeleton, are well conserved in sequence and function among all eukaryotic supergroups (Kusdian 9 56 et al. 2013; Wickstead and Gull 2011). In addition to actin and tubulin, the eukaryotic cytoskeleton 10 11 57 includes an extended suite of proteins that form a large range of morphologically distinct filaments 12 13 58 that in metazoa are collectively referred to as intermediate filament (IF) proteins (Coulombe and 14 59 Wong 2004; Goldman et al. 2012; Herrmann and Aebi 2004). Individual IF proteins can assemble into 15 16 60 higher-order homo- or heteromultimers through long repetitive core regions (Herrmann et al. 2007). 17 18 61 These regions act as a kind of cellular Velcro (Rose et al. 2005), which mediate the formation of 19 62 dimeric and trimeric coiled-coils. The evolutionary origin and radiation of metazoan IF proteins is not 20 21 63 well understood (Herrmann and Strelkov 2011), especially due to a poor conservation in terms of 22 23 64 primary sequence. In protists the situation is more involved. 24 25 65 Many proteins have been identified in a variety of protists that share common structural 26 27 66 properties, but no homology, with metazoan IF proteins (Fleury-Aubusson 2003; Roberts 1987). An 28 67 early identified example are the articulins, the most abundant proteins of the euglenoid membrane 29 30 68 skeleton that are characterized by a specific and highly-repetitive VPV-motif (Huttenlauch et al. 1998; 31 32 69 Marrs and Bouck 1992). Another group of protists, the alveolates, encode the alveolin proteins (Gould 33 70 et al. 2008). Similar to articulins, these proteins are characterized by a repetitive and charged motif, 34 35 71 EKIVEVPV, and the proteins themselves form a filamentous network that supports the alveolar sacs 36 37 72 residing underneath the plasma membrane of all alveolates (Anderson-White et al. 2011; Bullen et al. 38 73 2009; El-Haddad et al. 2013; Mann and Beckers 2001). A third example are the epiplasmins of ciliates 39 40 74 that form a large multigene family. This filament-forming protein family is very diverse, reflecting a 41 42 75 rapid evolution fueled by whole genome duplications (Damaj et al. 2009). Epiplasmins are 43 76 characterized by a conserved central region consisting of a repetitive consensus motif that is 44 45 77 [ERK]xx[VILT]EY[VIY], and which is flanked by additional repetitive sequence motifs such as PVQ- 46 47 78 and Y-rich domains (Coffe et al. 1996; Damaj et al. 2009). Giardins are filament-forming proteins of 48 79 the excavate parasiteAccepted lamblia, which are associated Manuscript with the microribbons (Crossley and 49 50 80 Holberton 1983). This excavate parasite has an elaborate actin cytoskeleton that, surprisingly, is 51 52 81 organized in the absence of canonical actin-binding proteins (Paredez et al. 2011). Others include 53 82 centrin and spasmin (Gogendeau et al. 2007; Vigues et al. 1984) or the tetrins (Brimmer and Weber 54 55 83 2000; Clerot et al. 2001). None of the mentioned protist proteins that form or associate with filaments 56 57 84 share any obvious sequence homology with metazoan IFs. These proteins are, however, characterized 58 85 by repetitive motifs that are biased in terms of their amino acid composition, a property also known 59 60 86 from metazoan intermediate filament proteins (Fleury-Aubusson 2003; Gould et al. 2011). Such 61 62 63 3 64 Page 3 of 33 65 87 proteins are usually restricted to a specific sub-cellular structure and often the eukaryotic lineage in 1 88 which they are found. 2 3 4 89 Metazoan genomes encode a large variety of filament-forming proteins of the IF family, 5 90 including the well-known examples lamin, vimentin and keratin (Coulombe and Wong 2004; Fuchs 6 7 91 and Weber 1994). It is common perception that IF proteins are restricted to metazoa (Herrmann and 8 9 92 Strelkov 2011; Peter and Stick 2015) and that an early duplication and modification of the lamin gene 10 93 in the bilateria, and the loss of the nuclear localization signal in one of the copies, might be the origin 11 12 94 of many extant cytosolic IF proteins in these organisms (Erber et al. 1999; Kollmar 2015; Peter and 13 14 95 Stick 2015). The identification of lamin-encoding genes in a range of phylogenetically distant 15 96 eukaryotic groups (Kollmar 2015) and an earlier hydrophobic cluster analysis (Bouchard et al. 2001), 16 17 97 however, indicate an evolutionary early origin of at least some kind of “proto-IF protein“ in 18 19 98 eukaryotes. A few prokaryotic IF proteins have been identified, too, such as CreS from Caulobacter, 20 99 CfpA and Scc from spirochetes, and AglZ from Myxococcus xanthus, but with their evolutionary 21 22 100 origin not always fully resolved. True homologs of IF proteins other than lamin appear absent from 23 24 101 sequenced protist genomes. Hence, the question about the nature of proteins that support the elaborate 25 102 cytoskeleton of protists, which in terms of morphological and species diversity outnumber metazoa to 26 27 103 a considerable degree, remains open. 28 29 104 The Parabasalia form a monophyletic group within the excavates and they are characterized by 30 31 105 unique lineage-specific cytoskeletal structures (Brugerolle 1991; Cepicka et al. 2010; Noda et al. 32 33 106 2012). Parabasalia include a long list of ecologically and medically important protists, which, among 34 107 others, include the majority of flagellated gut symbionts of termites (Brune and Dietrich 2015) and 35 36 108 human parasites such as Trichomonas vaginalis or (Hirt and Sherrard 2015; 37 38 109 Kusdian and Gould 2014). The parabasalian cytoskeleton typically consists of several characteristic 39 110 features that include the pelta, the axostyle, the costa and the karyomastigont (Cepicka et al. 2010). 40 41 111 Both the axostyle and the pelta are formed by highly organized microtubules (Benchimol 2004; Rosa 42 43 112 et al. 2013). The karyomastigont comprises the basal bodies, the microtubule-organizing center, the 44 113 flagella and the filaments that connect these to the nucleus (Brugerolle 1991). The costa is a rod-like 45 46 114 rigid structure exclusively found in the Parabasalia, albeit some parabasalian species appear to lack it 47 48 115 (Cepicka et al. 2010)Accepted. This striated filament (or striated Manuscript fiber) protects the plasma membrane against 49 116 the sheering forces generated by the beating of the recurrent flagella that is attached to the undulating 50 51 117 membrane (Kulda et al. 1986; Viscogliosi and Brugerolle 1994). In comparison to other parabasal 52 53 118 filaments, the costa has been described as the longest and thickest of the striated filaments of 54 119 T. vaginalis (Lee et al. 2009) and Tritrichomonas foetus (Rosa et al. 2013). Based on different striation 55 56 120 patterns, the costa was divided into two sub-types, the A- and B-type (Honigberg et al. 1971). While 57 58 121 morphologically distinct, it appears that the striated filaments of both types of costa are formed by the 59 122 same proteins that are, however, unique to Trichomonadidae and whose identity remains unknown 60 61 123 (Viscogliosi and Brugerolle 1994). 62 63 4 64 Page 4 of 33 65 124 A proteome study of the detergent-resistant pellicle of the ciliate Tetrahymena showed that 1 125 many of the identified proteins shared characteristics with metazoan IF proteins, albeit, again, without 2 3 126 any significant sequence similarity to them (Gould et al. 2011). Many of the identified pellicle proteins 4 5 127 were united by charged repetitive motifs with a potential to form long coiled-coil regions. Subsequent 6 128 characterization of individual candidates with similar characteristics from related apicomplexans 7 8 129 confirmed the association of such proteins with unique cytoskeletal structures (Katris et al. 2014; 9 10 130 Suvorova et al. 2015; Tran et al. 2012). Those studies motivated us to test whether proteins with 11 131 similar characteristics could be observed for the unique cytoskeleton of Trichomonadidae. If so, it 12 13 132 would (i) provide evidence for the presence of cytoskeletal proteins that share properties with 14 15 133 metazoan IF proteins in yet another eukaryotic group — next to alveolates — and (ii) offer a more 16 134 exhaustive set of proteins to study the evolution of cytoskeletal proteins, which form and associate 17 18 135 with non actin and tubulin-based filaments. 19 20 136 Here we present the proteomic profiling of the detergent-resistant components of the 21 22 137 cytoskeleton of T. gallinarum (Fig. 1A). We demonstrate that the majority of identified proteins lack 23 24 138 homologs outside of the Trichomonadidae and are unique to this group of protists. Based on a 25 139 transcriptome assembly of T. gallinarum and mass spectrometry analyses of its extracted cytoskeleton, 26 27 140 we identified homologs of T. vaginalis and localized five such homologs in the parasite that invades 28 29 141 the human urogenital tract. Cytoskeleton proteins contain significantly more repetitive elements, such 30 142 as coiled-coil motifs, than cytoskeleton-unrelated proteins. Most importantly, the unique cytoskeletal 31 32 143 structure of the parabasalian lineage comprises proteins that share features described for other 33 34 144 filament-forming proteins of different protists. Together they share features reminiscent of metazoan 35 145 IF proteins, but elude themselves from phylogenetic analysis due to virtually no sequence 36 37 146 conservation. 38 39 147 40 41 42 148 Results 43 44 149 45 46 150 Proteomic Profiling of the Tetratrichomonas gallinarum Cytoskeleton 47 48 151 Protocols for the Acceptedisolation of the detergent-resistant cytoskeleton Manuscript already exist for T. gallinarum, T. 49 50 152 vaginalis, Pentatrichomonas hominis and Tritrichomonas foetus (de Souza and da Cunha-e-Silva 51 52 153 2003; Rosa et al. 2013; Viscogliosi and Brugerolle 1994). However, even after several rounds of 53 154 attempts trying to optimize and adapt the protocols to isolate the cytoskeleton of T. vaginalis, we were 54 55 155 unable to extract a sufficient amount of sample that, based on visual inspection and Western blot 56 57 156 analysis, we deemed suitable for electrospray ionization tandem mass spectrometry (ESI-MS) analysis. 58 157 Similar issues regarding T. vaginalis were encountered in previous studies (Viscogliosi and Brugerolle 59 60 158 1994) and it was hence decided to isolate the cytoskeleton of T. gallinarum instead. 61 62 63 5 64 Page 5 of 33 65 159 The isolation of the T. gallinarum cytoskeleton leads to a highly-enriched fraction of forked 1 160 structures that resemble the pelta (at the top of the fork), from which the costa and the axostyle 2 3 161 individually emerge (Fig. 1B; Supplementary Material Fig. S1A). In their appearance, these structures 4 5 162 look identical to those described by Viscogliosi and Brugerolle (1994). A silver-stained SDS-PAGE of 6 163 the sample reveals a rich proteinaceous complexity (Fig. 1D). The most abundant protein migrating at 7 8 164 55 kDa, which most likely represents tubulin with a predicted mass of 50.1 kDa (for TVAG_467840) 9 10 165 (Fig. 1D). There are several more intense and distinct bands within the range of 100-140 kDa, whose 11 166 migration pattern largely matches that of a previous analysis (Viscogliosi and Brugerolle 1994). That 12 13 167 study had isolated the cytoskeleton, too, but in the absence of genome data at that time the identity of 14 15 168 the proteins remained unknown. The most abundant proteins in Trichomonas species are those of the 16 169 substrate-level phosphorylation pathway of the hydrogenosomes and they are a common source of 17 18 170 contamination (Garg et al. 2015; Rada et al. 2011; Twu et al. 2013). A multiplex fluorescent blot 19 20 171 confirmed the absence of the hydrogenosomal marker protein SCS (succinyl-CoA synthetase) in the 21 172 isolated fraction and the presence of tubulin as a marker for the main component of the cytoskeleton 22 23 173 (Fig. 1C). By liquid chromatography ESI-MS, three individual replicates of cytoskeleton isolated from 24 25 174 three separate T. gallinarum cultures were analyzed. For the identification of the proteins through ESI- 26 175 MS, however, it was first necessary to generate a de novo transcriptome library of T. gallinarum, 27 28 176 because no sufficient amount of sequence data for this organism were yet available. 29 30 177 RNA sequencing of T. gallinarum M3 generated a total of 20,982,889 reads, of which 31 32 178 20,638,776 passed quality filtering. The subsequent assembly yielded 64,756 contigs (N50 length of 33 34 179 694), which were filtered for isoforms and then screened for open reading frames (ORFs). A total of 35 180 37,740 ORFs were retrieved and of these, 26,130 share homology to genes (and with 11,268 unique 36 37 181 matches) of the sequenced genome of T. vaginalis strain G3 (TrichDB, v2.0) (Aurrecoechea et al. 38 39 182 2009). 40 41 183 Using the translated transcriptome assembly of T. gallinarum as a source (Supplementary 42 43 184 Material Table S1), we identified 582 proteins present with a minimum of two peptides per protein in 44 185 each of the three replicate cytoskeleton samples analyzed (Supplementary Material Table S2). 45 46 186 Homologs for these proteins were identified in T. vaginalis using reciprocal best BLAST search. 47 48 187 Results were manuallyAccepted curated by filtering them according Manuscript to their TrichDB annotation to omit 49 188 obvious cytoskeleton-unrelated candidates such as ribosomal proteins (Supplementary Material Table 50 51 189 S2). This yielded 203 potential cytoskeleton proteins in T. vaginalis. The 203 proteins were first sorted 52 53 190 according to their EuKaryotic Orthologous Groups association (KOGs; Koonin et al. 2004; Tatusov et 54 191 al. 2003). For 113 proteins no class was available and for an additional 30, the predicted function was 55 56 192 either of unknown nature or just a general prediction of function (Supplementary Material Fig. S4). 57 58 193 The remaining proteins could be assigned to three major categories: ‘cellular processes and signalling’ 59 194 (47 proteins including actin and tubulin), ‘information storage and processing’ (8 proteins) and 60 61 62 63 6 64 Page 6 of 33 65 195 ‘metabolism’ (7 proteins). Note that three proteins were assigned to more than one sub-category (Fig. 1 196 S4). 2 3 4 197 Proteins that were distinctly identified with a high number of peptide spectrum matches (PSM) 5 198 were, as already indicated by the silver stained SDS-PAGE (Fig. 1D), the tubulin beta- (TEGb007706; 6 7 199 corresponding to TVAG_467840) and epsilon chain (TEGb007357; corresponding to TVAG_008680) 8 9 200 with PSM values of 1,228 and 456, respectively, and a total coverage of 75% for each protein. Next to 10 201 tubulin, the proteins TEGb005933 (TVAG_339450), TEGb003426 (TVAG_474360), TEGb019317 11 12 202 (TVAG_117060), TEGb017573 (TVAG_030160) and TEGb012599 (TVAG_059360), were detected 13 14 203 with PSM values ranging between 111 and 330 and sequence coverage between 63 and 76% (Table 1; 15 204 Supplementary Material Table S2). To find homologs, a broader BLAST search outside of the 16 17 205 Trichomonadidae against RefSeq was performed. For TVAG_339450, TVAG_474360 and 18 -10 19 206 TVAG_117060 no significant (cutoff e-value of ≤ 1e ) homologs could be found. For 20 207 TVAG_030160, homologs encoding WD40 domains were identified in a range of organisms and with 21 22 208 e-values ≤ 1e-138. Among them were uncharacterized protist proteins such as TTHERM_01094880 23 -141 24 209 (Tetrahymena thermophila; e-value 8e ), but also ones of Angomonas deanei, a trypanosomatid 25 210 parasite (EPY27992.1, e-value 2e-134) or the alga Chlamydomonas reinhardtii (XP_001690930; e- 26 27 211 value 2e-137) that are all annotated as “flagella-associated” proteins. Also for TVAG_059360, BLAST 28 -150 29 212 hits to proteins from diverse organisms were found (all e-values ≤ 2e ), many of which were 30 213 annotated as “sperm-associated antigen”. It was peculiar that all five proteins were predicted to harbor 31 32 214 many repetitive elements as predicted by SMART (Letunic et al. 2014) (Fig. 2). In addition, for 33 34 215 TVAG_339450, TVAG_474360 and TVAG_117060, coiled-coil regions were widely detected 35 216 throughout their amino acid sequences by COILS (Lupas et al. 1991) (Supplementary Material Fig. 36 37 217 S2). This prompted us to further analyze the sequence characteristics, such as amino acid composition 38 39 218 and the potential to encode coiled-coils regions, of the identified cytoskeleton proteins. 40 41 219 42 43 220 Cytoskeleton-associated Proteins Harbor an Elevated Number of Long Coiled-coil Regions 44 45 221 Elongated coiled-coil regions that occupy much of the central region of proteins are a characteristic 46 47 222 feature of most metazoan IF proteins (Herrmann et al. 2007; Rose et al. 2005) and, more generally, an 48 223 enrichment for repetitiveAccepted charged sequence motifs hasManuscript been found among cytoskeletal scaffold 49 50 224 proteins of excavates and ciliates (Elmendorf et al. 2003; Gould et al. 2011; Kloetzel et al. 2003). We 51 52 225 tested whether proteins of our isolated detergent-resistant cytoskeleton might exhibit similar sequence 53 54 226 features. We compared our core set of 203 cytoskeleton-associated proteins (CYT) to (i) a set of 301 55 227 hydrogenosome-associated proteins (HYD; Garg et al. 2015) and to (ii) a size-equivalent set of 203 56 57 228 randomly selected proteins of T. vaginalis (RDM) (Fig. 3; Supplementary Material Fig. S3). Figure 58 59 229 3A shows the distributions of the number of detected coiled-coils per protein in the cytoskeleton and 60 230 hydrogenosomal sets; these two distributions were found to be statistically different (Mann-Whitney 61 62 63 7 64 Page 7 of 33 65 231 test (MWt), p-value < 2.1e-6). The same was true for the comparison with the random set 1 232 (Supplementary Material Fig. S3A), which was found to be significant, too (MWt, p-value < 3e-12). 2 3 233 Isolated cytoskeleton proteins also differ with respect to the length of the sequence predicted to form 4 5 234 coiled-coils. The length of each coiled-coil unit within the CYT set was found to be significantly 6 235 increased as compared to the HYD dataset (MWt, p-value < 1.9e-5, Fig. 3B) and with respect to the 7 8 236 RDM dataset (MWt, p-value < 0.022, Fig. S3B). Moreover, CYT proteins were found to contain 9 10 237 significantly more repetitive motifs per protein when compared to the HYD dataset (MWt, p-value < 11 238 0.0003, Fig. 3C) and the RDM dataset (MWt, p-value < 4.8e-5, Supplementary Material Fig. S3). To 12 13 239 consider the possibility that long proteins tend to have more coiled-coils and repetitive motifs than 14 15 240 smaller ones, we conducted an additional analysis and filtered the two control datasets (RDM and 16 241 HYD) for proteins similar in length to the CYT dataset. Here, the comparison between the CYT and 17 18 242 HYD dataset was still significantly different for all three parameters tested: number of coiled-coils 19 -6 -9 20 243 (MWt, p-value < 4.8 ), length of coiled-coils (MWt, p-value < 1.3 ) and number of repetitive motifs 21 244 (MWt, p-value < 0.0001). The re-analysis between the CYT and RDM dataset confirmed the primary 22 23 245 result, except for the number of repetitive motifs, which now was no longer found to be significant 24 -10 25 246 (number of coiled-coils, MWt, p-value < 1.8 ; length of coiled-coils, MWt, p-value < 0. 0004; 26 247 number of repetitive motifs, MWt, p-value < 0.73). 27 28 29 248 30 31 249 Homologous Proteins of T. vaginalis Localize to Defined Filamentous Structures 32 33 250 We tested whether homologs of proteins isolated from the T. gallinarum cytoskeleton and identified 34 35 251 through the ESI-MS analysis are homologous and associated with the cytoskeletal framework of T. 36 252 vaginalis. We chose the five proteins with the highest PSM scores (Table 1) and expressed them as 37 38 253 hemagglutinin (HA)-fusion proteins in T. vaginalis. The first protein, TVAG_339450, labels a single 39 40 254 long filament-like structure that runs along the cell’s periphery, suggesting it is a costa-related protein 41 255 (Figs 4, S6-S8). As the protein does not co-localize with the tubulin marker, it can be ruled out that the 42 43 256 long filament resembles the axostyle or the recurrent flagellum that is associated with the undulating 44 45 257 membrane. 46 47 258 The second candidate, TVAG_474360, has a more complex localization pattern. It was 48 259 observed to clusterAccepted towards the apical end of the nucleus Manuscript (i.e. in the region of the pelta) and from there 49 50 260 to form or associate with filaments that appeared to envelope the nucleus (Fig. 4; Supplementary 51 52 261 Material Fig. S6). In addition, TVAG_474360 was found to cluster inside the cytosol and at the end of 53 54 262 the axostyle, perhaps as a result of overexpression, although this was not observed for any other 55 263 construct. Also, the native expression of TVAG_474360 is the highest among the selected candidates 56 57 264 (Table 1). In any case, the pelta-associated labeling is resilient and also observed among the isolated 58 59 265 cytoskeletons (Supplementary Material Fig. S7), suggesting it is associated with the centrosome-like 60 266 structure of T. vaginalis also known as the atractophore (Bricheux et al. 2007). Intriguingly, the 61 62 63 8 64 Page 8 of 33 65 267 heterologous expression of TVAG_474360 in T. gallinarum induces the formation of additional 1 268 striated filaments. A single straight and thick rod-like structure traverses the cells centrally, which 2 3 269 protrudes from the anterior- and posterior ends of the cell (Fig. 5A). Transmission electron 4 5 270 microscopy of the mutants reveal this structure to consist of an accumulation of individual striated and 6 271 thinner filaments that resemble, but occur independently, of the actual wildtype forms of the costa and 7 8 272 the parabasal filaments (Fig. 5B). 9 10 273 The third protein, TVAG_117060, labels a fork-like structure just beneath the pelta and in 11 12 274 close proximity of the nucleus and could possibly represent two parabasal filaments of the parasite 13 14 275 (Fig. 4; Supplementary Material Fig. S6). The fourth protein, TVAG_030160, co-localizes with the 15 276 axostyle, spanning from the anterior pelta end to almost the posterior end, but is excluded from the 16 17 277 axostyle’s most terminal end that protrudes the cell. Furthermore, it appears to accumulate in the pelta 18 19 278 region and somehow envelope the nucleus (Fig. 4; Supplementary Material Fig. S6). Conspicuously, 20 279 Western blot analysis indicated that this protein is present in two different forms, with a faint signal at 21 22 280 66 kDa (in accordance with its mass prediction of 55.7 kDa based on the TrichDB annotated ORF) 23 24 281 and with a dominant signal of about 110 kDa, suggesting it may dimerize (Supplementary Material 25 282 Fig. S5). The fifth protein, TVAG_059350, does not co-localize with the axostyle and appears to form 26 27 283 some sort of filaments around the nucleus and additionally some kind of thin filaments that span the 28 29 284 entire length of the cell (Fig. 4; Supplementary Material Fig. S6). 30 285 31 32 33 286 Discussion 34 35 287 36 37 288 Specialized cytoskeletal structures are often unique to individual protist groups. The apomorphic and 38 39 289 eponymous structure of the Parabasalia is the parabasal apparatus. It is part of the more intricate 40 41 290 cytoskeletal scaffold that includes the pelta, axostyle, parts of the karyomastigont system and several 42 291 other accessory rootlet filaments. A main component of the eukaryotic cytoskeleton is tubulin, but 43 44 292 microtubules — together with other well-known accessory proteins we also identified — account only 45 46 293 for the composition of the flagella, pelta and axostyle. Our study, based on the proteomic profiling of 47 294 the detergent-resistant cytoskeleton of T. gallinarum and subsequent verification in T. vaginalis, 48 Accepted Manuscript 49 295 identified dozens of proteins of (previously) unknown function that share some features characteristic 50 51 296 for metazoan IFs, albeit sharing no sequence homology with the latter. 52 53 297 Proteome profiling of the T. gallinarum cytoskeleton revealed 203 proteins that share 54 55 298 reciprocal best BLAST hit homology to proteins of T. vaginalis. Among these were homologs of well- 56 299 known cytoskeleton proteins such as actin, tubulin, centrin and dynein, all of which had high peptide 57 58 300 spectrum matches (PSM; Supplementary Material Table S2), but also many new potential cytoskeletal 59 60 301 protein candidates with previously unknown function. This observation provides indirect support for 61 302 the reliability of the protocol used for the isolation of the cytoskeleton. For the five proteins of the 62 63 9 64 Page 9 of 33 65 303 cytoskeleton with the highest PSM values next to the canonical cytoskeletal proteins (listed in Table 1), 1 304 we could show them to label filamentous structures that are, or associate with, the T. vaginalis 2 3 305 cytoskeleton. 4 5 306 TVAG_339450 is likely a major, detergent-resistant component of the costa. As a rod-like 6 7 307 structure, the costa can be clearly distinguished from the recurrent flagellum of the undulating 8 9 308 membrane that constitutes the typical microtubule organization (Benchimol et al. 2000; Delgado- 10 309 Viscogliosi et al. 1996; Rosa et al. 2013). TVAG_339450 does not co-localize with tubulin, the main 11 12 310 component of pelta, axostyle and flagella (Fig. 4; Supplementary Material Fig. S6). In comparison to 13 14 311 the parabasal filaments, the costa has been described as the longest and thickest striated filament of 15 312 T. vaginalis (Lee et al., 2009) and T. foetus (Rosa et al., 2013). This is in line with the observed 16 17 313 labeling pattern of TVAG_339450 (Fig. 4; Supplementary Material Fig. S6). Western blotting 18 19 314 confirmed the predicted mass of around 118 kDa for TVAG_339450 (Fig. S5), which is furthermore 20 315 coherent with a previous study that generated antibodies against the isolated cytoskeleton of 21 22 316 T. vaginalis (Viscogliosi and Brugerolle 1994). The antibodies decorated the isolated costa and a band 23 24 317 of 118 kDa in the accompanied Western blot analysis. This very confined localization to the costa is 25 318 also observed for our HA-tagged TVAG_339450 among the isolated cytoskeleton structures 26 27 319 (Supplementary Material Fig. S7). The main function of the costa, a semi-rigid rod, is the stabilization 28 29 320 of the undulating membrane to which the recurrent flagellum is attached (Cepicka et al. 2010; Rosa et 30 321 al. 2013). Proteins known to be specialized in buffering shearing forces and to protect against 31 32 322 mechanical stress are IF proteins (Goldman et al. 2012; Herrmann et al. 2007). IF proteins can differ 33 34 323 with regard to their sequence and biochemical properties, but are generally united by a common 35 324 architecture that is defined through long coiled-coil motifs (Coulombe and Wong 2004; Herrmann and 36 37 325 Aebi 2004). The same is true for TVAG_339450, which shows this costa protein to unite some 38 39 326 features that are characteristic for metazoan IF proteins, albeit not being homologous to any known 40 327 protein of the metazoan IF family. 41 42 43 328 The coiled-coil containing cytoskeleton proteins TVAG_474360 and TVAG_117060 did not 44 329 localize to the costa, but to other filamentous structures (Fig. 4; Supplementary Material Figs S6, S8). 45 46 330 In the case of TVAG_117060 these might be the parabasal filaments, although the localization in 47 48 331 isolated cytoskeletonAccepted structures is more punctuate in closeManuscript proximity to the pelta (Supplementary 49 332 Material Fig. S7). For TVAG_474360 the situation is more complicated. In several independent 50 51 333 experiments the protein always associated with the pelta and several defined filaments embracing the 52 53 334 entire nucleus, almost resembling a thin ring, suggesting it could act as a crosslinker between the 54 335 microtubule-based cytoskeleton and a structure of unknown nature, a function similar to that reported 55 56 336 for some IF-proteins (Eckert et al. 1982; Kalnins et al. 1985). In some cases, such crosslinking IF 57 58 337 proteins were observed in the close proximity of the nucleus (Goldman et al. 1985; Trevor et al. 1995) 59 338 and a localization similar observed to that of TVAG_474360. With a similar pattern TVAG_030160 60 61 339 co-localized with the pelta, most of the axostyle and accumulated in close proximity of the nucleus 62 63 10 64 Page 10 of 33 65 340 (Fig. 4; Supplementary Material Figs S6, S8). This protein also appears to form a stable dimer 1 341 (Supplementary Material Fig. S5), but this observation requires an analysis dedicated to the protein in 2 3 342 question. In contrast, TVAG_059360 does neither co-localize with the axostyle nor the nucleus 4 5 343 distinctly. Instead, it appears net-like around the nucleus and branches from there through the cell (Fig. 6 344 4; Supplementary Material Figs S6, S8). The latter two proteins are both not predicted to form long 7 8 345 coiled-coils, but WD40 or ARM domains (armadillo/beta-catenin like repeat), respectively (Fig. 2). 9 10 346 The functions of these two domains are rather versatile, but both are also found in proteins associated 11 347 with the cytoskeleton (Stirnimann et al. 2010; Tewari et al. 2010). 12 13 14 348 Based on their association with the nucleus, TVAG_474360, TVAG_030160 and 15 349 TVAG_059360, might be relevant for the positioning of the nucleus and its anchoring to the 16 17 350 karyomastigont system (Brugerolle 1991; Cepicka et al. 2010), but a dedicated study to unravel their 18 19 351 function, and the reason for the potential dimerization of TVAG_030160, is required. In any case, the 20 352 heterologous expression of the coiled-coil protein TVAG_474360 in T. gallinarum, and the formation 21 22 353 of many additional striated filaments as a result (Fig. 5), suggest this protein (and its homologs) to be 23 24 354 responsible for the formation of striated filaments in Trichomonadidae. Most likely this protein itself 25 355 forms those striated filaments, but it might also only act as an accessory protein which recruits others. 26 27 28 356 Independent of their function within the cytoskeleton, there is something peculiar about a few 29 357 dozen of the proteins identified. Based on a search algorithm specific for charged repeat motif proteins 30 31 358 that were found enriched among a ciliate pellicle (Gould et al. 2011), we found a similar tendency 32 33 359 among the proteins of the cytoskeleton albeit not as dominant (Fig. 3A; Supplementary Material Fig. 34 360 S3A). Such motifs, sometimes predicted to also form long coiled-coils, have been routinely identified 35 36 361 among structural scaffolding proteins of protists that form filaments, thin and thick, and these include: 37 38 362 the euglenoid articulins (Marrs and Bouck 1992), the autoantigen I/6 of brucei (Detmer 39 363 et al. 1997), SF-assemblin of the basal apparatus of green algae and many other protists (Weber et al. 40 41 364 1993), the FAZ1 (a Flagellum Attachment Zone related protein) of T. brucei (Vaughan et al. 2008), 42 43 365 the alveolins of T. thermophila (El-Haddad et al. 2013; Gould et al. 2011), the H49/calpain protein of 44 366 T. cruzi (Galetovic et al. 2011) and the 477 kDa centrosome-associated protein of T. vaginalis 45 46 367 (Bricheux et al. 2007). This is to name just a few and plenty more must be present (Dawson and 47 48 368 Paredez 2013; FleuryAccepted-Aubusson 2003; Roberts 1987 )Manuscript. While coiled-coil containing proteins are 49 369 involved in diverse processes, long and centrally-located coiled-coil regions are characteristic for IF 50 51 370 proteins (Herrmann and Strelkov 2011; Mason and Arndt 2004; Rose et al. 2005) and exactly these 52 53 371 were found enriched among proteins of the isolated cytoskeleton (Fig. 3B; Supplementary Material 54 372 Fig. S3B). 55 56 57 373 The identification of the first IF proteins stems from research on the cytoskeleton of vertebrates. 58 374 The first intermediate filaments were found in muscle cells of chick embryos (Ishikawa et al. 1968). 59 60 375 Later the name ‘intermediate filaments’ became more commonly used to indicate their general, but not 61 62 376 obligate, intermediate width between actin filaments and microtubules (Fuchs and Weber 1994). Ever 63 11 64 Page 11 of 33 65 377 since, the definition of IF proteins orbits around those families that have been identified in vertebrates. 1 378 Now consider two things: First, already among vertebrates the IF protein families are diverse and not 2 3 379 ubiquitously present, also because of the flexible architecture of coiled-coils that can deviate a lot from 4 5 380 the canonical IF heptad repeat pattern (Hicks et al. 1997; Holberton et al. 1988; Mason and Arndt 6 381 2004). Second, vertebrates make up only a very small portion of extant eukaryotic diversity (Baldauf 7 8 382 2008; Burki 2014). It is quite possible that vertebrate IF protein families represent only a small 9 10 383 fraction of the proteins that can form filamentous structures in eukaryotes, and only those that 11 384 originate from an ancient duplication of lamin genes are recognizable as a family of homologous 12 13 385 proteins inside metazoa. IF protein evolution is difficult to track across phylogenetically distant groups 14 15 386 (Bouchard et al. 2001; Fleury-Aubusson 2003; Gould et al. 2011). If so, the two case examples of the 16 387 alveolate pellicle proteome (Gould et al. 2011) and the cytoskeleton proteome of an excavate (this 17 18 388 study) — two phylogenetically independent supergroups and independent of opisthokont metazoans 19 20 389 — suggest that we can expect to find a similar diversity among the majority, if not all, individual 21 390 eukaryotic supergroups. 22 23 24 391 Our data show that several of the identified detergent-resistant cytoskeleton proteins share 25 392 features that are considered a trade mark of metazoan IF proteins. This includes the length of 26 27 393 individual coiled-coil forming motifs, the number of total coiled-coil sequences and the cytoskeleton- 28 29 394 associated scaffolding nature of the localization pattern we observe, that is their function inside the 30 395 eukaryotic cell. More than 50% (113) of the 203 proteins identified have no significant sequence 31 32 396 similarity (E-value of 10e-10) to proteins of any other organism outside of the Trichomonadida and 33 34 397 none to canonical metazoan IF protein families. The presented data cannot claim that it has uncovered 35 398 dozens of new IF proteins similar to those of metazoa per se, but it encourages to dig deeper and it 36 37 399 provides a source from where to start. Considering the morphological complexity of protists, it is hard 38 39 400 to imagine how these single cells would realize their many unique scaffolding structures, if not also 40 401 with the support of proteins that are analog to metazoan IFs. Future studies will now need to 41 42 402 characterize individual candidates and for instance demonstrate that some of these proteins can form 43 44 403 filaments autonomously in vitro and induce such in other heterologous systems like yeast. 45 46 404 We conclude that many of the discussed cytoskeleton proteins share properties described for 47 48 405 metazoan IFs, butAccepted that are at the same time specific to Manuscriptthe parabasalian lineage. They evolved either 49 50 406 rapidly, the primary sequence no longer serving as a reliable source to screen for phylogenetic 51 407 relationships, or independently. A combination of the two options is plausible and would complicate 52 53 408 the matter of identification and classification even further. Rapid evolution for the coiled-coil forming 54 55 409 domains of such proteins has been observed for syntenic genes of apicomplexan parasites (Gould et al. 56 410 2011), and the limited amount of interactions with other cytosolic proteins is thought to lift the 57 58 411 sequence constrains of such proteins in general (Fleury-Aubusson 2003). On the contrary, if the many 59 60 412 cytoskeleton proteins of these phylogenetically distant eukaryotic groups are of multiple evolutionary 61 62 63 12 64 Page 12 of 33 65 413 origins, then they provide an excellent example of extensive convergent evolution. Either way, such 1 414 observations require us to discuss the restrictive use of the term IF protein only for metazoa. 2 3 4 415 5 6 416 Methods 7 8 417 9 10 Cell cultivation: Trichomonas vaginalis FMV-1 (kindly provided by M. Benchimol, University Santa 11 418 12 419 Ursula, Rio de Janeiro, Brazil) was cultivated in Tryptone Yeast extract maltose Medium (TYM) 13 14 420 containing 2.22% (w/v) tryptose, 1.11% (w/v) yeast extract, 15 mM maltose, 9.16 mM L-cysteine, 15 421 1.25 mM L(+) ascorbic acid, 0.77 mM KH PO , 3.86 mM K HPO , 10% (v/v) heat-inactivated horse 16 2 4 2 4 17 422 serum, 0.71% (v/v) iron solution [= 1% (w/v) Fe(NH ) (SO ) x 6H O, 0.1% (w/v) 5-sulfosalicylacid)] 18 4 2 4 2 19 423 adjusted to pH 6.2 and incubated anoxically at 37°C. Tetratrichomonas gallinarum M3 isolated from 20 21 424 the caecum of Melaegris gallopavo (kindly provided by Prof. Tachezy, Department of Parasitology, 22 425 Charles University of Prague, Czech Republic) was cultivated anoxically in TYM with pH 7.2 at 23 24 426 37 °C. 25 26 427 Cytoskeleton extraction of T. gallinarum: Based on different cytoskeleton extraction 27 28 428 protocols (de Souza and da Cunha-e-Silva 2003; Palm et al. 2005; Viscogliosi and Brugerolle 1994), 29 6 30 429 five hundred ml cell culture of T. gallinarum with approx. 7 x 10 cell/ml were collected by 31 430 centrifugation at 1,500 x g for 10 min at room temperature (RT). The cell pellet was washed twice in 32 33 431 Ringer solution: 0.12 M NaCl, 3.5 mM KCl, 2.0 mM CaCl2, 2.5 mM NaHCO3, pH 7.2 with 34 35 432 centrifugations at 999 x g, 10 min, RT. Washed cells were then resuspended in 30 ml ice-cold Triton 36 433 solution: 10 mM Tris base, 2 mM EDTA, 2 mM DTT, 1 mM ATP, 2 mM MgSO , 200 mM KCl, 1.5% 37 4 38 434 Triton X-100, pH 7.8 including one 1x complete Mini protease inhibitor tablet (Roche). The solution 39 40 435 was vortexed strongly for 2 min, incubated on ice for 2 min and this was repeated three times. After 41 436 cell lysis was confirmed via light microscopy the cell extract was centrifuged at 277 x g for 15 min, 42 43 437 RT. For increasing the output and purity of the isolated cytoskeleton complex, a sucrose gradient 44 45 438 centrifugation was applied. Therefore, the resulting pellet was transferred on top of 1 ml 2 M sucrose 46 439 and centrifuged at 1,000 x g, 15 min, 4 °C resulting in an upper band and a lower band. The upper 47 48 440 band was transferredAccepted on top of 1 ml 1.5 M sucrose and Manuscriptcentrifuged at 10,000 x g, 15 min, 4 °C. The 49 50 441 pellet was transferred on top of a four-step gradient containing 1 M, 1.5 M, 2 M and 2.5 M sucrose 51 442 (from top to bottom) within a 2 ml tube and centrifuged at 19,000 x g, 1 h, 4 °C. This led to three 52 53 443 different pellets along the tube. The purest and most highly concentrated cytoskeleton fraction could 54 55 444 be obtained from the upper-most pellet in the interface of 1 M and 1.5 M sucrose. This pellet was 56 57 445 washed twice in PBS with centrifugations at 12,000 x g, 5 min, 4 °C, yielding a highly concentrated 58 446 cytoskeleton fraction as evident through microscopy and Western blotting (Fig. 1). 59 60 61 62 63 13 64 Page 13 of 33 65 447 The transcriptome of T. gallinarum: RNA-Seq reads were obtained using Illumina sequencing 1 448 based on T. gallinarum M3 RNA (NCBI, accession SRA318841), which was isolated as described for 2 3 449 T. vaginalis (Woehle et al. 2014). A quality filtering step was applied to the reads so that the first nine 4 5 450 nucleotide (nt) positions were rejected according to a FastQC analysis 6 451 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) that showed low quality for the first 9 7 8 452 base calls. Subsequently, only reads with a minimum of 25 nt were retained. In addition, all reads 9 10 453 containing 25% of low quality bases (25% of all bases with values ≤ Q15) identified by a self-written 11 454 Perl script were rejected as well. The reads were assembled via Trinity assembler (version r20131110) 12 13 455 (Grabherr et al. 2011). From all assembled contigs only the longest isoform of a candidate was 14 15 456 selected. Open reading frames (ORFs) were identified and translated into the corresponding amino 16 457 acid (aa) sequences by getorf from EMBOSS 6.6.0 (Rice et al. 2000) and a self-written Perl script was 17 18 458 used to select only the longest ORF per candidate. To define an ORF, only stop codons were 19 20 459 considered (option-find 0). Furthermore, only sequences with a minimum of 100 aa as a minimum for 21 460 protein identification were used. For those sequences the best matches with T. vaginalis annotated 22 23 461 genes were determined by using the BLAST program (version 2.2.28) (Altschul et al. 1997) in 24 25 462 combination with the database TrichDB (version 1.3) (Aurrecoechea et al. 2009) based on an e value 26 463 cutoff at ≤ 1e-10. 27 28 29 464 Protein identification by liquid chromatography electrospray ionization tandem mass 30 465 spectrometry: Cytoskeletal fractions of T. gallinarum were separated in a polyacrylamide gel (~ 4 31 32 466 mm running distance). Protein containing bands from the silver stained gel were cut out, destained, 33 34 467 reduced, alkylated with iodoacetamide and digested with trypsin (1:50 w/w Serva, Heidelberg, 35 468 Germany) overnight at 37 °C as described (Poschmann et al. 2014). After that, resulting peptides were 36 37 469 extracted from the gel and subjected to liquid chromatography in 0.1% trifluoroacetic acid. 38 39 470 Before mass spectrometric peptide identification, peptides were separated by an Ultimate 3000 40 41 471 Rapid Separation liquid chromatography system (Thermo Scientific, Dreieich, Germany) on an 42 43 472 analytical column (Acclaim PepMapRSLC, 2 µm C18 particle size, 100 Å pore size, 75 µm inner 44 473 diameter, 25 cm length, Thermo Scientific, Dreieich, Germany) over a two h gradient as described 45 46 474 earlier (Hartwig et al. 2014). Using a nano electrospray ionization source, peptides were transferred to 47 48 475 an Orbitrap Elite Acceptedhigh resolution mass spectrometer (Thermo Manuscript Scientific, Bremen, Germany) operated 49 476 in positive mode with capillary temperature set to 275 °C and a source voltage of 1.4 kV. The orbitrap 50 51 477 analyzer of the instrument was used for survey scans over a mass range from 350 – 1,700 m/z. A 52 53 478 resolution of 60,000 (at 40 m/z) was used and the target value for the automatic gain control was set to 54 479 1,000,000 and the maximum fill time to 200 ms. Fragment spectra of the 20 most intense 2+ and 3+ 55 56 480 charged peptide ions (minimal signal intensity 500) were recorded in the linear ion trap part of the 57 58 481 instrument after collision induced dissociation based fragmentation using an available mass range of 59 482 200-2,000 m/z and at a resolution of 5,400 (at 400 m/z). A maximal fill time of 300 ms and an 60 61 62 63 14 64 Page 14 of 33 65 483 automatic gain control target value of 10,000 were used for the analysis of peptide fragments and 1 484 already fragmented ions were excluded from fragmentation for 45 seconds. 2 3 4 485 Protein identification from mass spectrometric data was carried out using the MASCOT search 5 486 engine (version 2.4.1, Matrix Science, London, UK) embedded in the Proteome Discoverer 6 7 487 environment (version 1.4.1.14, Thermo Scientific, Dreieich, Germany) with standard parameters for 8 9 488 spectrum selection. Searches were carried out in a T. gallinarum specific database containing 37,740 10 489 ORFs (obtained from the transcriptome analysis) with tryptic cleavage specificity allowing a 11 12 490 maximum of one missed cleavage site. The precursor mass tolerance was set to 10 ppm, the fragment 13 14 491 mass tolerance to 0.4 Da, carbamidomethyl at cysteine as static modification and methionine oxidation 15 492 and N-terminal acetylation as variable modification. For peptide evaluation, the percolator node was 16 17 493 used with standard parameters (strict target false discovery rate 1%, validation based on q-value). Only 18 19 494 peptides passing the “high confidence” filter (1% false discovery rate) were used for protein assembly 20 495 and only proteins reported with a minimum of two peptides were considered. The mass spectrometry 21 22 496 proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner 23 24 497 repository (Vizcaino et al. 2014) with the dataset identifier PRIDE: PXD003212. 25 26 498 Protein datasets: Out of all the proteins identified by mass spectrometry, we retained only 27 28 499 those proteins supported by at least two peptides. This resulted in a dataset of 582 proteins 29 500 (Supplementary Material Table S2). Homologous proteins in T. vaginalis were next identified using 30 -10 31 501 reciprocal blast: best BLASTp hit with 25% identity and a minimum e value of ≤ 1e , without low 32 33 502 complexity filter against the latest version of TrichDB (version 2.0, March 2015) (Aurrecooechea et al. 34 503 2009). This resulted in 271 pairs of putative orthologous proteins. Subsequently, this list was manually 35 36 504 filtered to exclude cytoskeleton unrelated proteins, such as several ribosomal subunits. This filtering 37 38 505 was performed based on TrichDB annotations. A final set of 203 cytoskeleton associated proteins was 39 506 obtained. 40 41 507 For statistical purposes we established two control datasets. The first was composed of 301 42 43 508 hydrogenosomal proteins. This set was based on manually filtering the 359 proteins comprising the 44 45 509 core hydrogenosomal proteome (Garg et al. 2015) to exclude hydrogenosomal unrelated proteins, such 46 as diverse ribosomal subunits. The second set was obtained by randomly selecting 203 proteins from 47 510 48 511 TrichDB. In an additionalAccepted run, the random control datasets Manuscript was normalized for their length to match 49 50 512 those of the cytoskeleton dataset. 51 52 513 The number and length of coiled-coils within proteins were detected by NCOILS (Lupas et al., 53 54 514 1991). Repetitive motifs were detected by RADAR (Heger and Holm 2000). A repetitive motif was 55 515 considered to be at least three repetitive segments, each with a minimal length of ten amino acids, as 56 57 516 used before for identifying cytoskeleton related proteins (Gould et al. 2011). Functional annotation 58 59 517 was assigned to the 203 cytoskeleton associated cytoskeleton proteins, based on the KOG database 60 518 (Koonin et al. 2004; Tatusov et al. 2003). Protein domains of the five candidate proteins 61 62 63 15 64 Page 15 of 33 65 519 (TVAG_339450, TVAG_474360, TVAG_117060, TVAG_030160 and TVAG_059360) were 1 520 analyzed using SMART (Letunic et al., 2015). Coiled-coil motifs of TVAG_339450, TVAG_474360, 2 3 521 TVAG_117060 were identified using COILS (Lupas et al. 1991). 4 5 522 Cloning and localization analysis: The gene sequences of all analyzed proteins were amplified 6 7 523 by specific primers (Supplementary Material Table S3) with a proof-reading polymerase. PCR 8 9 524 products were ligated into the expression vector pTagVag2 (Zimorski et al. 2013) to label the proteins 10 525 with a C-terminal hemagglutinin (HA) tag. The final expression constructs were verified by standard 11 12 526 sequencing. Transfection using electroporation (Delgadillo et al. 1997) was done using approx. 2.5 x 13 8 14 527 10 cells of T. vaginalis and 30 µg expression vector. After four h of anoxic incubation at 37 °C, 15 528 geneticin (G418) was added to a final concentration of 100 µg/ml to initiate selection. 16 17 Protein probes were separated through a 8% or a 12% SDS-PAGE and blotted onto a Porablot 18 529 19 530 nitrocellulose membrane (Macherey-Nagel) via the standard preset of the Trans-Blot Turbo Transfer 20 21 531 System (BioRad). Membranes were blocked in blocking buffer: 5% milk powder + Tris-Buffered 22 23 532 Saline (TBS) for 1 h, RT. Membranes of chemiluminescence blots were incubated with a primary 24 533 monoclonal mouse anti-HA antibody (Sigma-Aldrich H9658) in blocking buffer (1:5,000) over night 25 26 534 (ON) at 4°C. After three washes, each 10 min in TBS-T (TBS + 0.1% Tween 20), membranes were 27 28 535 incubated with a secondary anti-mouse and horseradish peroxidase-conjugated antibody (produced in 29 536 rat, ImmunoPure, Pierce, ThermoFisher Scientific) in TBS-T (1:10,000)) for a minimum of 1 h at RT, 30 31 537 followed by three washes of 10 min. Prior to detection, membranes were treated with WesternBright 32 33 538 ECL spray (Advansta). The chemiluminescence reaction was analyzed with the ChemiDoc MP 34 539 System (BioRad). Membranes of multiplex fluorescent blots were incubated overnight with primary 35 36 540 antibodies (anti-HA (produced in mouse, H9658, Sigma-Aldrich) and anti-SCS (produced in rabbit, 37 38 541 Eurogentec)) in blocking buffer (1:2,000) at 4 °C. After five washes for 5 min in TBS-T, membranes 39 542 were incubated with secondary antibodies [(Alexa Fluor®488 goat anti-Rabbit IgG H+L, A-11008); 40 41 543 Alexa Fluor®594 donkey anti-Mouse IgG H+L, A-21203 (ThermoFischer Scientific)] in TBS-T 42 43 544 (1:2,000) for 2 h at RT followed by six washes of 5 min prior to immunofluorescence detection using 44 545 the ChemiDoc MP System (BioRad). 45 46 For immunofluorescence microscopy a 12 ml culture of T. vaginalis with approximately 7 x 47 546 48 547 106 cell/ml was centrifugedAccepted at 914 x g for 8 min, RT. Manuscript The cell pellet was washed twice in 1 ml 49 50 548 Phosphate Buffer Saline (PBS) and centrifuged at 550 x g for 2 min, RT. The latter centrifugation 51 52 549 setup was then maintained throughout the procedure. Cells were fixed and permeablized in 1 ml fix- 53 550 perm-solution (3.5% paraformaldehyd, 0.5% Triton X-100 in PBS) and incubated for 20 min on a tube 54 55 551 rotator, RT. Cells were collected by centrifugation and resuspended in 1 ml blocking-PBS (0.1% BSA 56 57 552 in PBS) and blocked on a tube rotator for 20 min, RT. Cells were collected by centrifugation and 58 553 resuspended in 0.5 ml primary antibody solution [(anti-HA (produced in rabbit, H6908, Sigma- 59 60 554 Aldrich) and anti-tubulin (IG10, produced in mouse, from Bricheux et al. 2007) diluted in blocking- 61 62 555 PBS, 1:1,000)] and incubated on a tube rotator, 2 h, RT. Cells were washed twice in PBS, resuspended 63 16 64 Page 16 of 33 65 556 in 0.5 ml secondary antibody solution [(Alexa Fluor®488 goat anti-Rabbit IgG H+L, A-11008 and 1 557 Alexa Fluor®594 donkey anti-Mouse IgG H+L, A-21203, ThermoFischer Scientific) diluted in 2 3 558 blocking-PBS 1:1,000)] and incubated on a tube rotator, 2 h, RT, in the dark. Cells were washed twice 4 5 559 in PBS. Washed cells were mounted in DAPI solution (Fluoroshield, Sigma) plus PBS, 1:1. Imaging 6 560 was conducted via a Nikon ECLIPSE Ti immunofluorescence microscope. 7 8 9 561 For TEM cells were pelleted at 1000 g for 10 min and washed three times with PBS [0.8% 10 562 (w/v) NaCl, 0.02% (w/v) KCl, 0.144% Na2- HPO4, 0.024% (w/v) KH2PO4; pH 7.4]. After fixation 11 12 563 over night at 4 °C in fixation buffer [2.5% (v/v) glutaraldehyde in 0.1 M Na-cacodylate buffer pH 7.3] 13 14 564 the cells were washed four times for 10 min with 0.1 M Na-cacodylate buffer pH 7.3. Post fixation 15 565 was done within two hours incubation with 2% (w/v) osmiumtetroxide diluted in 0.1 M Na-cacodylate 16 17 566 buffer pH 7.3 containing 0.8% (w/v) potassium ferricyanide III. The cells were washed again four 18 19 567 times like before and were then dehydrated by incubation with 50% (v/v), 70% (v/v), 80% (v/v), 90% 20 568 (v/v) and absolute acetone (15 min each). Impregnation was done overnight in 1:1 acetone–epon 21 22 569 mixture. The samples were polymerized in pure epon within 48 h at 60 °C. Ultra-thin sections of 23 24 570 embedded samples were collected on formvar coated nickel grids (400 square mesh) and contrasted 25 571 with subsequent incubation with saturated uranyl acetate solution and with 1% lead citrate for 5 min. 26 27 572 Pictures were obtained using a Zeiss CEM 902 operated at 80 kV equipped with a wide-angle Dual 28 29 573 Speed 2K CCD camera (TRS, Moorenweis Germany). 30 31 574 32 33 575 34 35 576 Acknowledgements 36 37 38 577 ELK is a fellow of the Edmond J. Safra Center for Bioinformatics at the Tel-Aviv University. TP and 39 578 SBG wish to thank the German-Israeli Foundation for joint funding (grant ID I-1207-264.13/2012) 40 41 579 and SBG and KS the Deutsche Forschungsgemeinschaft for additional support (CRC1208). HP and 42 43 580 SBG wish to thank Ramona Stach and Maria Handrich for their help in the laboratory. We also thank 44 581 Jan Tachezy and Marlene Benchimol for providing the strains. 45 46 47 582 48 Accepted Manuscript 49 583 References 50 51 584 Adl SM, Simpson AG, Lane CE, Lukes J, Bass D, Bowser SS, Brown MW, Burki F, Dunthorn M, 52 53 585 Hampl V, Heiss A, Hoppenrath M, Lara E, Le Gall L, Lynn DH, McManus H, Mitchell EA, 54 55 586 Mozley-Stanridge SE, Parfrey LW, Pawlowski J, Rueckert S, Shadwick RS, Schoch CL, 56 587 Smirnov A, Spiegel FW (2012) The revised classification of eukaryotes. J Eukaryot Microbiol 59: 57 58 588 429-493 59 60 61 62 63 17 64 Page 17 of 33 65 589 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped 1 590 BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 2 3 591 25:3389-3402 4 5 6 592 Anderson-White BR, Ivey FD, Cheng K, Szatanek T, Lorestani A, Beckers CJ, Ferguson DJ, 7 593 Sahoo N, Gubbels MJ (2011) A family of intermediate filament-like proteins is sequentially 8 9 594 assembled into the cytoskeleton of Toxoplasma gondii. Cellul Microbiol 13:18-31 10 11 12 595 Aurrecoechea C, Brestelli J, Brunk BP, Carlton JM, Dommer J, Fischer S, Gajria B, Gao X, 13 14 596 Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, 15 597 Miller JA, Morrison HG, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ, 16 17 598 Jr., Sullivan S, Treatman C, Wang H (2009) GiardiaDB and TrichDB: integrated genomic resources 18 19 599 for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res 20 600 37:D526-D530 21 22 23 601 Ausmees N, Kuhn JR, Jacobs-Wagner C (2003) The bacterial cytoskeleton: an intermediate 24 25 602 filament-like function in cell shape. Cell 115:705-713 26 27 603 Baldauf SL (2008) An overview of the phylogeny and diversity of eukaryotes. J Syst Evol 46:263- 28 29 604 273 30 31 32 605 Bagchi S, Tomenius H, Belova LM, Ausmees N (2008) Intermediate filament-like proteins in 33 606 bacteria and a cytoskeletal function in Streptomyces. Mol Microbiol 70:1037-1050 34 35 36 607 Benchimol M (2004) Trichomonads under microscopy. Microsc Microanal 10:528-550 37 38 39 608 Benchimol M, Diniz JA, Ribeiro K (2000) The fine structure of the axostyle and its associations with 40 41 609 organelles in Trichomonads. Tissue Cell 32:178-187 42 43 610 Bouchard P, Chomilier J, Ravet V, Mornon JP, Vigues B (2001) Molecular characterization of the 44 45 611 major membrane skeletal protein in the ciliate Tetrahymena pyriformis suggests n-plication of an early 46 47 612 evolutionary intermediate filament protein subdomain. J Cell Sci 114:101-110 48 Accepted Manuscript 49 613 Bricheux G, Coffe G, Brugerolle G (2007) Identification of a new protein in the centrosome-like 50 51 614 "atractophore" of Trichomonas vaginalis. Mol Biochem Parasitol 153:133-140 52 53 54 615 Brimmer A, Weber K (2000) The cDNA sequences of three tetrins, the structural proteins of the 55 Tetrahymena oral filaments, show that they are novel cytoskeletal proteins. Protist 151:171-180 56 616 57 58 617 Brugerolle G (1991) Flagellar and cytoskeletal systems in amitochondrial flagellates: Archamoeba, 59 60 618 Metamonada and Parabasalia. Protoplasma 164:70-90 61 62 63 18 64 Page 18 of 33 65 619 Brune A, Dietrich C (2015) The gut microbiota of termites: Digesting the diversity in the light of 1 620 ecology and evolution. Annu Rev Microbiol 69:145-166 2 3 4 621 Bullen HE, Tonkin CJ, O'Donnell RA, Tham WH, Papenfuss AT, Gould S, Cowman AF, Crabb 5 6 622 BS, Gilson PR (2009) A novel family of apicomplexan glideosome-associated proteins with an inner 7 623 membrane-anchoring role. J Biol Chem 284:25353-25363 8 9 10 624 Burki F. (2014) The eukaryotic tree of life from a global phylogenomic perspective. Cold Spring 11 12 625 Harb Persp Biol 6:a016147-a016147 13 14 15 626 Cepicka I, Hampl V, Kulda J (2010) Critical taxonomic revision of with description of 16 627 one new genus and three new species. Protist 161:400-433 17 18 19 628 Clerot JC, Iftode F, Budin K, Jeanmaire-Wolf R, Coffe G, Fleury-Aubusson A (2001) Fine oral 20 21 629 filaments in Paramecium: a biochemical and immunological analysis. J Eukaryot Microbiol 48:234- 22 630 245 23 24 25 631 Coffe G, Le Caer JP, Lima O, Adoutte A (1996) Purification, in vitro reassembly, and preliminary 26 27 632 sequence analysis of epiplasmins, the major constituent of the membrane skeleton of Paramecium. 28 633 Cell Motil Cytoskeleton 34:137-151 29 30 31 634 Coulombe PA, Wong P (2004) Cytoplasmic intermediate filaments revealed as dynamic and 32 33 635 multipurpose scaffolds. Nat Cell Biol 6:699-706 34 35 36 636 Crossley R, Holberton DV (1983) Characterization of proteins from the cytoskeleton of Giardia 37 637 Lamblia. J Cell Sci 59:81-103 38 39 40 638 Damaj R, Pomel S, Bricheux G, Coffe G, Vigues B, Ravet V, Bouchard P (2009) Cross-study 41 42 639 analysis of genomic data defines the ciliate multigenic epiplasmin family: strategies for functional 43 640 analysis in Paramecium tetraurelia. BMC Evol Biol 9:125 44 45 46 641 Dawson SC, Paredez AR (2013) Alternative cytoskeletal landscapes: cytoskeletal novelty and 47 48 642 evolution in basalAccepted excavate protists. Curr Opin Cell Biol Manuscript25:134-141 49 50 643 de Souza W, da Cunha-e-Silva NL (2003) Cell fractionation of parasitic : a review. Mem 51 52 644 Inst Oswaldo Cruz 98:151-170 53 54 55 645 Delgadillo MG, Liston DR, Niazi K, Johnson PJ (1997) Transient and selectable transformation of 56 57 646 the parasitic protist Trichomonas vaginalis. Proc Natl Acad Sci USA 94:4716-4720 58 59 647 Delgado-Viscogliosi P, Brugerolle G, Viscogliosi E (1996) Tubulin post-translational modifications 60 61 648 in the primitive protist Trichomonas vaginalis. Cell Motil Cytoskeleton 33:288-297 62 63 19 64 Page 19 of 33 65 649 Detmer E, Hemphill A, Muller N, Seebeck T (1997) The Trypanosoma brucei autoantigen I/6 is an 1 650 internally repetitive cytoskeletal protein. Eur J Cell Biol 72:378-384 2 3 4 651 Eckert BS, Daley RA, Parysek LM (1982) Assembly of keratin onto Ptk1 cytoskeletons - Evidence 5 6 652 for an intermediate filament organizing center. J Cell Biol 92:575-578 7 8 El-Haddad H, Przyborski JM, Kraft LGK, McFadden GI, Waller RF, Gould SB (2013) 9 653 10 654 Characterization of TtALV2, an essential charged repeat motif protein of the Tetrahymena 11 12 655 thermophila membrane skeleton. Eukaryot Cell 12:932-940 13 14 15 656 Elmendorf HG, Dawson SC, McCaffery M (2003) The cytoskeleton of Giardia lamblia. Int J 16 657 Parasitol 33:3-28 17 18 19 658 Erber A, Riemer D, Hofemeister H, Bovenschulte M, Stick R, Panopoulou G, Lehrach H, Weber 20 21 659 K (1999) Characterization of the Hydra lamin and its gene: A molecular phylogeny of metazoan 22 660 lamins. J Mol Evol 49:260-271 23 24 25 661 Fleury-Aubusson A (2003) "Novel cytoskeletal proteins in protists": Introductory remarks. J 26 27 662 Eukaryot Microbiol 50:3-8 28 29 Fuchs E, Weber K (1994) Intermediate filaments - Structure, dynamics, function, and disease. Annu 30 663 31 664 Rev Biochem 63:345-382 32 33 34 665 Galetovic A, Souza RT, Santos MRM, Cordero EM, Bastos IMD, Santana JM, Ruiz JC, Lima 35 36 666 FM, Marini MM, Mortara RA, da Silveira JF (2011) The repetitive cytoskeletal protein H49 of 37 667 Trypanosoma cruzi Is a calpain-like protein located at the flagellum attachment zone. PloS ONE 38 39 668 6(11):e27634 40 41 42 669 Garg S, Stoelting J, Zimorski V, Rada P, Tachezy J, Martin W, Gould S (2015) Conservation of 43 670 transit peptide-independent protein import into the mitochondrial and hydrogenosomal matrix. 44 45 671 Genome Biol Evol 7:2716-2726 46 47 48 672 Gogendeau D, KlotzAccepted C, Arnaiz O, Malinowska A, DadlezManuscript M, de Loubresse NG, Ruiz F, Koll F, 49 673 Beisson J (2008) Functional diversification of centrins and cell morphological complexity. J Cell Sci 50 51 674 121:65-74 52 53 54 675 Goldman R, Goldman A, Green K, Jones J, Lieska N, Yang HY (1985) Intermediate filaments - 55 Possible functions as cytoskeletal connecting links between the nucleus and the cell-surface. Ann NY 56 676 57 677 Acad Sci 455:1-17 58 59 60 678 Goldman RD, Cleland MM, Murthy SNP, Mahammad S, Kuczmarski ER (2012) Inroads into the 61 62 679 structure and function of intermediate filament networks. J Struct Biol 177:14-23 63 20 64 Page 20 of 33 65 680 Gould SB, Tham WH, Cowman AF, McFadden GI, Waller RF (2008) Alveolins, a new family of 1 681 cortical proteins that define the protist infrakingdom Alveolata. Mol Biol Evol 25:1219-1230 2 3 4 682 Gould SB, Woehle C, Kusdian G, Landan G, Tachezy J, Zimorski V, Martin WF (2013) Deep 5 6 683 sequencing of Trichomonas vaginalis during the early infection of vaginal epithelial cells and 7 684 amoeboid transition. Int J Parasitol 43:707-719 8 9 10 685 Gould SB, Kraft LGK, van Dooren GG, Goodman CD, Ford KL, Cassin AM, Bacic A, 11 12 686 McFadden GI, Waller RF (2011) Ciliate pellicular proteome identifies novel protein families with 13 14 687 characteristic repeat motifs that are common to alveolates. Mol Biol Evol 28:1319-1331 15 16 688 Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, 17 18 689 Raychowdhury R, Zeng QD, Chen ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, 19 20 690 Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome 21 691 assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644-652 22 23 24 692 Hartwig S, Goeddeke S, Poschmann G, Dicken HD, Jacob S, Nitzgen U, Passlack W, Stühler K, 25 26 693 Ouwens DM, Al-Hasani H, Knebel B, Kotzka J, Lehr S (2014) Identification of novel adipokines 27 694 differential regulated in C57BL/Ks and C57BL/6. Arch Physiol Biochem 120:208-215 28 29 30 695 Heger A, Holm L (2000) Rapid automatic detection and alignment of repeats in protein sequences. 31 32 696 Proteins 41:224-237 33 34 35 697 Herrmann H, Aebi U (2004) Intermediate filaments: Molecular structure, assembly mechanism, and 36 698 integration into functionally distinct intracellular scaffolds. Annu Rev Biochem 73:749-789 37 38 39 699 Herrmann H, Strelkov SV (2011) History and phylogeny of intermediate filaments: Now in insects. 40 41 700 BMC Biology 9:16 42 43 701 Herrmann H, Bar H, Kreplak L, Strelkov SV, Aebi U (2007) Intermediate filaments: from cell 44 45 702 architecture to nanomechanics. Nat Rev Mol Cell Biol 8:562-573 46 47 48 703 Hicks MR, HolbertonAccepted DV, Kowalczyk C, Woolfson ManuscriptDN (1997) Coiled-coil assembly by peptides 49 704 with non-heptad sequence motifs. Fold Des 2:149-158 50 51 52 705 Hirt RP, Sherrard J (2015) Trichomonas vaginalis origins, molecular pathobiology and clinical 53 54 706 considerations. Curr Opin Infect Dis 28:72-79 55 56 57 707 Holberton D, Baker DA, Marshall J (1988) Segmented alpha-helical coiled-coil structure of the 58 708 protein giardin from the Giardia cytoskeleton. J Mol Biol 204:789-795 59 60 61 62 63 21 64 Page 21 of 33 65 709 Honigberg BM Mattern CFT, C, Daniel WA (1971) Fine structure of the mastigont system in 1 710 Tritrichomonas foetus (Riedmueller). J Protozool 18:183-198 2 3 4 711 Huttenlauch I, Peck RK, Stick R (1998) Articulins and epiplasmins: two distinct classes of 5 6 712 cytoskeletal proteins of the membrane skeleton in protists. J Cell Sci 111:3367-3378 7 8 Ishikawa H, Bischoff R, Holtzer H (1968) Mitosis and intermediate-sized filaments in developing 9 713 10 714 skeletal muscle. J Cell Biol 38:538-555 11 12 13 715 Kalnins VI, Subrahmanyan L, Fedoroff S (1985) Assembly of glial intermediate filament protein Is 14 15 716 initiated in the centriolar region. Brain Res 345:322-326 16 17 717 Katris NJ, van Dooren GG, McMillan PJ, Hanssen E, Tilley L, Waller RF (2014) The Apical 18 19 718 Complex Provides a Regulated Gateway for Secretion of Invasion Factors in Toxoplasma. PloS 20 21 719 Pathog 10(4):e1004074 22 23 720 Kloetzel JA, Baroin-Tourancheau A, Miceli C, Barchetta S, Farmar J, Banerjee D, Fleury- 24 25 721 Aubusson A (2003) Plateins: A novel family of signal peptide-containing articulins in euplotid ciliates. 26 27 722 J Eukaryot Microbiol 50:19-33 28 29 Kollmar M (2015) Polyphyly of nuclear lamin genes indicates an early eukaryotic origin of the 30 723 31 724 metazoan-type intermediate filament proteins. Sci Rep 5:10652 32 33 34 725 Koonin EV (2010) The origin and early evolution of eukaryotes in the light of phylogenomics. 35 36 726 Genome Biol 11:209 37 38 727 Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, 39 40 728 Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, 41 42 729 Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of 43 730 proteins encoded in complete eukaryotic genomes. Genome Biol 5:R7 44 45 46 731 Koreny L, Field MC (2016) Ancient eukaryotic origin and evolutionary plasticity of nuclear lamina. 47 48 732 Genome Biol EvolAccepted In press. doi: 10.1093/gbe/evw087 Manuscript 49 50 733 Kulda J, Nohýnková E, Ludvík J (1986) Basic structure and function of the trichomonad cell. Acta 51 52 734 Univ Carol Biol 30:181-198 53 54 55 735 Kusdian G, Gould SB (2014) The biology of Trichomonas vaginalis in the light of urogenital tract 56 57 736 infection. Mol Biochem Parasitol 198:92-99 58 59 60 61 62 63 22 64 Page 22 of 33 65 737 Kusdian G, Woehle C, Martin WF, Gould SB (2013) The actin-based machinery of Trichomonas 1 738 vaginalis mediates flagellate-amoeboid transition and migration across host tissue. Cellul Microbiol 2 3 739 15:1707-1721 4 5 6 740 Lee KE, Kim JH, Jung MK, Arii T, Ryu JS, Han SS (2009) Three-dimensional structure of the 7 741 cytoskeleton in Trichomonas vaginalis revealed new features. J Electron Microsc 58:305-313 8 9 10 742 Letunic I, Doerks T, Bork P (2015) SMART: recent updates, new developments and status in 2015. 11 12 743 Nucleic Acids Res 43:D257-D260 13 14 15 744 Lupas A, Vandyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 16 745 252:1162-1164 17 18 19 746 Mann T, Beckers C (2001) Characterization of the subpellicular network, a filamentous membrane 20 21 747 skeletal component in the parasite Toxoplasma gondii. Mol Biochem Parasitol 115:257-268 22 23 748 Marrs JA, Bouck GB (1992) The two major membrane skeletal proteins (articulins) of 24 25 749 gracilis define a novel class of cytoskeletal proteins. J Cell Biol 118:1465-1475 26 27 28 750 Mason JM, Arndt KM (2004) Coiled coil domains: Stability, specificity, and biological implications. 29 ChemBioChem 5:170-176 30 751 31 32 752 Mazouni K, Pehau-Arnaudet G, England P, Bourhy P, Saint Girons I, Picardeau M (2006) The 33 34 753 scc spirochetal coiled-coil protein forms helix-like filaments and binds to nucleic acids generating 35 36 754 nucleoprotein structures. J Bacteriol 188:469-476 37 38 755 Noda S, Mantini C, Meloni D, Inoue JI, Kitade O, Viscogliosi E, Ohkuma M (2012) Molecular 39 40 756 phylogeny and evolution of Parabasalia with improved taxon sampling and new protein markers of 41 42 757 actin and elongation factor-1 alpha. PloS ONE 7(1):e29938 43 44 758 Palm D, Weiland M, McArthur AG, Winiecka-Krusnell J, Cipriano MJ, Birkeland SR, Pacocha 45 46 759 SE, Davids B, Gillin F, Linder E, Svard S (2005) Developmental changes in the adhesive disk 47 48 760 during Giardia differentiation.Accepted Mol Biochem Parasitol 141Manuscript:199-207 49 50 761 Paredez AR, Assaf ZJ, Sept D, Timofejeva L, Dawson SC, Wang CJR, Cande WZ (2011) An 51 52 762 actin cytoskeleton with evolutionarily conserved functions in the absence of canonical actin-binding 53 54 763 proteins. Proc Natl Acad Sci USA 108:6151-6156 55 56 57 764 Peter A, Stick R (2015) Evolutionary aspects in intermediate filament proteins. Curr Opin Cell Biol 58 765 32:48-55 59 60 61 62 63 23 64 Page 23 of 33 65 766 Poschmann G, Seyfarth K, Besong Agbo D, Klafki HW, Rozman J, Wurst W, Wiltfang J, Meyer 1 767 HE, Klingenspor M, Stühler K (2014) High-fat diet induced isoform changes of the Parkinson's 2 3 768 disease protein DJ-1. J Proteome Res 13:2339-2351 4 5 6 769 Rada P, Dolezal P, Jedelsky PL, Bursac D, Perry AJ, Sedinova M, Smiskova K, Novotny M, 7 770 Beltran NC, Hrdy I, Lithgow T, Tachezy J (2011) The core components of organelle biogenesis and 8 9 771 membrane transport in the hydrogenosomes of Trichomonas vaginalis. Plos One 6(9):e24428 10 11 12 772 Rice P, Longden I, Bleasby A (2000) EMBOSS: The European molecular biology open software 13 14 773 suite. Trends Genet 16:276-277 15 16 774 Roberts TM (1987) Fine (2-5-Nm) Filaments - New types of cytoskeletal structures - Invited review. 17 18 775 Cell Motil Cytoskeleton 8:130-142 19 20 21 776 Rosa ID, de Souza W, Benchimol M (2013) High-resolution scanning electron microscopy of the 22 777 cytoskeleton of Tritrichomonas foetus. J Struct Biol 183:412-418 23 24 25 778 Rose A, Schraegle SJ, Stahlberg EA, Meier I (2005) Coiled-coil protein composition of 22 26 27 779 proteomes - differences and common themes in subcellular infrastructure and traffic control. BMC 28 780 Evol Biol 5:66 29 30 31 781 Stirnimann CU, Petsalaki E, Russell RB, Muller CW (2010) WD40 proteins propel cellular 32 33 782 networks. Trends Biochem Sci 35:565-574 34 35 36 783 Suvorova ES, Francia M, Striepen B, White MW (2015) A Novel Bipartite Centrosome 37 784 Coordinates the Apicomplexan Cell Cycle. PloS Biology 13(3):e1002093 38 39 40 785 Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, 41 42 786 Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, 43 787 Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. 44 45 788 BMC Bioinformatics 4:41 46 47 48 789 Tewari R, BailesAccepted E, Bunting KA, Coates JC (2010) ManuscriptArmadillo-repeat protein functions: questions 49 790 for little creatures. Trends Cell Biol 20:470-481 50 51 52 791 Tran JQ, Li C, Chyan A, Chung L, Morrissette NS (2012) SPM1 Stabilizes Subpellicular 53 54 792 Microtubules in Toxoplasma gondii. Eukaryot Cell 11:206-216 55 56 57 793 Trevor KT, Mcguire JG, Leonova EV (1995) Association of vimentin intermediate filaments with 58 794 the centrosome. J Cell Sci 108:343-356 59 60 61 62 63 24 64 Page 24 of 33 65 795 Twu O, de Miguel N, Lustig G, Stevens GC, Vashisht AA, Wohlschlegel JA, Johnson PJ (2013) 1 796 Trichomonas vaginalis exosomes deliver cargo to host cells and mediate host:parasite interactions. 2 3 797 PloS Pathog 9(7):e1003482 4 5 6 798 Vaughan S, Kohl L, Ngai I, Wheeler RJ, Gull K (2008) A repetitive protein essential for the 7 799 flagellum attachment zone filament structure and function in Trypanosoma brucei. Protist 159:127- 8 9 800 136 10 11 12 801 Viguès B, Méténier G, Sénaud J (1984) The sub-surface cytoskeleton of the ciliate Polyplastron 13 14 802 multivesiculatum: isolation and major protein components. Eur J Cell Biol 35:336-342 15 16 803 Viscogliosi E, Brugerolle G (1994) Striated fibers in trichomonads - Costa proteins represent a new 17 18 804 class of proteins forming striated roots. Cell Motil Cytoskeleton 29:82-93 19 20 21 805 Vizcaino JA, Deutsch EW, Wang R, Csordas A, Reisinger F, Rios D, Dianes JA, Sun Z, Farrah 22 806 T, Bandeira N, Binz PA, Xenarios I, Eisenacher M, Mayer G, Gatto L, Campos A, Chalkley RJ, 23 24 807 Kraus HJ, Albar JP, Martinez-Bartolome S, Apweiler R, Omenn GS, Martens L, Jones AR, 25 26 808 Hermjakob H (2014) ProteomeXchange provides globally coordinated proteomics data submission 27 809 and dissemination. Nat Biotechnol 32:223-226 28 29 30 810 Weber K, Geisler N, Plessmann U, Bremerich A, Lechtreck KF, Melkonian M (1993) Sf- 31 32 811 assemblin, the structural protein of the 2-nm filaments from striated microtubule-associated fibers of 33 812 algal flagellar roots, forms a segmented coiled-coil. J Cell Biol 121:837-845 34 35 36 813 Wickstead B, Gull K (2011) The evolution of the cytoskeleton. J Cell Biol 194:513-525 37 38 39 814 Woehle C, Kusdian G, Radine C, Graur D, Landan G, Gould SB (2014) The parasite Trichomonas 40 41 815 vaginalis expresses thousands of pseudogenes and long non-coding RNAs independently from 42 816 functional neighbouring genes. BMC Genomics 15:906 43 44 45 817 Yang R, Bartle S, Otto R, Stassinopoulos A, Rogers M, Plamann L, Hartzell P (2004) AglZ is a 46 47 818 filament-forming coiled-coil protein required for adventurous gliding motility of Myxococcus xanthus. 48 819 J Bacteriol 186:6168Accepted-6178 Manuscript 49 50 51 820 You Y, Elmore S, Colton LL, Mackenzie C, Stoops JK, Weinstock GM, Norris SJ (1996) 52 53 821 Characterization of the cytoplasmic filament protein gene (cfpA) of Treponema pallidum subsp. 54 822 pallidum. J Bacteriol 178:3177-3187 55 56 57 823 Zimorski V, Major P, Hoffmann K, Bras XP, Martin WF, Gould SB (2013) The N-terminal 58 59 824 sequences of four major hydrogenosomal proteins are not essential for import into hydrogenosomes of 60 61 825 Trichomonas vaginalis. J Eukaryot Microbiol 60:89-97 62 63 25 64 Page 25 of 33 65 826 1 2 827 3 4 828 Table legend 5 6 829 7 8 Table 1. Properties of the five cytoskeleton proteins localized and in comparison to actin and tubulin 9 830 10 831 as references. Five candidates were chosen for immunofluorescence localization studies according to 11 12 832 the PSM (peptide spectrum matches) value and the molecular weight. The tubulin beta chain 13 14 833 (TVAG_467840) and actin (TVAG_172680) were also identified as part of the cytoskeleton proteome 15 834 and are listed here for comparison. Database values are adopted from TrichDB v2.0. In addition, based 16 17 835 on the transcriptome of T. vaginalis (Gould et al. 2013), the absolute expression values of the proteins 18 19 836 are listed. Cov %, coverage of predicted protein sequence; AA, amino acids; MM, molecular weight; 20 837 IEP, isoelectric point. 21 22 838 23 24 25 839 Figure Legends 26 27 840 Figure 1. The cytoskeleton of T. gallinarum and T. vaginalis. (A) The cartoon illustrates the pelta- 28 29 841 axostyle-costa-karyomastigont complex and its associated structures that represent the main 30 842 cytoskeletal structures of the parasite. The karyomastigont system in this case comprises nucleus 31 32 843 associated filaments and all five kinetosomes. The extracted cytoskeletal fraction (B) showing the 33 34 844 cytoskeleton was analyzed by the multiplex fluorescent blot (C), separated on the silver stained gel 35 845 (D) and used for mass spectrometry analysis (ESI-MS). The multiplex fluorescent blot visualizes the 36 37 846 detection of tubulin (red signal) and SCSalpha (green signal) within the fractions total cell lysate 38 39 847 (TCL), cytoskeleton (CYT) and hydrogenosomes (HYD) indicating the absence of hydrogenosomes as 40 848 a likely contaminant in the cytoskeleton fraction. The two bands of SCSalpha in the hydrogenosomal 41 42 849 fraction are frequently observed (Woehle et al. 2014; Zimorski et al, 2013). To verify the data, 43 44 850 triplicates were applied for ESI-MS, see Supplementary Material Figure S1. Scale bar: 10 µm. 45 46 851 47 48 852 Figure 2. DistributionAccepted of recognizable and conserved domainsManuscript among the five selected T. vaginalis 49 50 853 proteins. The motifs shown are based on the prediction by the SMART algorithm (Simple Modular 51 854 Architecture Research Tool; Letunic et al. 2014). 52 53 54 855 55 56 856 Figure 3. Coiled-coils and other repetitive units are increased among proteins of the cytoskeleton 57 857 proteome. (A) The box blot displays the number of predicted coiled-coils within 203 proteins present 58 59 858 in the extracted cytoskeleton (CYT) fraction compared to 301 hydrogenosomal proteins (HYD; based 60 61 859 on Garg et al., 2015) as a control. (B) The box blot indicates the length of coiled-coils in amino acids 62 63 26 64 Page 26 of 33 65 860 (aa) based on the same dataset. (C) The bar diagram shows the distribution of the number of repetitive 1 861 units per protein in the compared datasets. One unit consists of at least 3 repetitive motifs, each being 2 3 862 at least 10 amino acids long. Here the number of proteins containing 1, 2, 3 or 4 repetitive motifs is 4 5 863 significantly higher in the CYT dataset than in the HYD dataset. In contrast to the dataset of randomly 6 864 selected proteins (RDM; Fig. S3), the HYD dataset contains proteins harbouring more than five 7 8 865 repetitive motifs. 9 10 866 11 12 867 Figure 4. Immunofluorescence localization in T. vaginalis of the five cytoskeleton candidate proteins. 13 14 868 All proteins were expressed as hemagglutinin-fusion proteins (anti-HA, green) and co-localized with 15 16 869 tubulin (red) that labels in particular the axostyle . Scale bar: 10 μm. 17 18 870 19 20 871 Figure 5. Heterologous expression of TVAG_474360 in T. gallinarum. (A) Immunofluorescence 21 22 872 microscopy images show the localization of the hemagglutinin (HA)- tagged TVAG_474360 (green). 23 24 873 Scale bar: 10 μm. (B) Transmission electron microscopy images of longitudinal sections through three 25 874 exemplary cells. The cells show additional striated filaments (highlighted by the dashed boxes and 26 27 875 indicated by arrow heads) that together aggregate to form a single thick rod. The striated pattern of the 28 29 876 additional filaments looks similar to that of the costa (C) and the parabasal filaments (Pf), but they 30 877 differ in width and were not observed to co-localize with either of the two naturally-occurring 31 32 878 filaments. Axostyl, Ax; Nucleus, N; Hydrogenosomes, H. Scale bar: 1 μm. 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Accepted Manuscript 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 27 64 Page 27 of 33 65 FigureA 1 B C TCL CYT HYD 55

Anterior 35 flagella D CYT

Pelta

Axostyle Nucleus 140 Costa Hydrogenosomes 115 Parabasal Golgi apparatus filaments 080

Recurrent 065 flagellum Undulating 050 membrane Accepted040 Manuscript

Page 28030 of 33 020

015 Figure100 2 200 300 400 500 600 700 800 900 1000 TVAG_339450

TVAG_474360

TVAG_117060

TVAG_030160

Coiled-coil

AcceptedInternal repeats Manuscript

TVAG_059360 Page 29WD40 of domain 33

Armadillo/beta- catenin like repeat Figure 3 A B C

26 200 300

300 24

275 22 250 20 150 18 225 200 200 16 14 175 100 12 150

10 125 100 8 100 Number of proteins 50 6 75

Length of coiled-coils (aa) 50

Number of predicted coiled-coils 4 2 Accepted25 Manuscript 0 0 0 0 CYT HYD CYT HYD CYT HYD Page 30 of 33 0 1 2 3 4 >5 Amount of repetitive motifs Figureα-HA 4 α-tubulin Merge + DAPI Phase contrast TVAG_339450 TVAG_339450 TVAG_474360 TVAG_474360

TVAG_117060 Accepted Manuscript TVAG_030160

Page 31 of 33 TVAG_059360 TVAG_059360 FigureA TVAG_474360::HA 5 DIC Merge

B

xA

Ax H Ax Pf C N N AcceptedH Manuscript PageH 32 of 33 Table 1

Table 1: Properties of the five cytoskeleton proteins localized in comparison to actin and tubulin as references

T. vaginalis Mass spectrometry data of T. gallinarum TrichDB annotated properties Accession numbers Annotation E value Expression absolute Coverage % UP PSM AA MW (kDa) IEP T. gallinarum T. vaginalis

TEGb005933 TVAG_339450 unknown 0.0 665 65 67 256 977 113.5 4.9 TEGb003426 TVAG_474360 unknown 0.0 2267 65 66 244 1042 119.1 4.8 TEGb019317 TVAG_117060 unknown 0.0 502 70 64 330 878 100.5 4.9 TEGb017573 TVAG_030160 F-box and WD domain protein 0.0 682 76 33 163 605 65.7 6.9 TEGb012599 TVAG_059360 Sperm associated antigen 6 0.0 941 63 30 111 505 55.0 7.1 TEGb007706 TVAG_467840 Tubulin beta chain 0.0 29130 75 27 1228 452 50.1 4.7 TEGb007619 TVAG_172680 Actin 0.0 18485 8 9 24 386 43.0 5.3

Accepted Manuscript

Page 33 of 33