<<

Accepted Manuscript

Title: Establishment of a de novo Reference Transcriptome of Reveals Basic Insights About Biological Functions and Potential Pathogenic Mechanisms of the ParasiteHistomonas meleagridis Reference Transcriptome–>

Authors: Rounik Mazumdar, Lukas Endler, Andreas Monoyios, Michael Hess, Ivana Bilic

PII: S1434-4610(17)30080-9 DOI: https://doi.org/10.1016/j.protis.2017.09.004 Reference: PROTIS 25595

To appear in:

Received date: 7-4-2017 Accepted date: 23-9-2017

Please cite this article as: Mazumdar, Rounik, Endler, Lukas, Monoyios, Andreas, Hess, Michael, Bilic, Ivana, Establishment of a de novo Reference Transcriptome of Histomonas meleagridis Reveals Basic Insights About Biological Functions and Potential Pathogenic Mechanisms of the Parasite. https://doi.org/10.1016/j.protis.2017.09.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ORIGINAL PAPER

Establishment of a de novo Reference Transcriptome of Histomonas meleagridis Reveals Basic Insights About Biological Functions and

Potential Pathogenic Mechanisms of the Parasite

Running title: Histomonas meleagridis Reference Transcriptome

Rounik Mazumdara, Lukas Endlerb, Andreas Monoyiosa, Michael Hessa,c, and Ivana

Bilica,1

aClinic for Poultry and Fish Medicine, Department for Farm and Veterinary

Public Health, University of Veterinary Medicine Vienna, Veterinärplatz 1, A–1210

Vienna, Austria bPlatform Bioinformatics and Biostatistics, Department of Biomedical Sciences,

University of Veterinary Medicine Vienna, Veterinärplatz 1, A–1210 Vienna, Austria cChristian Doppler Laboratory for Innovative Poultry Vaccines (IPOV), University of

Veterinary Medicine Vienna, Veterinärplatz 1, A-1210, Vienna, Austria

Submitted April 7, 2017; Accepted September 23, 2017

Monitoring Editor: C. Graham Clark

1

1Corresponding author; fax +43 1 25077 5192 e-mail [email protected] (I. Bilic).

The protozoan Histomonas meleagridis is the causative agent of histomonosis in poultry. In turkeys, high mortality might be noticed whereas in chickens the disease is less severe despite production losses. Discovered over a century ago, molecular data on this parasite are scarce and genetic studies are in its infancy. To expand genomic information, a de novo transcriptome sequencing of H. meleagridis was performed from a virulent and an attenuated strain, cultivated in vitro as monoxenic mono-eukaryotic culture. Normalized cDNA libraries were prepared and sequenced on Roche 454 GS FLX resulting in 1.17 million reads with an average read length of 458bp. Sequencing reads were assembled into two sets of >4500 contigs, which were further integrated to establish a reference transcriptome for H. meleagridis consisting of 3356 contigs. Following gene ontology analysis, data mining provided novel biological insights into proteostasis, cytoskeleton, metabolism, environmental adaptation and potential pathogenic mechanisms of H. meleagridis. Finally, the transcriptome data was used to perform an in silico drug screen to identify potential anti-histomonal drugs. Altogether, data recruited from virulent and attenuated parasites facilitate a better understanding of the parasites’ molecular biology aiding the development of novel diagnostics and future research.

2

Abbreviations

ABP: Actin binding proteins; ADP: Adenosine diphosphate; AP: Adhesion protein;

ATP: Adenosine triphosphate; BLAST: Basic local alignment search tool; CP:

Cysteine peptidase; GFP: Green fluorescent protein; GO: Gene ontology; IFT:

Intraflagellar transport; LRR: Leucine rich repeat; LTG: Lateral gene transfer; NO:

Nitric oxide; PFOR: Pyruvate: ferredoxin oxidoreductase; PFT: Pore-forming toxin;

ROS: Reactive oxygen species; SALIP: Saposin like protein; SOD: Superoxide dismutase, TCP- T-complex protein; TIM- translocase in the inner membane; TRiC-

TCP-1 ring containing complex

Key words: Histomonas meleagridis; transcriptome; de novo sequencing; protozoa; virulence; drugs.

Introduction

Histomonas meleagridis, first described in 1893, is a flagellate protozoan and the causative agent of histomonosis in gallinaceous birds, primarily infecting turkeys and chickens (Hess et al. 2015). The disease manifests as an enterohepatitis and sometimes causes high mortality in turkeys with pathological lesions in the caeca and liver, whereas in chickens lesions are usually confined to the caeca (Hess et al. 2015;

McDougald 2005). The occurrence of histomonosis was well controlled in the past by the application of nitroimidazoles and nitrofurans for therapeutic or prophylactic use

(Liebhart et al. 2017). After their introduction, research on H. meleagridis nearly 3 ceased as cases of outbreaks became rare (McDougald 2005). The ban of these effective drugs in the last two decades in the European Union, North America and elsewhere has led to the re-emergence of histomonosis (Hess et al. 2015). As a consequence, scientific interest on H. meleagridis intensified due to the threat it poses to poultry flocks, with the substantial suffering of animals and the financial losses associated with outbreaks of the disease.

Various forms of H. meleagridis are known to occur depending on environmental conditions, e.g. in in vitro cultures and within the host (Honigberg and

Bennett 1971; Tyzzer 1919). A cyst-like form was described, indicating that the parasite might undergo transformation to a certain persistent stage in which it becomes resistant to unfavourable environmental conditions (Munsch et al. 2009).

Despite its apparently amoeboid morphology under the light microscope, H. meleagridis is a trichomonad as confirmed by electron microscopy studies (Schuster

1968). Phylogenetic investigations on microorganisms demonstrated the close relationship of H. meleagridis to , placing both taxa in the class of Tritrichomonadea (Gerbod et al. 2001). Common features of the protozoon include the presence of , the parabasal apparatus with one and a typical mastigont arrangement.

Hitherto the majority of molecular investigations concentrated on the phylogenetic position of H. meleagridis (Hess et al. 2015). Detailed molecular studies are very limited with few exceptions (Bilic et al. 2009; Klodnicki et al. 2013; Leberl et al. 2010; Mazet et al. 2008) and genetic studies have just begun (Bilic et al. 2014).

The scarcity of molecular data has been hindering the progression of our understanding of the molecular biology, especially mechanisms contributing to virulence of the parasite. Furthermore, the lack of a whole genome sequence of the

4 parasite considerably limits any comparative ‘omics’ analyses (transcriptome, proteome). Therefore, in order to broaden the repertoire of molecular data, a de novo transcriptome sequencing study of a virulent and an attenuated H. meleagridis strain was performed, both traced back to a single cell. The present study employed Roche

454 NGS technology to obtain normalized transcript sequences from the two strains, which were integrated into a hybrid assembly representing the reference transcriptome. Obtained sequences were analyzed and interpreted to address biological functions and potential pathogenic mechanisms.

Results and Discussion

De novo Sequence Assembly of H. meleagridis Transcriptome

Sequencing was performed from in vitro cultivated virulent and attenuated strains of

H. meleagridis, maintained as monoxenic mono-eukaryotic cultures. Histomonas meleagridis is propagated in vitro only in the presence of (xenic conditions)

(Hess et al. 2015). Cultures used as starting material in the present study were established in several steps. The initial H. meleagridis culture was generated from a clinical case by inoculating the caecal contents of a diseased turkey into a suitable medium. To provide well-defined protozoan material a “mono-eukaryotic” culture was created by the transfer of a single histomonad cell to fresh medium supplemented with poultry caecal flora via micromanipulation approach (Hess et al. 2006). Such a procedure enabled tracing back the culture to a single cell. Attenuation of the protozoa was achieved by prolonged in vitro cultivation and was demonstrated in experiments in chickens and turkeys (Liebhart et al. 2017). A final refinement of cultures used in the present study was the mono-xenization, a process in which

5 the mix of caecal specific bacteria was replaced by a single Escherichia coli strain

(Ganas et al. 2012). Animal experiments using mono-xenic virulent and attenuated strain demonstrated that the ability to cause a disease is the exclusive property of the protozoa since the exchange in the bacterial background flora did not influence their initial virulent and avirulent phenotype, respectively (Ganas et al. 2012).

Sequencing resulted in 509.5 Mb DNA sequences, with a total number of

1173830 reads of an average length of 458 bp (Table 1). Assembly of reads for the virulent strain yielded 4543 contigs larger than 500 bp and an average contig length of 1329 bp, whereas sequencing of transcripts from attenuated strain resulted in

4508 contigs and an average contig size of 1410 bp. An earlier study, reporting sequencing of a H. meleagridis cDNA library resulted in a slightly higher number of contigs (n=3425) (Klodnicki et al. 2013), which could be explained by a substantial difference in the starting material. In that study, the H. meleagridis isolate, obtained from a backyard poultry flock, was cultivated in vitro as an ordinary xenic culture, consisting of the mix of poultry caecal flora and the protozoan. This contributed to higher complexity of the starting material as compared to the material used here, especially in respect to the strain content. In contrast to the xenic culture, mono- eukaryotic mono-xenic cultures employed here, have defined protozoan and bacterial strain content that enabled us to sort individual sequences with the high confidence.

In order to determine the difference in identified sequences between the present study and the one of Klodnicki et al. (2013), a Blastn (E-value ≥ 1E-5) comparison of two data sets was carried out. The analysis detected 559 novel sequences in the actual dataset (Supplementary Material Table S1). Among protein coding information sequences of histone 2A-IV, 60S acidic ribosomal protein P1 and

Hsp20 have been identified, altogether missing in the earlier study.

6

About 35% of all contigs were of E. coli origin, which was anticipated owing to the monoxenic nature of the H. meleagridis in vitro culture. The E. coli-specific contigs were filtered out and the remaining 65% were considered as H. meleagridis- specific contigs that were retained for further analysis (Fig. 1). Considering, that a lateral gene transfer (LTG) is well documented in trichomonads (Alsmark et al. 2009) the possibility that few of the H. meleagridis contigs arising from a recent LGT from E. coli might have been removed in the above mentioned process, cannot be excluded.

The H. meleagridis-specific contigs included 2890 and 2943 contigs from the virulent and the attenuated strain, respectively. Both sets of contigs were merged using CAP3 followed by CD-HIT giving rise to 3356 unique contigs, which constituted the H. meleagridis reference transcriptome database (Fig. 1). Out of 3356 contigs,

2150 were mapped to both virulent and attenuated set, 564 contigs were specific to the virulent and 642 contigs to the attenuated strain (Fig. 1; Supplementary Material

Table S2). In order to differentiate strain-specific contigs from the contigs present in both strains, contigs from the virulent and the attenuated strain have been assigned an extension of ‘_LP’ and ‘_HP’, respectively (Supplementary Material Table S2). The majority of strain specific contigs were annotated with the same protein family as contigs present in transcriptomes of both strains. In very few cases, only a strain- specific contig was annotated as the coding sequence for the particular protein, suggesting a strain specific expression. However, considering that normalized cDNA libraries were used for sequencing and that sequencing depth was not very high, the absence of a certain sequence in a particular transcriptome was not further addressed.

Major Aspects of the H. meleagridis Transcriptome

7

The study resulted in smaller numbers of gene coding sequences in comparison to homologous trichomonads, such as T. vaginalis and D. fragilis (Barratt et al. 2016), which is similar to an earlier sequencing study (Klodnicki et al. 2013). The reason for this discrepancy may be attributable to the sequencing depth and the fact that the transcriptome, and not the genome, was sequenced. Even though both the present sequencing study and that of Klodnicki et al. (2013) used normalized cDNA to obtain sequences from genes expressed at low levels, the limited sequencing depth might have resulted in rarer transcripts being missed.

Gene ontology (GO) analysis of 3356 contigs revealed numerous transcripts, often encoding expanding gene families. This was not surprising as gene duplication is a well known feature shared among the trichomonads (Barratt et al. 2016).

Sequences encoding for small GTPases, ribosomal proteins, kinases, peptidases, a significant repertoire of hypothetical proteins, cytoskeletal components and metabolic enzymes along with proteins involved in survival and pathogenic mechanisms were found to be in multiple copies. The gene families with a high number of different transcripts are illustrated as a ‘transcriptome ball’ (Fig. 2).

Components of G Protein Signaling Pathway

The H. meleagridis transcriptome possesses more than 120 transcripts of monomeric

G-proteins with representatives in all major five families: Ras, Rho, Rab, Arf/Sar1 and

Ran. With 74 transcripts, the Rab family proteins were the most abundant group of monomeric G proteins. Rab family GTPases are involved in membrane trafficking between organelles, playing one of the central roles in ensuring the delivery of cargo to the correct destination (Stenmark 2009). The localization of particular Rab proteins

8 is limited to a specific cellular compartment (Wennerberg et al. 2005), which correlates with an observation that the diversity of the Rab GTPase family seems to be proportional to the complexity of an . The majority of unicellular possess 10 to 20 different Rabs, whereas multicellular organisms can possess over 60 (Pereira-Leal and Seabra 2001). However, similar to H. meleagridis this seems not to be true for some of the protozoan parasites like vaginalis or Entamoeba histolytica (Bosch and Siderovski 2013; Carlton et al. 2007), as well as for free living Dictyostelium discoideum and ciliate Tetrahymena thermophila (Bright et al. 2010; Eichinger et al. 2005). Such a variety of Rab family

GTPases in these unicellular organisms could be explained by their structural complexity, which for most of them is reflected in the range of different morphological forms. The same can be applied for H. meleagridis, as this parasite demonstrates striking changes of its morphology depending on the environmental conditions in which it is found (Gruber et al. 2017; Hess et al. 2015).

Contrary to monomeric G-proteins, components of heterotrimeric G-proteins were not so abundant. These proteins are made up of three subunits: alpha (Gα), beta (Gβ) and gamma (Gγ), of which the latter two are often referred to as beta-

/gamma-subunit complex (Gβγ) (Preininger and Hamm 2004). In the H. meleagridis transcriptome only Gα subunits (5 transcripts) could be identified, similarly to T. vaginalis (Anantharaman et al. 2011), indicating that some trichomonads might employ G-protein signaling independent of Gβγ.

Ribosomal Proteins

More than 170 ribosomal protein family members were identified in the transcriptome of H. meleagridis; of which 95 transcripts encode the components of the 60S subunit and 79 transcripts the components of the 40S subunit. The majority of ribosomal 9 protein family members are present in multiple copies, a feature already reported for other trichomonads, such as D. fragilis and T. vaginalis (Barratt et al. 2015; Carlton et al. 2007).

Kinases

The range of transcripts encoding kinases (>230) in H. meleagridis transcriptome is broad. The majority of kinases demonstrated sequence similarity to the eukaryotic kinase protein (ePK) and could be classified into the following protein kinase families: Protein Kinase A, G and C families (AGC), Calmodulin/Calcium regulated protein kinases (CAMK), Cell Kinase 1 (CK1), homologues of the yeast STE kinases

(STE) and Tyrosine Kinase-Like (TKL). Even though kinases represent one of the most abundant group of transcripts, theoretical kinome of H. meleagridis seems not as complex and voluminous as reported for other homologous protozoa such as D. fragilis, E. histolytica and T. vaginalis (Anamika and Bhattacharya 2008; Barratt et al.

2015; Carlton et al. 2007). Considering that sequencing depth was not very high it is possible that a certain number of transcripts were missed which should be addressed in future studies.

Peptidases and Protein Homeostasis

Histomonas meleagridis expresses a wide range of peptidases as reflected in the results retrieved from the BLASTx search against MEROPS pepunit.lib database

(Rawlings et al. 2006). The search resulted in the detection of 115 peptidases (E- value ≤ 1E-5), excluding hits to non-peptidase homologues and inhibitors (Fig. 3). In general, transcripts of various cysteine peptidases were identified as the most represented group, followed by serine and metallopeptidases. The hit name

“mername-AA287 peptidase” received the highest number of transcripts (6).

10

“Mername-AA287 peptidase” is the MEROPS name for Clan CA, family C1, cysteine peptidases, described in T. vaginalis as CP65, TvCP39 or TvCP4. These peptidases were found to be involved in the parasite’s adherence to host cells and in their destruction (Alvarez-Sánchez et al. 2000; Cárdenas-Guerra R et al. 2013;

Hernández-Gutiérrez et al. 2004). Interestingly, the same hit was also found as the most abundant in the transcriptome of D. fragilis (Barratt et al. 2015), indicating a possible conserved mechanism amongst these with importance for pathogenicity.

In eukaryotic cells, the bulk of the protein turnover is carried out by two proteolytic systems: the lysosomes and proteasomes (Sorokin et al. 2009). Hallmarks of both systems are present in the transcriptome of H. meleagridis. Lysosomes are membrane-bound organelles that serve as the major degradative compartment within eukaryotic vacuolar system to degrade endocytic, autophagic and secretory materials

(Mullins and Bonifacino 2001). The above mentioned BLASTx search identified a number of peptidases with potential lysosomal action (Fig.3), such as lysosmal peptidase 66.3kDa protein, family C13 and S28 unassigned proteases as well as members of family A1 peptidases. Considering that studies of the ultrastructure of in vitro cultured H. meleagridis demonstrate a rich vacuolar morphology (Schuster

1968), the existence of lysosomal protein degradation mechanisms can be suggested. In protozoan parasites, such as E. histolytica, lysosomes were additionally shown to control transport, maturation and secretion of cysteine peptidases and thereby contribute to the virulence of this organism (Nozaki and

Nakada-Tsukui 2006).

Contrary to lysosomes, proteasomes are a selective system in which proteins tagged for degradation are eliminated. Among the identified peptidases were also

11 major components of the 20S proteasome, the catalytic core particle of a bigger complex called the 26S proteasome (Tanaka 2013). The identification of the transcripts encoding 7 alpha (10 transcripts) and 4 beta (5 transcripts) subunits of the

20S proteasome (Fig. 3; Supplementary Material Table S2) is congruent with data from other protozoan parasites indicating the subunit composition, similar to other rather than to that of archeabacterial proteasomes (Paugam et al. 2003).

In contrary to eukaryotic proteasomes that compromise multiple alpha- and beta subunits, archebacterial proteasomes consist of single alpha- and beta subunits.

In the 26S proteasome, a catalytic core particle (20S proteasome) is regulated by two terminal regulatory particles (19S regulatory particles), which recognize ubiquitin-tagged proteins and direct them to degradation by the catalytic core

(Tanaka 2013). Taking into account that transcripts encoding hallmarks of an ubiquitin-proteasome system, such as components of 19S regulatory particles, COP9 signalosome, ubiquitin specific peptidases, ubiquitin and ubiquitin conjugating enzymes are present in the H. meleagridis transcriptome (Fig. 3; Supplementary

Material Table S2), it becomes evident that H. meleagridis relies upon the ubiquitin- proteasome system to regulate protein levels within the cells. Research on the role of proteasomes, especially in the respect to the biology of protozoan parasites, demonstrated their action in the replication and cell cycle, life stage-specific transformation and metabolic adaptation to environmental changes or stress responses (Muñoz et al. 2015). In agreement with this, one can assume that the ubiquitin-proteasome system has likewise an important role in the biology of H. meleagridis.

Hypothetical Proteins

12

Hypothetical proteins constitute a significant percentage of any genome (Galperin

2001). In the present study, 30% of the transcripts were identified as hypothetical proteins. Considering that these sequences originate from mRNA, it is plausible that they actually represent real proteins which yet have to be characterized. This is supported by in silico investigation of hypothetical proteins from humans available in the protein database, which demonstrated that 21% of the hypothetical proteins have been experimentally characterized and 6% of those have been shown to have a role in a mitochondrial context (Desler et al. 2009). A small proportion of H. meleagridis transcripts encoding hypothetical proteins could be classified as ‘conserved hypothetical proteins’ [contig1462, contig1525, contig00440_HP] (Supplementary

Material Table S2). These hypothetical proteins are found to have phyletic conservation (Galperin 2001). Finally, hypothetical proteins for which some functional predictions could be obtained, are collectively termed as the ‘known unknowns’

(Galperin and Koonin 2004). Such proteins accounted for about 15% of the total hypothetical protein hits in the H. meleagridis transcriptome.

Cytoskeleton

The cytoskeleton dynamics seem to play an important role for H. meleagridis as striking differences in parasite’s cell morphology are seen between in vitro cultivated virulent and attenuated cells. Virulent cells exhibit a flagellate spheroid form with occasional pseudopodia, whereas the attenuated cells are of highly amoeboid morphology although the presence of a flagellum is still present (Gruber et al. 2017).

The shift from amoeboid to spherical form could be observed in the attenuated culture upon the changes in the environmental conditions indicating a capacity of the parasite to quickly exchange its morphology (Gruber et al. 2017). Apart from the in vitro culture conditions different morphological forms of H. meleagridis were observed

13 in the host i) an amoeboid form found in early lesions considered to be an ‘invading form’, ii) a larger “vegetative form” found accumulating in distended host tissue and iii) small round cells found in older parts of lesions and inside the intermediate host the intestinal nematode Heterakis gallinarum (Tyzzer 1919).

These observations of variable cell morphology are supported on the transcriptome level, as transcripts important for architecture and dynamics of actin microfilaments, , and flagella were identified (Table 2). In addition to already described alpha-actinins (1, 2, and 3) (Leberl et al. 2010), coding sequences for two further alpha-actinins were detected (Table 2), suggesting a high adaptability of this protein family. A recent proteome study on detergent-resistant cytoskeleton of

T. gallinarum identified several hypothetical proteins with a potential role in the formation of cytoskeletal filaments (Preisner et al. 2016). homologues of these hypothetical proteins were shown to localize with previously described cytoskeletal structures, such as , costa and pelta, suggesting common structural properties with metazoal intermediate filament proteins, while not demonstrating a homology in sequence. In the H. meleagridis transcriptome several contigs encoding hypothetical proteins with GO domains related to cytoskeleton could be found, however their association with the architecture of cytoskeleton still has to be investigated.

Metabolism

Electron microscopic studies of H. meleagridis cells suggested the ingestion of bacteria and rice starch particles via phagocytosis into the protozoan (Mazet et al.

2008). More recently, by observing the cohabitation of H. meleagridis with green fluorescent protein (GFP)-tagged E. coli DH5α, it was shown that the parasite could

14 take up E. coli most likely as a food source (Ganas et al. 2012). The identification of transcripts encoding enzymes which can digest starch, such as amylases, as well as all ten enzymes of the glycolytic pathway (Fig. 4) underlines the ability of H. meleagridis to metabolize carbohydrates irrespective of the source, either rice starch, bacteria or host cells.

Transcripts encoding for arginine deiminase, carbamate kinase and ornithine carbamoyl transferase suggest the presence of the arginine dihydrolase pathway.

This pathway was not addressed by Klodnicki et al. (2013), since the central enzyme, arginine deiminase (Contig332), was reported absent in their dataset. Interestingly, the almost identical transcript is present in the dataset of Klodnicki et al. (2013)

(GAAM01005540), which leads to the conclusion that the annotation was different.

The sequence shows an amidinotransferase domain (pfam02274; E-value=8.69E-

38), characteristically present in arginine deiminase and 387 out of 422 amino acids match to the arginine deiminase hit (COG2235; E-value=3.51E-33), suggesting a H. meleagridis arginine deiminase homologue. The detection of the arginine dihydrolase pathway indicates that H. meleagridis has the potential of using L-arginine as a direct energy source. Apart from providing an energy source for the parasite, this pathway could present a mechanism for enabling survival of the parasite within the host. L- arginine is necessary for the production of nitric oxide (NO), which is important for the innate immune response as it is produced by activated macrophages (Bronte and

Zanovello 2005). Several including bacteria and protozoa intervene with their host’s arginine metabolism, by competing with the host’s NO synthase for the common substrate L-arginine, thereby starving the host of arginine with consequences on NO production (Das et al. 2010). Probably the best known example of such an interaction is the human intestinal parasite lamblia. The

15 consumption of L-arginine by G. lamblia reduces the proliferation of intestinal epithelial cells in vitro (Stadelmann et al. 2012) and the secreted giardial arginine deiminase actively diminishes T-cell proliferation in vitro (Stadelmann et al. 2013).

Therefore, as a consequence of utilizing arginine, G. lamblia is not only capable of generating energy for its own growth, but at the same time interferes with the protective mechanism of host immune cells. Based on the presence of enzymes involved in the arginine dihydrolase pathway, the phenomenon of arginine starvation exhibited by G. lamblia was recently also suggested for D. fragilis (Barratt et al.

2015). The presence of the same enzymes in T. vaginalis (Yarlett et al. 1996) and E. histolytica (Elnekave et al. 2003), indicates that arginine starvation mechanism might be widely distributed among protozoan parasites. Furthermore, the presence of the arginine dihydrolase pathway enzymes in the H. meleagridis transcriptome suggests that this protozoan parasite might utilize the arginine starvation mechanism to deplete host arginine stock and thereby interact with its immune system.

Hydrogenosome

Histomonas meleagridis is an anaerobic protozoan harboring hydrogenosomes for the generation of energy (Mazet et al. 2008). Similar to other trichomonads (Müller

1993), hydrogenosomes of H. meleagridis are presumed to carry out the process of

ATP generation by the combinatorial effect of hydrogenosomal metabolic enzymes, transcripts of which were identified including malic enzyme, iron hydrogenase, ferredoxin, A-type flavoprotein, pyruvate ferredoxin oxidoreductase (A, D and E subunits), adenylate kinase, NADH dehydrogenase (51kDa and 24kDa subunits),

NADH:flavin oxydoreductase, succinyl coenzyme A synthetase (α- and β-subunits), acetyl-CoA acetyltransferase (Supplementary Material Table S2). The majority of these enzymes were also reported in the cDNA library sequencing study (Klodnicki et

16 al. 2013), with the exception of gyloxylate reductase whose transcript was not present in our dataset. Glyoxylate reductase seems to be present in T. vaginalis hydrogenosomes. Its T. vaginalis sequence possesses N-terminal targeting peptide for import into hydrogenosomes (TP) and it was identified in the proteome of T. vaginalis hydrogenosomes (Henze et al. 2007). However, considering that the starting material in the Klodnicki et al (2013) study was a xenic culture and the reported glyoxylate reductase sequence has the homology to a sequence from bacterium Anaerolinea thermophila, it is questionable whether the detected transcript stems from H. meleagridis or co-cultivating bacteria.

In addition to hydrogenosomal metabolic enzymes, transcripts encoding enzymes involved in oxygen stress response, Fe-S biosynthesis( IscS-like protein, iron hydrogenase assembly protein), mitochondrial/hydrogenosomal chaperones

(chaperonin cpn60, mitochondrial heat shock proteins 70, Hsp20), hydrogenosomal membrane proteins (Hmp31- and Hmp35-like) and Tim17 family protein were identified (Supplementary Material Table S2). Homologues of Hmp 31 were shown to exist in T. vaginalis and T. gallinae (Dyall et al. 2000; Tjaden et al. 2004). In H. meleagridis 3 transcripts encoding for homologues were detected (Supplementary

Material Table S2), corroborating the results from the cDNA sequencing study

(Klodnicki et al. 2013). This is a mitochondrial carrier protein family member, of which five paralogs localized in the inner hydrogenosomal membrane function as ADP/ATP carrier (Dyall et al. 2000; Tjaden et al. 2004). In contrary to Hmp 31, T. vaginalis

Hmp35 is localized in the outer hydrogenosomal membrane and its function is not known (Dyall et al. 2003) and until now only a homologue in D. fragilis was shown to exist (Barratt et al. 2015). In H. meleagridis a single transcript was detected

(Supplementary Material Table S2), supporting the data reported by the cDNA

17 sequencing study (Klodnicki et al. 2013). Identification of a transcript encoding

Tim17-like protein indicates the existence of a protein translocase in the inner mitochondrial membrane (TIM) complex - Tim 22 like complex (Supplementary

Material Table S2). The open reading frame encoding for Tim17-like protein could not be found in the dataset reported by Klodnicki et al. (2013), even though this contig was not sorted out in the initial Blastn comparison of our and their dataset. The reason for this is due to a homology within a non-coding region of this contig that was sufficient to sustain the applied threshold (E-value ≥ 1E-5). Even though TIM22

(consisting of Tim17/22/23 proteins) complex was shown to facilitate the assembly of mitochondrial carrier proteins in yeast and humans, it is still unclear whether this is the case for T. vaginalis (Neupert and Herrmann 2007; Rada et al. 2011). A homologue of Tim23 was reported in the cDNA sequencing study (Klodnicki et al.

2013), however it is questionable whether this is a Tim23 homologue as conserved domain search analysis using the specified sequence (GAAM01005229) indicated a protein containing enolase domain (cd03313; E-value=1.06E-25).

Adaptation to Dynamic Environmental Conditions

Histomonas meleagridis has developed various mechanisms of adaptation to changing environmental conditions. This is most evident by its lifecycle including survival within paratenic hosts such as eggs of the intestinal nematode H. gallinarum and the earthworm (Graybill and Smith 1920; Norton et al. 1999), or the ability to survive for short period in the faeces of the host (Lotfi et al. 2012). As an organism which favors a microaerophilic environment, H. meleagridis must be able to cope with varying degrees of fluctuating oxygen levels and reactive oxygen species (ROS), in vivo and in vitro. Indeed, analysis of its transcriptome identified a list of transcripts encoding enzymes shown to play an essential role in this process in other

18 trichomonads, such as superoxide dismutase (SOD) (Ellis et al. 1994; Gould et al.

2013; Lindmark and Müller 1974), ferredoxin (Gould et al. 2013), rubrerythrin (Gould et al. 2013; Pütz et al. 2005), hydrogenosomal oxygen reductase (Smutná et al.

2009), thioredoxin reductase, thioredoxin peroxidase (Gould et al. 2013) and several thioredoxin family proteins (Coombs et al. 2004; Mentel et al. 2008) (Supplementary material Table S2).

In response to dynamic changes in the surrounding environment, robust chaperone surveillance is required to stabilize the cell by countering dynamic up- and downshifts of protein synthesis that might occur (Kim et al. 2013). Chaperones contribute to the precise folding of proteins to its three-dimensional conformation, crucial for their specific biological function (Kim et al. 2013). The analysis of the H. meleagridis transcriptome identified a range of transcripts encoding heat shock proteins (Hsp20, Hsp70, Hsp90) and different chaperonin subunits (Supplementary

Material Table S2). Hsp70 seems to be the most represented group of heat-shock proteins in H. meleagridis, with transcripts homologous to mitochondrial and cytosolic types identified (Supplementary Material Table S2). Hsp70 proteins are highly conserved in all living organisms and they are crucial for the correct folding of stress- accumulated, misfolded proteins preventing protein aggregation (Hartl et al. 2011;

Kim et al. 2013). In T. vaginalis, their up-regulation in response to heat-shock and oxidative stress was reported (Davis-Hayman et al. 2000), although RNA sequencing data show their down regulation shortly after the exposure to high oxygen levels

(Gould et al. 2013).

Chaperonins act in the same basic mechanism as Hsp70s, but, as opposed to them, they function downstream in the final folding of proteins which failed to reach their native state after interacting with Hsp70 (Hartl et al. 2011). The chaperonin

19 containing T-complex protein - TCP-1 (CCT) also known as TCP-1 ring complex

(TRiC) is crucial for the survival of the cell due to its involvement in the folding of essential cytosolic proteins (Spiess et al. 2004), a function not to be substituted by other chaperones. This is a multimeric complex composed of eight different subunits

(Kabir et al. 2011), of which six (alpha, gamma, delta, epsilon, zeta and eta) have been identified in the H. meleagridis transcriptome (Supplementary Material Table

S2). Aside to CCT chaperone subunits, transcripts reported to encode essential cytosolic substrates of CCT (Spiess et al. 2004) were also identified. These include

α- and β-tubulin, actin polymerization factor, protein phosphatase PP2A regulatory subunit B, coatomer β subunit, guanine nucleotide binding protein β subunit and SET domain protein (Supplementary Material Table S2).

Mechanisms Facilitating Pathogenesis

Molecular mechanisms of H. meleagridis pathogenesis are unknown. However, inferences can be made from other protozoa for which substantially more data is available. Therefore mining for molecular signatures of virulence, such as adhesion proteins, surface antigens, lectins, cysteine peptidases, GP63-like proteins, phospholipases, pore-forming proteins and other virulence factors could give the basis for hypotheses on possible pathogenic mechanisms.

Adherence of the to host target cells is a pivotal step in establishing an infection and molecules mediating such a mechanism are potential virulence factors. Adhesion proteins (AP) have been reported to play a critical role in the cytoadherence of T. vaginalis to host cells (Petrin et al. 1998). In the H. meleagridis transcriptome of both strains, several adhesin homologs were identified including AP-

33 (5 transcripts), AP-51 (3 transcripts) and AP-65 (2 transcripts) which might play a similar role in cytoadhesion (Supplementary Material Table S2). However, one has to 20 keep in mind that similar to T. vaginalis, all these adhesin homologues actually encode hydrogenosomal carbohydrate metabolic enzymes: alpha-succinyl coenzyme

A synthetase (AP-33), beta-succinyl coenzyme A synthetase (AP-51) and malic enzyme (AP-65). A list of T. vaginalis studies demonstrated that all these enzymes are truly localized in hydrogenosomes, but upon the parasite contact with host cells, or exposure to high iron concentrations they can be found on the parasite’s surface, therefore showing a “moonlighting” behavior (Hirt et al. 2007). Dual localization was reported for the H. meleagridis homologue of succinyl coenzyme A synthetase (or

AP-33), whereas for the homologue of malic enzyme (or AP-65) such feature could not be shown (Mazet et al. 2008). This indicates that some H. meleagridis adhesin homologues might be moonlighting proteins and besides encoding intracellular enzymes they could also localize on the host cell surface to mediate cytopathogenic effects.

Other potential cytoadhesion molecules are homologues of BspA-like surface antigens (6 transcripts) and legume-like lectin family proteins (4 transcripts)

(Supplementary Material Table S2). Hallmark of BspA-like surface antigen is a specific type of leucine rich repeats (LRR), TpLRR (Tp stands for Treponema pallidum), which was shown to be involved in protein-protein interactions and host– microbe interaction (Hirt et al. 2011). Six contigs encoding BspA-like surface antigens were detected in H. meleagridis transcriptome. BspA-like surface antigens were reported in protozoa such as, T. vaginalis, D. fragilis, E. histolytica (Barratt et al.

2015; Carlton et al. 2007; Davis et al. 2006). However, numbers of different BspA-like coding sequences were much higher in these microorganisms than those in the present study. Similarly to kinase sequences, the reason for this can be seen in the combination of not high sequencing depth and the mRNA as starting material, which

21 might have caused that a certain number of genes with very low expression level or no expression in the applied material were missed. While for most of the protozoa the function of BspA-like surface antigens is still not resolved, in T. vaginalis they were shown to be over-expressed in cells binding extracellular matrix or when exposed to high iron concentration (Noël et al. 2010). Furthermore, considering the cell surface expression of one of the T. vaginalis BspA-surface antigens (TvBspA625) and its immune-reactivity with patient sera, possible roles in cytoadhesion and pathobiology of this parasite were suggested (Noël et al. 2010).

Legume-like lectins are a diverse family of proteins able to bind carbohydrates through their lectin L-type superfamily domain (Gupta 2012). Four transcripts encoding homologues of legume-like lectin family proteins from T. vaginalis were identified in H. meleagridis transcriptome (Supplementary Material Table S2). In T. vaginalis legume-like lectins were suggested to be involved in cytoadhesion via binding to sugar moieties of host cell glycoconjugates (Hirt et al. 2007). It is conceivable that H. meleagridis legume-like lectins could have similar functions, since all 4 transcripts were annotated to be located at the membrane, additionally to their carbohydrate-binding domain.

Cysteine peptidases (CP) play key roles in the biology and pathogenicity of different parasites (Sajid and McKerrow 2002). Analysis of the H. meleagridis transcriptome identified cysteine peptidases as the biggest group of peptidases. The majority of cysteine peptidase transcripts (59%) encoded clan CA, family C1, cathepsin L-like and clan CD, family C13, asparaginyl endopeptidase-like cysteine peptidases (Supplementary Material Table S2). The comparison of deduced amino acids demonstrated separation of contigs encoding for clan CA, family C1, cathepsin

L-like cysteine peptidases into two big clusters each containing a number of paralogs

22

(Supplementary Material S1- Supplementary Material Fig. S1). The deduced amino acid alignment of clan CD, family C13, asparaginyl endopeptidase-like or legumain cysteine peptidases showed 5 different proteins with the homology in the region containing conserved domain for peptidase C13 family (Supplementary Material S2).

In T. vaginalis, CPs (TvCPs) have been implicated in mediating cytotoxicity, virulence and seem to be necessary for adhesins to cytoadhere (Petrin et al. 1998). The majority of TvCPs were identified as clan CA family C1 cathepsin L-like cysteine peptidases, with only few CPs falling into the clan CD family C13 asparaginyl endopeptidase-like subfamily. Furthermore, for the avian trichomonad Trichomonas gallinae, Clan CA, family C1, cathepsin L-like cysteine peptidases have been shown to exert cytotoxicity on host cells (Amin et al. 2012).

GP63-like proteins or leishmanolysins are metallopeptidases found on the surface of human intracellular parasite major and considered to be involved in the virulence and pathogenicity of this parasite (Yao et al. 2003). Analysis of the H. meleagridis transcriptome identified one transcript encoding a GP63-like protein. In T. vaginalis 16 GP63-like proteins were identified by proteomic analysis of surface proteins and a member of this family has been reported to play a role in its virulence (Hirt et al. 2011). The possible role of this protein in virulence of H. meleagridis is supported by gene ontology that classified this transcript as a membrane metallopeptidase involved in proteolysis and cell adhesion.

Phospholipases catalyze the hydrolysis of phospholipids, which, together with proteins, represent the major building blocks of cell membranes. In this context, the function of phospholipases could be seen in a membrane disruption process that occurs during host cell invasion. Indeed, destabilization of cell membranes by hydrolysis of phospholipids has been reported in connection with host cell invasion of

23 different pathogens (Ghannoum 2000). Analysis of the H. meleagridis transcriptome identified transcripts encoding different phospholipases, such as phospholipase B (3 transcripts), Zn2+-dependent phospholipase C (3 transcripts) and phosphatidylinositol- specific phospholipase C (1 transcript) (Supplementary Material Table S2).

Pore-forming toxins (PFTs) are molecules able to integrate themselves into the lipid bilayer of the target cell, forming transmembrane channels that lead to cell death by osmotic lysis as a consequence of membrane permeabilization (Bischofberger et al. 2012). The repertoire of PFTs seems to be widely spread in H. meleagridis transcriptome and includes a considerable number of different surfactant B-like or saposin-like proteins (SALIPs) (18 transcripts). SALIPs can be found in phylogenetically distant eukaryotes (from to humans) but were never reported in prokaryotes (Bruhn 2005). The most studied SALIPs in protozoa are amoebapores of E. histolytica, shown to be pore-forming polypeptides capable of creating ion channels on cell membranes (Leippe 1997). As a consequence of this feature, amoebapores were demonstrated to induce cell lysis of both bacteria and host cells

(Leippe and Herbst 2004). Involvement in host-cell lysis is hypothesised for T. vaginalis SALIPs (Hirt et al. 2011) and supported by the identification of these molecules in the secretome of T. vaginalis (Riestra et al. 2015).Considering available data on these small PFTs, the identification of transcripts encoding SALIPs in the H. meleagridis transcriptome suggests a possible dual role in this parasite i) host-cell disruption mechanisms via formation of ion channels in the host cell membrane and ii) destruction of ingested bacterial cells in the vacuole.

In addition to data mining for already described virulence factors in related protozoa, a BLAST search using a custom made virulence database, consisting of the MvirDB (Zhou et al. 2007) and ProtVirDB (Ramana and Gupta 2009) databases,

24 was performed. The search resulted in 153 hits to 32 different proteins (Fig. 5). Hits matching to E. histolytica small GTPase Rab proteins, Rab11B, Rab5 and Rab 7A were most abundant, which is not surprising considering the high amount of Rab transcripts in the H. meleagridis transcriptome. Studies on E. histolytica demonstrated a role in virulence for all three proteins. Rab 5 and Rab7A, an isotype of Rab7, were suggested to be involved in the formation of the “prephagosomal vesicle”, a site for processing, activation and temporary storage of cysteine peptidases (Saito-Nakano et al. 2004), whereas the Rab7B isotype is of importance for the transport of cysteine proteases to the lysosome (Saito‐Nakano and Mitra

2007). Finally, Rab11B was demonstrated to play a pivotal role in the regulation of transport and secretion of major cysteine peptidases suggesting a concerted action of all three proteins in transport and secretion of pathogenic factors, such as cysteine peptidases (Mitra et al. 2007). Implication of Rab proteins in the virulence of H. meleagridis is still unknown, but inferences of possible pathogenicity mechanims can be made. There is, on one hand, a high number of transcripts encoding Rab proteins in H. meleagridis and on the other data on their role in vesicle trafficking in eukaryotes. Considering this, the possible role of extracellular vesicles in the delivery of molecules involved in the pathogenicity, similar to mechanisms described in a number of other protozoa (Mantel and Marti 2014; Twu et al. 2013), should be considered.

Secretory Signal Peptides

The majority of secreted proteins originate from a precursor with amino-terminal extensions which functions as the signal peptide and targets proteins to the endoplasmic reticulum for a later passage to the secretory pathway (Zimmermann et al. 2011). However, presence of such a signal peptide does not necessarily imply the

25 protein’s secretory destination, as many integral membrane proteins additionally possess a signal sequence called stop transfer sequence (Spencer-Yost et al. 1983).

In order to identify potential secretory proteins, SignalP 4.1 server (Petersen et al.

2011) was used. A list of transcripts encoding proteins that could potentially be destined to the secretory pathway was identified (Supplementary Material Table S3), with the most prominent groups being hypothetical proteins, SALIPs/surfactant B homologues and Clan CA, family C1, cathepsin L-like cysteine peptidases.

Identification of secretory signal for the latter two groups, cysteine peptidases and surfactant B homologues is not surprising, as it correlates to data reported from other protozoa. It also supports the proposed roles for these two proteins groups in the virulence of H. melegaridis, as discussed above.

Potential Chemotherapeutic Targets

The infection with H. meleagridis has been successfully controlled for decades by the application of nitroimidazoles and nitrofurans for therapy and prophylaxis. However, due to potential consumer risks, these substances have been banned in the EU and the USA. Although the aminoglycoside antibiotic, paramomycin, can be used as prophylaxis, concerns are raised with regard to an increased antibiotic resistance

(Liebhart et al. 2017).

In order to apply the newly gained transcriptome data for identifying possible alternative antiparasitic compounds, an in silico drug screening was performed. The screening used transcriptome data as potential targets and a list of chemical compounds with known and unknown pharmacological action available in the

DrugBank database (Wishart et al. 2008). Since major natural hosts of H. meleagridis are turkeys (Meleagris gallopavo) and chickens (Gallus gallus), the screening procedure was subjected to data from avian genomics. In this way, it was assured

26 that identified compounds would not be harmful to the host. The result of the screening identified 60 drugs offering the potential to act as anti-histomonal. Potential candidates including chemical compounds and their protein targets (UniProt target

ID) are shown in table 3. Among other compounds, an anti-protozoal drug

Nitazoxanide (DB00507) was identified, supporting the applied screening procedure.

Nitazoxanide was shown at low concentrations to be effective to inhibit in vitro growth of three anaerobic protozoa; E. histolytica [IC50= 0.017μg/ml], G. intestinalis

[IC50=0.004μg/ml] and T. vaginalis [IC50=0.034μg/ml] (Cedillo-rivera et al. 2002). The proposed mechanism of action of Nitazoxanide is the inhibition of the pyruvate ferredoxin oxidoreductase (PFOR) enzyme, an essential hydrogenosomal enzyme involved in anaerobic energy metabolism (Hoffman et al. 2007). Considering the effectiveness of the drug against the above mentioned parasites, it is plausible that similar results might be expected when used against H. meleagridis. To our knowledge this is the first such kind of analysis for H. meleagridis and it should be taken as a starting point for identification of possible future treatment options, considering that i) various features involve efficacy in vivo and ii) licensing of drugs for food producing animals remains a difficult task.

Conclusions

Genomic information on H. meleagridis has been scarce and posed a major bottleneck in the perception of the parasite. Given the recent outbreaks of histomonosis in numerous countries, an urgent need to better understand the molecular biology of the parasite is warranted. To address this issue, de novo transcriptome sequencing study of a virulent and an attenuated H. meleagridis strains

27 was performed. Use of two phenotypically different strains, traced back to a single cell, broadened the spectrum of sequences considerably, showing the presence of various gene families. Considering the fact that the genome of this protozoan parasite is still not sequenced, the present study achieved to gain substantial genomic information and provides novel insights regarding H. meleagridis biological processes, such as metabolism, locomotion, cell signaling and its ability to adapt to dynamic environmental changes. In addition, findings presented are helpful to elucidate potential pathogenic mechanisms in respect to cytoadherence and host cell membrane disruption, together with the possible regulation of such processes. In view of the current treatment options, the transcriptomic data enabled us to explore and identify additional drugs holding the potential to act as anti-histomonad. Based upon the recruited data, the present study makes an important contribution to our understanding of the molecular biology of H. meleagridis and more generally of trichomonads.

Methods

Histomonas meleagridis culture and RNA extraction: Monoxenic mono- eukaryotic H. meleagridis strains, H. meleagridis/Turkey/Austria/2922-C6/04-

10x/DH5α and H. meleagridis/Turkey/Austria/2922-C6/04-290x/DH5α, were used for the study. The original culture was established by micromanipulation (Hess et al.

2006) and was passaged as described, which resulted in successful attenuation

(Ganas et al. 2012; Hess et al. 2008). Both, virulent and attenuated, xenic cultures were monoxenized by substituting the xenic microbiological flora with E. coli DH5α 28

(Ganas et al. 2012). Both strains were in vitro propagated in Medium 199 (Gibco™,

Thermo Fisher Scientific, Waltham, MA, USA) containing 15% heat-inactivated FBS

(Gibco ™, Thermo Fisher Scientific, Waltham, MA, USA) and 0.25% rice starch (Carl

Roth GmbH + Co. KG, Karlsruhe, Germany), as previously described (Ganas et al.

2012). Prior to harvesting of cultures they were incubated for 48 hours with decreased amount of rice starch to minimize polysaccharide disturbance of RNA extraction. Parasites were harvested following a purification protocol reported previously (Bilic et al. 2009), consisting of consecutive washing and centrifugation steps, but lacking the step with Histopaque® 1077 (Sigma-Aldrich, Merck KGaA, St-

Louis, MI, USA). Prior to RNA extraction, parasite samples were pre-treated with

Fruit-mate™ for RNA purification (Takara Bio Europe, Saint-Germain-en-Laye,

France) to remove polysaccharides and total RNA was prepared using Trizol® reagent (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) according to manufacturer’s instructions.

Normalized cDNA library construction and high throughput sequencing:

Sequencing was performed externally by LGC Genomics GmbH (Berlin, Germany) on the GS FLX using the Roche/454 FLX+ chemistry. The following procedure was performed separately for each strain. Briefly, mRNA was purified from 10 μg total

RNA by mRNA-Only Eukaryotic mRNA Isolation Kit according to manufacturer’s instructions (Epicentre, Madison, WI, USA). Purified mRNA was used for cDNA synthesis and amplified according to the Mint-Universal cDNA Synthesis Kit

(Evrogen, Moscow, Russia) user manual, applying the oligo-dT primer to specifically detect and bind polyA tails of eukaryotic mRNA and by that to further reduce the contamination with bacterial transcripts. In order prevent bias due to differential gene expression, amplified cDNA was normalized using the Trimmer Kit (Evrogen, Mocow,

29

Russia), followed by further re-amplification for 18 cycles. Normalized cDNA was digested with SfiI and fragments larger than 800bp were ligated to SfiI-cut pDNR-lib

Vector (Clontech, Takara Bio Europe, Saint-Germain-en-Laye, France.) using the

Fast Ligation Kit (NEB, Ipswich, MA, USA). To create the cDNA library, 3-times desalted ligation was used to transform NEB10b competent cells (NEB, Ipswich, MA,

USA). Normalization efficiency was verified by sequencing 96 randomly chosen clones. The obtained cDNA library consisted of approximately two million clones. To create cDNA concatenates one half of the cells from the cDNA library were grown for

5 hours, plasmid DNA was purified using Qiagen Plasmid Maxi Kit (Qiagen, Hilden,

Germany), and digested by SfiI. cDNA inserts were gel purified (MinElute Gel

Extraction Kit; Qiagen, Hilden, Germany) and ligated to high-molecular-weight DNA using a proprietary Sfi-linker.

Library generation for the 454 FLX sequencing was carried out according to the manufacturer’s standard protocols (Roche/454 life sciences, Branford, CT 06405,

USA). Briefly, the concatenated inserts were sheared randomly by nebulization to fragments ranging in size from 400 bp to 900 bp. These fragments were end polished and the 454 A and B adaptors containing identifier tags (MIDs) that are required for the emulsion PCR and sequencing were ligated to the ends of the fragments. The resulting fragment library, containing libraries from both strains, was sequenced on 1 picotiterplate (PTP) on the GS FLX using the Roche/454 FLX+ chemistry. Prior to assembly, the sequence reads were screened for the Sfi-linker that was used for concatenation, the linker sequences were clipped from the reads and the clipped reads assembled to individual transcript contigs using the Roche/454 Newbler software at default settings (454 Life Sciences Corporation, Software Release: 2.8

(20120726_1306)).

30

Removal of bacterial contigs and generation of the reference transcriptome: Following the assembly, two contig sets of 4543 and 4508 were generated from the virulent and attenuated strain of H. meleagridis, respectively. Due to the monoxenic nature of the H. meleagridis cultures used in this study, a significant proportion of sequences originating from the bacterial background was expected. To remove contaminant bacterial sequences, BLASTn (E-value ≤ 1E-6, identity ≥ 80% and query covered at ≥100 bp or ≥ 10% whichever was smaller) was used to identify

E. coli sequences that were afterwards discarded. The removal of E. coli sequences resulted in the generation of 2890 and 2943 contigs from the virulent and the attenuated strain, respectively. Subsequently, a hybrid assembly was created from the two sets of contigs by applying CAP3 with default parameters (Huang and Madan

1999) followed by CD-HIT (Huang et al. 2010) with default parameters to generate a reference transcriptome database for H. meleagridis consisting of 3356 contigs.

Applying BLASTn (E-value ≥1E-5) the 3356 contigs were further compared with the

3425 contigs reported by Klodnicki et al. (2013).

Contig quality assurance: To assess the quality of contigs, the total number of reads (1173830) from the two H. meleagridis strains used in this study was mapped back to the final set of 3356 contigs using BWA-MEM algorithm (Li 2013) (Table 1).

Annotation: All the contigs were searched for homologous genes to assign gene ontology (GO) terms to each, using the mapping and annotation functionality of the Blast2GO software suit (Conesa and Götz 2008). Contigs were also subjected to

InterPro scans (Hunter et al. 2009) through the Blast2GO interface and any new GO term identified was assigned to the contig. Annotations were carried out with E-value cutoff of ≤1E-6. Finally, annotations were manually inspected for any errors arising from the automated annotation procedure and correct GO terms were manually

31 assessed and assigned after numerous rounds of BLAST searches. All contigs and their pertinent annotation are listed in the Supplementary Material Table S2. In order to differentiate between paralogs and isoforms nucleic acid and deduced amino acid sequences of contigs coding for the same protein were aligned. Alignments were performed with Accelrys Gene, version 2.5 (Accelrys, San Diego, CA) and Lasergene

(DNASTAR Inc.) software packages. In order to confirm the annotation and predict the protein structure deduced amino acid sequences were analyzed with the NCBI

Conserved Domain search. Phylogenetic trees using deduced amino acid sequences of contigs with identical annotation were performed by MegAlign module of

Lasergene v14 software package (DNASTAR Inc.) with default settings. The robustness of the tree was determined by bootstrap re-sampling of the multiple– sequence alignments (1000 sets).

Identification of peptidases: All contigs were compared to protein sequences in the MEROPS ‘pepunit.lib’ database (Rawlings et al. 2006) to identify putative H. meleagridis proteases. A BLASTx was performed with E-value ≤1E-3.

Identification of virulence factors: To access a larger data pool of known virulent factors, a custom BLAST database was generated consisting of sequences of all virulent factors included in the virulence databases MvirDB (Zhou et al. 2007) and ProtVirDB (Ramana and Gupta 2009). A BLASTx search was performed with E- value ≤1E-20 and identity ≥40%.

Identification of a secretion signal peptide: The transcriptome of H. meleagridis was submitted to the SignalP 4.1 Server (Petersen et al. 2011) using default parameters and organism group: Eukaryotes. This bioinformatic tool was developed to predict the secretory fate of a protein by identifying the signal for the

32 peptidase I cleavage site, and to discriminate between such signal peptides and transmembrane regions.

Drug screening: Drug screening was performed to identify potential drug targets in the H. meleagridis transcriptome. Drug target sequences were downloaded from DrugBank database (Wishart et al. 2008) (filename: all_target.fasta) consisting of proteins targeted by drugs with known and unknown pharmacological action. In order to eliminate drug targets that are also present in turkeys and chickens, the following procedure was performed. The cDNA sequences of Meleagris gallopavo

(Turkey) (filename: Meleagris_gallopavo.UMD2.cdna.all) and Gallus gallus (Chicken)

(filename: Gallus_gallus.Galgal4.cdna.all) were downloaded from Ensembl (Hubbard et al. 2002). A tBLASTn was performed with e-value ≤1e-3 using DrugBank target sequences and turkey cDNA, or chicken cDNA. Sequences resulting as matches were excluded. Remaining sequences (E-value ≥1E-3) were taken forward for a second tBLASTn (E-value ≤1E-18) with the H. meleagridis transcriptiome. Resulting matches were included as potential drug candidates.

Acknowledgements

We would like to acknowledge the Platform Bioinformatics and Biostatistics,

Department of Biomedical Sciences, University of Veterinary Medicine Vienna for

33 their support in bioinformatics analysis. This work was funded by the Austrian national fund (FWF) project number P25519-B25.

34

References

Alsmark UC, Sicheritz-Ponten T, Foster PG, Hirt RP, Embley TM (2009)

Horizontal gene transfer in eukaryotic parasites: a case study of Entamoeba

histolytica and Trichomonas vaginalis. Methods Mol Biol 532:489–500

Alvarez-Sánchez ME, Avila-González L, Becerril-García C, Fattel-Facenda L V,

Ortega-López J, Arroyo R (2000) A novel cysteine proteinase (CP65) of

Trichomonas vaginalis involved in cytotoxicity. Microb Pathog 28:193–202

Amin A, Nöbauer K, Patzl M, Berger E, Hess M, Bilic I (2012) Cysteine

peptidases, secreted by Trichomonas gallinae, are involved in the

cytopathogenic effects on a permanent chicken liver cell culture. PLoS ONE

7(5):e37417

Anamika K, Bhattacharya A, Srinivasan N (2008) Analysis of the protein kinome of

Entamoeba histolytica. Proteins 71:995-1006

Anantharaman V, Abhiman S, De Souza R, Aravind L (2011) Comparative

genomics uncovers novel structural and functional features of the heterotrimeric

GTPase signalling system. Gene 475:63–78

Barratt J, Cao M, Stark D, Ellis J (2015) The transcriptome sequence of

Dientamoeba fragilis offers new biological insights on its metabolism, kinome,

degradome and potential mechanisms of pathogenicity. Protist 166:389-408

Barratt J, Gough R, Stark D, Ellis J (2016) Bulky trichomonad genomes: Encoding

a Swiss army knife. Trends Parasitol 32:783-797

35

Bastian M, Heymann S, Jacomy M (2009) Gephi: An Open Source Software for

Exploring and Manipulating Networks. ICWSM

Baugh JM, Viktorova EG, Pilipenko E V (2009) Proteasomes can degrade a

significant proportion of cellular proteins independent of ubiquitination. J Mol Biol

386:814–827

Bilic I, Leberl M, Hess M (2009) Identification and molecular characterization of

numerous Histomonas meleagridis proteins using cDNA library. Parasitology

136:379–391

Bilic I, Jaskulska B, Souillard R, Liebhart D, Hess M (2014) Multi-locus typing of

Histomonas meleagridis isolates demonstrates the existence of two different

genotypes. PLoS ONE 9(3):e92438

Bischofberger M, Iacovache I, Gisou Van Der Goot F (2012) Pathogenic pore-

forming proteins: Function and host response. Cell Host Microbe 12:266–275

Bosch DE, Siderovski DP (2013) G protein signaling in the parasite Entamoeba

histolytica. Exp Mol Med 45(3):e15

Bright LJ, Kambesis N, Nelson SB, Jeong B, Turkewitz AP (2010)

Comprehensive analysis reveals dynamic and evolutionary plasticity of Rab

GTPases and membrane traffic in Tetrahymena thermophila. PLoS Genet

6(10):e1001155

Bronte V, Zanovello P (2005) Regulation of immune responses by L-arginine

metabolism. Nat Rev Immunol 5:641-654

Bruhn H (2005) A short guided tour through functional and structural features of

saposin-like proteins. Biochem J 389(Pt 2):249–257 36

Cárdenas-Guerra R RE, Arroyo R, Rosa de Andrade I, Benchimol M, Ortega-

López J (2013) The iron-induced cysteine proteinase TvCP4 plays a key role in

Trichomonas vaginalis haemolysis. Microbes Infect 15:958–968

Carlton J, Hirt R, Silva J, Delcher A (2007) Draft genome sequence of the sexually

transmitted pathogen Trichomonas vaginalis. Science 315:207-212

Cedillo-rivera R, Chávez B, González-robles A, Vez BC, Gonz Lez-robles A,

Tapia A, Yépez-Mulia L (2002) In vitro effect of nitazoxanide against

Entamoeba histolytica, Giardia intestinalis and Trichomonas vaginalis

trophozoites. J Eukaryot Microbiol 49:201–208

Conesa A, Götz S (2008) Blast2GO: A comprehensive suite for functional analysis in

plant genomics. Int J Plant Genomics 2008: 619832

Coombs G, Westrop G, Suchan P, Puzova G, Hirt RP, Embley TM, Mottram JC,

Müller S (2004) The amitochondriate Trichomonas vaginalis contains

a divergent thioredoxin-linked peroxiredoxin antioxidant system. J Biol Chem

279:5249-5256

Das P, Lahiri A, Lahiri A, Chakravortty D (2010) Modulation of the arginase

pathway in the context of microbial pathogenesis: A metabolic enzyme

moonlighting as an immune modulator. PLoS Pathog 6(6):e1000899

Davis P, Zhang Z, Chen M, Zhang X, Chakraborty S, Stanley S (2006)

Identification of a family of Bsp-A like surface proteins of Entamoeba histolytica

with novel leucine rich repeats. Mol Biochem Parasitol 145: 111–116

Davis-Hayman SR, Shah PH, Finley RW, Lushbaugh WB, Meade JC (2000)

Trichomonas vaginalis: analysis of a heat-inducible member of the cytosolic

37

heat-shock protein 70 multigene family. Parasitol Res 86:608–612

Desler C, Suravajhala P, Sanderhoff M, Rasmussen M, Rasmussen LJ (2009) In

Silico screening for functional candidates amongst hypothetical proteins. BMC

Bioinformatics 10:289

Dyall SD, Koehler CM, Delgadillo-Correa MG, Bradley PJ, Plümper E,

Leuenberger D, Turck CW, Johnson PJ (2000) Presence of a member of the

mitochondrial carrier family in hydrogenosomes: conservation of membrane-

targeting pathways between hydrogenosomes and mitochondria. Mol Cell Biol

20:2488–2497

Dyall SD, Lester DC, Schneider RE, Delgadillo-Correa MG, Plümper E, Martinez

A, Koehler CM, Johnson PJ (2003) Trichomonas vaginalis Hmp35, a putative

pore-forming hydrogenosomal membrane protein, can form a complex in yeast

mitochondria. J Biol Chem 278:30548-30561

Eichinger L, Pachebat J, Glöckner G, M.-A. Rajandream M-A, Sucgang R,

Berriman M, Song J, Olsen R, Szafranski K, Xu Q, Tunggal B et al. (2005)

The genome of the social amoeba Dictyostelium discoideum. Nature 435:43–57

Ellis J, Yarlett N, Cole D (1994) Antioxidant defences in the microaerophilic

protozoan Trichomonas vaginalis: comparison of metronidazole-resistant and

sensitive strains. Microbiology140:2489-2494

Elnekave K, Siman-Tov R, Ankri S (2003) Consumption of L-arginine mediated by

Entamoeba histolytica L-arginase (EhArg) inhibits amoebicidal activity and nitric

oxide production by activated. Parasite Immunol 25:597-608

Galperin MY (2001) Conserved “hypothetical” proteins: New hints and new puzzles.

38

Comp Funct Genomics 2:14–18

Galperin MY, Koonin EV (2004) “Conserved hypothetical” proteins: prioritization of

targets for experimental study. Nucleic Acids Res 32:5452–5463

Ganas P, Liebhart D, Glösmann M, Hess C, Hess M (2012) Escherichia coli

strongly supports the growth of Histomonas meleagridis, in a monoxenic culture,

without influence on its pathogenicity. Int J Parasitol 42:893–901

Gerbod D, Edgcomb V, Noël C (2001) Phylogenetic position of the trichomonad

parasite of turkeys, Histomonas meleagridis (Smith) Tyzzer, inferred from small

subunit rRNA sequence. J Eukaryot Microbiol 48:498-504

Ghannoum M (2000) Potential role of phospholipases in virulence and fungal

pathogenesis. Clin Microbiol Rev 13:122–143

Gould SB, Woehle C, Kusdian G, Landan G, Tachezy J, Zimorski V, Martin WF

(2013) Deep sequencing of Trichomonas vaginalis during the early infection of

vaginal epithelial cells and amoeboid transition. Int J Parasitol 43:707-719

Graybill H, Smith T (1920) Production of fatal blackhead in turkeys by feeding

embryonated eggs of Heterakis papillosa. J Exp Med 31:647-55

Gruber J, Ganas P, Hess M (2017) Long-term in vitro cultivation of Histomonas

meleagridis coincides with the dominance of a very distinct phenotype of the

parasite exhibiting increased tenacity and improved cell yields. Parasitology

144:1253-1263

Gupta G (2012) Animal Lectins: Form, Function and Clinical Applications. Vol 1,

Springer-Verlag Wien, ISBN 978-3-7091-1065-2, 1108 p

39

Hartl F, Bracher A, Hayer-Hartl M (2011) Molecular chaperones in protein folding

and proteostasis. Nature 475:324-332

Henze K (2007) The Proteome of T. vaginalis Hydrogenosomes. In Tachezy J (ed)

Hydrogenosomes and Mitosomes: Mitochondria of Anaerobic Eukaryotes.

Microbiology Monographs vol 9, Springer, Berlin, Germany, pp 163–178

Hernández-Gutiérrez R, Avila-González L, Ortega-López J, Cruz-Talonia F,

Gómez-Gutierrez G, Arroyo R (2004) Trichomonas vaginalis: Characterization

of a 39-kDa cysteine proteinase found in patient vaginal secretions. Exp

Parasitol 107:125–135

Hess M, Kolbe T, Grabensteiner E, Prosl H (2006) Clonal cultures of Histomonas

meleagridis, Tetratrichomonas gallinarum and a Blastocystis sp. established

through micromanipulation. Parasitology 133:547–554

Hess M, Liebhart D, Grabensteiner E, Singh A (2008) Cloned Histomonas

meleagridis passaged in vitro resulted in reduced pathogenicity and is capable of

protecting turkeys from histomonosis. Vaccine 26:4187-4193

Hess M, Liebhart D, Bilic I, Ganas P (2015) Histomonas meleagridis - New insights

into an old pathogen. Vet Parasitol 208:67–76

Hirt R, Miguel N de, Nakjang S, Dessi D (2011) Trichomonas vaginalis

pathobiology: New insights from the genome sequence. Adv Parasitol 77:87-140

Hirt R, Noel C, Sicheritz-Pontén T, Tachezy J (2007) Trichomonas vaginalis

surface proteins: a view from the genome. Trends Parasitol 23:540-547

Hoffman PS, Sisson G, Croxen MA, Welch K, Harman WD, Cremades N, Morash

MG (2007) Antiparasitic drug nitazoxanide inhibits the pyruvate oxidoreductases 40

of Helicobacter pylori, selected anaerobic bacteria and parasites, and

Campylobacter jejuni. Antimicrob Agents Chemother 51:868–876

Honigberg B, Bennett C (1971) Light microscopic observations on structure and

division of Histomonas meleagridis (Smith). J Protozool 18:687-700

Huang X, Madan A (1999) CAP 3: A DNA sequence assembly program. Genome

Res 9:868–877

Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering

and comparing biological sequences. Bioinformatics 26:680-682

Hubbard T, Barker D, Birney E, Cameron G (2002) The Ensembl genome

database project. Nucleic Acids Res 30:38-41

Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D (2009)

InterPro: The integrative protein signature database. Nucleic Acids Res

37:D211--D215

Kabir MA, Uddin W, Narayanan A, Reddy PK, Jairajpuri MA, Sherman F (2011)

Functional subunits of eukaryotic chaperonin CCT/TRiC in protein folding. J

Amino Acids 2011:843206

Kim YE, Hipp MS, Bracher A, Hayer-Hartl M, Hartl FU (2013) Molecular chaperone

functions in protein folding and proteostasis. Annu Rev Biochem 82:323-355

Klodnicki ME, McDougald LR, Beckstead RB (2013) A genomic analysis of

Histomonas meleagridis through sequencing of a cDNA library. J Parasitol

99:264–269

Leberl M, Hess M, Bilic I (2010) Histomonas meleagridis possesses three α-actinins

41

immunogenic to its hosts. Mol Biochem Parasitol 169:101–107

Leippe M (1997) Amoebapores. Parasitol Today 13:178-183

Leippe M, Herbst R (2004) Ancient weapons for attack and defense: The pore-

forming polypeptides of pathogenic enteric and free living amoeboid protozoa. J

Eukaryot Microbiol 51:516-521

Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with

BWA-MEM. arXiv:1303.3997v1, [q-bio.GN]

Liebhart D, Ganas P, Sulejmanovic T, Hess M (2017) Histomonosis in poultry:

previous and current strategies for prevention and therapy. Avian Pathol 46:1-18

Lindmark D, Müller M (1974) Superoxide dismutase in the anaerobic flagellates,

Tritrichomonas foetus and Monocercomonas sp. J Biol Chem 249:4634-4637

Lotfi A, Abdelwhab E, Hafez H (2012) Persistence of Histomonas meleagridis in or

on materials used in poultry houses. Avian Dis 56:224-226

Mantel P, Marti M (2014) The role of extracellular vesicles in Plasmodium and other

protozoan parasites. Cell Microbiol 16:344–354

Mazet M, Diogon M, Alderete JF, Vivarès CP, Delbac F (2008) First molecular

characterisation of hydrogenosomes in the protozoan parasite Histomonas

meleagridis. Int J Parasitol 38:177–190

McDougald L (2005) Blackhead disease (Histomoniasis) in Poultry: A critical review.

Avian Dis 49:462-476

Mentel M, Zimorski V, Haferkamp P, Martin W, Henze K (2008) Protein import into

hydrogenosomes of Trichomonas vaginalis involves both N-terminal and internal

42

targeting signals: A case study of thioredoxin reductases. Eukaryot Cell 7:1750–

1757

Mitra BN, Saito-Nakano Y, Nakada-Tsukui K, Sato D, Nozaki T (2007) Rab11B

small GTPase regulates secretion of cysteine proteases in the enteric protozoan

parasite Entamoeba histolytica. Cell Microbiol 9:2112–2125

Müller M (1993) Review article: The . J Gen Microbiol 139:2879–

2889

Mullins C, Bonifacino JS (2001) The molecular machinery for lysosome biogenesis.

Bioessays 23:333-343

Muñoz C, San Francisco J, Gutiérrez B, González J (2015) Role of the Ubiquitin-

Proteasome Systems in the Biology and Virulence of Protozoan Parasites.

Biomed Res Int 2015:141526

Munsch M, Lotfi A, Hafez H, Al-Quraishy S (2009) Light and transmission electron

microscopic studies on trophozoites and cyst-like stages of Histomonas

meleagridis from cultures. Parasitol Res 104:683-689

Neupert W, Herrmann J (2007) Translocation of proteins into mitochondria. Annu

Rev Biochem 76:723-749

Noël CJ, Diaz N, Sicheritz-Ponten T, Safarikova L, Tachezy J, Tang P, Fiori PL,

Hirt RP (2010) Trichomonas vaginalis vast BspA-like gene family: evidence for

functional diversity from structural organisation and transcriptomics. BMC

Genomics 11:99

Norton R, Clark F, Beasley J (1999) An outbreak of histomoniasis in turkeys

infected with a moderate level of Ascaridia dissimilis but no Heterakis gallinarum. 43

Avian Dis 43:342-348

Nozaki T, Nakada-Tsukui K (2006) Membrane trafficking as a virulence mechanism

of the enteric protozoan parasite Entamoeba histolytica. Parasitol Res 98:179-

183

Paugam A, Bulteau AL, Dupouy-Camet J, Creuzet C, Friguet B (2003)

Characterization and role of protozoan parasite proteasomes. Trends Parasitol

19:55-59

Pereira-Leal JB, Seabra MC (2001) Evolution of the Rab family of small GTP-

binding proteins. J Mol Biol 313:889–901

Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating

signal peptides from transmembrane regions. Nat Methods 8:785–786

Petrin D, Delgaty K, Bhatt R, Garber G (1998) Clinical and microbiological aspects

of Trichomonas vaginalis. Clin Microbiol Rev 11:300–317

Preisner H, Karin EL, Poschmann G, Stühler K, Pupko T, Gould SB (2016) The

cytoskeleton of parabasalian parasites comprises proteins that share properties

common to intermediate filament proteins. Protist 167:526-543

Preininger AM, Hamm HE (2004) G protein signaling: insights from new structures.

Sci STKE 2004:re3

Pütz S, Gelius-Dietrich G, Piotrowski M (2005) Rubrerythrin and peroxiredoxin: two

novel putative peroxidases in the hydrogenosomes of the microaerophilic

protozoon Trichomonas vaginalis. Mol Biochem Parasitol 142:212-223

Rada P, Doležal P, Jedelský P, Bursac D, Perry A (2011) The core components of

44

organelle biogenesis and membrane transport in the hydrogenosomes of

Trichomonas vaginalis. PLoS ONE 6(9):e24428

Ramana J, Gupta D (2009) ProtVirDB: A database of protozoan virulent proteins.

Bioinformatics 25:1568–1569

Rawlings N, Morton F, Barrett A (2006) MEROPS: the peptidase database. Nucleic

Acids Res 34:D320–D325

Riestra A, Gandhi S, Sweredoski M, Moradian A, Hess S, Urban S, Johnson P

(2015) A Trichomonas vaginalis rhomboid protease and Its substrate modulate

parasite attachment and cytolysis of host cells. PLoS Pathog 11(12):e1005294

Saito-Nakano Y, Yasuda T, Nakada-Tsukui K, Leippe M, Nozaki T (2004) Rab5-

associated vacuoles play a unique role in phagocytosis of the enteric protozoan

parasite Entamoeba histolytica. J Biol Chem 279:49497–49507

Saito-Nakano Y, Mitra B (2007) Two Rab7 isotypes, EhRab7A and EhRab7B, play

distinct roles in biogenesis of lysosomes and phagosomes in the enteric

protozoan parasite Entamoeba histolytica. Cell Microbiol 9:1796-1808

Sajid M, McKerrow JH (2002) Cysteine proteases of parasitic organisms. Mol

Biochem Parasitol 120:1–21

Schuster FL (1968) Ultrastructure of Histomonas meleagridis (Smith) Tyzzer, a

Parasitic Amebo-Flagellate. J. Parasitol 54:725-737

Shannon P, Markiel A, Ozier O, Baliga N (2003) Cytoscape: a software

environment for integrated models of biomolecular interaction networks. Genome

Res 13:2498-2504

45

Smutná T, Gonçalves V, Saraiva L, Tachezy J (2009) Flavodiiron protein from

Trichomonas vaginalis hydrogenosomes: the terminal oxygen reductase.

Eukaryot Cell 8:47-55

Sorokin V, Kim ER, Ovchinnikov LP (2009) Proteasome system of protein

degradation and processing. Biochemistry (Moscow) 74:1411-1442

Spencer-Yost C, Hedgpeth J, Lingappa VR (1983) A stop transfer sequence

confers predictable transmembrane orientation to a previously secreted protein

in cell-free systems. Cell 34:759–766

Spiess C, Meyer AS, Reissmann S, Frydman J (2004) Mechanism of the

eukaryotic chaperonin: Protein folding in the chamber of secrets. Trends Cell

Biol 14:598-604

Stadelmann B, Merino MC, Persson L, Svärd SG (2012) Arginine consumption by

the intestinal parasite Giardia intestinalis reduces proliferation of intestinal

epithelial cells. PLoS ONE 7(9):e45325

Stadelmann B, Hanevik K, Andersson MK, Bruserud O, Svärd SG (2013) The

role of arginine and arginine-metabolizing enzymes during Giardia-host cell

interactions in vitro. BMC Microbiology 13:256

Stenmark H (2009) Rab GTPases as coordinators of vesicle traffic. Nat Rev Mol Cell

Biol 10:513–525

Takai Y, Sasaki T, Matozaki T (2001) Small GTP-binding proteins. Physiol Rev

81:153–208

Tanaka K (2013) The proteasome: From basic mechanisms to emerging roles. Keio

J Med 62:1-12 46

Tjaden J, Haferkamp I, Boxma B (2004) A divergent ADP/ATP carrier in the

hydrogenosomes of Trichomonas gallinae argues for an independent origin of

these organelles. Mol Microbiol 51:1439-1446

Twu O, de Miguel N, Lustig G, Stevens GC, Vashisht AA, Wohlschlegel JA, et

al. (2013). Trichomonas vaginalis exosomes deliver cargo to host cells and

mediate host ratio:parasite interactions. PLoS Pathog 9(7):e1003482.

Tyzzer E (1919) Developmental phases of the protozoon of “blackhead” in turkeys. J

Med Res 40:1-33

Vargas-Villarreal J, Mata-Cárdenas B, Palacios-Corona R, González-Salazar F,

Cortes-Gutierrez E, Martínez-Rodríguez H, Said-Fernández S (2005)

Trichomonas vaginalis: identification of soluble and membrane-associated

phospholipase A1 and A2 activities with direct and indirect hemolytic effects. J

Parasitol 91:5-11

Wennerberg K, Rossman KL, Der CJ (2005) The Ras superfamily at a glance. J

Cell Sci 118:843–846

Wishart D, Knox C, Guo A, Cheng D (2008) DrugBank: a knowledgebase for drugs,

drug actions and drug targets. Nucleic Acids Res 36:D901-D906

Yao C, Donelson J, Wilson M (2003) The major surface protease (MSP or GP63) of

Leishmania sp. Biosynthesis, regulation of expression, and function. Mol

Biochem Parasitol 132:1-16

Yarlett N, Martinez MP, Moharrami MA, Tachezy J (1996) The contribution of the

arginine dihydrolase pathway to energy metabolism by Trichomonas vaginalis.

Mol Biochem Parasitol 78:117–125

47

Zhou C, Smith J, Lam M, Zemla A (2007) MvirDB--a microbial database of protein

toxins, virulence factors and antibiotic resistance genes for bio-defence

applications. Nucleic Acids Res 35:D391–D394

Zimmermann R, Eyrisch S, Ahmad M (2011) Protein translocation across the ER

membrane. Biochim Biophys Acta 1808:912-924

48

Figure captions

Figure 1. Generation of the H. meleagridis reference transcriptome. Schematic representation of the steps used to generate H. meleagridis reference transcriptome.

49

Figure 2. Visualization of major gene families in form of a ‘transcriptome ball’, representing the whole transcriptome of H. meleagridis. For visualization purpose software Gephi 0.9.0 was used (Bastian et al. 2009). Each gene family is represented by the nodes protruding from the whole data set (transcriptome ball).

50

Figure 3. Clusters showing the distribution of MEROPS peptidase families. For data visualization, a cytoscape network (Shannon et al. 2003) was created from the

BLAST results, with the identified hit as the source node and its associated contig as the target node. Each cluster is represented by the source node (identified hit) and edge protruding towards the target node (associated contig). Multiple contigs can have one common source node. The number of target nodes and their edges serves as a quantitative visualization of a cluster.

51

Figure 4. Enzymes of the glycolytic pathway with the associated transcripts.

52

Figure 5. Clusters showing the distribution of various potential virulent factors, identified through the custom made virulent database. For data visualization, a cytoscape network (Shannon et al. 2003) was created from the blast results, with the source node as the identified hit and the target node as it’s associated contig. Each cluster is represented by the source node (identified hit) and edge protruding towards the target node (associated contig). Multiple contigs can have one common source node. The number of target nodes and their edges serves as a quantitative visualization of a cluster.

53

Table 1. General assembly statistics for the H. meleadridis.

Total number of H.meleagridis contigs 3356

Transcriptome size (nucleotides) 3673135

Average contig length (nucleotides) 1089

Longest contig (nucleotides) 4675

Shortest contig (nucleotides) 500

Average GC content (%) 38

Total number of reads 1173830

Average read length (nucleotides) 458

Number of mapped reads+ 830508

Number of reads of E.coli origin ++ 343322

+Total number of reads from the two strains mapped back to the final set of H. meleagridis contigs.

++Total number of un-mapped reads to the final set of H. meleagridis contigs originating from E.coli.

54

Table 2. Transcripts encoding cytoskeleton components identified in H. meleagridis trancriptome.

Protein Contig number actin Contig 2112, actin family protein Contig 444, 1375, 1403, 02413_HP, 03289_HP, 02010_HP actin like protein Contig 320, 1255, 1280, 1299 putative actin Contig 491 actin depolymerizing factor/ cofilin Contig 04525_LP fimbrin Contig1576, 03681_LP formin homology 2 domain containing protein Contig 175, 378, 1005, 1484, 01776_LP, 02440_LP F-actin capping alpha subunit Contig 692, 759, 766, 826 F-actin capping beta subunit Contig 688, 1661, 1764, 02879_HP, 03263_HP alpha actinin Contig 328, 2120 alpha actinin 1 Contig 1428 alpha actinin 2 Contig 4 alpha actinin 3 Contig 00331_LP, 00343_HP ARP 2/3 complex Contig 724, 1463, 1519, 1645, 2026 2058, 04293_HP, contig04335_HP, 04428_HP, 04544_LP coronin Contig 1058, 1193, 2106, 03023_LP alpha tubulin 1 Contig 690, 1423 beta tubulin Contig 231, 1213 intermediate dynein chain Contig 13 inner dynein arm light axonemal Contig 560 dynein intermediate chain ciliary related protein Contig 1032 dynein heavy chain family protein Contig 1209 dynein heavy chain axonemal like Contig 04452_HP kinesin motor domain containing protein Contig 8, 286,1903 myosin heavy chain kinase B like Contig 169 myosin heavy chain kinase Contig 1426 myosin tail family protein Contig 661, 01461_LP intraflagellar transport protein 46 homolog Contig 1512 intraflagellar transport protein 52 homolog Contig 396 intraflagellar transport protein 74 homolog Contig 153 intraflagellar transport protein 80 homolog Contig 1572 intraflagellar transport protein 81 homolog Contig 527 intraflagellar transport protein 122 homolog Contig 519 cilia and flagella associated protein Contig 02012_HP flagellar associated protein Contig 02398_HP flagellar protofilament ribbon protein Contig 508 RIB43a like with coiled–coils protein 2 Contig 430 radial spoke 3 protein Contig 493 radial spoke head protein 1 homolog Contig 03305_HP radial spoke head protein 3 homolog Contig 02236_HP, 02805_HP radial spoke head protein 4 homolog Contig 1936 radial spoke head protein 9 homolog Contig 02867_HP

55

Table 3. List of drugs holding potential anti-histomonal activity. Drugs and their known targets in H. meleagridis transcriptome identified by the associated contig are listed. Host specificity means that drug target does not exist in the host genome.

56

contig number UniProt for H. target UniProt target description DrugBank IDs E-value host specificity meleagridis ID target Q9NPH2 inositol-3-phosphate synthase 1 DB01840;DB04077;DB04516 Contig1124 0 Chicken, Turkey P05655 levansucrase DB02772 Contig1 0 Chicken, Turkey P22983 pyruvate, phosphate dikinase DB02522 Contig1102 0 Chicken, Turkey P94692 pyruvate-flavodoxin oxidoreductase DB00507;DB01987;DB02410 Contig1046 2.23e-168 Chicken, Turkey P62580 chloramphenicol acetyltransferase DB02703 Contig1 3.95e-154 Chicken, Turkey P31013 tyrosine phenol-lyase DB03897 Contig261 2.65e-151 Chicken, Turkey P52647 probable pyruvate-flavodoxin DB00698 Contig1046 5.73e-133 Chicken, Turkey oxidoreductase P13702 3-hydroxy-3-methylglutaryl-coenzyme A DB01992;DB03169;DB03518;DB03785 Contig1298 4.29e-87 Chicken, Turkey reductase Q9F0J6 rubredoxin-oxygen oxidoreductase DB03247 Contig1384 1.02e-60 Chicken, Turkey O43681 ATPase ASNA1 DB00171 Contig02876_H 1.21e-59 Turkey P P0AB74 D-tagatose-1,6-bisphosphate aldolase DB03026 Contig1698 6.46e-54 Chicken, Turkey subunit KbaY P40429 60S ribosomal protein L13a DB02494;DB07374;DB08437 Contig931 1.62e-52 Chicken, Turkey P36924 beta-amylase DB02379;DB02645;DB03323;DB03389 Contig1222 2.55e-49 Chicken, Turkey Q51990 morphinone reductase DB03247 Contig04183_H 2.34e-31 Chicken, Turkey P Q14894 ketimine reductase mu-crystallin DB05235 Contig588 4.63e-29 Turkey P71278 pentaerythritol tetranitrate reductase DB01676;DB02060;DB02508;DB03247;DB Contig04183_H 2.36e-27 Chicken, Turkey 03651; P DB04528;DB07373 P61927 60S ribosomal protein L37 DB02494;DB04602;DB04805;DB07374;DB Contig03409_LP 5.27e-25 Turkey 08437 P42593 2,4-dienoyl-CoA reductase DB03147;DB03247;DB03461; Contig04183_H 9.05e-25 Chicken, Turkey [NADPH] DB03698 P Q8ZNF3 UDP-4-amino-4-deoxy-L-arabinose-- DB02142;DB03579 Contig1382 1.93e-23 Chicken, Turkey oxoglutarate aminotransferase P28072 proteasome subunit beta type-6 DB00188;DB08515 Contig1850 5.22e-23 Turkey P16099 trimethylamine dehydrogenase DB03247 Contig1194 1.14e-21 Chicken, Turkey 57

P0A2K1 tryptophan synthase beta chain DB03171;DB04143;DB07732;DB07745;DB Contig214 1.63e-18 Chicken, Turkey 07748; DB07773;DB07890;DB07894;DB07925;DB 07951; DB07952;DB07953

58