<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 18S rDNA Sequence-Structure Phylogeny of the

2 (Kinetoplastea, ) with

3 Special Reference on

4

5 Alyssa R. Borges1,§, Markus Engstler1 & Matthias Wolf2,§*

6

7 1Department of Cell and Developmental Biology, Biocenter, University of Würzburg, Am

8 Hubland, 97074 Würzburg, Germany

9

10 2Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074

11 Würzburg, Germany

12

13 § These authors contributed equally to this work

14 * Corresponding author: [email protected] (M. Wolf)

15

16 Abstract

17 Background: Parasites of the order Trypanosomatida are known due to their medical

18 relevance. Trypanosomes cause African sleeping sickness and in South

19 America, and ROSS, 1903 species mutilate and kill hundreds of thousands of

20 people each . However, pathogens are very few when compared to the great

21 diversity of trypanosomatids. Despite the progresses made in the past decades on

22 understanding the evolution of this group of organisms, there are still many open questions

23 which require robust phylogenetic markers to increase the resolution of trees.

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24 Methods: Using two known 18S rDNA template structures (from

25 CHAGAS, 1909 and PLIMMER & BRADFORD, 1899), individual 18S rDNA

26 secondary structures were predicted by homology modeling. Sequences and their secondary

27 structures, automatically encoded by a 12-letter alphabet (each nucleotide with its three

28 structural states, paired left, paired right, unpaired), were simultaneously aligned. Sequence-

29 structure trees were generated by neighbor joining and/or maximum likelihood.

30 Results: With a few exceptions, all nodes within a sequence-structure maximum likelihood

31 tree of 43 representative 18S rDNA sequence-structure pairs are robustly supported (bootstrap

32 support >75). Even a quick and easy sequence-structure neighbor-joining analysis yields

33 accurate results and enables reconstruction and discussion of the big picture for all 240 18S

34 rDNA sequence-structure pairs of trypanosomatids that are currently available.

35 Conclusions: We reconstructed the phylogeny of a comprehensive sampling of trypanosomes

36 evaluated in the context of trypanosomatid diversity, demonstrating that the simultaneous use

37 of 18S rDNA sequence and secondary structure data can reconstruct robust phylogenetic

38 trees.

39

40 Keywords: Evolution, SSU, Phylogeny, RNA, Secondary Structure, Sequence-Structure,

41 Trypanosoma.

42

43 Background

44 Kinetoplastids (Kinetoplastea HONIGBERG, 1963) are a remarkable group of unicellular

45 organisms. They include free-living and parasite of , , and

46 [1–3]. Among them, we find the obligatory parasites of the order Trypanosomatida

47 KENT, 1880, emend VICKERMAN in Moreira et al. 2004 [3], including the human pathogens T.

48 brucei, which causes African sleeping sickness, T. cruzi, the causative agent of Chagas

49 disease in South America, and Leishmania species which infect and harm hundreds of 2

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

50 thousands of people each year [2,4,5]. African trypanosomes are likely the most well-known

51 trypanosomatids. Due to their dixenic cycle and the extracellular lifestyle in the

52 , they have evolved interesting, and sometimes unusual, mechanisms to deal with such

53 different environments and challenges imposed by the . Thus, it does not come

54 as a surprise that major discoveries in cell biology have been made in trypanosomes, such as

55 [6], glycolysis compartmented in unique [7], GPI-anchoring of

56 membrane [8], and unprecedented nucleotide modifications [9].

57 For harboring such a diverse group of organisms, it is unsurprising that the evolution of

58 inside Kinetoplastea has been intriguing scientists for decades. Given that each

59 parasitic group has closely affiliated free-living relatives and reversion to a free-living state

60 did not occur, it is probable that at least four independent adoptions of obligate parasitism or

61 commensalism have occurred [2,5]. Currently, the earliest diverging lineage inside

62 Trypanosomatida is the Paratrypanosoma VOTYPKA & LUKES, 2013, represented by

63 one species found in mosquitoes, Paratrypanosoma confusum VOTYPKA & LUKES, 2013

64 [10,11]. The vast majority of trypanosomatids are monoxenic parasites of with few

65 dixenic genera due to the capacity of infect vertebrates, such as Leishmania and Trypanosoma

66 GRUBY, 1843 [2]. Thus, the most likely origin of Leishmania and Trypanosoma is from within

67 monoxenic trypanosomatids, implicating that their origins were no earlier than 370 million

68 ago, when the invasion of land by vertebrates occurred [5,12,13]. The transmission of

69 an -living trypanosomatid into a warm-blooded host has most likely occurred many

70 times with rare successful cases [5,10,13]. So far, only Trypanosoma and Leishmania have

71 left surviving descendants and only trypanosomes have adopted an extracellular lifestyle in

72 vertebrates.

73 Once having passed the vertebrate colonization bottleneck, Trypanosoma radiation and

74 adaptation to diverse vertebrate species became an unprecedented evolutionary success story.

75 Today, these parasites prosper in essentially all vertebrate Classes, from fish to birds [13–15]. 3

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

76 The 18S rDNA marker has been extensively used to analyze the phylogenetic relationship

77 inside this group [5,12,16–19]. The incorporation of glycosomal glyceraldehyde phosphate

78 dehydrogenase (gGAPDH) to the analysis allowed the generation of more resolved trees,

79 consolidating, for example, the long-lasting question about the monophyly inside

80 Trypanosoma [13,15,17–21].

81 The advent of modern sequencing technologies has greatly advanced our understanding of

82 trypanosomatid phylogeny, with more new genera described in the last decade than within the

83 past century [2,3,5]. These days, trypanosomatid phylogeny has sufficiently advanced to

84 provide a solid framework for comparative studies, with genomic data available for more than

85 just the medically relevant kinetoplastids. Interestingly, the basic layout of trypanosomatid

86 appears to be strikingly similar, with high overall synteny, within and between

87 monoxenic and dixenic species [2]. The constantly growing data might become a

88 powerful tool for evolutionary inference. In the future, trypanosomatids will be studied not

89 only as infective agents of devastating neglected tropical diseases, or powerful genetic and

90 cellular model systems, but also to unravel basic principles of the evolution of unicellular

91 . What we know today is just the tip of an iceberg. The origin of Trypanosoma, for

92 example, remains enigmatic. Further, the presence of prokaryotic and viruses

93 in trypanosomatids, or the full biodiversity and ecological role of insect trypanosomatids

94 remain superficially explored [2,22].

95 Here, we present the first large scale study in which trypanosomatid RNA secondary structure

96 is used as an additional source of phylogenetic information. We use 18S rRNA sequence-

97 structure data simultaneously in inferring alignments and trees. This approach was recently

98 reviewed and shown to increase robustness and accuracy of reconstructed phylogenies

99 [23,24]. So far, all conclusions have been made with multigene trees. The most important

100 result of our study is that there are only a few places in our robustly supported trees where

101 branching does not match with multigene phylogenomic trees. It is these discrepancies with 4

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

102 which we focus our discussion, as they are potentially critical branches that are ambiguous

103 and require more attention.

104

105 Methods

106 Taxon Sampling, Secondary Structure Prediction, Sequence–Structure Alignment, and

107 Phylogenetic Tree Reconstruction

108 We have used all 240 SSU 18S ribosomal RNA gene sequences from Trypanosomatida

109 available at NCBI (GenBank) with a sequence length >1500 nucleotides and with a full

110 taxonomic lineage down to a complete species name. Secondary structures of the 18S rDNA

111 sequences were obtained by homology modeling [25] using the ITS2 database [26] as well as

112 T. cruzi (AF245382) and T. brucei (M12676) as templates (Fig S1). The two template-

113 secondary structures (without pseudoknots) were obtained from the Comparative RNA Web

114 (CRW) site [27]. For sequence-structure alignments, the four nucleotides multiplied by three

115 states (unpaired, paired left and paired right) are encoded by a 12-letter alphabet [24]. Using a

116 specific 12×12 sequence-structure scoring matrix [28], global multiple sequence-structure

117 alignments were automatically generated in ClustalW2 1.83 [29] as implemented in 4SALE

118 1.7.1 [28,30]. Based on Keller et al. [23], using 12-letter encoded sequences, sequence-

119 structure neighbor-joining (NJ) trees were determined using ProfDistS [31]. Further, using 12-

120 letter encoded sequences, a sequence-structure maximum likelihood (ML) tree [32] - for only

121 a representative subset of 43 sequence-structure pairs - was calculated using phangorn [33] as

122 implemented in the statistical framework R [34]. The R script is available from the 4SALE

123 homepage at http://4sale.bioapps.biozentrum.uni-wuerzburg.de [24]. Bootstrap support for all

124 sequence-structure trees was estimated based on 100 pseudo-replicates. Trees were rooted

125 with non-Trypanosoma sequences from Trypanosomatida.

126

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

127 Results and discussion

128 From the analysis of 240 18S rDNA sequence-structure pairs (Fig. 1) and selection of 43

129 different species (Fig. 2), we obtained trees supported by high bootstraps values (> 75) on

130 sister groups displaying the following Trypanosoma clades: the Trypanosoma pestanai

131 BETTENCOURT & FRANCA, 1905 clade, represented in our tree by this species found in the

132 Eurasian badger [13]; the T. brucei clade, consisting of trypanosome species naturally

133 transmitted by tsetse , such as Trypanosoma vivax ZIEMANN, 1905, Trypanosoma

134 congolense BRODEN, 1904, Trypanosoma godfrey MCNAMARA ET AL., 1994, Trypanosoma

135 simiae BRUCE ET AL., 1913, DOFLEIN, 1901,

136 STEEL, 1885, and T. brucei [13,17,19,21]; the T. cruzi clade, comprising mammalian

137 trypanosomes with worldwide distribution, such as T. cruzi, TEJERA,

138 1920, and Trypanosoma wauwau TEIXEIRA & CAMARGO, 2016, endemic of ,

139 Trypanosoma conorhini (DONOVAN, 1909) SHORTT & SWAMINATH, 1928 found in Europe,

140 South America and Africa, and Trypanosoma dionisii BETTENCOURT & FRANCA, 1905

141 distributted in Latin America, Africa, Asia and Europe [19–21,35]; the

142 (KENT, 1880) LAVERAN & MESNIL, 1901 clade, including the parasites Trypanosoma

143 microti LAVERAN & PETTIT, 1910, Trypanosoma grosi LAVERAN & PETTIT, 1909 and T. lewisi

144 [36]; the Crocodilian clade, harboring Trypanosoma grayi NOVY, 1906 from Africa and

145 Trypanosoma ralphi TEIXEIRA & CAMARGO, 2013 from South America [15,37]; the Avian

146 clade, with Trypanosoma corvi STEPHENS & CHRISTOPHERS, 1908, Trypanosoma avium

147 DANILEWSKY, 1885 and Trypanosoma thomasbancrofti SLAPETA, 2016 [38]; the Trypanosoma

148 theileri LAVERAN, 1902 clade, with T. theileri, a worldwide distributted parasite, and

149 the subclade representant Trypanosoma cyclops WEINMAN, 1972 [13,21]; and the Aquatic

150 clade, harboring trypanosomes from fish, anurans and platypus [14,39,40]. Interestingly, the

151 /snake clade is also represented in our tree with Trypanosoma varani WENYON, 1908, a

152 snake trypanosome, branching together with the parasite Trypanosoma freitasi REGO 6

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

153 ET AL., 1957. The branching of marsupial and rodent trypanosomes inside this clade has been

154 previously observed [36,41]. Thus, our analysis corroborates the existence of the lizard-

155 snake/marsupial-rodent clade composed by trypanosomes transmitted by [36].

156 The phylogenetic analyses using sequence-structure data of 18S rDNA (Fig. 2) supports the

157 monophyly of Trypanosoma as previously observed in trees constructed with partial/

158 complete sequences of 18S rDNA and/or gGAPDH sequences [15,17–19]. Intriguingly, in the

159 tree obtained using a greater number of sequences (Fig. 1) (WALLACE &

160 JOHNSON, 1961) TEIXEIRA & CAMARGO, 2011 (U05679-1 and HQ659564-1) appear as a basal

161 group of African trypanosomes (Fig. 1). To date, studies on Trypanosomatida showed a basal

162 position of Trypanosoma in relation to Strigomonas LWOFF & LWOFF, 1931 [2,22,42]. Of

163 notice, S. culicis is a monoxenic parasite of the order Diptera [22], the same order of the well-

164 known of T. brucei clade trypanosomes, the tsetse . Although it may be tempting to

165 speculate the derivation of the tsetse lifestyle of African trypanosomes from Strigomonas, we

166 cannot exclude the possibility of poor sequence assembly, which could interfere with the

167 topology observed. Finally, one sequence of the bat parasite T. dionisii clustered within the T.

168 brucei clade (Fig. 1). This species is distributed worldwide, with its origin in Africa, and

169 presents a high phyletic diversity [35]. However, its branching inside T. cruzi clade is strongly

170 supported [13,17–19,21].

171 The first branching of Trypanosoma (Fig. 2) forms two major groups: one lineage composed

172 by T. brucei and T. pestanai clades, and another with trypanosomes from Terrestial (T. cruzi,

173 T. lewisi, T. theileri, snake-lizard/marsupial-rodent, avian and crocodilian clades) and Aquatic

174 lineages. Thus, our tree corroborates the hypothesis of the independent evolutionary history of

175 both human pathogens, T. brucei and T. cruzi [13]. The topology of our tree shows the

176 Aquatic clade as a solid lineage, in accordance with previous observations [15,18,19,21].

177 However, the origin of this clade is still under debate. Many studies using different DNA

178 markers, such as long (> 1.4 kb) 18S rDNA sequences, v7v8 hypervariable region of 18S 7

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

179 rDNA and/or partial sequences of gGAPDH, showed either an early division between Aquatic

180 and Terrestrial lineages as a single event [15,18,19,21] or in subsequent events with

181 trypanosomes and Trypanosoma therezieni BRYGOO, 1963 at the basis of

182 Trypanosoma [17,43,44]. Interestingly, our tree suggests a later evolution of the Aquatic clade

183 from Terrestrial trypanosomes (Fig. 2), which agrees with the insect-first hypothesis [13,36].

184 This hypothesis assumes that trypanosomes were originated from a monogenetic insect

185 parasite that adapted to live inside terrestrial vertebrates and later spread to and other

186 aquatic , most likely through [13].

187 Trypanosomes of the T. brucei clade are virtually restricted to Africa, having an exception in

188 T. vivax [45,46]. The early divergence of T. vivax inside the T. brucei clade (Fig. 1, 2) is in

189 accordance to previous results showing a higher evolutionary rate of this species among the

190 Salivarian trypanosomes [19,21,47]. It is interesting to consider that a previous analysis of

191 18S rDNA sequences revealed that members of the T. brucei clade show an evolutionary rate

192 higher than other trypanosomes [47]. However, this high divergence has proven not to alter

193 the topology of sequence-based trees [17]. Inside this clade, a low bootstrap value (ML = 53)

194 is observed in the differentiation between T. brucei and T. evansi (Fig. 2), suggesting an

195 unresolved positioning. In fact, the relationship between these species is controversial, with

196 results supporting either T. evansi as a subspecies of T. brucei or showing great diversity

197 between both depending on the T. evansi strains [48,49].

198 Regarding the other major group of our analysis (Terrestrial/Aquatic lineages), the first

199 branching inside this group suggests the differentiation of the snake-lizard/marsupial-rodent

200 clade as a basal group of other trypanosomes (Fig. 2). However, other studies have suggested

201 avian trypanosomes as a basal group among terrestrial lineages [13,17]. This can be

202 associated with the low bootstrap values of our tree in either three events: the snake-

203 lizard/marsupial-rodent clade (ML = 54) differentiation, the divergence of crocodilian

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

204 trypanosomes (ML = 55), and the internal branch of avian trypanosomes (ML = 61), which

205 will be further explored in our discussion.

206 The T. cruzi and T. lewisi clades appear as sister groups in our analysis (Fig. 2), as has been

207 previously demonstrated [17–19,21]. The T. cruzi clade can be subdivided into three

208 subclades: Schyzotrypanum, T. wauwau (and other Neotropical bat trypanosomes,

209 Trypanosoma noyesi BOTERO & COOPER, 2016, and Trypanosoma livingstonei TEIXEIRA &

210 CAMARGO, 2013) and T. rangeli/T. conorhini [21,35]. The specific sequences of

211 Trypanosoma minasense CHAGAS, 1908 and Trypanosoma leeuwenhoeki SHAW, 1969

212 grouped with T. rangeli in our tree were previously considered synonyms of this species by

213 18S rDNA sequence analysis, explaining their positioning [13,50]. Regarding the T. lewisi

214 clade, our results suggest the existence of two subclades inside the group (ML = 100), one

215 harboring T. microti, and the other with T. lewisi and T. grosi. This finding is in accordance

216 with a recent analysis of long fragments of 18S rDNA which demonstrated this subdivision

217 despite the similarities in the v7v8 hypervariable region [51].

218 Concerning avian trypanosomes, our tree reflects previous findings in both the divergence

219 between T. avium and T. corvi and the highly supported (ML = 100) proximity between T.

220 avium and T. thomasbancrofti [38,50]. A low bootstrap value (ML = 61) is observed in the

221 divergence of T. corvi and T. avium/ T. thomasbancrofti. It is interesting to note that we

222 currently have two topologies known in literature, with the possibility of

223 demonstrated by analysis of long sequences of 18S rDNA [19,38]. This, however, was not

224 observed in trees constructed with v7v8 hypervariable region of 18S rDNA, gGAPDH

225 sequences, or concatenated trees using both [15,37,52]. Thus, our tree indicates the need for a

226 better resolution on avian trypanosome positioning. Considering that we used only three

227 species of avian and two species of crocodilian trypanosomes in our reconstruction, our

228 approach represents an interesting method to be applied in further studies.

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

229 In our analysis we see the crocodilian/alligator trypanosomes (T. grayi and T. ralphi)

230 branching together with a high support value (ML = 100) (Fig. 2). Although T. grayi is found

231 in Africa and T. ralphi in South America [15,37,52], in a tree with distant external groups like

232 ours, this topology is expected due to their proximity inside the Crocodilian clade [15,37].

233 Our tree reflects the proximity of the crocodilian trypanosomes with T. cruzi clade, as

234 previously observed through full genome analysis [53]. Interestingly, crocodilian

235 trypanosomes, such as T. grayi and Trypanosoma kaiowa TEIXEIRA & CAMARGO, 2019 are

236 tsetse-transmitted species that are not restricted to the sub-Saharan belt [15,37,53], suggesting

237 higher adaptive plasticity of crocodilian trypanosomes.

238 The trypanosomes of the Aquatic lineage branched together (Fig. 2). The subgroups observed

239 are anuran trypanosomes (Trypanosoma rotatorium (MAYER, 1843) LAVERAN, 1901,

240 Trypanosoma mega DUTTON & TODD, 1903, Trypanosoma fallisi MARTIN & DESSER, 1990,

241 Trypanosoma ranarum (LANKESTER, 1871) DANILEWSKY, 1885, and Trypanosoma

242 neveulemairei BRUMPT, 1928) and fish trypanosomes (Trypanosoma siniperca CHANG, 1964,

243 Trypanosoma ophiocephali CHEN, 1964, Trypanosoma cobitis MITROPHANOW, 1884,

244 Trypanosoma granulosum LAVERAN & MESNIL, 1902, Trypanosoma pleuronectidium

245 ROBERTSON, 1906) along with the platypus parasite Trypanosoma binneyi MACKERRAS, 1959,

246 which is in accordance to the literature [14,15,39,40]. Interestingly, the anuran parasite,

247 Trypanosoma chattoni MATHIS & LEGER, 1911, appears in our analysis more related to fish

248 and platypus trypanosomes than to the anuran clade. This positioning of T. chattoni was

249 shown in a previous study using complete 18S rDNA sequences and non-trypanosomes as the

250 outgroup [54]. However, recent trees using complete 18S rDNA sequences and concatenated

251 analysis of v7v8 hypervariable region and gGAPDH rooted by other trypanosomes sustained

252 a monophyletic anuran clade [39,40]. Thus, T. chattoni positioning in our tree can be related

253 to the use of non-trypanosomes as the outgroup.

254 10

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

255

256 Conclusions

257 18S rDNA is one of the most used markers in inferring phylogenies at higher taxonomic

258 levels. However, this molecule has often been claimed to be inadequate for reconstructing

259 phylogenetic relationships at lower taxonomic levels, in particular, because of its conservative

260 rate of evolution. Thus, the use of the v7v8 hypervariable region has become popular inside

261 the trypanosome community as a barcode suitable to describe new species along with

262 gGAPDH [13]. However, some questions regarding the positioning of the groups still need to

263 be answered. To this, evaluating bigger sequences and using different genetic markers can

264 bring more information to the analysis improving resolution of the trees [13]. In this work, we

265 demonstrate that the simultaneous use of 18S rDNA sequence and secondary structure data

266 (i.e., the consideration of the individual secondary structures of the rRNA genes) could indeed

267 provide important information for reconstructing more robust phylogenetic trees. Our

268 topology highlights the need for further exploration of some groups, such as the avian and

269 snake-lizard/marsupial-rodent clades, which are less explored in trypanosome phylogenies.

270 Therefore, this study provides an additional fundament for upcoming phylogenetic

271 reconstructions and/or barcoding approaches which can be used not only in trypanosomatids.

272

273 Declarations

274 Ethics approval and consent to participate

275 Not applicable

276

277 Consent for publication

278 Not applicable

279

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

280 Availability of data and materials

281 The sequence data analysed in this paper is publicly available in NCBI (GenBank) and their

282 respective accession numbers are included in the trees. Data supporting the conclusions of this

283 article are included within the article. Data and materials are available upon reasonable

284 request to the corresponding author.

285

286 Competing interests

287 The authors declare that they have no competing interests.

288

289 Funding

290 Ph.D. scholarship was granted to Alyssa Rossi Borges by Brazilian agency CAPES (program:

291 CAPES/DAAD - Call No. 22/2018; process 88881.199683/2018-01). ME is supported by

292 DFGgrants EN-305.

293

294 Authors' contributions

295 MW conceived the study and performed the bioinformatic analysis. ARB analyzed the data

296 and majorly contributed to the manuscript writing. ME and MW assisted with data analysis

297 and manuscript writing. All authors read and approved the final manuscript.

298

299 Acknowledgments

300 We thank Julius Lukeš (Institute of , Biology Centre CAS, Czech Republic) for

301 fruitful discussions and a pre-review of our manuscript, Jaime Lisack for proof-reading, and

302 Elisabeth Meyer-Natus for obtaining the electron microscopy image used in the graphical

303 abstract.

304

305 References 12

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

306 1. Cavalier-Smith T. Higher classification and phylogeny of Euglenozoa. Eur J Protistol. 307 2016;56:250–76.

308 2. Lukeš J, Butenko A, Hashimi H, Maslov DA, Votýpka J, Yurchenko V. Trypanosomatids 309 are much more than just trypanosomes: clues from the expanded family tree. Trends Parasitol. 310 2018;34:466–80.

311 3. Adl SM, Bass D, Lane CE, Lukeš J, Schoch CL, Smirnov A, et al. Revisions to the 312 classification, nomenclature, and diversity of eukaryotes. J Euk Microbiol. 2019;66:4–119.

313 4. WHO. Research priorities for Chagas disease, human African and 314 . WHO; 2012. 315 http://www.who.int/trypanosomiasis_african/resources/who_trs_975/en/

316 5. Lukeš J, Skalický T, Týč J, Votýpka J, Yurchenko V. Evolution of parasitism in 317 kinetoplastid . Mol Biochem Parasitol. 2014;195:115–22.

318 6. Vickerman K. Antigenic variation in trypanosomes. Nature. 1978;273:613–7.

319 7. Hart DT, Baudhuin P, Opperdoes FR, de Duve C. Biogenesis of the in 320 Trypanosoma brucei: the synthesis, translocation and turnover of glycosomal polypeptides. 321 EMBO J. 1987;6:1403–11.

322 8. Menon AK, Mayor S, Ferguson MA, Duszenko M, Cross GA. Candidate 323 glycophospholipid precursor for the glycosylphosphatidylinositol membrane anchor of 324 Trypanosoma brucei variant surface glycoproteins. J Biol Chem. 1988;263:1970–7.

325 9. Gommers-Ampt JH, Van Leeuwen F, de Beer AL, Vliegenthart JF, Dizdaroglu M, 326 Kowalak JA, et al. beta-D-glucosyl-hydroxymethyluracil: a novel modified base present in the 327 DNA of the parasitic protozoan T. brucei. Cell. 1993;75:1129–36.

328 10. Flegontov P, Votýpka J, Skalický T, Logacheva MD, Penin AA, Tanifuji G, et al. 329 Paratrypanosoma is a novel early-branching trypanosomatid. Curr Biol. 2013;23:1787–93.

330 11. Skalický T, Dobáková E, Wheeler RJ, Tesařová M, Flegontov P, Jirsová D, et al. 331 Extensive flagellar remodeling during the complex life cycle of Paratrypanosoma, an early- 332 branching trypanosomatid. Proc Natl Acad Sci USA. 2017;114:11757–62.

333 12. Stevens JR, Gibson W, Stevens JR, Gibson W. The molecular evolution of trypanosomes. 334 Parasitol Today. 1999;15:432–7.

335 13. Hamilton PB, Stevens JR. Classification and phylogeny of Trypanosoma cruzi. In: 336 Telleria J, Tibayrenc M, editors. American Trypanosomiasis Chagas Disease. 2nd ed. 337 London: Elsevier; 2017. p. 321–44.

338 14. Lemos M, Fermino BR, Simas-Rodrigues C, Hoffmann L, Silva R, Camargo EP, et al. 339 Phylogenetic and morphological characterization of trypanosomes from Brazilian armoured 340 catfishes and leeches reveal high species diversity, mixed and a new fish 341 trypanosome species. Parasit Vectors. 2015;8:573.

342 15. Fermino BR, Paiva F, Viola LB, Rodrigues CMF, Garcia HA, Campaner M, et al. Shared 343 species of crocodilian trypanosomes carried by tabanid flies in Africa and South America,

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

344 including the description of a new species from caimans, Trypanosoma kaiowa n. sp. Parasit 345 Vectors. 2019;12:225.

346 16. Maslov DA, Podlipaev SA, Lukes J. Phylogeny of the : taxonomic problems 347 and insights into the evolution of parasitism. Mem Inst Oswaldo Cruz. 2001;96:397–402.

348 17. Hamilton PB, Stevens JR, Gaunt MW, Gidley J, Gibson WC. Trypanosomes are 349 monophyletic: evidence from genes for glyceraldehyde phosphate dehydrogenase and small 350 subunit ribosomal RNA. Int J Parasitol. 2004;34:1393–404.

351 18. Hamilton PB, Stevens JR, Gidley J, Holz P, Gibson WC. A new lineage of trypanosomes 352 from Australian vertebrates and terrestrial bloodsucking leeches (Haemadipsidae). Int J 353 Parasitol. 2005;35:431–43.

354 19. Hamilton PB, Gibson WC, Stevens JR. Patterns of co-evolution between trypanosomes 355 and their hosts deduced from ribosomal RNA and -coding gene phylogenies. Mol 356 Phylogenet Evol. 2007;44:15–25.

357 20. Hamilton PB, Adams ER, Njiokou F, Gibson WC, Cuny G, Herder S. Phylogenetic 358 analysis reveals the presence of the Trypanosoma cruzi clade in African terrestrial . 359 Infect Genet Evol. 2009;9:81–6.

360 21. Lima L, Espinosa-Álvarez O, Pinto CM, Cavazzana M, Pavan AC, Carranza JC, et al. 361 New insights into the evolution of the Trypanosoma cruzi clade provided by a new 362 trypanosome species tightly linked to Neotropical Pteronotus bats and related to an Australian 363 lineage of trypanosomes. Parasit Vectors. 2015;8:657.

364 22. Teixeira MMG, Borghesan TC, Ferreira RC, Santos MA, Takata CSA, Campaner M, et al. 365 Phylogenetic validation of the genera Angomonas and Strigomonas of trypanosomatids 366 harboring bacterial endosymbionts with the description of new species of trypanosomatids 367 and of proteobacterial symbionts. . 2011;162:503–24.

368 23. Keller A, Förster F, Müller T, Dandekar T, Schultz J, Wolf M. Including RNA secondary 369 structures improves accuracy and robustness in reconstruction of phylogenetic trees. Biol 370 Direct. 2010;5:4.

371 24. Wolf M, Koetschan C, Müller T. ITS2, 18S, 16S or any other RNA - simply aligning 372 sequences and their individual secondary structures simultaneously by an automatic approach. 373 Gene. 2014;546:145–9.

374 25. Hall BS, Smith E, Langer W, Jacobs LA, Goulding D, Field MC. Developmental variation 375 in Rab11-dependent trafficking in Trypanosoma brucei. Eukaryot Cell. 2005;4:971–80.

376 26. Ankenbrand MJ, Keller A, Wolf M, Schultz J, Förster F. ITS2 Database V: Twice as 377 Much. Mol Biol Evol. 2015;32:3030–2.

378 27. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D’Souza LM, Du Y, et al. The 379 Comparative RNA Web (CRW) Site: an online database of comparative sequence and 380 structure information for ribosomal, intron, and other . BMC Bioinformatics. 2002;3:2.

381 28. Seibel PN, Müller T, Dandekar T, Schultz J, Wolf M. 4SALE – A tool for synchronous 382 RNA sequence and secondary structure alignment and editing. BMC Bioinformatics. 383 2006;7:498. 14

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

384 29. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. 385 Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–8.

386 30. Seibel PN, Müller T, Dandekar T, Wolf M. Synchronous visual analysis and editing of 387 RNA sequence and secondary structure alignments using 4SALE. BMC Res Notes. 388 2008;1:91.

389 31. Wolf M, Ruderisch B, Dandekar T, Schultz J, Müller T. ProfDistS: (profile-) distance 390 based phylogeny on sequence--structure alignments. Bioinformatics. 2008;24:2401–2.

391 32. Felsenstein J. Evolutionary trees from gene frequencies and quantitative characters: 392 finding maximum likelihood estimates. Evolution. 1981;35:1229–42.

393 33. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27:592–3.

394 34. R Core Team. R: a language and environment for statistical computing. R Foundation for 395 Statistical Computing, Vienna.; 2014.

396 35. Clément L, Dietrich M, Markotter W, Fasel NJ, Monadjem A, López-Baucells A, et al. 397 Out of Africa: The origins of the protozoan blood parasites of the Trypanosoma cruzi clade 398 found in bats from Africa. Mol Phylogenet Evol. 2020;145:106705.

399 36. Ortiz PA, Garcia HA, Lima L, da Silva FM, Campaner M, Pereira CL, et al. Diagnosis 400 and genetic analysis of the worldwide distributed -borne Trypanosoma (Herpetosoma) 401 lewisi and its allied species in blood and of . Infect Genet Evol. 2018;63:380–90.

402 37. Fermino BR, Viola LB, Paiva F, Garcia HA, de Paula CD, Botero-Arias R, et al. The 403 phylogeography of trypanosomes from South American alligatorids and African crocodilids is 404 consistent with the geological history of South American river basins and the transoceanic 405 dispersal of Crocodylus at the Miocene. Parasit Vectors. 2013;6:313.

406 38. Šlapeta J, Morin-Adeline V, Thompson P, McDonell D, Shiels M, Gilchrist K, et al. 407 Intercontinental distribution of a new trypanosome species from Australian endemic Regent 408 Honeyeater (Anthochaera phrygia). Parasitology. 2016;143:1012–25.

409 39. Attias M, Sato LH, Ferreira RC, Takata CSA, Campaner M, Camargo EP, et al. 410 Developmental and ultrastructural characterization and phylogenetic analysis of Trypanosoma 411 herthameyeri n. sp. of Brazilian Leptodactilydae frogs. J Euk Microbiol. 2016;63:610–22.

412 40. Spodareva VV, Grybchuk-Ieremenko A, Losev A, Votýpka J, Lukeš J, Yurchenko V, et 413 al. Diversity and evolution of anuran trypanosomes: insights from the study of European 414 species. Parasit Vectors. 2018;11:447.

415 41. Dobigny G, Poirier P, Hima K, Cabaret O, Gauthier P, Tatard C, et al. Molecular survey 416 of rodent-borne Trypanosoma in Niger with special emphasis on T. lewisi imported by 417 invasive black rats. Acta Trop. 2011;117:183–8.

418 42. Du Y, Maslov DA, Chang K-P. Monophyletic origin of (division proteobacterial 419 endosymbionts and their with insect trypanosomatid 420 culicis and spp. Proc Natl Acad Sci USA. 1994;91:8437–8441.

421 43. Botero A, Thompson CK, Peacock CS, Clode PL, Nicholls PK, Wayne AF, et al. 422 Trypanosomes genetic diversity, polyparasitism and the population decline of the critically 15

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

423 endangered Australian marsupial, the brush tailed bettong or woylie (Bettongia penicillata). 424 Int J Parasitol Parasites Wildl. 2013;2:77–89.

425 44. Botero A, Cooper C, Thompson CK, Clode PL, Rose K, Thompson RCA. Morphological 426 and phylogenetic description of Trypanosoma noyesi sp. nov.: an Australian wildlife 427 trypanosome within the T. cruzi clade. Protist. 2016;167:425–39.

428 45. Rodrigues AC, Neves L, Garcia HA, Viola LB, Marcili A, Da Silva FM, et al. 429 Phylogenetic analysis of Trypanosoma vivax supports the separation of South American/West 430 African from East African isolates and a new T. vivax-like genotype infecting a antelope 431 from Mozambique. Parasitology. 2008;135:1317–28.

432 46. Silva Pereira S, de Almeida Castilho Neto KJG, Duffy CW, Richards P, Noyes H, Ogugo 433 M, et al. Variant antigen diversity in Trypanosoma vivax is not driven by recombination. Nat 434 Commun. 2020;11:844.

435 47. Stevens J, Rambaut A. Evolutionary rate differences in trypanosomes. Infect Genet Evol. 436 2001;1:143–50.

437 48. Carnes J, Anupama A, Balmer O, Jackson A, Lewis M, Brown R, et al. Genome and 438 phylogenetic analyses of Trypanosoma evansi reveal extensive similarity to T. brucei and 439 multiple independent origins for dyskinetoplasty. PLoS Negl Trop Dis. 2015;9.

440 49. Kamidi CM, Saarman NP, Dion K, Mireji PO, Ouma C, Murilla G, et al. Multiple 441 evolutionary origins of Trypanosoma evansi in Kenya. PLoS Negl Trop Dis. 2017;11.

442 50. Sato H, Leo N, Katakai Y, Takano J, Akari H, Nakamura S, et al. Prevalence and 443 molecular phylogenetic characterization of Trypanosoma (Megatrypanum) minasense in the 444 peripheral blood of small neotropical primates after a quarantine period. J Parasitol. 445 2008;94:1128–38.

446 51. Egan SL, Taylor CL, Austen JM, Banks PB, Ahlstrom LA, Ryan UM, et al. Molecular 447 identification of the Trypanosoma (Herpetosoma) lewisi clade in black rats (Rattus rattus) 448 from . Parasitol Res. 2020;119:1691–6.

449 52. Fermino BR, Paiva F, Soares P, Tavares LER, Viola LB, Ferreira RC, et al. Field and 450 experimental evidence of a new caiman trypanosome species closely phylogenetically related 451 to fish trypanosomes and transmitted by leeches. Int J Parasitol Parasites Wildl. 2015;4:368– 452 78.

453 53. Kelly S, Ivens A, Manna PT, Gibson W, Field MC. A draft genome for the African 454 crocodilian trypanosome Trypanosoma grayi. Sci Data. 2014;1:140024.

455 54. Martin DS, Wright A-DG, Barta JR, Desser SS. Phylogenetic position of the giant anuran 456 trypanosomes Trypanosoma chattoni, Trypanosoma fallisi, Trypanosoma mega, Trypanosoma 457 neveulemairei, and Trypanosoma ranarum inferred from 18S rRNA gene sequences. J 458 Parasitol. 2002;88:566–71.

459

460

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

461 Figure legends:

462

463 Fig. 1: 18S rDNA sequence-structure neighbor-joining (NJ) tree obtained by ProfDistS

464 [31]. All 240 18S rDNA sequences from Trypanosomatida (Kinetoplastea, Euglenozoa)

465 available at NCBI (GenBank) with a sequence length >1500 nucleotides and with a full

466 taxonomic lineage down to a complete species name have been used for the analysis. For tree

467 reconstruction the global multiple sequence-structure alignment (.xfasta format) as derived by

468 4SALE [28,30] was automatically encoded by a 12-letter alphabet [24]. GenBank accession

469 numbers accompany each taxon name. Key taxa are off and on marked in gray and

470 additionally named alongside the tree. Non-monophyletic taxa are indicated by quotation

471 marks. Singletons are highlighted red and polyphyletic taxa are highlighted blue. The scale

472 bar indicates evolutionary distances. The tree is rooted at non-Trypanosoma sequences.

473

474 Fig. 2: 18S rDNA sequence-structure maximum likelihood (ML) tree, representative

475 subset of 43 sequences from Fig. 1, obtained by phangorn as implemented in R [33].

476 Bootstrap values from 100 pseudo-replicates, mapped at the appropriate internodes, are from

477 maximum likelihood- (ML) and neighbor-joining- (NJ, obtained by ProfDistS, Wolf et al.

478 [31]) analyses. For NJ tree reconstruction the global multiple sequence-structure alignment

479 (.xfasta format) as derived by 4SALE [28,30] was automatically encoded by a 12-letter

480 alphabet [24]. For ML tree reconstruction the “one letter encoded” fasta format (12-letter

481 alphabet) as derived by 4SALE [28,30] was used. GenBank identifiers accompany each taxon

482 name. The scale bar indicates evolutionary distances. Highly supported branches are indicated

483 by thicker lines. The tree is rooted at Crithridia mellificae (KM980182) and Leishmania

484 amazonensis (JX030087). Clades discussed in the text are highlighted.

485

486 Supplementary Material 17

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

487

488 Fig. S1: Trypanosoma brucei (M12676). Secondary structure: small subunit ribosomal RNA

489 taken from the Comparative RNA Web (CRW) site [27].

490

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

AF359495_1_Trypanosoma_cruzi AF359496_2_Trypanosoma_cruzi AF359480_1_Trypanosoma_cruzi AF359472_1_Trypanosoma_cruzi AF359492_1_Trypanosoma_cruzi AF359491_1_Trypanosoma_cruzi AF359478_1_Trypanosoma_cruzi AF359490_1_Trypanosoma_cruzi AF359489_1_Trypanosoma_cruzi AF359477_1_Trypanosoma_cruzi AF359471_1_Trypanosoma_cruzi AF359468_1_Trypanosoma_cruzi AF359465_1_Trypanosoma_cruzi AF359462_1_Trypanosoma_cruzi AF359461_1_Trypanosoma_cruzi AF288661_1_Trypanosoma_cruzi AJ620544_1_Trypanosoma_cruzi AF292942_1_Trypanosoma_cruzi NJ AJ009148_1_Trypanosoma_cruzi AY785566_1_Trypanosoma_cruzi AY785565_1_Trypanosoma_cruzi AY785582_1_Trypanosoma_cruzi AY785581_1_Trypanosoma_cruzi AY785572_1_Trypanosoma_cruzi 0.02 AY785574_1_Trypanosoma_cruzi AY785573_1_Trypanosoma_cruzi AY785571_1_Trypanosoma_cruzi

AF359479_1_Trypanosoma_cruzi AF359493_1_Trypanosoma_cruzi AF359494_1_Trypanosoma_cruzi AF359463_1_Trypanosoma_cruzi AF359464_1_Trypanosoma_cruzi_ AF303659_1_Trypanosoma_cruzi AF359466_1_Trypanosoma_cruzi AY785580_1_Trypanosoma_cruzi AF359467_1_Trypanosoma_cruzi AF359469_1_Trypanosoma_cruzi KX007998_1_Trypanosoma_cruzi AF359474_1_Trypanosoma_cruzi AF359484_1_Trypanosoma_cruzi AF359483_1_Trypanosoma_cruzi AJ009147_1_Trypanosoma_cruzi AF359473_1_Trypanosoma_cruzi AF359481_1_Trypanosoma_cruzi AY785583_1_Trypanosoma_cruzi AF359482_1_Trypanosoma_cruzi AF359470_1_Trypanosoma_cruzi AY785564_1_Trypanosoma_cruzi AF359475_1_Trypanosoma_cruzi AF359486_1_Trypanosoma_cruzi AY785576_1_Trypanosoma_cruzi AF359485_1_Trypanosoma_cruzi AF359476_1_Trypanosoma_cruzi AY785584_1_Trypanosoma_cruzi AF359487_1_Trypanosoma_cruzi AF359488_1_Trypanosoma_cruzi AY785575_1_Trypanosoma_cruzi AF288660_1_Trypanosoma_cruzi AF228685_1_Trypanosoma_cruzi AY785568_1_Trypanosoma_cruzi AF303660_1_Trypanosoma_cruzi AY785577_1_Trypanosoma_cruzi AY785570_1_Trypanosoma_cruzi AY785578_1_Trypanosoma_cruzi AY785563_1_Trypanosoma_cruzi AY785579_1_Trypanosoma_cruzi AJ009149_1_Trypanosoma_cruzi AY785569_1_Trypanosoma_cruzi AJ009141_1_Trypanosoma_brucei_gambiense AY785585_1_Trypanosoma_cruzi AJ009154_1_Trypanosoma_evansi cruzi KX007996_1_Trypanosoma_brucei_gambiense „brucei/evansi/equiperdum“ AY785562_1_Trypanosoma_cruzi LC326397_1_Trypanosoma_dionisii AY785561_1_Trypanosoma_cruzi AJ009153_1_Trypanosoma_equiperdum AY912268_1_Trypanosoma_evansi AY785586_1_Trypanosoma_cruzi AY912269_1_Trypanosoma_evansi AY785567_1_Trypanosoma_cruzi AJ009142_1_Trypanosoma_brucei_rhodesiense AJ009150_1_Trypanosoma_cruzi_marinkellei KX007997_1_Trypanosoma_brucei_rhodesiense AF301912_1_Trypanosoma_cruzi KY114576_1_Trypanosoma_evansi KY114578_1_Trypanosoma_evansi AJ009152_1_Trypanosoma_dionisii KT023565_1_Trypanosoma_evansi KY114580_1_Trypanosoma_evansi AJ009151_1_Trypanosoma_dionisii KY114575_1_Trypanosoma_evansi AJ012411_1_Trypanosoma_conorhini congolense XR_003828665_1_Trypanosoma_conorhini KY114577_1_Trypanosoma_evansi KY114579_1_Trypanosoma_evansi AJ009166_1_Trypanosoma_vespertilionis AJ009144_1_Trypanosoma_congolense AJ009145_1_Trypanosoma_congolense AJ012415_1_Trypanosoma_rangeli AJ009146_1_Trypanosoma_congolense AJ012414_1_Trypanosoma_rangeli KP307024_1_Trypanosoma_congolense KP307025_1_Trypanosoma_congolense AJ012413_1_Trypanosoma_minasense dionisii KP307026_1_Trypanosoma_congolense AJ012417_1_Trypanosoma_rangeli AJ009155_1_Trypanosoma_godfreyi AF065157_1_Trypanosoma_rangeli conorhini Trypanosoma AJ009162_1_Trypanosoma_simiae KP307021_1_Trypanosoma_simiae AJ012416_1_Trypanosoma_rangeli AJ404608_1_Trypanosoma_simiae AJ012412_1_Trypanosoma_leeuwenhoeki „rangeli“ simiae KP307022_1_Trypanosoma_simiae AJ009160_1_Trypanosoma_rangeli EU477537_1_Trypanosoma_vivax HQ659564_1_Strigomonas_culicis Strigomonas KT030821_1_Trypanosoma_wauwau U05679_1_Strigomonas_culicis KT030810_1_Trypanosoma_wauwau XR_001548753_1_Leptomonas_pyrrhocoris KT030809_1_Trypanosoma_wauwau KM980182_1_Crithidia_mellificae KM980183_1_Crithidia_mellificae KR653211_1_Trypanosoma_wauwau wauwau KM980184_1_Crithidia_bombi KR653210_1_Trypanosoma_wauwau KM980185_1_Crithidia_bombi KT030813_1_Trypanosoma_wauwau KY264937_1_Crithidia_luciliae_thermophila corvi KM980186_1_Crithidia_expoeki KT030835_1_Trypanosoma_wauwau KM980187_1_Crithidia_expoeki KT030830_1_Trypanosoma_wauwau JN624299_1_Crithidia_dedva KT030823_1_Trypanosoma_wauwau GQ920678_1_Leishmania_aethiopica XR_002460808_1_Leishmania_major AY461665_2_Trypanosoma_corvi Crithidia XR_002460809_1_Leishmania_major JN006854_1_Trypanosoma_corvi outgroup: XR_002460810_1_Leishmania_major XR_002460811_1_Leishmania_major KT728396_1_Trypanosoma_thomasbancrofti thomasbancrofti Trypanosomatida XR_002460812_1_Leishmania_major XR_002460813_1_Leishmania_major KT728395_1_Trypanosoma_thomasbancrofti (non-Trypanosoma ) JX030087_1_Leishmania_amazonensis KT728394_1_Trypanosoma_thomasbancrofti JX030088_1_Leishmania_amazonensis JX030083_1_Leishmania_amazonensis major KT728393_1_Trypanosoma_thomasbancrofti JX030084_1_Leishmania_amazonensis KT728392_1_Trypanosoma_thomasbancrofti JX030085_1_Leishmania_amazonensis JX030086_1_Leishmania_amazonensis KT728391_1_Trypanosoma_thomasbancrofti JX030191_1_Leishmania_braziliensis KT728390_1_Trypanosoma_thomasbancrofti JX030192_1_Leishmania_braziliensis avium JX030188_1_Leishmania_braziliensis KT728389_1_Trypanosoma_thomasbancrofti JX030189_1_Leishmania_braziliensis JX030190_1_Leishmania_braziliensis KT728388_1_Trypanosoma_thomasbancrofti lewisi JX030135_1_Leishmania_braziliensis KT728387_1_Trypanosoma_thomasbancrofti JX030136_1_Leishmania_braziliensis JX030137_1_Leishmania_braziliensis amazonensis KT728386_1_Trypanosoma_thomasbancrofti JX030138_1_Leishmania_braziliensis JX030139_1_Leishmania_braziliensis KT728385_1_Trypanosoma_thomasbancrofti JX030140_1_Leishmania_braziliensis JX030187_1_Leishmania_braziliensis KT728384_1_Trypanosoma_thomasbancrofti JN003595_1_Leishmania_panamensis KT728383_1_Trypanosoma_thomasbancrofti KX138599_1_Blastocrithidia_triatomae Leishmania KX138600_1_Blastocrithidia_miridarum JN624300_2_Herpetomonas_nabiculae KT728382_1_Trypanosoma_thomasbancrofti HG425174_1_Herpetomonas_ztiplika KT223609_1_Phytomonas_nordicus KT728381_1_Trypanosoma_thomasbancrofti JN582046_1_Leptomonas_collosoma JN582047_1_Leptomonas_rigidus KT728380_1_Trypanosoma_thomasbancrofti JN582048_1_Leptomonas_rigidus JN582049_1_Leptomonas_rigidus MF401951_1_Trypanosoma_freitasi braziliensis KT728379_1_Trypanosoma_thomasbancrofti AJ005279_1_Trypanosoma_varani AJ009159_1_Trypanosoma_pestanai AJ009143_1_Trypanosoma_cobitis KT728378_1_Trypanosoma_thomasbancrofti AJ620551_1_Trypanosoma_granulosum DQ494415_1_Trypanosoma_siniperca EU185634_1_Trypanosoma_ophiocephali AJ132351_1_Trypanosoma_binneyi KT728377_1_Trypanosoma_thomasbancrofti DQ016613_1_Trypanosoma_pleuronectidium DQ016617_1_Trypanosoma_pleuronectidium DQ016618_1_Trypanosoma_pleuronectidium DQ016616_1_Trypanosoma_murmanensis AJ009157_1_Trypanosoma_mega AJ009161_1_Trypanosoma_rotatorium AF119806_1_Trypanosoma_fallisi KT728376_1_Trypanosoma_thomasbancrofti AF119808_1_Trypanosoma_mega KT728375_1_Trypanosoma_thomasbancroftiKT728401_1_Trypanosoma_avium KT728374_1_Trypanosoma_thomasbancroftiKT728400_1_Trypanosoma_avium KT728373_1_Trypanosoma_thomasbancroftiKT728399_1_Trypanosoma_avium KT728398_1_Trypanosoma_avium KT728397_1_Trypanosoma_avium KT728402_1_Trypanosoma_avium Blastocrithidia AF416563_2_Trypanosoma_avium AY099320_2_Trypanosoma_avium AY099319_3_Trypanosoma_avium Herpetomonas AF416559_1_Trypanosoma_avium AJ009140_1_Trypanosoma_avium FJ694763_1_Trypanosoma_grosi AJ009158_1_Trypanosoma_microti GU252213_1_Trypanosoma_lewisi GU252211_1_Trypanosoma_lewisi GU252210_1_Trypanosoma_lewisi AB242273_1_Trypanosoma_lewisi AJ009156_1_Trypanosoma_lewisiAJ005278_1_Trypanosoma_grayi Leptomonas

KF546526_1_Trypanosoma_grayi

KF546521_1_Trypanosoma_ralphi

AJ009164_1_Trypanosoma_theileri AJ009163_1_Trypanosoma_theileri

AJ131958_1_Trypanosoma_cyclops

AF119807_1_Trypanosoma_chattoni grayi AF119810_1_Trypanosoma_ranarum theileri AF119809_1_Trypanosoma_neveulemairei pleuronectidium bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

K 980182M Crithidia mellificae outgroup J 030087X Leishmania amazonensis A 009159J Trypanosoma pestanai T. pestanai clade 79/73 E 477537U Trypanosoma vivax 100/100 65/92 A 009144J 100/100 A 009155J Trypanosoma godfreyi clade 70/78 A 009162J Trypanosoma simiae 100/100 A 009153J Trypanosoma equiperdum A 009154J Trypanosoma evansi 53/100 A 009141J Trypanosoma brucei brucei T. 100/100 A 005279J Trypanosoma varani 100/100 Snake-lizard/marsupial-rodent clade M 401951F Trypanosoma freitasi A 119807F Trypanosoma chattoni 77/71 88/- A 132351J Trypanosoma binneyi D 016613Q Trypanosoma pleuronectidium 80/94 A 620551J Trypanosoma granulosum Fish/platypus 100/ 99/100 100 A 009143J Trypanosoma cobitis trypanosomes 100/100 E 185634U Trypanosoma ophiocephali 54/96 100/100 D 494415Q Trypanosoma siniperca A 009161J Trypanosoma rotatorium A 009157J Trypanosoma mega

77/- 84/100 Aquatic clade A 119806F Trypanosoma fallisi 80/97 A 119810F Trypanosoma ranarum 81/79 96/98 A 119809F Trypanosoma neveulemairei

Anuran clade 95/92 A 119808F Trypanosoma mega 100/100 A 009163J Trypanosoma theileri 99/100 T. theileri clade A 131958J Trypanosoma cyclops 100/100 K 546526F Trypanosoma grayi Crocodilian clade K 546521F Trypanosoma ralphi 61/80 J 006854N Trypanosoma corvi 83/69 A 009140J Trypanosoma avium Avian clade 100/100 K 728373T Trypanosoma thomasbancrofti 100/100 A 009158J Trypanosoma microti 55/52 A 009156J Trypanosoma lewisi T. lewisi clade 100/100 F 694763J Trypanosoma grosi 99/100 A 012413J Trypanosoma minasense 100/100 69/53 A 009160J Trypanosoma rangeli A 012412J Trypanosoma leeuwenhoeki T. rangeli/T. conorhini 64/- A 012411J Trypanosoma conorhini clade 79/- A 009166J Trypanosoma vespertilionis 84/98 78/94 A 301912F Trypanosoma cruzi Schyzotrypanum

100/100 A 009151J Trypanosoma dionisii cruzi T. 0.1 KT030823 Trypanosoma wauwau T. wauwau

ML/NJ bioRxiv preprint doi: https://doi.org/10.1101/2020.08.04.235945; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Secondary Structure: small subunit ribosomal RNA C A C 5' - AUCAUUGGGUGC A A CCGUGUGGCGGC C G GA C G a CUUUUGUGCCGA U U U A U U G C CCCUCGGCCCCA G C U A G C U G AUUUAUUUAUCA A C A U G G G G U U A A U AUUUACGUGCCU C G G A G A U C C U G C U A U AUUCUAUCACCC ACGGAC CCAAGCUG CAG AGGUGCC CCUCCGGCGGGGA UU U C A A C U A A G A C A G U UUC CGCACCAA CG A GCG CCGGUUCCCUCU U A A U G GUCGGU G UUUGG CUGCGG GGCCCCU UCCCCAACGGUGG U G A UUUGAGGUUCUU U G U U U A U U GAG CUGGUGCG CG GCA G G C A G C C C G A C U A U CCGGGGUUUUUU U A A U U A G A A G A G U A A U ACGGGAAUAUCC A U U G C UG U C U C G U U A A U UCAGCAC G U G - 3' U A U U U U C U GAACUGUGGGCCACGUAGUUUUGUGCCGUG G U A U C G G C C U U C G C A G A G U G C CCAGUCCCGUCCACCUCGGACGUGUUUUGA C A G C G A U G G U C G A U CCCACGCCCUCGUGGCCCGUGAACACACUC G U A U A U G A G U C G A U A C A C U C G AGAUACAAGAAACACGGGAGCGGUUCCUCC C G C C G UC G C A U G C U G U A G UCACUUUCACGCAUGUCAUGCAUGCGAGGG U A G G UG C U A U A G G U U A U A G U GGCGUCCGUGAAUUUUACUGUGACCAAAAA G U U U C U A A A G C G U U AGUGCGACCAAAGCAGUCUGCCGACUUGAA C A C U C A U G U G G U A A UUACAAAGCAUGGGAUAACGAAGCAUCAGC G C G C G U C G A C G U U U G C U CCUGGGGCCACCGUUUCGGCUUUUGUUGGU G C A U G U G C G G G UUUAGAAGUCCUUGGGAGAUUAUGGGGCCG C G G A G C G C A A U U C U C CGUGCCUUGGGUCGGUGUUUCGUGUCUCAU G C A C U A A C A G G U A U UUUUGUGGCGCGCACAUUCGGCUCUUCGUG A G C a C C G C A U A U C A U A C U U AUGUUUUUUUACAUUCAUUGCGACGCGCGG U G A C A U A G C C A A U A U U A G G G C CUUCCAGGAAUGAAGGA G G A A U C G A A G U A U A A A G U G U C C C A G C U U U C G U G U U C A G C A A U C U U A C G C G U A G C G U A C A G C A G A A U A A G C U U G A U C G A G G U C G G G C U A C GACACC U G C C G C C A C G U A C G C A U A A U U C A UUC A C U G C C G A GUC GGU A C U A U G G C G U U U A G G C G G A U C A A A G C AC G G C G G A G G C G C A A G C A G U C A C A C U A U G C U G A A A U G C U A G C G A C G C C C A U G G C CC A G U AA C A A G U G A U A A C C U G G C G C G G A C U U U GGG GUUC G A U U A A A G C C G A A G C C G A A C A C A A C G A C U U U U A G CUU GAGU A G C A U U U C A G A G G A U A UC G U A A A U G A C A A G A U G A A C C A A C G G C G A G C C A G G A A A A A A G A C G U A U U U UGACG AAUGGCA AC U C A G G A G G C U C C U C U A G A G A G A U U C C C U A U CA UGCCAUUUGU GU U G C G A U A G U A A U G U A C A C C C G U U A U U G U G C C A U U A G A C C A G G G U A C U A G C GCAUGGUUG UUUCA G A U C U G A A U U A G A C A U G G A A U A A C AUAGAGCCGACCGUGC A C G C A G C C A U G C C A G C G GC G U A C C A G C C G A A U 5' U G G G G A U C G C C A G U U A U U A A C C G C G G A CGAUG GGCA G A UAGCUGUAGG U G G U A U U G U C A G C U G A C G A U A CCUGCAGCUG A A C U A UGUCG GCC AU G A U U A A A C U A G A A A A U A A U A G A A C U A U U C U C G C C A A U G G A C G U A A U A C G UU C G A C G C G G A G C A G U G C A U A C G A U G U U A G C G A A U U A C AG C U A N C G G U G A G C U G U A U C U U U C U U 3' A A C C A A G A G C A C U A GC AGA CUGC G C C A C G A U A U G C G A U CU G G G UCUGC GCAG A U G A G U A G G GA A A U U A G A U C U G U U A C C A U A C A G C G G C A G G U G G U G C G G C G C G C A U U A G G C A U G A U G A U U G C G A C C U A C G U C C U G C G A U U A G A G C A C U C G A A Trypanosoma brucei G A A A C G G G A C C C (M12676) U C C G G C A C A U U A C A A C A G A C G 1.cellular organisms 2.Eukaryota 3.Euglenozoa U G U A U G G G C G C U G U C G 4.Kinetoplastida 5.Trypanosomatidae C C C G A C G U G G G A C C G 6.Trypanosoma 7.Trypanozoon U U G C G C G U G U U G U C July 2003 A C C U UGCGG U A C U C U A U CCGCA U C A A U A U A U U G A U U GCUCGGCG ACUG ACUUGGC G U A U A A U C AGCCGAGC CAUG GCCAAGC C A A A U A A U A A G C C G C U A G A C U C U C G U G A A C A A C C U U A G C G C G C U G U C A A A G U U A C C U U U U U U C A C U G G C A A A G A U G A A U U U U G U A U G U G C U A A C A C A C C A G G G G G C A G C A G G U G C C G C U C C U U C U U C G A G C G C G G U U A C A U G C C G A A G C A U U G C C U G U C A Citation and related information available at http://www.rna.icmb.utexas.edu