bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 The establishment of variant surface glycoprotein monoallelic

2 expression revealed by single-cell RNA-seq of Trypanosoma

3 brucei in the tsetse fly salivary glands.

4

5 Authors: Sebastian Hutchinson1*, Sophie Foulon2, Aline Crouzols1, Roberta Menafra2, Brice

6 Rotureau1, Andrew D. Griffiths2, Philippe Bastin1

7

8 1. Trypanosome Cell Biology Unit and INSERM U1201, 25-28 Rue du Dr Roux, Institut

9 Pasteur, Paris, France

10 2. Laboratoire de Biochimie, CBI, ESPCI Paris, Université PSL, CNRS UMR 8231,

11 75005 Paris, France

12

13

14 *Correspondence: Sebastian Hutchinson, [email protected]

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 Abstract

17

18 The long and complex Trypanosoma brucei development in the tsetse fly vector culminates

19 when parasites gain mammalian infectivity in the salivary glands. A key step in this process is

20 the establishment of monoallelic variant surface glycoprotein (VSG) expression and the

21 formation of the VSG coat. The establishment of VSG monoallelic expression is complex and

22 poorly understood, due to the multiple parasite stages present in the salivary glands.

23 Therefore, we sought to further our understanding of this phenomenon by performing single-

24 cell RNA-sequencing (scRNA-seq) on these trypanosome populations. We were able to

25 capture the developmental program of trypanosomes in the salivary glands, identifying

26 populations of epimastigote, gamete, pre-metacyclic and metacyclic cells. Our results show

27 that parasite metabolism is dramatically remodeled during development in the salivary glands,

28 with a shift in transcript abundance from tricarboxylic acid metabolism to glycolytic metabolism.

29 Analysis of VSG gene expression in pre-metacyclic and metacyclic cells revealed a dynamic

30 VSG gene activation program. Strikingly, we found that pre-metacyclic cells contain transcripts

31 from multiple VSG genes, which resolves to singular VSG gene expression in mature

32 metacyclic cells. Single molecule RNA fluorescence in situ hybridisation (smRNA-FISH) of

33 VSG gene expression following in vitro metacyclogenesis confirmed this finding. Our data

34 demonstrate that multiple VSG genes are transcribed before a single gene is chosen. We

35 propose a transcriptional race model governs the initiation of monoallelic expression.

36 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

37 Introduction

38 African trypanosomes are single-cell flagellated parasites that cause Human African

39 Trypanosomiases, and nagana in cattle in sub-Saharan Africa. The parasites have a digenetic

40 life cycle, cycling through a mammalian reservoir host and a tsetse fly vector host. During the

41 developmental phase in the fly, the parasite progresses through multiple stages before

42 acquiring mammalian infectivity in a process called metacyclogenesis in the salivary glands

43 (Rotureau and Van Den Abbeele 2013). Trypanosome cells can be classified into multiple

44 stages grouped in two morphotypes: trypomastigote and epimastigote forms (Figure 1-figure

45 supplement 1), defined by the relative positions of the nuclear and mitochondrial (kinetoplast)

46 DNA within the cell (Hoare and Wallace 1966). Following a tsetse blood meal, the

47 trypomastigote stumpy-form parasites ingested from the blood rapidly differentiate into

48 trypomastigote procyclic form (PCF) parasites which colonise the posterior midgut of the fly.

49 Procyclic trypanosomes elongate and migrate through the alimentary tract to reach the cardia

50 (or proventriculus), while differentiating into epimastigote forms.

51 After a first asymmetric division, epimastigote parasites then migrate to the salivary

52 glands where they attach to the epithelium via their flagellum and start proliferating to colonise

53 it (Van Den Abbeele, Claes et al. 1999, Sharma, Peacock et al. 2008). This population of

54 attached epimastigote cells enters another asymmetric division, producing trypomastigote pre-

55 metacyclic cells which then mature further to produce free mammalian-infective metacyclic

56 cells that fill the lumen of the salivary glands and will be injected to the next host with the tsetse

57 saliva (Rotureau, Subota et al. 2012) (Figure 1-figure supplement 1). The two mitotic cell

58 divisions of attached epimastigote cells, producing either an attached epimastigote or a pre-

59 metacyclic trypomastigote daughter cell occur continuously and in parallel during the entire life

60 of the tsetse, optimizing parasite transmission. An additional heterogeneous parasite

61 population is found in the salivary glands, composed of meiotic cells and gametes (Peacock,

62 Ferris et al. 2011, Peacock, Bailey et al. 2014). These processes lead to a heterogeneous

63 population of cells within the salivary glands. RNA binding appear to be the key

64 regulators underpinning the trypanosome developmental program (Kolev, Ullu et al. 2014). For bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

65 instance, ALBA3 restricts development in the cardia (Subota, Rotureau et al. 2011) whereas

66 overexpression of RNA binding 6 (RBP6) is sufficient to drive differentiation to infective

67 metacyclic cells (Kolev, Ramey-Butler et al. 2012).

68 In addition to the dramatic remodeling described above, trypanosomes modulate their

69 energy metabolism in response to their specific environments (Smith, Bringaud et al. 2017). In

70 the mammalian host, parasites utilize glucose as a primary carbon source to produce ATP

71 through the glycolytic pathway that is restricted in specialized organelles named glycosomes.

72 In contrast, in the tsetse fly midgut, parasites mostly use proline to feed the mitochondrial

73 tricarboxylic acid (TCA) cycle and oxidative phosphorylation pathways to produce ATP (review

74 in Smith, Bringaud et al. (2017)).

75 A key step in the metacyclogenesis process is the establishment of monoallelic VSG

76 gene expression (Tetley, Turner et al. 1987). African trypanosomes have an exclusively

77 extracellular lifecycle, meaning that in the mammalian host they are continually exposed to the

78 immune system. To evade humoral immune attack, the parasite covers its surface in a

79 monolayer of 107 molecules of a single species of VSG protein which protects from

80 complement mediated lysis (Cross 1975, Mosser and Roberts 1982). It is put under continuous

81 pressure to switch from the , and the parasite evades this by antigenic

82 variation, a process of VSG switching (Horn 2014).

83 The epigenetic control of monoallelic expression of VSG genes underpins this antigenic

84 variation (Duraisingh and Horn 2016). Transcription of VSGs is highly structured; VSG genes

85 are always transcribed from a telomere proximal site known as a VSG-expression site (VSG-

86 ES), by RNA polymerase I (RNA Pol I) (Zomerdijk, Kieft et al. 1991, Gunzl, Bruderer et al.

87 2003). The genome has 19 bloodstream form VSG-ES which are polycistronic transcription

88 units, and 8 monocistronic metacyclic VSG-ES (mVSG-ES) (Hertz-Fowler, Figueiredo et al.

89 2008, Muller, Cosentino et al. 2018). In addition, the trypanosome genome encodes several

90 thousand silent VSG genes and gene fragments in a sub-telomeric archive which hugely

91 increases the antigen diversity available for antigenic variation (Berriman, Ghedin et al. 2005).

92 The single active VSG-ES is transcribed from a sub-nuclear organelle called the expression bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

93 site body (ESB), an extra-nucleolar RNA Pol I focus (Navarro and Gull 2001). Transcription is

94 initiated at all VSG-ES, however only elongates over a single VSG-ES (Kassem, Pays et al.

95 2014). Maintenance of monoallelic VSG expression depends on the VSG Exclusion complex

96 (VEX) complex, which associates with the ESB (Glover, Hutchinson et al. 2016, Faria, Glover

97 et al. 2019), and an mRNA trans-splicing locus, forming a transcription and splicing factory

98 (Faria, Luzak et al. 2021). In addition, several chromatin factors, including the sheltrin

99 component RAP1 and the histone chaperone CAF1 have been implicated in maintenance of

100 monoallelic expression (Yang, Figueiredo et al. 2009, Alsford and Horn 2012, Faria, Glover et

101 al. 2019).

102 The establishment of monoallelic expression in metacyclic trypanosomes is relatively

103 poorly understood. Immunogold staining of salivary gland trypanosomes and double

104 immunofluorescence staining of in vitro derived metacyclic parasites demonstrates that

105 monoallelic expression is established in the salivary glands of the tsetse fly (Tetley, Turner et

106 al. 1987, Ramey-Butler, Ullu et al. 2015). In bloodstream parasites, monoallelic expression can

107 undergo transcriptional switching events, where the active VSG-ES is switched off and a

108 previously silent one is activated epigenetically (Horn 2014), requiring the re-establishment of

109 monoallelic expression. Analysis of BSF cells following a forced VSG-ES switch showed that

110 transcription transiently increased at silent VSG-ES prior to switching the active VSG-ES. This

111 was proposed as “probing” of silent VSG-ES before commitment to switching (Aresta-Branco,

112 Pimenta et al. 2016). Only one factor has so-far been linked to the establishment of monoallelic

113 expression; the histone methyltransferase DOT1b trimethylates Histone H3 lysine 76 (Janzen,

114 Hake et al. 2006) and is required to switch off a previously active VSG-ES following the

115 establishment of a new active site (Figueiredo, Janzen et al. 2008).

116 The limiting nature and diverse cellular composition of salivary gland trypanosomes

117 has made these parasites less experimentally tractable than in vitro systems. These parasites

118 are not amenable to axenic culture (metacyclic cells are G0 arrested), and genetic

119 manipulation tools are not as well established as in cultured parasites. We therefore applied a

120 single-cell RNA-sequencing (scRNA-seq) approach to salivary gland derived trypanosomes, bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

121 which overcomes many experimental boundaries through increased sensitivity and granularity

122 to address the lack of resolution in these cell types. Here, we dissect the architecture of

123 parasite development as they acquire mammalian infectivity in the salivary glands using

124 inDrop. InDrop is a scRNA-seq technology wherein single cells are co-encapsulated inside

125 nanoliter droplets with single barcoded hydrogel microspheres (BHMs) coupled to barcoded

126 cDNA primers that hybridize to the 3’ poly(A) tail of mRNA (Figure 1A), allowing sequencing of

127 several thousand cells per experiment. Reverse transcription of mRNAs from each cell

128 generates cDNAs tagged with a unique barcode, specific to each cell, permitting the informatic

129 decomposition of single-cell transcriptomes (Klein, Mazutis et al. 2015, Zilionis, Nainys et al.

130 2017).

131 In this study, we first demonstrated that trypanosomes are compatible with 3´ mRNA

132 barcoding using inDrop BHMs. After having generated single cell transcriptomes for cultured

133 bloodstream (BSF) and PCF trypanosomes, we performed inDrop with parasites from the

134 salivary glands. This revealed the presence of 4 cell clusters in the salivary glands,

135 corresponding to 1) attached epimastigote cells, 2) gametes and meiotic cells, 3) pre-

136 metacyclic trypomastigote cells and 4) metacyclic parasites. We delineated transcriptomic

137 changes to energy metabolism pathways, revealing a coherent developmental program in this

138 organ towards infectivity. Finally, we have performed the first analysis of the establishment of

139 monoallelic VSG expression in vivo. We show that the establishment of monoallelic expression

140 is a two-step process, where pre-metacyclic cells first initiate expression of multiple mVSG

141 transcripts, followed by monoallelic expression in metacyclic parasites. We further validated

142 this finding by an orthogonal method, employing smRNA-FISH with parasites undergoing

143 metacyclogenesis in vitro.

144 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

145 Results

146 Assessment of single-cell RNA-seq using inDrop applied to Trypanosoma brucei

147 To test whether trypanosomes would not be visibly stressed by the inDrop pipeline, we

148 monitored the survival and RNA metabolism using ALBA3 and DHH1 as markers for cellular

149 stress in procyclic form (PCF) cells, in conditions which mimic the inDrop protocol. Following

150 starvation in PBS for 2 h at 27°C, ALBA3 and DHH1 form RNA granules in the cytoplasm

151 (Subota, Rotureau et al. 2011), however when cells are placed on ice for 2h in phosphate

152 buffered saline, no RNA granule formation was observed (Figure 1-figure supplement 2A). In

153 addition, cells remained alive based on their motility after returning them to 27°C SDM-79

154 medium (97 % untreated and 94 % 2h PBS) (Figure 1-figure supplement 2B), indicating that

155 they are not visibly stressed in the conditions required for inDrop.

156 We first characterized the performance of the inDrop pipeline with cultured Antat1.1E

157 PCF trypanosomes expressing EP (Glu-Pro) / GPEET (Gly-Pro-Glu-Glu-Thr) repeat containing

158 procyclins. T. brucei mRNAs are poly-adenylated and therefore compatible with priming of

159 cDNA synthesis using poly(T)VN, so we began by performing ‘bulk’ experiments using a primer

160 containing a T7 promoter for cDNA amplification by in vitro transcription (IVT), an Illumina

161 sequencing adaptor, a single inDrop cell barcode, unique molecular identifiers UMI (Islam,

162 Zeisel et al. 2014) and a poly(T)VN sequence for mRNA capture (see Materials and Methods).

163 Library sequencing revealed that the majority of barcoded cDNAs corresponded to the

164 annotated 3´-end of transcripts, as expected (Figure 1B). We therefore proceeded to single

165 cell barcoding of trypanosomes. We mixed cultured BSF and PCF trypanosomes (47.5 %

166 each) with Ramos cells (5%), a human B-cell line derived from Burkitt’s lymphoma (Klein,

167 Giovanella et al. 1975) as a positive control. The inclusion of Ramos cells also allows us to

168 estimate the amount of ambient RNA within the cell suspension which is detected by inDrop.

169 We constructed a sequencing library, and generated 77.4 million reads. To facilitate the use

170 of trypanosome genomic sequences for which no reference strain is available, we developed

171 a bespoke analysis pipeline to create a count matrix (see Materials and Methods). After

172 processing the reads with this pipeline, we used the Seurat and single cell transform (SCT) R bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

173 packages for data analysis (Butler, Hoffman et al. 2018, Hafemeister and Satija 2019, Stuart,

174 Butler et al. 2019) and determined that the transcriptomes of 1,979 cells were captured, with

175 an average of 2,195 UMIs and 1,347 genes per cell. The emulsion contained approximately

176 4,900 cells, suggesting that we recovered approximately 40 % of the cells through inDrop.

177 Dimensionality reduction using the Uniform Manifold Approximation and Projection (UMAP)

178 algorithm (L McInnes, J Healy et al. 2018) revealed three clusters which corresponded to the

179 three expected cell types in approximately the expected proportions (Figure 1C), with 164 (8%)

180 Ramos cells, 809 (41%) procyclic cells and 1,006 (51%) BSF cells. Within clusters, we

181 detected an average of 1,932 UMIs and 1,211 genes for BSF, 2,268 UMIs and 1,258 genes

182 for PCF cells and an average of 3,447 UMIs and 2,627 genes in Ramos cells (Figure 1D).

183 Each trypanosome cell has between 20,000-40,000 mRNA molecules per cell (Haanstra,

184 Stewart et al. 2008) therefore these data are consistent with an mRNA capture efficiency

185 between 4.8 % and 11.3 %, similar to the 7.1 % reported by Klein, Mazutis et al. (2015). We

186 next plotted percentage of human and trypanosome UMIs for each barcode (Figure 1-figure

187 supplement 2C). This revealed that human barcodes contained higher proportions of

188 trypanosome UMIs than the converse. We hypothesized that this was potentially due to

189 trypanosome cell fragility which leads to release of trypanosome mRNAs from lysed cells. We

190 therefore modified our protocol to maintain cell suspensions on ice until the moment they are

191 encapsulated (see Materials and Methods).

192 Marker gene analysis for each cluster revealed significantly differentially expressed

193 genes, corresponding to the known biology of each respective cell type. In the BSF cluster,

194 marker genes included surface proteins such as the active VSG-2 (Tb427.BES40.22) (Figure

195 1E), GRESAG 4 (Tb927.7.6080), ISG65 (Tb927.2.3270), and GPI-PLC (Tb927.2.6000), as

196 well as transcripts involved in the glycolytic pathway, such as AOX (Tb927.10.7090), ALD

197 (Tb927.10.5620), and PYK1 (Tb927.10.14140), for example. The PCF cluster markers

198 included the procyclic surface markers EP1 (Tb927.10.10260) (Figure 1E), EP2

199 (Tb927.10.10250), GPEET (Tb927.6.510) and PARP A/EP3-2 (Tb927.6.450), the electron

200 transport chain component COXIV (Tb927.1.4100), an amino acid transporter (AAT10-2) bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

201 (Tb927.8.8300), the RNA binding protein UBP2 (Tb927.11.510) and HSP90

202 (Tb927.10.10980), all of which are seen to be up-regulated in PCF using bulk RNA-seq (Siegel,

203 Hekstra et al. 2010). In the Ramos cluster, markers included genes such as immunoglobulin

204 lambda constant 3 (IGLC3) (Figure 1E) and prothymyosin alpha (PTMA), as expected. These

205 data correspond with known biology from bulk transcriptomics experiments and therefore

206 demonstrate that inDrop is capable of single-cell analysis of trypanosome transcripts.

207

208 Single-cell analysis of trypanosome salivary gland transcriptomes using inDrop

209 We next proceeded to single-cell barcoding of trypanosomes from tsetse fly salivary

210 glands. We infected cages of 50 teneral (1-2 days post-eclosion) male tsetse flies with in vitro

211 generated stumpy-form Antat1.1E (EATRO1125) T. b. brucei and recovered salivary gland

212 parasites by dissection 28 days post infection. We obtained 4.6 x104 parasites from a total of

213 72 dissected flies (Figure 2-figure supplement 1). We then generated and sequenced single-

214 cell barcoded libraries (Figure 2-figure supplement 2A), and developed an optimized

215 bioinformatic pipeline for T. brucei, recovering data for 2,279 cells from two technical replicates

216 (Figure 2 and Figure 2-figure supplement 2B), detecting an average of 959 UMIs per replicate

217 (Figure 2-supplement 2B). As no genome sequence is available for the Antat1.1 strain used

218 here, we identified VSG genes expressed in the salivary glands by the Antat1.1 strain by

219 aligning reads to the VSGnome for Antat1.1E (EATRO1125) (Cross 2017). We then selected

220 the top 8 sequences, in line with the reported number of mVSG-ES in the Lister-427 strain

221 (Muller, Cosentino et al. 2018).

222 We used the Seurat R package to analyze our technical replicates (Butler, Hoffman et

223 al. 2018, Hafemeister and Satija 2019). Count matrices were compiled from all aligned reads,

224 except for VSG genes, where we used a stringent mapping quality (MapQ40), to avoid mis-

225 aligned reads being spuriously counted. We next examined several technical aspects of

226 inDrop, Firstly, we examined the background from ambient RNA by looking at the detection

227 rate in a cluster of Ramos cells, which were incorporated into our experiment. These data

228 showed that we did not detect more than 2 mVSG UMIs per barcode in Ramos cells (Figure bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

229 2-figure supplement 3A). We therefore applied a >2 UMI cutoff for our mVSG analyses below.

230 We established the percentage of stringently aligned reads (mapping quality >40) in our data.

231 This showed improvement over our in vitro data, with two distinct clusters of cells for human

232 and trypanosome UMIs (Figure 2-figure supplement 3B). We next considered whether

233 sequencing errors could contribute to barcode switching. Our bioinformatics pipeline permits

234 up-to two errors per cell barcode, so we compared the edit distance between all cell barcodes

235 in our study. The mean editing distance between barcodes was 11.4 nt, and only 0.017 % of

236 barcode pairs were within an editing distance of 2 (Figure 2-figure supplement 3C). These

237 parameters allowed us to effectively estimate the noise for our downstream analyses.

238 Following data preprocessing and integration of the technical replicates, we visualized

239 the datasets using UMAP projections (L McInnes, J Healy et al. 2018) (Figure 2A). Clustering

240 identified 5 clusters (Figure 2B), whose developmental state could be identified using known

241 marker genes (Figure 1E). EP1 Procyclin (Tb927.10.10260) is a surface protein found in

242 midgut forms (Roditi, Carrington et al. 1987), and its transcript can also be detected in salivary

243 gland parasites (Urwyler, Studer et al. 2007). Brucei alanine-rich protein (BARP,

244 Tb927.9.15640) is a surface protein present in attached epimastigote parasites (Urwyler,

245 Studer et al. 2007). Metacyclic VSG expression is a marker for metacyclic cells (Rotureau,

246 Subota et al. 2012), and VSG-393 is the most abundant VSG transcript expressed in our

247 dataset. RBP6 (Tb927.3.2930) is an RNA binding protein which drives developmental

248 progression to infectivity in salivary glands, and is found in epimastigote forms in the salivary

249 glands (Kolev, Ramey-Butler et al. 2012, Vigneron, O'Neill et al. 2020). Calflagin is a flagellar

250 calcium binding protein whose expression is up-regulated in pre-metacyclic and metacyclic

251 forms (Rotureau, Subota et al. 2012). Finally, HAP2 (Tb927.10.10770) is a membrane-

252 membrane fusion protein found conserved in nature and expressed in the gametes of many

253 species (Figure 2C) (Fedry, Liu et al. 2017). Further evidence from bulk transcriptomics of

254 midgut, cardia (proventriculus) or salivary gland tissues (Kolev, Ramey-Butler et al. 2012) were

255 used to identify midgut forms. Of the 226 marker genes identified for the midgut forms cluster

256 in our inDrop data, 82.3 % of genes were maximally expressed in the midgut samples of bulk bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

257 transcriptomics (Kolev, Ramey-Butler et al. 2012), further supporting this annotation. In total,

258 we identified 142 midgut forms, 296 attached epimastigote cells, 280 gametes, 317 pre-

259 metacyclic and 1,244 metacyclic cells. These data indicate that we are able to capture single

260 cell transcriptomes of these salivary gland parasites, and identify cell states from these

261 transcriptomes, using inDrop.

262

263 Metabolic transcript analysis reveals dramatic remodeling during metacyclogenesis

264 To further our understanding of developmental changes in the salivary glands, we

265 assessed the expression of glycolysis and TCA cycle genes expressed in the transcriptomes

266 of parasites from the salivary glands. Our analysis reveals that parasites remodel the

267 expression of metabolic during differentiation to metacyclic cells (Figure 3, and

268 Figure 3-figure supplement 1). We were able to assess the expression changes for 12

269 glycolysis transcripts and 7 TCA cycle transcripts (Figure 3A). We observed a

270 common expression profile for most genes at the cluster (Figure 3B and Figure 3-figure

271 supplement 1) and single cell level (Figure 3C). Glycolytic enzymes increased in their

272 expression between pre-metacyclic and metacyclic clusters, with hexokinase (HK1,

273 Tb927.10.2010) and ATP-dependent 6-phosphofructokinase (PFK, Tb927.3.3270) being clear

274 examples of this. We observed a decline in TCA cycle enzyme expression during

275 metacyclogenesis, with pre-metacyclic cells positioned as a clear intermediate stage between

276 epimastigote forms (Attached and Gametes) and metacyclic cells. For instance, aconitase

277 (ACO, Tb927.10.14000) and glycosomal isocitrate dehydrogenase (IDH, Tb927.11.900) both

278 decrease from their peak expression in attached epimastigote cells to their lowest in metacyclic

279 parasites with pre-metacyclic cells as intermediates. This indicates that the data clustering is

280 indeed reflecting the true developmental progression of these cells, consistent with previous

281 electron microscopy observations on glycosomal and mitochondrial remodeling during this

282 development (Tetley and Vickerman 1985, Kolev, Ramey-Butler et al. 2012).

283

284 Identification of a putative gamete cluster bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

285 A subpopulation of trypanosomes undergoes meiosis I and II in the salivary glands,

286 producing gamete cells that can recombine sexually (Peacock, Ferris et al. 2011). HAP2

287 expression has not been previously profiled during parasite development in the salivary glands.

288 In our data, expression of HAP2 was significantly upregulated in gamete cells compared to the

289 other clusters (average log2 FC 0.74, detected in 82.5 % gamete cells and 36.6 % cells in

290 other clusters, adjusted p.val = 0. 9.75 x 10-50, negative binomial test, Supplementary table 1).

291 Additionally, and unexpectedly, BARP expression was significantly reduced in these cells

292 compared to gamete cells (average log2 FC -0.83, detected in 94.6 % attached epimastigote

293 cells and 61.1 % gamete cells, adjusted p.val = 9.68 x 10-36, negative binomial test) and EP1

294 procyclin expression up-regulated (average log2 FC 1.45, detected in 58.1% and 87.5% of

295 attached epimastigote cells and gamete cells, respectively, adjusted p.val = 3.26 x 10-56),

296 potentially explaining the presence of these transcripts in previous studies (Urwyler, Studer et

297 al. 2007). We also detected a significant up-regulation of the gametocytogenesis marker HOP1

298 (Tb927.10.5490) (Peacock, Ferris et al. 2011) in this cluster compared to the other clusters

299 (average log2 FC 0.80, detected in 7.5 % gamete cells and 1.8 % cells in other clusters,

300 adjusted p.val = 1.54 x 10-4). These data therefore support the assignment of gamete cells to

301 this cluster.

302 Attached epimastigote cells undergo asymmetrical division to produce one attached

303 epimastigote daughter and one pre-metacyclic trypomastigote daughter (Rotureau, Subota et

304 al. 2012). We examined these cells in more detail (Figure 4A). We searched for cells

305 expressing the histones (H1 Tb927.11.1800, H2A Tb927.7.2820, H2B Tb927.10.10460, H3

306 Tb927.1.2430 and H4 Tb927.5.4170), as a proxy for S-phase cells (Ersfeld, Docherty et al.

307 1996), and discovered two sub-clusters of cells expressing these transcripts (Figure 4B). To

308 investigate this more quantitatively, we sub-divided the S-Phase cluster and assessed the

309 marker genes expressed in these clusters (Figure 4C). This revealed that BARP and the set

310 of 5 canonical histones were significantly associated with the “BARP-S” cluster. HAP2 was

311 significantly associated with both clusters, albeit more strongly with the HAP2-S cluster bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

312 (Supplementary table 2). The presence of these two clusters raises the questions as to

313 whether there are two potential modes of development in the salivary glands.

314

315 Establishment of mVSG monoallelic expression

316 VSG protein expression is initiated during pre-metacyclic to metacyclic differentiation

317 (Tetley and Vickerman 1985, Rotureau, Subota et al. 2012). Consistent with this, our data

318 revealed, as expected, the presence of very few mVSG UMIs in attached epimastigote cells

319 with 85% of cells containing no detectable mVSG transcripts, compared to pre-metacyclic and

320 metacyclic clusters (Figure 5-figure supplement 1). Using our Ramos cell data to estimate the

321 amount of ambient RNA in the cell suspension detected by inDrop, we found that attached

322 epimastigote cells contained more mVSG UMIs on average than Ramos cells, where these

323 genes are not present (3.36 vs 0.19 UMIs per cell, respectively), suggesting that there is some

324 transcription of these loci, consistent with ubiquitous transcription initiation from RNA-Pol I

325 promoters (Kassem, Pays et al. 2014). We observed an increase in the average number of

326 mVSG UMIs per cell as parasites proceeded through their developmental program (Figure 5-

327 figure supplement 1), increasing from a mean of 3.36 UMIs in attached epimastigote cells to

328 10.0 UMIs and 21.7 UMIs in pre-metacyclic and metacyclic cells, respectively.

329 We next assessed the diversity of mVSG expression during the differentiation process.

330 Plotting the expression profile of individual pre-metacyclic cells showed that the majority of

331 cells in this cluster express multiple different mVSG genes (Figure 5A). We quantified the

332 number of different mVSG genes expressed per cell (i.e. the diversity of UMIs per cell) in cells

333 with more than 10 mVSG UMIs. This revealed, strikingly, that pre-metacyclic cells expressed

334 up-to six different mVSG genes. Multi-mVSG gene expression (UMI cutoff >2) was observed

335 in 86.7 % of cells, whereas only 13.3 % expressed a single mVSG gene. (Figure 5B). Following

336 differentiation to metacyclic form cells, however, mVSG expression had resolved into a

337 monoallelic state (Figure 5C), where a single mVSG gene was detected in 86.8 % of cells,

338 12.2 % contained two mVSG genes and only 1 % expressed 3 genes types (Figure 5D). bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

339 We queried whether the detection of multiple mVSG genes in pre-metacyclic cells was

340 a function of sequencing depth for those barcodes, compared to metacyclic cells. We plotted

341 the number of VSG detected per barcode against the number of aligned reads, stratified by

342 cluster (Figure 5-figure supplement 2). This revealed that rather than having higher aligned

343 read counts, the median aligned reads per barcode associated to pre-metacyclic cells was

344 lower than metacyclic cells, even when comparing metacyclic cells with a single expressed

345 mVSG to pre-metacyclic cells expressing 6 mVSG genes (2.2 x 104 vs 1.6 x 104 median aligned

346 reads, respectively). These data show that it is highly unlikely that the detection of multiple

347 mVSG transcripts per cell is artefactual due to sequencing depth.

348 We sought to further validate our findings by turning to previously published scRNA-

349 seq data of salivary gland trypanosomes using both a different trypanosome strain; RUMP503,

350 and a different technique, the 10xGenomics single-cell RNA-seq system (Vigneron, O'Neill et

351 al. 2020). As the complete set of mVSG genes was not available for this strain, we used Trinity

352 (Grabherr, Haas et al. 2011) to assemble mVSG transcripts de novo from bulk RNA-seq data

353 (Savage, Kolev et al. 2016), and retained the 8 most abundant mVSG genes for downstream

354 analysis. This allowed us to perform a detailed analysis of mVSG expression in these data,

355 which was not possible previously. We reprocessed the scRNA-seq data set using our pipeline,

356 observing similar clusters to Vigneron, O'Neill et al. (2020) (Figure 5-figure supplement 3A,B).

357 We identified a cluster of pre-metacyclic cells, in which we detected multi-mVSG gene

358 expression. In cells with more than 10 mVSG UMIs, 71 % contained transcripts from multiple

359 VSG genes (12 cells), compared to 29 % with one mVSG detected (UMI cutoff >2) (Figure 5-

360 figure supplement 3C). In the metacyclic cell clusters, a single mVSG gene was detected in 78

361 % of cells, and 22 % of cells expressed two or more mVSG genes, with the majority (86 %) of

362 this subset expressing 2 mVSG transcripts. This new finding is similar to our inDrop data,

363 where 87 % of pre-metacyclic cells expressed more than a single mVSG. These new analyses

364 uncover that expression of multiple mVSGs is a common precursor to monoallelic expression

365 in different trypanosome strains in vivo. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

366 To formally demonstrate the existence of cells expressing more than a single VSG

367 gene, we performed smRNA-FISH following in vitro metacyclogenesis by overexpressing

368 RBP6. This in vitro process models the establishment of monoallelic mVSG expression in a

369 third strain, the Lister 427 genetic background. Induction of RBP6 in PCF forms triggers

370 differentiation first to epimastigote forms, which then further differentiate to trypomastigote

371 forms, producing infective metacyclic forms which express a single mVSG (Kolev, Ramey-

372 Butler et al. 2012, Ramey-Butler, Ullu et al. 2015). This process therefore allows us to model

373 the initiation of mVSG expression.

374 We overexpressed RBP6 in PCF cells using a tetracycline inducible system

375 (Trenaman, Glover et al. 2019), and sampled cultures after 3, 4 or 5 days of induction. After

376 probing for VSG-397, VSG-531 and alpha tubulin as a positive control, we detected VSG

377 transcripts in ~5 % of cells (303 cells positive for VSG out of a total of 6,452). Within this

378 population, we established 3 categories of cells: “VSG+”, where transcript numbers were

379 quantifiable (193 cells), “VSG++” where transcript numbers were too high to be quantified

380 accurately (but where individual transcripts were discernable) (18 cells), and “VSG+++”, where

381 individual transcripts were no longer discernable (92 cells) (for representative images see

382 Figure 6A, Supplementary table 3).

383 To visualize the proportions of VSG-397 and VSG-531 transcripts per cell, we

384 estimated the total number of mVSG transripts per cell in VSG++ and VSG+++ categories.

385 BSF cells contain on average 2,000 VSG transcripts (Haanstra, Stewart et al. 2008). Hence,

386 we set the number of transcripts in VSG+++ cells at 2,000, and VSG++ cells at 200 (an order

387 of magnitude between VSG+ and VSG++) per channel (i.e. if both VSG-397 and VSG-531

388 were “VSG++”, we estimate that cell contains 400 VSG transcripts). Cells were therefore

389 classified as VSG+ with less than 200 transcripts, VSG++ with between 200 and 400

390 transcripts, and VSG+++ with 2000 transcripts or more. We then assessed singular VSG gene

391 expression in each cell category: 30 % (58 cells) VSG+ cells contained transcripts from both

392 VSG-397 and VSG-531, directly demonstrating the presence of cells expressing two different

393 VSG genes. This proportion dropped to only 17 % (3 cells) among VSG++ cells and 0 % in bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

394 VSG+++ cells, indicating that monoallelic expression is established as cells express a larger

395 amount of mVSG transcripts (Figure 6B).

396 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

397 Discussion

398 Our data demonstrate that the inDrop scRNA-seq approach is compatible with

399 investigation of gene expression in trypanosomes. Despite significant biological constraints

400 (limited access to multiple populations of parasites in low numbers), we were able to delineate

401 the developmental progression of these parasites in the salivary glands, including the

402 identification of a gamete cluster. In our hands, we detected similar numbers of trypanosome

403 UMIs (~2,000 for cultured cells) compared to Ramos cells (~3,500), despite having a much

404 lower mRNA content per cell (105 vs 104 for human and trypanosome cells, respectively)

405 (Haanstra, Stewart et al. 2008).

406 The gamete cell cluster was characterized by a significant upregulation of HAP2, a

407 conserved membrane fusion protein required for gamete fusion in diverse organisms (Fedry,

408 Liu et al. 2017), and HOP1, a component of synaptonemal complexes formed in the first

409 meiotic division (Peacock, Ferris et al. 2011). We did not, however, observe statistically

410 significant changes in other known marker genes of gametocytogenesis such as SPO11,

411 DMC1 or MND1 (Peacock, Ferris et al. 2011), likely due the sensitivity of the inDrop. Our

412 analysis of the metabolic transcriptome remodeling indicated that energy metabolism in

413 attached epimastigote cells is dependent on the production of ATP in the mitochondrion,

414 consistent with recent evidence from in vitro derived epimastigote parasites using the RBP6

415 overexpression system (Dolezelova, Kunzova et al. 2020). The most dramatic transcriptome

416 remodeling was following asymmetrical division of attached epimastigote, with pre-metacyclic

417 cells as transition intermediates in terms of metabolic remodeling (Figure 3). These data are

418 consistent with electron microscopy evidence showing an increase in the number and size of

419 glycosomes together with the reduction of mitochondrial branching in metacyclic cells

420 compared to attached epimastigote parasites (Tetley and Vickerman 1985, Kolev, Ramey-

421 Butler et al. 2012).

422 Our analysis of the establishment of monoallelic mVSG expression has uncovered that

423 this is a phased process, first involving the activation of multiple mVSG-ES, and second the

424 emergence of a single active mVSG-ES. Here, we present three independent lines of evidence bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

425 to support this finding. First, in AnTat1.1E (EATRO1125) parasites from tsetse salivary glands,

426 our inDrop data identified both metacyclic and pre-metacyclic cells, which allowed us to

427 evaluate the expression of mVSG genes before monoallelic expression was established.

428 Second, 10xGenomics data from Vigneron, O'Neill et al. (2020) using the RUMP503 strain

429 were re-analyzed. Vigneron, O'Neill et al. (2020) concluded that each metacyclic cell

430 expressed a single mVSG, as our re-analysis concluded. However, expression of multiple

431 mVSG genes occurs before cells become metacyclic forms. The intermediate cell cluster

432 identified in their study therefore likely represents the pre-metacyclic cells in our re-analysis.

433 This analysis was only possible due to the addition of de novo assembled mVSG transcripts.

434 Third, we used smRNA-FISH in an in vitro metacyclogenesis model using the Lister 427 strain.

435 Taken together, our data from three strains analyzed by different techniques, strongly support

436 our conclusions. This triple set of experimental evidence supports a dynamic process where

437 many genes are initially transcribed at a low level, and as expression levels increase, a single

438 gene emerges as the active VSG.

439 Metacyclic cells express a protein coat comprised of a single VSG type (Ramey-Butler,

440 Ullu et al. 2015), suggesting that the multi-mVSG gene expression we observed is restricted

441 to the transcript level in pre-metacyclic cells. Indeed, immunogold electron microscopy (Tetley,

442 Turner et al. 1987) and immunofluorescence data (Rotureau, Subota et al. 2012) indicates that

443 pre-metacyclic cells do not have VSG on their surface. According to the nomenclature from

444 Tetley and Vickerman (1985), VSG protein appears on the surface of nascent metacyclic cells,

445 a developmental stage between pre-metacyclic and metacyclic form cells. As no cell division

446 occurs between pre-metacyclic/nascent/metacyclic differentiation, these data suggest that

447 there is a temporal component to ensuring a single VSG coat which depends upon both the

448 time required to establish a single active locus and the differentiation from pre-metacyclic to

449 metacyclic cell. No data is available on the duration of the transition from pre-metacyclic to

450 metacyclic cell, so it is not possible to estimate the rate at which a single mVSG-ES is chosen

451 in vivo. Evidence from VSG switching experiments in bloodstream forms indicates that new

452 VSG proteins are detectable on the surface within 12 h of a switch (Pinger, Chowdhury et al. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

453 2017), suggesting that the establishment of monoallelic expression could occur within this time

454 frame. During the transition from BSF to PCF following a tsetse bloodmeal, the VSG coat is

455 removed and replaced with procyclin. The appearance of procyclin mRNA precedes the

456 appearance of protein by approximately 6h (Roditi, Schwarz et al. 1989). If a similar delay

457 exists with BARP to mVSG coat remodeling, this could potentially explain how parasites are

458 able to express a single VSG type in their mVSG coats. Nevertheless, the exact parameters

459 underlying these transitions remain to be determined.

460 In bloodstream form cells, a single VSG is transcribed from the active VSG-ES in the

461 ESB (Navarro and Gull 2001). In metacyclic cells, the picture is less clear: RNA Pol I staining

462 indicated that no ESB is detected at this stage. Metacyclic VSG-ES are much shorter than BSF

463 VSG-ES (~5 Kbp for mVSG-ES compared to ~60 Kbp for BSF VSG-ES), therefore the absence

464 of an ESB in metacyclic cells may be due to fewer RNA Pol I molecules at these loci (Ramey-

465 Butler, Ullu et al. 2015). Inferring from the characteristics of the bloodstream ESB, we can

466 speculate on nature of VSG gene expression regulation in metacyclic cells. The ESB is a bi-

467 partite nuclear body consisting of the mRNA trans-splicing machinery and the actively

468 transcribed VSG-ES (Faria, Luzak et al. 2021). VEX1 and VEX2 are markers of the ESB, and

469 their association with the ESB are key to the maintenance of VSG monoallelic expression.

470 Perturbation of either or both protein’s expression leads to dramatic failures in monoallelic VSG

471 expression control, with silent VSG-ES being transcribed and multiple VSGs expressed on the

472 surface (Glover, Hutchinson et al. 2016, Faria, Glover et al. 2019). Pharmacological inhibition

473 of transcription disperses the VEX1 and VEX2 and RNA Pol I foci (Glover, Hutchinson et al.

474 2016, Kerry, Pegg et al. 2017, Faria, Glover et al. 2019, Faria, Luzak et al. 2021), suggesting

475 that the ESB requires active transcription. These data lead to a picture of the ESB as a

476 structure which emerges from, and is stabilized by transcription of a VSG gene.

477 We propose a model for the establishment of monoallelic expression in metacyclic cells

478 (Figure 7) where the multi-mVSG expression profiles observed in this study correlate to an

479 initial step in a “race” between each mVSG-ES (Figure 7A) to obtain a sufficient level of

480 transcription (Figure 7B) which leads to the down regulation of other sites. We propose that bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

481 following asymmetric division of attached epimastigote parasites, pre-metacyclic cells begin

482 transcribing multiple mVSG-ES. This would likely in turn recruit VEX1 and VEX2 proteins to

483 these loci, triggering a positive feedback loop, where VEX proteins recruit the mRNA splicing

484 machinery, stabilize the ESB and drive further transcription. This could lead to suppression of

485 transcription at other mVSG-ES by depletion of these limiting factors, or by the previously

486 proposed “RNA silencing”, in which transcripts from the active VSG silence transcription from

487 other VSG-ES (Glover, Hutchinson et al. 2016).

488 We note a striking similarity between the dynamics controlling the establishment of

489 monoallelic expression in both mVSG and olfactory neuron OR gene choice (Hanchate,

490 Kondoh et al. 2015). In Metazoa, monoallelic expression underpins the expression of olfactory

491 receptors (ORs). It is initiated during neuronal development in the olfactory bulb, and

492 establishment of monoallelic expression is preceded by expression of multiple olfactory

493 receptors in early mature olfactory neurons (Hanchate, Kondoh et al. 2015). A combination of

494 DNA enhancer networks and an epigenetic trap, where the expression of an OR triggers a

495 feedback loop which inhibits further activation after the first gene, establishing a single active

496 OR. Here, during development of olfactory neurons, all OR genes are modified with

497 heterochromatic histone trimethyl marks H3K4me3 and H4K20me3 (Magklara, Yen et al.

498 2011). The lysine demethylase LSD1 is then transiently expressed during development,

499 activating a single OR (Lyons, Allen et al. 2013). A feedback signal triggered by OR expression

500 utilizing the misfolded protein response inhibits further OR gene activation (Dalton, Lyons et

501 al. 2013).

502 Both systems have intermediate developmental stages where transcripts from multiple

503 mVSG/OR genes are detected, which then resolve to monoallelic states. Further, the

504 establishment of monoallelic expression of OR genes is dependent on the structured and

505 temporal control of histone methylation (Magklara, Yen et al. 2011). Similarly, in trypanosomes,

506 the histone methyltransferase DOT1B is required for strict control of monoallelic expression

507 following a transcriptional switch (Figueiredo, Janzen et al. 2008). In BSF cells, ectopic

508 expression of a second VSG leads to silencing of the active VSG that is dependent on DOT1B bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

509 (Batram, Jones et al. 2014). In the race model, the mVSG-ES who lost the “race” for

510 transcription would be silenced in a DOT1B dependent manner following the initial race,

511 consistent with the two-stage process observed in BSF cells. These similarities suggest

512 common solutions to the problem of monoallelic expression which deserve further

513 investigation. Through both experimental data, and re-analyses of published data, we are able

514 to uncover the dynamics underlying the establishment of monoallelic VSG expression. Our

515 data provide a landscape to frame these future investigations into the establishment of single

516 antigen expression in trypanosomes. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

517 Materials and methods

518

519 Tsetse fly infections and dissections

520 Tsetse flies (Glossina morsitans morsitans) were maintained at 27°C and 70%

521 hygrometry in Roubaud cages, in groups of 50 male flies per cage. Teneral flies were infected

522 with Trypanosoma brucei Antat1.1E (EATRO1125) during their first meal. Stumpy form

523 trypanosomes were resuspended at 106 cells per ml in SDM-79 (Brun and Schonenberger

524 1979) supplemented with 10% FBS (dominique Dutscher) and 60 mM N-acetylglucosamine

525 (Peacock et al., 2006). Flies were allowed to feed on infected media through a silicone

526 membrane. Following infection, flies were then maintained until dissection by feeding four

527 times per week on sheep’s blood in heparin. Flies were dissected as described previously

528 (Rotureau, Subota et al. 2012). All salivary glands were collected (uninfected and infected) in

529 trypanosome dilution buffer on ice and midguts scored for the presence of trypanosomes.

530 Salivary glands were manually broken using tweezers before being passed through a 70 µm

531 cell strainer (ClearLine) to remove the salivary glands. Trypanosomes were counted on a

532 hemocytometer and diluted to 9.4 x 104 cells per ml. The final concentration was reached by

533 adding 18 µL of OptiPrep™ Density Gradient Medium to 100 µL of cell suspension (final

534 concentration 15 % vol/vol) immediately prior to encapsulation. Cells were maintained on ice

535 throughout this process.

536

537 Microscopy

538 Live trypanosomes in SDM-79 (Brun and Schonenberger 1979) were spread on glass

539 slides and imaged on a DMR microscope (Leica) equipped with a EMCCD camera (C-9100,

540 Hamamatsu). The camera was controlled using µManager and a plugin for Hamamatsu

541 cameras (Edelstein, Tsuchida et al. 2014) and captured images were edited using Fiji is just

542 ImageJ (FIJI) (Schindelin, Arganda-Carreras et al. 2012). For the viability assay, cells were

543 scored for motility in a hemocytometer using an inverted phase contrast microscope.

544 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

545 Cell Encapsulation

546 InDrop emulsions were prepared as described in Zilionis, Nainys et al. (2017). A

547 detailed protocol compatible with the microfluidic designs described by Zilionis, Nainys et al.

548 (2017) is available on (10.5281/zenodo.3974628). Monodisperse droplets were produced by

549 injecting the 3 aqueous phases consisting of the cells mix, RT and lysis mix and barcoded

550 beads mix, and a fluorinated oil phase containing fluorosurfactant into a microfluidic chip with

551 controlled flow rates, creating an emulsion. Polydimethylsiloxane (PDMS) chips were

552 fabricated as previously described (Mazutis, Gilbert et al. 2013) at the Institut Pierre-Gilles de

553 Gennes (IPGG) in Paris. After plasma bonding, chips were made hydrophobic by manual

554 injection of 2 % 1H,1H,2H,2H-perfluorodecyltrichlorosilane (Sigma-Aldrich) diluted in HFE

555 7500 (3M Novec).

556 Three aqueous phases, respectively for the cells, reverse transcriptase/lysis and BHMs

557 (1CellBio) were prepared as follows; the cell mix contained 100 µL of cells diluted at a

558 concentration of 9.4 x 104 cells per ml in trypanosome dilution buffer (TDB) (5 mM KCl, 80 mM

559 NaCl, 1 mM MgSO4, 20 mM Na2HPO4, 2 mM NaH2PO4, 20 mM glucose, pH 7.4) supplemented

560 with 18 µL Optiprep (1CellBio). The RT mix contains 1.3 RT premix (1CellBio), 11 mM MgCl2

561 (1CellBio), 6.9 mM DTT (Invitrogen), 1.13 U/µL SUPERase IN RNAse Inhibitor (Life

562 Technologies), 20.7 U/µL Superscript III (SSIII) (Invitrogen) and 1 µM DY-647 (Dyomics), the

563 beads mix contains BHMs (1CellBio) packed in 1x Gel Concentration Buffer (1CellBio). The

564 commercial BHMs carry photocleavable barcoded primers, containing, in order, a T7 promoter

565 sequence for IVT, an Illumina adaptor, a cellular barcode, a 6 nt UMI and a 18TVN primer site,

566 as described (Zilionis, Nainys et al. 2017). The closely packing of the BHBs allows 80-90%

567 loading of a single bead per droplet (Abate, Chen et al. 2009).

568 The 1 ml syringes (Omnifix) for the cells, reverse transcriptase/lysis and BHMs were

569 loaded with mineral oil (Sigma-Aldrich) and connected to a 200 µL pipette tip prefilled with

570 mineral oil using ~30 cm of polytetrafluoroethylene (PTFE) 0.56 mm tubing and a PDMS plug

571 connecting the tubing to the tip. Mineral oil was then ejected from the syringe until it entirely bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

572 filled the pipette tip prior to aspiration of each aqueous phase. Each reagent was aspirated on

573 ice into the 200 µL pipette tips filled with mineral oil, just before encapsulation. Cells were

574 maintained on ice throughout the procedure by placing the pipette tip holding the cells in ice.

575 A short length (15 cm) of PFTE 0.56 mm tubing then connected this tube to the microfluidic

576 chip. The RT/lysis and BHB pipette tips were directly inserted into the PDMS chip, whereas

577 the tubing carrying cells from the pipette on ice was inserted into the PDMS chip. The syringe

578 for the oil phase was filled with carrier oil (HFE-7500 fluorinated oil (3M Novec) containing 2%

579 w/w 008-FluoroSurfactant (RAN Biotechnologies)) and connected to the PDMS chip by PFTE

580 0.56 mm tubing.

581 Droplet production and beads loading were followed using an inverted microscope

582 (Nikon Eclipse Ti) and High-Speed Camera (Phantom). Flow rates for droplet production were

583 controlled with syringe pumps (neMESYS 290N Low pressure Syringe Pump, Cetoni). Droplets

584 size and frequency of production were monitored using laser optics excitation and detection

585 and a soluble fluorophore DY-647 present in the droplets (Dyomics) as described (Mazutis,

586 Gilbert et al. 2013). Flow rates of 250 µL/h, 250 µL/h, 60-80 µL/h and 400-500 µL/h for the

587 cells, RT/lysis and BHMs and oil, respectively were maintained. A 5 cm PTFE tubing (0.3 mm

588 inner diameter) connects the chip outlet to a 1.5 mL collection tube, prefilled with 300 µL of

589 mineral oil. Emulsions were collected during 15 minutes on ice, corresponding to

590 approximately 4,000 drops containing both a cell and and a BHM.

591 Generation and sequencing of barcoded cDNA libraries

592 We made two minor alterations to the molecular biology workflow described by Zilionis,

593 Nainys et al. (2017): Our protocol omits the RNA fragmentation step following IVT of double

594 stranded cDNA products, and we modified the primer used to reverse transcribe amplified RNA

595 (aRNA) to make our libraries compatible with standard Illumina sequencing primers. These

596 alterations are unlikely to affect the overall performance of inDrop, however.

597 1. Recovery of first strand cDNA

598 Barcoded cDNA libraries were prepared according to the protocol by Zilionis, Nainys et al.

599 (2017) with small exceptions. Following encapsulation and ultra violet primer release, bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

600 emulsions were incubated at 50°C for 2h and then 70°C for 15 minutes. Following reverse

601 transcription (RT), emulsions were partitioned into approximately 2,000 cells per partition

602 before breaking the emulsion by adding one drop (3µL) of 1H,1H,2H,2H-Perfluorooctanol

603 (Sigma-Aldrich) (Zilionis, Nainys et al. 2017). cDNA was then stored at -80°C until further

604 processed.

605 Post-RT material was thawed on ice and 20µL of dH2O (nuclease free) was added and

606 the solution centrifuged at 16,000 g for 5 minutes at 4°C. The aqueous phase was then applied

607 to a 0.45 µm filter spin column (Corning CoStar Spin-X, Sigma-Aldrich) to remove the BHMs.

608 The emulsion tube was then washed with 20µL dH2O to collect as much cDNA as possible

609 and applied to the spin column. The column was centrifuged at 16,000 g for 5 minutes at 4°C.

610 The emulsion tube was washed once more and supernatant collected by centrifugation once

611 more.

612 2. Removal of unused primers

613 The volume of cDNA was estimated and digested by adding 100 µL of digestion mix

614 (79 µL H2O, 9 µL 10x Fast Digest Buffer (Thermo Scientific), 5 µL ExoI (20 U/µL, NEB), 7 µL

615 HinFI (10U/µL, Thermo Scientific)) per 70 µL of cDNA and digested at 37°C for 30 minutes.

616 Samples were then purified by adding 1.2x volume of room temperature AMPure XP beads

617 (Beckman Coulter), incubating 5 minutes at room temperature, collecting the beads on a

618 magnet and then 3 washes with freshly prepared 80% ethanol (room temperature), followed

619 by air drying, according to the manufacturer’s instructions. cDNA was then eluted in 17 µL of

620 dH2O.

621 3. Second strand synthesis

622 Second strand synthesis (SSS) was performed using the NEBNext Ultra II Non-

623 Directional RNA Second Strand Synthesis Module (E6111S). 2 µL of SSS buffer and 1 µL of

624 SSS enzyme mix was added to 17 µL of cDNA and the reaction was incubated at 16°C for

625 2h30m in a PCR machine with the lid set to 16°C. cDNA was purified again using 1.2x AMPure

626 XP beads (as above) and eluted in 7µL of DNA elution buffer (Zilionis, Nainys et al. 2017).

627 4. In vitro transcription and reverse transcription of amplified RNA (aRNA) bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

628 Double stranded cDNA was then amplified using the HiScribe T7 Quick High Yield RNA

629 Synthesis Kit (E2050S). To the 8µL of double stranded cDNA, 10 µL of 10x NTPs buffer mix,

630 2 µL of T7 enzyme and 1 µL of SUPERase IN RNAse Inibitor (20 U/µL). were added and the

631 reaction incubated at 37°C with the lid set to 50°C for 14-16 hours. The size and concentration

632 of amplified RNA was verified using a BioAnalyzer RNA pico chip (Agilent) using the mRNA

633 pico program. Approximately 5 ng of aRNA was then reverse transcribed by adding 1 µL of 10

634 mM dNTPs, 2 µL of Rd2N6 (see primer list) and dH2O to 13 µL. The RT reaction was heated

635 to 70°C for 3 minutes and then put on ice for 1 minute. Finally, 4 µL of 5x first strand buffer, 1

636 µL of 0.1M DTT, 1 µL of SSIII (200 U/µL) and 1 µL of SUPERase IN RNase inhibitor (20U/µL).

637 RT reactions were incubated for 5 minutes at 25°C, followed by 1h at 50°C and heat inactivated

638 at 70°C for 15 minutes. cDNA was purified by adding 20 µL of dH2O then purified with 1.2 x

639 AMPure XP beads and eluted with 20 µL dH2O. 10 µL of this was then stored at -80°C as a

640 pre-PCR backup.

641 5. Library PCR.

642 The remaining 10 µL of cDNA was used for the library PCR. Libraries were PCR amplified

643 using KAPA HiFi HotStart ReadyMix (Kapa Biosystem, KK2601). 25 µL of PCR mix containing

644 12,5 µL of 2x KAPA HiFi Master mix, 10 µL of cDNA, together with 2.5 µL of 5 µM P5-Rd1 and

645 P7-Rd2 primers (final concentration of 500 nM). Library indices were in the P7-Rd2 primer.

646 Primer sequences are available in Supplementary table 4. The libraries were amplified using

647 the following protocol; initial denaturation 5 min at 95°C, then 2 cycles of denaturation 98°C /

648 20 s, annealing 55°C / 30s and extension 72°C / 40 s, followed by 10 cycles of denaturation

649 98°C / 20 s, annealing 65°C / 30 s and extension 72°C / 40 s and final extension 5 min at 72°C.

650 They were then size selected using 0.7x volume of AMPure XP beads at room temperature

651 prior to quantification and sequencing.

652 6. Library quantification and sequencing

653 Libraries were quantified using the KAPA library quantification kit for Illumina (Kapa

654 Biosystem, KK4824) according to the manufacturers’ protocol. Bulk libraries (Figure 1B) were

655 sequenced on the Illumina MiSeq platform with a MiSeq V3 150 cycle cartridge at the Institut bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

656 Curie. Single cell barcoding libraries were sequenced on the Illumina NextSeq platform at the

657 Institut Pasteur using a Mid-output 150 cycle cartridge. Sequencing of single cell libraries at

658 BGI was performed on the MGIseq-2000 following conversion to DNBseq libraries (Senabouth,

659 Andersen et al. 2020). Details of sequencing outputs including read lengths and number of

660 reads generated can be found in Supplementary table 5.

661

662 Trypanosoma brucei transcriptome

663 We assembled a non-redundant transcriptome from the T. brucei TREU927 reference

664 strain (Berriman, Ghedin et al. 2005). As only 60% of the trypanosome transcriptome has

665 annotated 3´ UTRs, we decided to extend each gene’s annotated 3´ extremity to the 5´

666 extremity of the neighboring gene. This was done by a combination of scripted extension using

667 the following rules. 1. If the proceeding gene is on the same strand, and within 2.5 kbp, the

668 gene is extended to the base before the proceeding gene. 2. If the proceeding gene is over

669 2.5 kbp then the gene is extended to 2.5 kbp downstream. 3. If the gene already extended into

670 the proceeding gene, then no change was made. 4. Similarly if the proceeding gene is on the

671 opposite strand, no extension was made. Following extension of the entire gene list, extended

672 transcripts were filtered using a non-redundant gene list (Alsford, Turner et al. 2011).

673 To obtain sequences of metacyclic VSG genes expressed in the salivary glands, we

674 aligned our data to the VSGnome (including flanks) of the Antat1.1E (EATRO1125) strain

675 made available by Cross (2017) here (http://129.85.245.250/index.html), using bowtie2

676 (Langmead and Salzberg 2012) and selected the top 8 gene most abundant transcripts. These

677 are Tbb1125VSG-1401 (Genbank: KX699657), Tbb1125VSG-1476 (Genbank: KX699716),

678 Tbb1125VSG-1654 (Genbank: KX699860), Tbb1125VSG-385 (Genbank: KX699226),

679 Tbb1125VSG-393 (Genbank: KX698711), Tbb1125VSG-4564 (Genbank: KX700928),

680 Tbb1125VSG-4862 (Genbank: KX701110), Tbb1125VSG-4959 (Genbank: KX701187).

681 To this non-redundant transcriptome, we appended both a human transcriptome and

682 the G. morsitans morsitans genome sequence (International Glossina Genome 2014). Our

683 inDrop library preparations included 5% human B-cells as a positive control for the inDrop bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

684 pipeline, and our bowtie2 index therefore contained the ENSEMBL release 85 human

685 transcriptome (Yates, Achuthan et al. 2020). The alignment index was constructed with

686 bowtie2-build (Langmead and Salzberg 2012). These files are available at

687 10.5281/zenodo.3974628.

688

689 Generation and analysis of bulk inDrop data

690 Cultured procyclic form Antat1.1E (EATRO1125) trypanosomes were encapsulated

691 using a standard inDrop microfluidic chip, except we replaced BHMs with a single primer,

692 T7Rd2_linkerPolyTvN (Supplementary table 4). Libraries were generated as described by

693 Zilionis, Nainys et al. (2017), and as described for single cell BHMs in this manuscript, except

694 we excluded stages relating to filtration of BHMs. Libraries were sequenced on a MiSeq at the

695 Institut Curie (Paris) next generation sequencing platform.

696 Bulk data were aligned to the trypanosome genome sequence using bowtie2

697 (Langmead and Salzberg 2012), and resulting bam files manipulated using samtools (Li,

698 Handsaker et al. 2009). Metagene plots and heat maps were generated using deeptools2

699 (Ramirez, Ryan et al. 2016). Gene coordinates used in the metaplot were the 5´and 3´splicing

700 and polyadenylation signal sites (or start and stop codons if unannotated for a particular gene).

701 Bin size was set to 10 bp and the 500 bp region downstream from each gene are unscaled.

702 The gene list was filtered to a non-redundant set (Alsford, Turner et al. 2011).

703

704 inDrop pre-processing pipeline

705 To permit more usability with bespoke genome sequences (no reference sequence is

706 available for the strain used here for tsetse infections), we generated a new inDrop data

707 analysis pipeline for the analysis of trypanosomes, using bowtie2 and UMI-tools (Langmead

708 and Salzberg 2012, Smith, Heger et al. 2017), rather than bowtie and the inDrop analysis

709 pipeline (Klein, Mazutis et al. 2015). Pre-processing of inDrop data was performed using the

710 UMI-tools package (Smith, Heger et al. 2017). A bash script is available at

711 (10.5281/zenodo.3974628) which performs all steps of the pre-processing. The script bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

712 executes 5 steps: 1. whitelist, 2. fastqsplitter, 3. extract, 4. align and 5. count. Whitelist

713 generates a list of acceptable barcodes, using a regular expression to pattern match the cell

714 barcode flanked by the common “W1” sequence (Klein, Mazutis et al. 2015, Zilionis, Nainys et

715 al. 2017) permitting upto 6 errors in the W1, and requiring a string of 3 “T” nucleotides after the

716 UMI. We set the error correction threshold for barcodes at 2, and set the expected number of

717 cells at 2,000. We subset 106 reads to perform the whitelist step (these same filtering

718 parameters are maintained through our workflow). Fastqsplitter breaks the read files into a

719 user-defined number of equal parts to allow downstream parallel processing. Extract then

720 places the whitelisted cell barcodes from read 1 into the header of mRNA sequences in read

721 2, using the regex pattern matching parameters from whitelist. The script can initiate multiple

722 instances of extract to reduce the time required for this stage. Align uses bowtie2 (Langmead

723 and Salzberg 2012) to align the reads to the hybrid trypanosome, human transcriptomes and

724 tsetse fly genome. Reads aligning to the tsetse genome were filtered out using samtools view

725 (Li, Handsaker et al. 2009). Finally count generates count matrices at mapping quality 0 and

726 40. Higher mapping stringency was used for VSG counts to ensure accurate mapping.

727

728 Analysis of single cell RNA-seq data

729 Single cell count matrices were analysed using the Seurat R package (Butler, Hoffman

730 et al. 2018, Stuart, Butler et al. 2019). Data were normalized using the single cell transform

731 algorithm (Hafemeister and Satija 2019). Complete R scripts are available in

732 (10.5281/zenodo.3974628) to generate all the figures in the manuscript. Count tables from

733 mapq 0 and mapq 40 matrices were merged into a single matrix by removing VSG counts with

734 mapq 0 and merging the mapq 40 counts to this reduced matrix. Valid barcodes were filtered

735 by a minimum gene cut-off of 300. VSG UMI count data were extracted using fetchData,

736 accessing the “counts” slot of the RNA assay.

737 To assess the sequencing depth, we employed data subsetting, as done previously by

738 Zhang, Li et al. (2019). Data sets were subset using samtools and UMI totals per barcode

739 extracted in R, retaining only valid barcodes (R script on 10.5281/zenodo.3974628). To bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

740 calculate the number of mapped reads per cell, a custom script was developed to count the

741 number of aligned reads per barcode (available on 10.5281/zenodo.3974628). Subset reads

742 were then calculated as a fraction of the total. To calculate the Levenshtein distance between

743 valid barcodes, the barcodes for replicate 1 were extracted and all possible pairs were

744 evaluated using a python script (10.5281/zenodo.3974628). Histogram and plotting were

745 performed in R. To estimate the expected rate of doublets a Poisson series was generated

746 corresponding to the predicted occupancy of cells in droplets. This was then compared to the

747 frequency of multi-VSG expression observed in the pre-metacyclic cell cluster. The metabolism

748 analysis was performed using metabolic funtion annotations and schematic for glycolysis and

749 TCA cycle from trypanocyc (Shameer, Logan-Klumpler et al. 2015). Scaled (Z-scores) are from

750 the SCT assay of Seurat in the “scaled.data” slot.

751

752 Single molecule RNA-FISH

753 Overexpression of RBP6 was performed using PT1RBP6-OE cells (a kind gift provided by

754 Lucy Glover, Institut Pasteur). PT1 cells are equivalent to the 2T1 BSF (Alsford, Kawahara et

755 al. 2005). Wild type Lister 427 PCF cells were transfected with pHD1313 (Alibu, Storm et al.

756 2005). Second, a ribosomal spacer was targeted with pRPH, integrating a hygromycin

757 resistance cassette and eGFP reporter gene (GFP tagged Sir2.3) with a RNA Pol-I promoter

758 under the control of a tet operator (Alsford, Kawahara et al. 2005). The RPH locus was then

759 disrupted with pH3E (Alsford, Kawahara et al. 2005), replacing hygromycin with a puromycin

760 resistance cassette, resulting in PT1 cells. Cells were induced by addition of 10 µg/ml of

761 tetracycline to the culture medium and diluting every 24h to 2 x 106 cells per ml. We made

762 staggered inductions to obtain 3, 4 and 5 day induced cultures on the same day. We used the

763 ViewRNA™ Cell Plus Assay Kit (Catalog number 88-19000-99, Thermo Fisher) for smRNA-

764 FISH. The company designed probes based on the VSG-397 and VSG-531 or alpha tubulin

765 coding sequences. We followed the manufacturer’s instructions with a few minor exceptions.

766 Before settling cells onto slides, we chilled cultures on ice and then fixed in ice cold 1 %

767 paraformaldehyde. Fixed cells were immediately washed in ice cold PBS by spinning 3x at bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

768 1,000 g for 10 minutes. Cells were settled onto 8-well glass chamber slides (Nunc™ Lab-Tek™

769 II Chamber Slide™ System) and stained according to the manufacturer’s protocol. A minimum

770 of 100 µL was used to avoid drying samples during incubations. Cells were imaged on a

771 widefield Zeiss Axio Imager.Z2 upright microscope equipped with an LED light source, and an

772 Axiocam 506 camera. Images were analysed using FIJI (Schindelin, Arganda-Carreras et al.

773 2012, Rueden, Schindelin et al. 2017).

774

775 Acknowledgements

776 We thank George Cross for making the Antat1.1E (EATRO1125) VSGnome available

777 to the research community. Artur Scherf and the Biomics platform at the Institut Pasteur

778 provided access to the Illumina NextSeq platform. Jessica Bryant and Sebastian Baumgarten

779 provided valuable assistance with Illumina sequencing, as well as many useful discussions,

780 and critical reading of the manuscript. Sophie Créno from the Institut Pasteur high performance

781 computing facility helped with computational workflows. Christelle Travaillé aided with tsetse

782 fly infections. We thank David Horn (University of Dundee) for valuable advice and discussions.

783 We thank Lucy Glover, Serge Bonnefoy, and Aline Araujo Alves for critical reading of the

784 manuscript. We are indebted to Eliane Thion and Lucy Glover (Institut Pasteur, Paris) for the

785 RBP6 strain, to Alena Zíková (Institute of Parasitology, České Budějovice) for advice on

786 inductions, and to Susanne Kramer (Universität Würzburg, Würzburg) for advice on smRNA-

787 FISH.

788

789 Funding

790 This project has received funding from the European Union’s Horizon 2020 research and

791 innovation programme under the Marie Skłodowska-Curie grant agreement No 794979. SH

792 was funded by a European Union Marie Skłodowska-Curie (No 794979) and an Institut Pasteur

793 Roux-Cantarini fellowship. Funding was provided by the Institut Pasteur, and the French

794 Government Investissement d’Avenir Programme—Laboratoire d’Excellence “Integrative

795 Biology of Emerging Infectious Diseases” (ANR-10-LABX-62-IBEID). ICGex NGS platform of bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

796 the Institut Curie supported by the grants ANR-10-EQPX-03 (Equipex) and ANR-10-INBS-09-

797 08 (France Génomique Consortium) from the Agence Nationale de la Recherche

798 ("Investissements d’Avenir" program), by the Canceropole Ile-de-France and by the SiRIC-

799 Curie program - SiRIC Grant INCa-DGOS- 4654. This work has received the support of the

800 “Institut Pierre-Gilles de Gennes” (IPGG) through the “Investissements d’avenir” programs

801 Equipex ANR-10- EQPX-34 and Labex ANR-10-LABX-31. R.M. was supported by the ANR

802 project "Cellectchip" ANR-14-CE10-0013.

803

804 Data Availability

805 All sequencing data generated in this study is available from the European nucleotide

806 archive, accession number PRJEB43345. All scripts and code used to generate the figures

807 are available from Zenodo 10.5281/zenodo.3974628.

808 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

809 Figure legends

810 Figure 1. inDrop analysis of cultured bloodstream form and procyclic trypanosomes. A.

811 Schematic of the inDrop microfluidic chip, adapted from Klein, Mazutis et al. (2015). B. inDrop

812 performed in a microfluidic format, using a single primer to replace the BHMs (no single-cell

813 resolution). The plots show the reads per kilobase per million mapped (RPKM) for annotated

814 T. brucei transcripts aligned from their 5´splice site to the 3´polyadenylation sequence. Profile

815 plots (above) show the average RPKM across the ~7,500 non-redundant gene set for

816 trypanosomes. Heatmaps (below) show the RPKM for individual transcripts. C. UMAP

817 projection of single-cell barcoding data for a population of cells containing calculated

818 proportions of 47.5% for each BSF and PCF, and 5% Ramos cells. Measured cell proportions

819 are 51% BSF, 41% PCF and 8% Ramos. D. Violin plots depict the total number of UMIs and

820 genes captured per cell within BSF, PCF and Ramos cell clusters. E. UMAP projections from

821 C overlaid with colour scale of gene expression values for EP1 procyclin (Tb927.10.10260),

822 VSG-2 (Tb427.BES40.22) and Immunoglobulin lambda constant 3 (IGLC3).

823

824 Figure 2: InDrop barcoding of salivary gland T. brucei. A. UMAP projection of two technical

825 replicates (Lib1 and Lib2, red and blue, respectively). B. Merged UMAP projection of technical

826 replicates with clustering information as indicated. C. Gene expression levels for marker genes

827 in the salivary glands, EP1 procyclin (midgut stages) (Roditi 1987 nature): Tb927.10.10260,

828 BARP (epimastigote stage): Tb927.9.15640 (Urwyler, Studer et al. 2007), VSG-393:

829 (Genbank) KC612418.1 (Cross 2017), RBP6: Tb927.3.2930 (Kolev, Ramey-Butler et al. 2012),

830 Calflagin: Tb927.8.5440 (Rotureau, Subota et al. 2012), HAP2: Tb927.10.10770 (Fedry, Liu et

831 al. 2017).

832

833 Figure 3: Cluster-based analysis of main energy metabolism transcripts. A. The schematic

834 shows the glycolytic and TCA pathways (Shameer, Logan-Klumpler et al. 2015). Compounds

835 (black text) and enzymes (red text for glycolysis, blue text for TCA cycle) are shown and

836 reactions are represented by arrows. B. Plot shows the expression Z-scores for each glycolysis bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

837 or TCA cycle enzyme transcript retained after SCT normalization (Hafemeister and Satija

838 2019). Thicker lines show the average across each cohort. AE: attached epimastigote cells;

839 G: Gamete cells; Pre-M: pre-metacyclic cells: M: metacyclic cells. C. Single cell level analysis

840 of metabolic transcripts. Cells are arranged by cluster (left to right) and relative transcript

841 expression shown per cell. Colours represent Z-transformed expression values. Cluster

842 abbreviations as for B. Abbreviations of metabolites: G-6-P: glucose-6-phosphate; F-6-P:

843 fructose-6-phosphate; F-1,6-BP: fructose-1,6-bisphosphate; DHA-P: dihydroxyacetone

844 phosphate; GA-3-P: glyceraldehyde-3-phosphate; 1,3-BP-GA: 1,3-bisphosphoglycerate; 3-P-

845 GA: 3-phosphoglycerate; 2-P-GA: 2-phosphoglycerate; P-EP: phospho-enol pyruvate.

846 Abbreviations of enzymes: HK1: hexokinase 1; PGI phosphoglucose isomerase; PFK:

847 phosphofructokinase; ALD: aldolase; TIM: triose-phosphate isomerase; GAPDH:

848 glyceraldehyde-3-phosphate dehydrogenase; PGK(C/A): phosphoglycerate kinase, PGAM:

849 phosphoglycerate mutase; ENO: enolase; PYK1: pyruvate kinase; CS: citrate synthase; ACO:

850 aconitase; IDH: isocitrate dehydrogenase; SCSa: succinyl coenzyme A synthetase; FHc,

851 fumarate hydratase, cytosolic; gMDH: glycosomal malate dehydrogenase; mMDH:

852 mitochondrial malate dehydrogenase.

853

854 Figure 4: Analysis of gamete cluster. A. Normalised expression values for BARP

855 (Tb927.9.15640) and HAP2 (Tb927.10.10770) overlaid on UMAP projections of attached

856 epimastigote and gamete cell clusters. B. Normalised expression values for canonical

857 histones: H1 Tb927.11.1800, H2A Tb927.7.2820, H2B Tb927.10.10460, H3 Tb927.1.2430 and

858 H4 Tb927.5.4170. C. Sub-clustering of cells in S-Phase. Clusters were manually re-annotated

859 using the Seurat function CellSelector. Two new clusters of cells expressing both core histones

860 and either BARP or HAP2 were created and are indicated in brown and red, respectively.

861

862 Figure 5: Developmental progression of VSG expression. A. VSG expression data for a

863 random subset of 50 pre-metacyclic cells, with a total VSG UMI count > 10. Data are raw

864 (unscaled) UMI counts. Each column represents a cell and each colour a different VSG. B. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

865 Histogram showing the number of VSG expressed for all pre-metacyclic cells (per VSG UMI

866 count > 2, all cells in cluster). C. As for A except that the data are a subset of 50 metacyclic

867 cells with total VSG UMI counts >10. D. Histogram as for B except showing metacyclic VSG

868 transcript diversity (per VSG UMI count >2, all cells in cluster).

869

870 Figure 6. Single molecule RNA-FISH of in vitro metacyclogenesis. A. Panel of representative

871 images of VSG+ and VSG++ cells containing both VSG-397 and VSG-531 transcripts (i-iii) and

872 VSG+++ cells where a single VSG type is present (iv and v). B. Above, proportions of VSG-

873 397 and VSG-531 transcripts for VSG+, VSG++ and VSG+++ cells. Each cell is plotted as a

874 single column on the x-axis. An estimate of transcript abundance for VSG++ and VSG+++ cells

875 was used based on published estimates of VSG transcript abundance in BSF (Haanstra,

876 Stewart et al. 2008). Below, transcript abundance for VSG+ cells, and estimated transcript

877 abundance for VSG+ and VSG++ cells.

878

879 Figure 7. Model of the initiation of VSG monoallelic expression during metacyclogenesis. A.

880 Pre-metacyclic cells initiate a transcriptional race for monoallelic expression where a single

881 VSG gene must reach a threshold of transcriptional activity. B. In metacyclic cells, this

882 threshold has been reached for a single VSG that is active, while the transcription of the other

883 mVSGs is silenced.

884

885 Supplementary figure legends.

886

887 Figure 1-figure supplement 1. Schematic of the life cycle of T. brucei, adapted from Rotureau

888 and Van Den Abbeele (2013). The morphology and surface protein expression of

889 trypanosomes during differentiation is depicted in chronological order from left to right. Cell

890 types shown here are SL: slender bloodstream form; ST: stumpy; PCF: procyclic form; Epi:

891 epimastigote; AE: attached epimastigote; ET: epimastigote-trypomastigote dividing cell; pMT:

892 pre-metacyclic cell; MT: metacyclic. The nucleus is depicted as a white circle / oval and the bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

893 kinetoplast (mitochondrial DNA) as a yellow circle. In trypomastigote cells, the kinetoplast is

894 posterior to the nucleus, e.g. in PCF. In epimastigote cells, the kinetoplast is anterior to the

895 nucleus, e.g. in Epi. Cells are coloured according to the family of surface protein expressed.

896

897 Figure 1-figure supplement 2. Survival and RNA metabolism of trypanosomes expressing the

898 indicated fusion proteins in conditions mimicking the inDrop procedure. A. Trypanosomes

899 collected by centrifugation and resuspended in either SDM-79 medium, or PBS and incubated

900 at either 0°C or 27°C. Scale bar 10 µm. B. Results of live/dead assay based on trypanosome

901 motility, by scoring live cells subjected to phase contrast microscopy. We assessed the ability

902 of the parasites to survive starvation at 0°C for 2h in PBS. C. Histogram shows the percentage

903 of stringently aligned reads (mapQ > 40) to the trypanosome or human transcriptomes.

904

905 Figure 2-figure supplement 1. Details of tsetse fly infections. Flies were infected for 28 days.

906 The table shows the characteristics of the dissection. Below is the workflow for dissections and

907 sample collection.

908

909 Figure 2-figure supplement 2. Library construction, basic statistics and subsetting analysis. A.

910 Tapestation or Bioanalyzer traces for IVT and library PCR reactions to construct libraries

911 sequenced in this study. B. Table shows the number of cells recovered, and key parameters

912 including UMIs and genes captured. C. Subsetting analysis of the data. Reads were

913 subsampled using samtools (Li, Handsaker et al. 2009) in 10% intervals from 10% to 100%

914 before being re-counted using UMI-tools (Smith, Heger et al. 2017). Aligned reads per cell

915 were counted using a custom script (10.5281/zenodo.3974628).

916

917 Figure 2-figure supplement 3. Analysis of possible technical artefacts. A. Ambient RNA

918 analysis. Plot shows the frequency of VSG UMI detection in a cluster of Ramos cells. B.

919 Histogram shows the percentage of stringently aligned reads (mapQ > 40) to the trypanosome

920 or human transcriptomes. C. Barcode switching analysis. The plot shows the frequency of bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

921 Levenshtein distances (number of changes required to permute one barcode to another)

922 between all pairs of barcodes in replicate 1 library 1 (1,129 barcodes). Red line shows cut-off

923 for error correction.

924

925 Figure 3-figure supplement 1. Analysis of metabolic enzymes in salivary gland clusters. As in

926 Figure 3B except each gene is plotted individually.

927

928 Figure 5-figure supplement 1. VSG expression analysis for attached epimastigote, pre

929 metacyclic and metacyclic clusters. Data presented have not been filtered (no minimum UMI

930 cutoff per cell) and consist of 50 random cells. Units are raw (non-normalised and unscaled)

931 UMI counts per cell. All VSG counts use a stringent mapping quality (MapQ40).

932

933 Figure 5-figure supplement 2. Plot shows the number of VSG genes detected per barcode

934 versus the aligned reads per barcode stratified by cluster. Boxplots show bounds are 25th 50th

935 and 75th percentiles. Individual barcodes are plotted as dots.

936

937 Figure 5-figure supplement 3. mVSG expression analysis of data from Vigneron, O'Neill et al.

938 (2020). UMAP projection of scRNA-seq data indicating the classifications of cells. B.

939 Normalised expression values overlaid on UMAP projections for VSG (IlTat1.22 AJ012198.3),

940 BARP (Tb927.9.15640), HAP2 (Tb927.10.10770) and EP1-Procyclin (Tb927.10.10260). C.

941 VSG expression profiles for pre-metacyclic (above) and metacyclic cells (below). Left, VSG

942 expression data for all pre-metacyclic cells with a total VSG UMI count > 10 and a random

943 subset of 50 metacyclic cells with a total VSG UMI count > 10. Data are raw (unscaled) UMI

944 counts. Each column represents a cell and each colour a different VSG. Right, histogram

945 shows the number of VSG expressed for all pre-metacyclic or metacyclic cells (per VSG UMI

946 count > 2, all cells in cluster).

947 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

948 References

949 950 Abate, A. R., C. H. Chen, J. J. Agresti and D. A. Weitz (2009). "Beating Poisson 951 encapsulation statistics using close-packed ordering." Lab Chip 9(18): 2628-2631.

952 Alibu, V. P., L. Storm, S. Haile, C. Clayton and D. Horn (2005). "A doubly inducible 953 system for RNA interference and rapid RNAi plasmid construction in Trypanosoma 954 brucei." Mol Biochem Parasitol 139(1): 75-82.

955 Alsford, S. and D. Horn (2012). "Cell-cycle-regulated control of VSG expression site 956 silencing by histones and histone chaperones ASF1A and CAF-1b in Trypanosoma 957 brucei." Nucleic Acids Res 40(20): 10150-10160.

958 Alsford, S., T. Kawahara, L. Glover and D. Horn (2005). "Tagging a T. brucei RRNA 959 locus improves stable transfection efficiency and circumvents inducible expression 960 position effects." Mol Biochem Parasitol 144(2): 142-148.

961 Alsford, S., D. J. Turner, S. O. Obado, A. Sanchez-Flores, L. Glover, M. Berriman, C. 962 Hertz-Fowler and D. Horn (2011). "High-throughput phenotyping using parallel 963 sequencing of RNA interference targets in the African trypanosome." Genome Res 964 21(6): 915-924.

965 Aresta-Branco, F., S. Pimenta and L. M. Figueiredo (2016). "A transcription- 966 independent epigenetic mechanism is associated with antigenic switching in 967 Trypanosoma brucei." Nucleic Acids Res 44(7): 3131-3146.

968 Batram, C., N. G. Jones, C. J. Janzen, S. M. Markert and M. Engstler (2014). 969 "Expression site attenuation mechanistically links antigenic variation and development 970 in Trypanosoma brucei." Elife 3: e02324.

971 Berriman, M., E. Ghedin, C. Hertz-Fowler, G. Blandin, H. Renauld, D. C. Bartholomeu, 972 N. J. Lennard, E. Caler, N. E. Hamlin, B. Haas, U. Bohme, L. Hannick, M. A. Aslett, J. 973 Shallom, L. Marcello, L. Hou, B. Wickstead, U. C. Alsmark, C. Arrowsmith, R. J. Atkin, 974 A. J. Barron, F. Bringaud, K. Brooks, M. Carrington, I. Cherevach, T. J. Chillingworth, 975 C. Churcher, L. N. Clark, C. H. Corton, A. Cronin, R. M. Davies, J. Doggett, A. Djikeng, 976 T. Feldblyum, M. C. Field, A. Fraser, I. Goodhead, Z. Hance, D. Harper, B. R. Harris, 977 H. Hauser, J. Hostetler, A. Ivens, K. Jagels, D. Johnson, J. Johnson, K. Jones, A. X. 978 Kerhornou, H. Koo, N. Larke, S. Landfear, C. Larkin, V. Leech, A. Line, A. Lord, A. 979 Macleod, P. J. Mooney, S. Moule, D. M. Martin, G. W. Morgan, K. Mungall, H. 980 Norbertczak, D. Ormond, G. Pai, C. S. Peacock, J. Peterson, M. A. Quail, E. 981 Rabbinowitsch, M. A. Rajandream, C. Reitter, S. L. Salzberg, M. Sanders, S. Schobel, 982 S. Sharp, M. Simmonds, A. J. Simpson, L. Tallon, C. M. Turner, A. Tait, A. R. Tivey, 983 S. Van Aken, D. Walker, D. Wanless, S. Wang, B. White, O. White, S. Whitehead, J. 984 Woodward, J. Wortman, M. D. Adams, T. M. Embley, K. Gull, E. Ullu, J. D. Barry, A. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

985 H. Fairlamb, F. Opperdoes, B. G. Barrell, J. E. Donelson, N. Hall, C. M. Fraser, S. E. 986 Melville and N. M. El-Sayed (2005). "The genome of the African trypanosome 987 Trypanosoma brucei." Science 309(5733): 416-422.

988 Brun, R. and Schonenberger (1979). "Cultivation and in vitro cloning or procyclic 989 culture forms of Trypanosoma brucei in a semi-defined medium. Short 990 communication." Acta Trop 36(3): 289-292.

991 Butler, A., P. Hoffman, P. Smibert, E. Papalexi and R. Satija (2018). "Integrating single- 992 cell transcriptomic data across different conditions, technologies, and species." Nat 993 Biotechnol 36(5): 411-420.

994 Cross, G. A. (1975). "Identification, purification and properties of clone-specific 995 glycoprotein antigens constituting the surface coat of Trypanosoma brucei." 996 Parasitology 71(3): 393-417.

997 Cross, G. A. (2017). "http://129.85.245.250/index.html."

998 Dalton, R. P., D. B. Lyons and S. Lomvardas (2013). "Co-opting the unfolded protein 999 response to elicit olfactory receptor feedback." Cell 155(2): 321-332.

1000 Dolezelova, E., M. Kunzova, M. Dejung, M. Levin, B. Panicucci, C. Regnault, C. J. 1001 Janzen, M. P. Barrett, F. Butter and A. Zikova (2020). "Cell-based and multi-omics 1002 profiling reveals dynamic metabolic repurposing of mitochondria to drive 1003 developmental progression of Trypanosoma brucei." PLoS Biol 18(6): e3000741.

1004 Duraisingh, M. T. and D. Horn (2016). "Epigenetic Regulation of Virulence Gene 1005 Expression in Parasitic Protozoa." Cell Host Microbe 19(5): 629-640.

1006 Edelstein, A. D., M. A. Tsuchida, N. Amodaj, H. Pinkard, R. D. Vale and N. Stuurman 1007 (2014). "Advanced methods of microscope control using muManager software." J Biol 1008 Methods 1(2).

1009 Ersfeld, K., R. Docherty, S. Alsford and K. Gull (1996). "A fluorescence in situ 1010 hybridisation study of the regulation of histone mRNA levels during the cell cycle of 1011 Trypanosoma brucei." Mol Biochem Parasitol 81(2): 201-209.

1012 Faria, J., L. Glover, S. Hutchinson, C. Boehm, M. C. Field and D. Horn (2019). 1013 "Monoallelic expression and epigenetic inheritance sustained by a Trypanosoma 1014 brucei variant surface glycoprotein exclusion complex." Nat Commun 10(1): 3023. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1015 Faria, J., V. Luzak, L. S. M. Muller, B. G. Brink, S. Hutchinson, L. Glover, D. Horn and 1016 T. N. Siegel (2021). "Spatial integration of transcription and splicing in a dedicated 1017 compartment sustains monogenic antigen expression in African trypanosomes." Nat 1018 Microbiol.

1019 Fedry, J., Y. Liu, G. Pehau-Arnaudet, J. Pei, W. Li, M. A. Tortorici, F. Traincard, A. 1020 Meola, G. Bricogne, N. V. Grishin, W. J. Snell, F. A. Rey and T. Krey (2017). "The 1021 Ancient Gamete Fusogen HAP2 Is a Eukaryotic Class II Fusion Protein." Cell 168(5): 1022 904-915 e910.

1023 Figueiredo, L. M., C. J. Janzen and G. A. Cross (2008). "A histone methyltransferase 1024 modulates antigenic variation in African trypanosomes." PLoS Biol 6(7): e161.

1025 Glover, L., S. Hutchinson, S. Alsford and D. Horn (2016). "VEX1 controls the allelic 1026 exclusion required for antigenic variation in trypanosomes." Proc Natl Acad Sci U S A 1027 113(26): 7225-7230.

1028 Grabherr, M. G., B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. 1029 Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, Z. Chen, E. Mauceli, N. Hacohen, A. 1030 Gnirke, N. Rhind, F. di Palma, B. W. Birren, C. Nusbaum, K. Lindblad-Toh, N. Friedman 1031 and A. Regev (2011). "Full-length transcriptome assembly from RNA-Seq data without 1032 a reference genome." Nat Biotechnol 29(7): 644-652.

1033 Gunzl, A., T. Bruderer, G. Laufer, B. Schimanski, L. C. Tu, H. M. Chung, P. T. Lee and 1034 M. G. Lee (2003). "RNA polymerase I transcribes procyclin genes and variant surface 1035 glycoprotein gene expression sites in Trypanosoma brucei." Eukaryot Cell 2(3): 542- 1036 551.

1037 Haanstra, J. R., M. Stewart, V. D. Luu, A. van Tuijl, H. V. Westerhoff, C. Clayton and 1038 B. M. Bakker (2008). "Control and regulation of gene expression: quantitative analysis 1039 of the expression of phosphoglycerate kinase in bloodstream form Trypanosoma 1040 brucei." J Biol Chem 283(5): 2495-2507.

1041 Hafemeister, C. and R. Satija (2019). "Normalization and variance stabilization of 1042 single-cell RNA-seq data using regularized negative binomial regression." Genome 1043 Biol 20(1): 296.

1044 Hanchate, N. K., K. Kondoh, Z. Lu, D. Kuang, X. Ye, X. Qiu, L. Pachter, C. Trapnell 1045 and L. B. Buck (2015). "Single-cell transcriptomics reveals receptor transformations 1046 during olfactory neurogenesis." Science 350(6265): 1251-1255.

1047 Hertz-Fowler, C., L. M. Figueiredo, M. A. Quail, M. Becker, A. Jackson, N. Bason, K. 1048 Brooks, C. Churcher, S. Fahkro, I. Goodhead, P. Heath, M. Kartvelishvili, K. Mungall, 1049 D. Harris, H. Hauser, M. Sanders, D. Saunders, K. Seeger, S. Sharp, J. E. Taylor, D. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1050 Walker, B. White, R. Young, G. A. Cross, G. Rudenko, J. D. Barry, E. J. Louis and M. 1051 Berriman (2008). "Telomeric expression sites are highly conserved in Trypanosoma 1052 brucei." PLoS One 3(10): e3527.

1053 Hoare, C. A. and F. G. Wallace (1966). "Developmental Stages of Trypanosomatid 1054 Flagellates: a New Terminology." Nature 212(5068): 1385-1386.

1055 Horn, D. (2014). "Antigenic variation in African trypanosomes." Mol Biochem Parasitol 1056 195(2): 123-129.

1057 International Glossina Genome, I. (2014). "Genome sequence of the tsetse fly 1058 (Glossina morsitans): vector of African trypanosomiasis." Science 344(6182): 380-386.

1059 Islam, S., A. Zeisel, S. Joost, G. La Manno, P. Zajac, M. Kasper, P. Lonnerberg and 1060 S. Linnarsson (2014). "Quantitative single-cell RNA-seq with unique molecular 1061 identifiers." Nat Methods 11(2): 163-166.

1062 Janzen, C. J., S. B. Hake, J. E. Lowell and G. A. Cross (2006). "Selective di- or 1063 trimethylation of histone H3 lysine 76 by two DOT1 homologs is important for cell cycle 1064 regulation in Trypanosoma brucei." Mol Cell 23(4): 497-507.

1065 Kassem, A., E. Pays and L. Vanhamme (2014). "Transcription is initiated on silent 1066 variant surface glycoprotein expression sites despite monoallelic expression in 1067 Trypanosoma brucei." Proc Natl Acad Sci U S A 111(24): 8943-8948.

1068 Kerry, L. E., E. E. Pegg, D. P. Cameron, J. Budzak, G. Poortinga, K. M. Hannan, R. D. 1069 Hannan and G. Rudenko (2017). "Selective inhibition of RNA polymerase I 1070 transcription as a potential approach to treat African trypanosomiasis." PLoS Negl Trop 1071 Dis 11(3): e0005432.

1072 Klein, A. M., L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. 1073 A. Weitz and M. W. Kirschner (2015). "Droplet barcoding for single-cell transcriptomics 1074 applied to embryonic stem cells." Cell 161(5): 1187-1201.

1075 Klein, G., B. Giovanella, A. Westman, J. S. Stehlin and D. Mumford (1975). "An EBV- 1076 genome-negative cell line established from an American Burkitt lymphoma; receptor 1077 characteristics. EBV infectibility and permanent conversion into EBV-positive sublines 1078 by in vitro infection." Intervirology 5(6): 319-334.

1079 Kolev, N. G., K. Ramey-Butler, G. A. Cross, E. Ullu and C. Tschudi (2012). 1080 "Developmental progression to infectivity in Trypanosoma brucei triggered by an RNA- 1081 binding protein." Science 338(6112): 1352-1353. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1082 Kolev, N. G., E. Ullu and C. Tschudi (2014). "The emerging role of RNA-binding 1083 proteins in the life cycle of Trypanosoma brucei." Cell Microbiol 16(4): 482-489.

1084 L McInnes, J Healy and J. Melville (2018). "Umap: Uniform manifold approximation 1085 and projection for dimension reduction." arxiv arXiv:1802.03426.

1086 Langmead, B. and S. L. Salzberg (2012). "Fast gapped-read alignment with Bowtie 2." 1087 Nature Methods 9(4): 357-359.

1088 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. 1089 Abecasis, R. Durbin and G. P. D. P. Subgroup (2009). "The Sequence Alignment/Map 1090 format and SAMtools." Bioinformatics 25(16): 2078-2079.

1091 Lyons, D. B., W. E. Allen, T. Goh, L. Tsai, G. Barnea and S. Lomvardas (2013). "An 1092 epigenetic trap stabilizes singular olfactory receptor expression." Cell 154(2): 325-336.

1093 Magklara, A., A. Yen, B. M. Colquitt, E. J. Clowney, W. Allen, E. Markenscoff- 1094 Papadimitriou, Z. A. Evans, P. Kheradpour, G. Mountoufaris, C. Carey, G. Barnea, M. 1095 Kellis and S. Lomvardas (2011). "An epigenetic signature for monoallelic olfactory 1096 receptor expression." Cell 145(4): 555-570.

1097 Mazutis, L., J. Gilbert, W. L. Ung, D. A. Weitz, A. D. Griffiths and J. A. Heyman (2013). 1098 "Single-cell analysis and sorting using droplet-based microfluidics." Nat Protoc 8(5): 1099 870-891.

1100 Mosser, D. M. and J. F. Roberts (1982). "Trypanosoma brucei: recognition in vitro of 1101 two developmental forms by murine macrophages." Exp Parasitol 54(3): 310-316.

1102 Muller, L. S. M., R. O. Cosentino, K. U. Forstner, J. Guizetti, C. Wedel, N. Kaplan, C. 1103 J. Janzen, P. Arampatzi, J. Vogel, S. Steinbiss, T. D. Otto, A. E. Saliba, R. P. Sebra 1104 and T. N. Siegel (2018). "Genome organization and DNA accessibility control antigenic 1105 variation in trypanosomes." Nature 563(7729): 121-125.

1106 Navarro, M. and K. Gull (2001). "A pol I transcriptional body associated with VSG 1107 mono-allelic expression in Trypanosoma brucei." Nature 414(6865): 759-763.

1108 Peacock, L., M. Bailey, M. Carrington and W. Gibson (2014). "Meiosis and haploid 1109 gametes in the pathogen Trypanosoma brucei." Curr Biol 24(2): 181-186.

1110 Peacock, L., V. Ferris, R. Sharma, J. Sunter, M. Bailey, M. Carrington and W. Gibson 1111 (2011). "Identification of the meiotic life cycle stage of Trypanosoma brucei in the tsetse 1112 fly." Proc Natl Acad Sci U S A 108(9): 3671-3676. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1113 Pinger, J., S. Chowdhury and F. N. Papavasiliou (2017). "Variant surface glycoprotein 1114 density defines an immune evasion threshold for African trypanosomes undergoing 1115 antigenic variation." Nature Communications 8(1): 828.

1116 Ramey-Butler, K., E. Ullu, N. G. Kolev and C. Tschudi (2015). "Synchronous 1117 expression of individual metacyclic variant surface glycoprotein genes in Trypanosoma 1118 brucei." Mol Biochem Parasitol 200(1-2): 1-4.

1119 Ramirez, F., D. P. Ryan, B. Gruning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, 1120 F. Dundar and T. Manke (2016). "deepTools2: a next generation web server for deep- 1121 sequencing data analysis." Nucleic Acids Res 44(W1): W160-165.

1122 Roditi, I., M. Carrington and M. Turner (1987). "Expression of a polypeptide containing 1123 a dipeptide repeat is confined to the insect stage of Trypanosoma brucei." Nature 1124 325(6101): 272-274.

1125 Roditi, I., H. Schwarz, T. W. Pearson, R. P. Beecroft, M. K. Liu, J. P. Richardson, H. J. 1126 Buhring, J. Pleiss, R. Bulow, R. O. Williams and et al. (1989). "Procyclin gene 1127 expression and loss of the variant surface glycoprotein during differentiation of 1128 Trypanosoma brucei." J Cell Biol 108(2): 737-746.

1129 Rotureau, B., I. Subota, J. Buisson and P. Bastin (2012). "A new asymmetric division 1130 contributes to the continuous production of infective trypanosomes in the tsetse fly." 1131 Development 139(10): 1842-1850.

1132 Rotureau, B. and J. Van Den Abbeele (2013). "Through the dark continent: African 1133 trypanosome development in the tsetse fly." Front Cell Infect Microbiol 3: 53.

1134 Rueden, C. T., J. Schindelin, M. C. Hiner, B. E. DeZonia, A. E. Walter, E. T. Arena and 1135 K. W. Eliceiri (2017). "ImageJ2: ImageJ for the next generation of scientific image 1136 data." BMC Bioinformatics 18(1): 529.

1137 Savage, A. F., N. G. Kolev, J. B. Franklin, A. Vigneron, S. Aksoy and C. Tschudi (2016). 1138 "Transcriptome Profiling of Trypanosoma brucei Development in the Tsetse Fly Vector 1139 Glossina morsitans." PLoS One 11(12): e0168877.

1140 Schindelin, J., I. Arganda-Carreras, E. Frise, V. Kaynig, M. Longair, T. Pietzsch, S. 1141 Preibisch, C. Rueden, S. Saalfeld, B. Schmid, J. Y. Tinevez, D. J. White, V. 1142 Hartenstein, K. Eliceiri, P. Tomancak and A. Cardona (2012). "Fiji: an open-source 1143 platform for biological-image analysis." Nat Methods 9(7): 676-682.

1144 Senabouth, A., S. Andersen, Q. Shi, L. Shi, F. Jiang, W. Zhang, K. Wing, M. 1145 Daniszewski, S. W. Lukowski, S. S. C. Hung, Q. Nguyen, L. Fink, A. Beckhouse, A. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1146 Pébay, A. W. Hewitt and J. E. Powell (2020). "Comparative performance of the BGI 1147 and Illumina sequencing technology for single-cell RNA-sequencing." NAR Genomics 1148 and Bioinformatics 2(2).

1149 Shameer, S., F. J. Logan-Klumpler, F. Vinson, L. Cottret, B. Merlet, F. Achcar, M. 1150 Boshart, M. Berriman, R. Breitling, F. Bringaud, P. Bütikofer, A. M. Cattanach, B. 1151 Bannerman-Chukualim, D. J. Creek, K. Crouch, H. P. de Koning, H. Denise, C. 1152 Ebikeme, A. H. Fairlamb, M. A. Ferguson, M. L. Ginger, C. Hertz-Fowler, E. J. 1153 Kerkhoven, P. Mäser, P. A. Michels, A. Nayak, D. W. Nes, D. P. Nolan, C. Olsen, F. 1154 Silva-Franco, T. K. Smith, M. C. Taylor, A. G. Tielens, M. D. Urbaniak, J. J. van 1155 Hellemond, I. M. Vincent, S. R. Wilkinson, S. Wyllie, F. R. Opperdoes, M. P. Barrett 1156 and F. Jourdan (2015). "TrypanoCyc: a community-led biochemical pathways 1157 database for Trypanosoma brucei." Nucleic Acids Res 43(Database issue): D637-644.

1158 Sharma, R., L. Peacock, E. Gluenz, K. Gull, W. Gibson and M. Carrington (2008). 1159 "Asymmetric cell division as a route to reduction in cell length and change in cell 1160 morphology in trypanosomes." Protist 159(1): 137-151.

1161 Siegel, T. N., D. R. Hekstra, X. Wang, S. Dewell and G. A. Cross (2010). "Genome- 1162 wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and 1163 identification of splicing and polyadenylation sites." Nucleic Acids Res 38(15): 4946- 1164 4957.

1165 Smith, T., A. Heger and I. Sudbery (2017). "UMI-tools: modeling sequencing errors in 1166 Unique Molecular Identifiers to improve quantification accuracy." Genome Res 27(3): 1167 491-499.

1168 Smith, T. K., F. Bringaud, D. P. Nolan and L. M. Figueiredo (2017). "Metabolic 1169 reprogramming during the Trypanosoma brucei life cycle." F1000Res 6.

1170 Stuart, T., A. Butler, P. Hoffman, C. Hafemeister, E. Papalexi, W. M. Mauck, 3rd, Y. 1171 Hao, M. Stoeckius, P. Smibert and R. Satija (2019). "Comprehensive Integration of 1172 Single-Cell Data." Cell 177(7): 1888-1902 e1821.

1173 Subota, I., B. Rotureau, T. Blisnick, S. Ngwabyt, M. Durand-Dubief, M. Engstler and P. 1174 Bastin (2011). "ALBA proteins are stage regulated during trypanosome development 1175 in the tsetse fly and participate in differentiation." Mol Biol Cell 22(22): 4205-4219.

1176 Tetley, L., C. M. Turner, J. D. Barry, J. S. Crowe and K. Vickerman (1987). "Onset of 1177 expression of the variant surface glycoproteins of Trypanosoma brucei in the tsetse fly 1178 studied using immunoelectron microscopy." Journal of Cell Science 87(2): 363-372. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1179 Tetley, L. and K. Vickerman (1985). "Differentiation in Trypanosoma brucei: host- 1180 parasite cell junctions and their persistence during acquisition of the variable antigen 1181 coat." J Cell Sci 74: 1-19.

1182 Trenaman, A., L. Glover, S. Hutchinson and D. Horn (2019). "A post-transcriptional 1183 respiratome regulon in trypanosomes." Nucleic Acids Res 47(13): 7063-7077.

1184 Urwyler, S., E. Studer, C. K. Renggli and I. Roditi (2007). "A family of stage-specific 1185 alanine-rich proteins on the surface of epimastigote forms of Trypanosoma brucei." 1186 Mol Microbiol 63(1): 218-228.

1187 Van Den Abbeele, J., Y. Claes, D. van Bockstaele, D. Le Ray and M. Coosemans 1188 (1999). "Trypanosoma brucei spp. development in the tsetse fly: characterization of 1189 the post-mesocyclic stages in the foregut and proboscis." Parasitology 118 ( Pt 5): 1190 469-478.

1191 Vigneron, A., M. B. O'Neill, B. L. Weiss, A. F. Savage, O. C. Campbell, S. Kamhawi, 1192 J. G. Valenzuela and S. Aksoy (2020). "Single-cell RNA sequencing of Trypanosoma 1193 brucei from tsetse salivary glands unveils metacyclogenesis and identifies potential 1194 transmission blocking antigens." Proc Natl Acad Sci U S A 117(5): 2613-2621.

1195 Yang, X., L. M. Figueiredo, A. Espinal, E. Okubo and B. Li (2009). "RAP1 is essential 1196 for silencing telomeric variant surface glycoprotein genes in Trypanosoma brucei." Cell 1197 137(1): 99-109.

1198 Yates, A. D., P. Achuthan, W. Akanni, J. Allen, J. Allen, J. Alvarez-Jarreta, M. R. 1199 Amode, I. M. Armean, A. G. Azov, R. Bennett, J. Bhai, K. Billis, S. Boddu, J. C. 1200 Marugan, C. Cummins, C. Davidson, K. Dodiya, R. Fatima, A. Gall, C. G. Giron, L. Gil, 1201 T. Grego, L. Haggerty, E. Haskell, T. Hourlier, O. G. Izuogu, S. H. Janacek, T. 1202 Juettemann, M. Kay, I. Lavidas, T. Le, D. Lemos, J. G. Martinez, T. Maurel, M. 1203 McDowall, A. McMahon, S. Mohanan, B. Moore, M. Nuhn, D. N. Oheh, A. Parker, A. 1204 Parton, M. Patricio, M. P. Sakthivel, A. I. Abdul Salam, B. M. Schmitt, H. Schuilenburg, 1205 D. Sheppard, M. Sycheva, M. Szuba, K. Taylor, A. Thormann, G. Threadgold, A. Vullo, 1206 B. Walts, A. Winterbottom, A. Zadissa, M. Chakiachvili, B. Flint, A. Frankish, S. E. Hunt, 1207 I. I. G, M. Kostadima, N. Langridge, J. E. Loveland, F. J. Martin, J. Morales, J. M. 1208 Mudge, M. Muffato, E. Perry, M. Ruffier, S. J. Trevanion, F. Cunningham, K. L. Howe, 1209 D. R. Zerbino and P. Flicek (2020). "Ensembl 2020." Nucleic Acids Res 48(D1): D682- 1210 D688.

1211 Zhang, X., T. Li, F. Liu, Y. Chen, J. Yao, Z. Li, Y. Huang and J. Wang (2019). 1212 "Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq 1213 Systems." Mol Cell 73(1): 130-142 e135. bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1214 Zilionis, R., J. Nainys, A. Veres, V. Savova, D. Zemmour, A. M. Klein and L. Mazutis 1215 (2017). "Single-cell barcoding and sequencing using droplet microfluidics." Nat Protoc 1216 12(1): 44-73.

1217 Zomerdijk, J. C., R. Kieft and P. Borst (1991). "Efficient production of functional mRNA 1218 mediated by RNA polymerase I in Trypanosoma brucei." Nature 353(6346): 772-775. 1219 Figure 1 A RT Lysis Mix Oil

Cells

Barcoding Oil bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (whichHydrogels was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

B Lib1 Lib2 Lib3 Lib4 1000

0 100 80 RPKM genes 60 40 20 0 5´ 3´ 5´ 3´ 5´ 3´ 5´ 3´ mRNA

C D UMI Genes 20,000 6,000 5 15,000

0 4,000 10,000 UMAP_2

-5 5,000 2,000 BSF PCF Ramos -10 0 -5 0 5 10 15 UMAP_1 BSF PCF BSF PCF Ramos Ramos Identity Identity

E EP1-Procyclin VSG-2 IGLC3

5 5 5 4 0 3 0 3 0 2.0 1.5 UMAP_2 UMAP_2 UMAP_2 2 2 -5 1 -5 1 -5 1.0

-10 -10 -10 -5 0 5 10 15 -5 0 5 10 15 -5 0 5 10 15 UMAP_1 UMAP_1 UMAP_1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

0

Figure 1-figure supplement 1

Host: Mammalian host Tsetse vector Compartment: Midgut Cardia Salivary Glands

Posterior

Morphology:

Anterior SL ST PCF Epi AE ET pMT MT

Surface coat: VSG Procyclins BARP VSG bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1- figure supplement 2 A ALBA3::YFP B mCherry::DHH1 DNA 100 80 Untreated 60 40 % Motile 20 0

Untreated SDM-79 + 0ºC 2 h PBS + 0ºC 2 h

C

1000

PBS + 27ºC 2 h Trypanosome 100 100

75

50 # Cells 25

0 10 Human

PBS + 0ºC 2 h 1 0 25 50 75 100 % UMI Trypanosome bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2

10 Lib1 Lib2 A B 10 Midgut Pre-metacyclic 5 5 Metacyclic 0 0 Gametes UMAP_2 UMAP_2 -5 -5 Attached Epimastigote -10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 UMAP_1 UMAP_1

C EP1 Procyclin BARP VSG-393 10 10 10 5 5 5 4 2.5 4 0 3 0 2.0 0 3 UMAP_2 2 UMAP_2 1.5 UMAP_2 2 -5 1 -5 1.0 -5 1

-10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 UMAP_1 UMAP_1 UMAP_1

RBP6 Calflagin HAP2 10 10 10 5 2.5 5 5 1.75 2.5 2.0 0 0 1.50 0 2.0

UMAP_2 1.5 UMAP_2 1.25 UMAP_2 1.5 -5 1.0 -5 1.00 -5 1.0

-10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 UMAP_1 UMAP_1 UMAP_1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 - figure supplement 1 28 days Infect Dissect

Total Died # Dissected # infected % infected Total cells recovered (pooled salivary (midgut) (midgut) glands) 100 28 72 16 22 46,000

Manually break salivary glands 70 µm cell strainer Wash and count inDrop Pipeline bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 - figure supplement 2 A

300 250 200 150 IVT 100 50 0 25 500 4,000 Library 1 1200 1000 800 600 Fluorescence Units Sequenced 400 200 library 0 25 50 100 200 300 400 500 700 1000 1500

30

20 IVT 10

0 Library 2 25 500 4,000

300 Sequenced 200 library Fluorescence Units 100

0 35 50 100 150 200 300 400 500 600 700 1000 2000 3000 7000 10380

B UMI/mRNA Genes Library #Cells Mean Median Mean Median Library1 1129 793 677 567 514 Library2 1150 1122 952 729 678 Combined 2279 959 795 649 585

C 1200

1000

800 Library1 600 Library2 Median UMI 400

200

0 0 5,000 10,000 15,000 20,000 25,000 30,000 Median aligned reads per cell bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 - figure supplement 3 A B

1.00

1000

0.75 Trypanosome VSG 100 VSG-393 100 VSG-4862 75 VSG-4564 50 0.50 VSG-1654 # Cells Frequency VSG-4959 25 VSG-385 0 VSG-1401 10 VSG-1476 Human

0.25

1

0.00 0 25 50 75 100 % UMI Trypanosome 0 1 2 Number of UMI per cell C 0.25

0.20

0.15

0.10 Frequency

0.05

0.00

0 5 10 15 Levenshtein distance Figure 3

Glucose A HK1 B 1.0 C AE GMG Pre-M M G-6-P F-6-P PGI HK1 PFK 0.5 PGI DHA-P F-1,6-BP Z-Score TIM ALD PFK GA-3-P 0.0 ALD GAPDH TIM 1,3-BP-GA -0.5 PGKC / GAPDH PGKA 3-P-GA PGKC PGAM AE G Pre-M M PGKA Expression 2-P-GA 1.5 TCA ENO Glycolysis PGAM 1.0 Average Average 0.5 P-EP ENO HK1 CS 0.0 PYK1 -0.5 PGI ACO PYK1 -1.0 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049pyruvate ; this version posted March 1, 2021. The copyright holderPFK for this preprint IDH -1.5 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. CS ALD SCSα gMDH acetyl-CoA FHc ACO mMDH oxalo- TIM acetate CS GAPDH gMDH IDH (S)-malate citrate PGKC mMDH FHc ACO SCSα fumarate cis-aconitate PGKA ACO PGAM FHc succinate D-threo-isocitrate ENO gMDH SCSα IDH 2-oxoglutarate PYK1 succinyl-CoA mMDH bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3 supplement 1

Tb927.10.2010 Tb927.1.3830 Tb927.3.3270 Tb927.10.5620 Tb927.11.5520 Tb927.6.4280 GeneID Enzyme 1.0 Tb927.10.2010 HK1 0.5 Tb927.1.3830 PGI 0.0 Tb927.3.3270 PFK

-0.5 Tb927.10.5620 ALD Tb927.11.5520 TIM Tb927.1.700 Tb927.1.720 Tb927.10.7930 Tb927.10.2890 Tb927.10.14140 Tb927.10.13430 Tb927.6.4280 GAPDH 1.0 Tb927.1.700 PGKC 0.5 Tb927.1.720 PGKA

value 0.0 Tb927.10.7930 PGAM

-0.5 Tb927.10.2890 ENO Tb927.10.14140 PYK1 Tb927.10.14000 Tb927.11.900 Tb927.3.2230 Tb927.3.4500 Tb927.10.15410 Tb927.10.2560 Tb927.10.13430 CS 1.0 Tb927.10.14000 ACO 0.5 Tb927.11.900 IDH 0.0 Tb927.3.2230 SCSα

-0.5 Tb927.3.4500 FHc Tb927.10.15410 gMDH Tb927.10.2560 mMDH Gametes Gametes Gametes Gametes Gametes Gametes Metacyclics Metacyclics Metacyclics Metacyclics Metacyclics Metacyclics Pre-metacyclic Pre-metacyclic Pre-metacyclic Pre-metacyclic Pre-metacyclic Pre-metacyclic Attached Epimastigote Attached Epimastigote Attached Epimastigote Attached Epimastigote Attached Epimastigote Attached Epimastigote Cluster Figure 4 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A BARP HAP2 B Histone H1 -2 -2 -2

-4 -4 -4 2.5 2.5 2.5 2.0 2.0 2.0

UMAP_2 -6 UMAP_2 UMAP_2 -6 1.5 -6 1.5 1.5 1.0 1.0 1.0 -8 -8 -8

8 1210 8 1210 8 1210 UMAP_1 UMAP_1 UMAP_1

C Histone H2A Histone H2B 10 -2 -2

-4 -4 3.5 3.0 3.0 2.5 2.5 2.0 2.0

UMAP_2 -6 UMAP_2 -6 5 1.5 1.5 1.0 1.0 -8 -8 HAP2-S BARP-S 8 1210 8 1210 Midgut forms UMAP_1 UMAP_1 0 Attached Epimastigote Histone H3 Histone H4

UMAP_2 Gametes -2 -2 Pre-metacyclic Metacyclics -4 -4 3.0 3.0 2.5 2.5 -5 2.0 2.0 UMAP_2 UMAP_2 -6 1.5 -6 1.5 1.0 1.0 -8 -8

8 1210 8 1210 UMAP_1 UMAP_1 -10 -5 0 5 10 UMAP_1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 A B 25 100 VSG-393 VSG-4862 VSG-4564 20 75 VSG-1654 VSG-4959 VSG-385 15 VSG-1401 50 VSG-1476 UMI Cells 10

25 5

0 0 246 Pre-metacyclic #VSGs C D 100 500

75 400

300 50 UMI Cells 200 25 100

0 0 Metacyclic 246 #VSGs bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 - figure supplement 1 Tbb1125VSG-393 Tbb1125VSG-4862 Tbb1125VSG-4564 100 Tbb1125VSG-1654 Tbb1125VSG-4959 75 Tbb1125VSG-385 Tbb1125VSG-1401 Tbb1125VSG-1476 50 UMI

25

0 Attached Epimastigote

100

75

50 UMI

25

0 Pre-metacyclic

100

75

50 UMI

25

0 Metacyclic bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 - figure supplement 2

1e+05

Cluster

Midgut forms Attached Epimastigote Gametes 1e+04 Pre-metacyclic Metacyclics # Reads per barcode

1e+03

0123 456 #VSG genes per barcode bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 supplement 3

A B VSG - IlTat1.22 BARP Epimastigote 6 6 Pre-metacyclic Metacyclic 4 4 4 2 4 2 3 3 2 UMAP_2 0 2 UMAP_2 0 1 1 2 -2 -2

-4 -4 -10 -5 0 5 10 -10 -5 0 5 10 UMAP_1 UMAP_1 UMAP_2 0 HAP2 EP1 - Procyclin 6 6

4 4

-2 2 2 2.0 4 3 1.5 UMAP_2 0 UMAP_2 0 2 1.0 1 -2 -2 -4 -4 -4 -10 -5 0 5 10 -10 -5 0 5 10 UMAP_1 UMAP_1 UMAP_1

C 5 30 4 20 3 UMI 2 Cells ILTat1.61 10 ILTat1.63 1

IlTat1.22 0 Pre-meyacyclic 0 IlTat1.64 Pre-metacyclic 246 8 TRINITY-DN1307-c0-g1-i9 VSGs TRINITY-DN1574-c0-g1-i1 120 TRINITY-DN570-c0-g2-i1 200 TRINITY-DN8007-c0-g1-i1 90 150 60 100 UM 30 50

0 Metacyclic Cells 0 Metacyclic 246 8 VSGs bioRxiv preprint doi: https://doi.org/10.1101/2021.03.01.433049; this version posted March 1, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6

A VSG VSG DNA 397 531 αTub Phase i

VSG+ ii

iii VSG++

iv

VSG+++ v

B VSG+ VSG++ VSG+++ 1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25

VSG proportion of total 0.00 0.00

1000 1000 100 100 10 10

Total # VSG 1 1 Est. # VSG 0.1 0.1 Cell Cell Cell A Figure 7 bioRxiv preprint Transcription

level (which wasnotcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. Pre-metacyclic VSG doi: https://doi.org/10.1101/2021.03.01.433049 Threshold B ; this versionpostedMarch1,2021.

Transcription level Metacyclic VSG The copyrightholderforthispreprint Threshold