Molecular & Biochemical Parasitology 195 (2014) 59–73
Contents lists available at ScienceDirect
Molecular & Biochemical Parasitology
Capturing the variant surface glycoprotein repertoire (the VSGnome)
ଝ
of Trypanosoma brucei Lister 427
a,∗ a,1 b
George A.M. Cross , Hee-Sook Kim , Bill Wickstead
a
Laboratory of Molecular Parasitology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
b
Medical School, Queen’s Medical Centre, University of Nottingham, Nottingham NG7 2UH, UK
a
r t a b
i s
c l e i n f o t r a c t
Article history: Trypanosoma brucei evades the adaptive immune response through the expression of antigenically dis-
Received 7 March 2014
tinct Variant Surface Glycoprotein (VSG) coats. To understand the progression and mechanisms of VSG
Received in revised form 19 June 2014
switching, and to identify the VSGs expressed in populations of trypanosomes, it is desirable to prede-
Accepted 23 June 2014
termine the available repertoire of VSG genes (the ‘VSGnome’). To date, the catalog of VSG genes present
Available online 30 June 2014
in any strain is far from complete and the majority of current information regarding VSGs is derived
from the TREU927 strain that is not commonly used as an experimental model. We have assembled,
Keywords:
annotated and analyzed 2563 distinct and previously unsequenced genes encoding complete and partial
Antigenic variation
VSGs of the widely used Lister 427 strain of T. brucei. Around 80% of the VSGnome consists of incomplete
Variant surface glycoprotein
genes or pseudogenes. Read-depth analysis demonstrated that most VSGs exist as single copies, but 360
Codon-usage bias
Minichromosome exist as two or more indistinguishable copies. The assembled regions include five functional metacyclic
∼
Expression site VSG expression sites. One third of minichromosome sub-telomeres contain a VSG (64–67 VSGs on 96
Metacyclic expression site minichromosomes), of which 85% appear to be functionally competent. The minichromosomal reper-
toire is very dynamic, differing among clones of the same strain. Few VSGs are unique along their entire
length: frequent recombination events are likely to have shaped (and to continue to shape) the reper-
toire. In spite of their low sequence conservation and short window of expression, VSGs show evidence
of purifying selection, with ∼40% of non-synonymous mutations being removed from the population.
VSGs show a strong codon-usage bias that is distinct from that of any other group of trypanosome genes.
VSG sequences are generally very divergent between Lister 427 and TREU927 strains of T. brucei, but
those that are highly similar are not found in ‘protected’ genomic environments, but may reflect genetic
exchange among populations.
© 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
1. Introduction
The existence of antigenic variation in Trypanosoma brucei has
been known for over 100 years [1], but the extent of the Variant Sur-
Abbreviations: BAC, Bacterial Artificial Chromosome; BES, bloodstream-form face Glycoprotein (VSG) repertoire, the mechanisms by which the
VSG expression site; BF, bloodstream form of the T. brucei life cycle; CDS, coding
repertoire evolves, the mechanisms that determine the VSG tem-
sequence; GPI, glycosyl phosphatidyl inositol; HTS, high-throughput sequencing; IC,
porally expressed by an individual cell, and the range of sequence
intermediate-sized chromosome; MBC, megabase chromosome; MES, metacyclic-
similarities among VSGs within and between strains and isolates,
form VSG expression site; MC, minichromosome; ORF, open reading frame; PF,
procyclic (tsetse midgut) form of the T. brucei life cycle; PFGE, pulsed-field gel are not fully understood [2–4]. This is despite the importance of
electrophoresis; VSG, variant surface glycoprotein.
these parameters to understanding antigenic variation and the
ଝ
All VSGs identified in this study are available in GenBank under accession num-
many studies that have pursued them as aims. Perhaps the sim-
bers KC611295–KC613858 and KC669596–KC669620. A BLAST server is available
plest but essential question to ask to achieve these aims concerns
at http://tryps.rockefeller.edu and FASTA-formatted files can be downloaded from
the repertoire of VSG genes, their diversity and evolution. this server.
∗
Corresponding author. Tel.: +1 212 327 7571. From the earliest work, there were indications of an extensive
E-mail addresses: [email protected] (G.A.M. Cross),
VSG repertoire. Analysis of cosmid clones of nuclear DNA from Lister
[email protected] (H.-S. Kim), [email protected]
427 showed that ∼9% hybridized with a conserved region flank-
(B. Wickstead).
1 ing known VSGs [5]. Several of these clones contained multiple
Present address: Laboratory of Lymphocyte Biology, The Rockefeller University,
1230 York Avenue, New York, NY 10065, USA. putative VSGs, suggesting that VSGs might be highly clustered. T.
http://dx.doi.org/10.1016/j.molbiopara.2014.06.004
0166-6851/© 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
60 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
brucei shows enormous variations in the size of diploid chromo- through Glossina by Wendy Gibson in 1992 [16], from whom we
some homologs within and between strains, and up to 30% variation obtained it.
in total genomic DNA among different strains [6,7]. Much of this
variation is attributable to subtelomeric VSG arrays. For exam- 2.2. DNA sequencing, assembly, annotation and analysis
ple, in the strain used for the T. brucei genome project, TREU927,
VSG arrays comprise >50% of the larger homolog of chromosome For the genome assembly, DNA was extracted from BF try-
I [8]. By comparison, in the widely used experimental strain Lis- panosomes using QIAamp DNA blood mini kit (from QIAGEN),
ter 427, the larger homolog of chromosome I is ∼4× larger than and submitted to local sequencing centers for library prepara-
the smaller homolog [7]. Haploid sub-telomeric regions are appar- tion (fragments sheared to mean length ∼500 bp) and sequencing
6
ently repositories for various repetitive gene families, including VSG with Illumina (>35 × 10 76-bp reads) or 454 Life Sciences systems
6
genes and VSG pseudogenes, but how these sequences are captured (1.2 × 10 350 bp median length reads). The sequence reads were
and diversified, and the contribution of these sub-telomeric loci assembled de novo on a MacPro computer with 32 GB RAM using
to the generation of novel functionally competent VSGs (including DNASTAR SeqMan NGen software v 2.2 with default parameters
‘mosaic’ VSGs [9]), are not fully clear, partly because their extent (after a great deal of experimenting with various parameter modi-
and complexity are not well defined. fications) in four iterative cycles (the software could accommodate
7
The T. brucei genome sequencing project [10] assembled and only 10 sequences per assembly). The first assembly iteration used
6
annotated the majority of the chromosome-internal portion of 10 Illumina and all the 454 reads; the second and third iterations
6
the genome (the eleven ‘megabase’ chromosomes; MBC) in the each added 12 × 10 Illumina reads. The 454 reads were added
TREU927 strain. This project identified 806 VSG genes, most of again in a fourth iteration, which substantially improved the results
which were pseudogenes, with only 57 encoding all features con- in terms of average contig length and the presence of the VSG-
sistent with functionality [10]. These VSGs are mostly located in associated 14-mer motif.
arrays flanking the cores of the chromosomes, and their assem- To isolate minichromosomal DNA, PF and BF cells were embed-
bly into larger contiguous sequences was limited by repetitive ded in low-melting point agarose and digested with proteinase K in
components of the VSG clusters. The majority of sub-telomeric N-lauroyl sarcosine solution as in [6]. MC DNA was separated from
sequences, the intermediate-sized chromosomes (ICs) and all of the rest of the genomic DNA by either pulsed-field gel electrophore-
the numerous minichromosomes (MCs) [11] were absent from the sis (1% SeaKem Gold agarose in TBE (90 mM Tris-borate, 0.2 mm
◦
assembled genome. A later, detailed analysis of an expanded 940 EDTA, pH 8.2) at 12 C, 2 V/cm with pulse time linearly ramped
VSG set from the TREU927 assemblies found only 43 functional 18–30 min over 80 h) followed by electroelution and ethanol pre-
VSGs, with a further 89 ‘atypical’ and 197 incomplete genes [12]. cipitation, or by rotating agarose gel electrophoresis [17] in TBE at
◦
More recent assemblies (http://tritrypdb.org/common/downloads/ 12 C at 150 V with linear pulse time 10–40 s for 8 h, at 120 V with
release-5.0/) contain 1274 annotated VSG-like sequences (1000 of linear pulse time 100–300 s for 2 h, and at 50 V with linear pulse
which are pseudogenic), encompassing fewer than 200 putative time 1200–2500 s for 85 h, followed by purification of MC DNA from
complete or almost-complete VSGs, and revisions are continuing a gel slice using the QIAEX II gel extraction kit (from QIAGEN).
(http://www.genedb.org/Homepage/Tbruceibrucei927). Publically available software and servers for gene and signal-
Owing to the diversity of VSGs among various strains and sequence prediction were used as follows: Glimmer [18] at (http://
isolates of T. brucei, it is necessary to characterize the repertoire www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer 3.cgi); Gen-
in the strain being studied. The most widely-studied strain by eMark [19] (http://exon.biology.gatech.edu/gmhmm2 prok.cgi);
far is Lister 427, which is amenable to laboratory propagation OrfPredictor [20] (http://proteomics.ysu.edu/tools/OrfPredictor.
and genetic manipulation and also possesses many of the criteria html); SignalP v3.0 [21,22] (http://www.cbs.dtu.dk/services/
considered to be representative of natural infections, including SignalP). The ORF and translation and coordinates were coordi-
the recent demonstration of its ability to complete the life cycle nated with the contig sequences in Microsoft Excel. Contig and
in vitro [13]. The propagation history and properties of Lister 427 VSG sequences, contig coordinates, annotations and analytical data
lines are discussed in more detail elsewhere (GAMC, manuscript were maintained in a FileMaker Pro database, which allowed the
in preparation; http://tryps.rockefeller.edu/DocumentsGlobal/ appropriate sense-orientated DNA coding sequences and flanks
lineage Lister427.pdf). The current study was undertaken to deter- to be computed, displayed, exported and imported conveniently.
mine as much of the VSGnome of the Lister 427 strain as possible, Many parameters of interest were identified computationally,
to facilitate the identification, by high-throughput sequencing of either using ad hoc Perl scripts or by calculations within the File-
RNA isolated from trypanosome populations, of VSGs expressed Maker Pro database.
in various experimental situations, and to characterize the MC For codon-aware pairwise alignments, protein-based DNA
component of the VSG repertoire, which was hitherto unknown. alignments were made with transAlign [23]. dN/dS ratios were cal-
culated from codon-aware alignments of the coding sequences of
Lister 427 VSGs with their best matches from TREU927. Nonsyn-
2. Materials and methods onymous and synonymous substitution rates were estimated using
KaKs Calculator [24] with the MYN method of model averaging.
2.1. Strains The values for VSG copy number were determined by mapping
100-bp Illumina reads from whole genomic or minichromosomal
T. brucei Lister 427 (information about the identity and geneal- DNA onto the assembled VSGnome, using Bowtie version 0.12.7.
ogy of this strain is available at http://tryps.rockefeller.edu/ [25], allowing two or three mismatches and counting the align-
DocumentsGlobal/lineage Lister427.pdf) bloodstream-form (BF) ments assigned to each VSG. To account for the overlap of VSG
and procyclic-form (PF) cells were grown under standard culture sequences, two approaches were tested to estimate counts. In the
conditions in HMI-9 or SDM-79 media [14,15]. DNA from PF of T. first approach, only reads that mapped uniquely to a VSG were
brucei TREU927/4 (genome strain) and T. b. gambiense Dal 972 was counted. By fragmenting the VSGnome and mapping unique frag-
included in some experiments. BF Lister 427–501 is a line that was ments onto itself, the deficit between the theoretical number of
adapted some years ago to culture from a mouse infected with an matches and the mapped unique matches was used to derive a
ancestral sample of strain 427 isolated as ‘variant 3 at the Lister correction factor that could be applied to the experimental data.
Institute in 1962 and frozen in 1964. Variant 3 was transmitted The alternative was to map all reads and randomly assign those
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 61
Table 1
that map to one or more. The two approaches, whose veracity was
Newly sequenced and annotated unique complete and partial VSGs. Few (if any)
verified by simulations, gave similar results (data not shown). The
VSGs are unique throughout their sequences: unique was operationally defined as
number of copies of each VSG was calculated as the number of reads
being distinguishable when aligning 50-bp HTS reads allowing 2 mismatches. Num-
per kilobase mapping to each VSG divided by the value for a pool bers exclude 24 previously sequenced VSGs. Four previously known VSGs did not
of known single-copy VSGs. Making a similar calculation but taking fully assemble here but 20 did. An additional 45 would be classified as function-
ally complete except for having a non-canonical GPI site (not Aspartate, Serine or
the ratio of VSG hits to hits for a pools of single-copy chromosomal
Asparagine).
housekeeping genes gave similar results, but this method could not
be used for VSGs from purified MC. Similarity among VSG proteins Category Number Comments
was assessed using BLAST [26] version 2.2.18 (October 2008). ≥
Total 150 AAs 2563
Total ≥ 250 AAs 2447
Appear functionally complete 325 436–582 AAs
Extended or almost complete 211 Lack a few AAs at N- or
3. Results
C-terminus or slightly
extended at C-terminus
≥
3.1. Assembling the genome and identifying VSGs N-terminal partial 250 AAs 918 250–561 AAs
≥
C-terminal partial 250 AAs 228 250–575 AAs
Internal partial ≥ 250 AAs 798 250–578 AAs
To identify the VSGnome in a parasite of great experimen-
tal relevance, a baseline de-novo DNA assembly was performed
using DNA from a recently cloned sample of bloodstream-form
(BF) Lister 427. Total genomic DNA from this clone was sequenced
that contain frameshifts, but inspection of a large number of these
using Illumina and 454 Life Sciences high-throughput sequenc-
contigs suggests that is true for only a few of the partial sequences.
ing systems to a total coverage of ∼60-fold (based on a haploid
The association of VSGs with RHS/INGI/RIME [29,30] sequences
genome of ∼38 Mbp [6,27]). De-novo assembly yielded 14,821 con-
was not investigated in a systematic manner, but was an inconve-
tigs ranging from 150 to 83,202 bp (N 13 kbp). Contigs were
50 nience when sometimes these sequences were fused in frame to the
aligned against a local database of all available VSG sequences,
originally computed CDS. Such fusions were edited out manually
using the BLASTn algorithm [28]. 4265 contigs (>278 bp) contained
(they will now show up in the upstream or downstream regions).
−10
VSG-like sequence elements with e-values better than 10 . Glim-
In some cases, annotated VSGs from the TREU927 strain, which
mer and GeneMark were used to identify ∼16,000 open reading
were used for VSG motif identification and translation verification
frames (ORFs) > 149 bp in these 4265 contigs. Based on BLASTp
using BLAST, are also fused to RHS sequences. A few VSG sequences
−12
alignments (e-values better than 10 ) with all previously known
were also initially fused in-frame to the 177-bp repeat sequence
VSGs, GeneMark identified almost four times as many VSG-like
characteristic of MCs. As expected for a highly arrayed gene family,
ORFs as Glimmer, which only identified 11 subsequently annotated
many contigs contained multiple (up to 16) complete or partial
VSGs in addition to those identified by GeneMark. The BLASTp
VSGs. Fifty-three contigs (11–83 kbp) contained at least 10 VSG
alignments showed an enormous range of similarity with previ-
ORFs. Out of 492 ‘complete’ sequences that are not contained in BF
ously sequenced VSGs from other strains (predominantly TREU927)
VSG ‘Expression Sites’ (BES) or on MCs, 321 were contained in con-
and subspecies. A few were identical, surprisingly, but some were
tigs > 6 kbp. Some of these contigs contain several complete VSGs,
not obviously VSG-related by visual inspection. A BLASTp search
but they are interspersed with incomplete VSGs and very short
against all proteins in the TREU927 genome database confirmed
ORFs containing VSG motifs. The arrangement of VSGs within the
that all of the newly annotated VSG sequences aligned only to pro-
longer contigs showed no obvious pattern, and was not investigated
teins previously annotated as VSGs, VSG pseudogenes, or atypical further.
VSGs.
After coordinating the assembly and analysis information, all
3.2. VSGnome coverage
VSG ORFs >149 amino acids were manually reviewed, and putative
N-terminal signal peptides and C-terminal glycosyl phosphatidyl
Twenty of the 26 (77%) previously sequenced Lister 427 VSGs
inositol- (GPI) anchor signal sequences were identified by SignalP
were present as complete sequences in our de novo assembly. It is
[21,22] or by manual inspection (easy for VSGs because of the
likely that the remaining VSGs (VSGs 4, 6, 7–9 and 13; previously
extreme conservation of the GPI signal), respectively. The results
known as 117, 121 and MITat 1.7–1.9 and 1.13 respectively) failed
of the annotation were 2563 previously unsequenced VSG coding
to assemble to completion due to regions of identical sequence
sequences ≥450 bp, including 2447 genes encoding at least half the
shared with other VSGs, although there could be differences in
length of an average-length VSG. The median length of the con-
the completeness of some VSGs among different lines of the same
tigs containing these annotated VSGs was 5100 bp. The sequences
strain. In agreement with this interpretation, some of these VSGs
were classified as shown in Table 1, and unique sequential num-
(4 and 6, for example) are known from Southern blots [31] and
bers were assigned to each new complete or partial VSG (to avoid
sequencing [32–34], to be members of families containing regions
confusion, numbers that had been assigned previously, formally or
of closely related sequences. Similarity data for six VSG 4 (117)
informally, for VSGs of Lister 427, were skipped). Of these genes,
family members is shown in Table 2. The underlying nucleotide
325 (13%) encode what appear likely to be fully functional VSGs,
sequences of this family are highly similar [33,34]. Overall, there-
meaning that they have N-terminal and GPI signal sequence predic-
fore, we estimate that Lister 427 contains at least ∼2700 unique
tions and cysteine residues consistent with established groupings.
complete and partial VSGs (excluding exact copies: see below), but
Another 211 are complete except for an atypical GPI-anchor site or
the number of complete and presumptively functionally competent
a minor C-terminal extension, or lack a few amino acids at the N-
VSGs we identified is probably underestimated by as much as ∼20%.
or C-terminus. The remaining ORFs were classified as ‘partial’. N-
The precise total number of VSGs is not so important, as it differs
terminal and C-terminal partial sequences were designated as such
among clones and strains (see below). However, both the estimated
on the basis of a clear signal peptide for endoplasmic-reticulum
total number of VSGs (∼2700) and also the proportion that encode
targeting or GPI signal sequence, respectively. Those sequences
likely functional proteins (13–20%) are considerably higher when
designated ‘internal partial’ had no obvious secretory or GPI signal
assessing the Lister 427 VSGnome as a whole than estimates based
sequences. Some of the sequences annotated as partial may have
on TREU927 chromosome-internal copies [10].
been rendered as such because they are part of longer sequences
62 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
Table 2
Assembly of members of one family of highly similar VSGs. Eight highly similar
members of the VSG 4 (117) family were sequenced previously [33,34]. This table
shows the % amino acid identity of five independently assembled sequences from
the current study with various VSG 4 family members in the GenBank database.
VSG Assembled Annotation Prior GenBank % Identity
length entry
4 526 Incomplete in L34415 100
this study
342 528 Complete L31602/L31608 99/98
1506 381 N-terminal L31603 99
1927 245 N-terminal L31606/L31607 96/95
3102 416 Internal L31604 97
3739 273 Internal L31605 98
3.3. Many VSGs are present in multiple copies
A factor in calculating the VSG repertoire size that has not pre-
viously been evaluated is the presence of exact duplicate copies,
either due to allelic amplification, the presence of VSGs within
diploid regions of the chromosomes, and the uncertain ploidy of
MCs. This information often cannot easily be deduced from the
assembly data, but is highly amenable to estimation from read
depth at high coverage. To do this, short-sequence reads were
mapped onto all assembled VSGs and the normalized number of
reads per kilobase was calculated, after correcting for reads that do
not map uniquely to one VSG. About 2000 assembled VSGs exist as
single copies, 300 as duplicates and 60 have more than 2 copies that
are not distinguishable at the level of short-read sequence mapping
(Fig. 1A). A similar trend was observed for VSGs annotated Complete
(Fig. 1B).
Lister 427 is the major experimental strain for trypanosome
research and has been distributed to many laboratories over sev-
eral decades and propagated separately as BF and PF lines. Since the
VSGnome is likely to be one of the most rapidly evolving parts of
the genome, this raises the possibility that the VSGnomes of Lister
Fig. 1. VSG copies. All partial and complete VSGs (A) and only those that are judged
427 cell lines propagated in different laboratories, and as different
to be functionally complete (B) are grouped according to their average number of
life-cycle forms, may contain a different VSG repertoire. To investi- copies in four samples (a PF from BW lab, two samples of SM BF from Rockefeller
labs (SM HK and SM GHM) and a BF telomerase long-term deletion line (TERT KO).
gate these possibilities, we first compared our reference VSGnome
(from a BF ‘single marker’ (SM) line [35]) to an insect procyclic form
(PF) line derived from a common ancestral BF line at least 35 years
highly conserved motifs (AGTGTTGTGAGTGTG and TATAATAA-
ago (Fig. 2). Comparison of the VSGnome copy number between
GAGCAGTAAT, one or both of which are present in 83% of the 70-bp
these BF and PF lines shows evidence of both gene loss and dupli-
set from which these consensus motifs were extracted) coupled
cation events in PF relative to the BF assembly. Similar gene loss and
to varying numbers of tandem TAA triplets [38,39]. Although a
duplication was also seen comparing our reference to clone 501, a
probe representing the 70-bp repeat hybridized with about 9% of
BF culture derived from ‘variant 3 , a 50-year-old cryopreserved
the clones in a cosmid library [5], 70-bp repeats were not found
sample of Lister 427 [16] that is progenitor of both lines shown in
upstream of all VSGs [37]. In the current assembly, 790 instances
Fig. 2 (Supplementary Fig. 1). From these data, we can deduce that
of the 3 14mer were found in 724 contigs, of which 156 were
the VSGnome is evolving in axenic cultures of both BF and PF cells
within 100 bp of the end of a VSG CDS, as would be expected if
and that Lister 427 cell lines from different laboratories are likely
forming part of the UTR. Among all the annotated VSGs in the
to differ at many VSG loci. Some of the lost VSGs were on MCs (see
current study, one or both of the 70-bp repeat component motifs
below), but many were not. VSG 2 (widely known as 221), which is
were present in 697 contigs (54%) that contained 1288 annotated
expressed in most cultured lines of Lister 427 BF (and was the VSG
VSGs. In contrast, the 70-bp motifs were only found in 165/10,558
expressed when Lister 427 BF were first cultured in vitro [36]), was −10
contigs (1.6%) that lacked identified (e ) VSG motifs, confirming
lost when adapting ‘variant 3 to culture, numbered as clone 501.
that the search motifs were largely specific to VSGs. The simi-
Comparison between lines of BF Lister 427 from two closely
larity of these sequences probably hindered their assembly into
linked laboratories illustrates that VSGs can also be readily lost
specific contigs, and prevented the assembly of many longer VSG-
and duplicated during routine culture of BF cells (Supplementary
containing contigs, and variation in their sequence and the number
Fig. 2), which is hardly surprising considering the opportunities for
of copies associated with different VSGs probably underestimates
recombination that VSGs and their flanking sequences provide.
their overall number. The consensus motifs were derived from a few
sequenced expression-site VSG copies and it is possible that there
3.4. VSG-associated motifs is wider variation among these sequences in non-BES VSG copies,
or that they are not associated with all VSG sequences, as previ-
Early studies revealed that many VSGs are flanked by an ously noted [37]. The identified motifs in our contigs were often
invariant 14-bp (GATATATTTTAACA) motif in the 3 UTR, and an far from the annotated VSGs. Another study [12], using data from
upstream heterogeneous ∼70-bp repeat [5,37], containing two conventional (long-read) Sanger sequencing, found that 92% (687)
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 63
Fig. 2. Fluctuation in the copy number of individual VSGs in long-separated PF and BF lines. Data are shown for 328 complete newly assembled VSG genes that are anticipated
to encode functional VSG proteins.
of VSGs were flanked upstream by at least one (74%) 70-bp repeat, part of the CDS. Even when considering local alignments, however,
and noted that the repeats were more degenerate than previously only 5% (146/2613) have regions that are >98% identical over 50 bp
recognized. (Supplementary Fig. 3). However, a small number are very similar;
Telomere repeats would be expected downstream of some VSG 20 of 2613 Lister 427 VSGs are >98% identical to ORFs in TREU927
genes, but would be linked to them during the assembly only if the across their entire length, 3 of which are 100% identical. Near iden-
intervening sequence is of sufficient complexity to allow assem- tical sequences are also seen between Lister 427 and other T. brucei
bly. However, due to the low sequence complexity in subtelomeric strains, including in T. brucei gambiense Dal927 (12 sequences >98%
DNA, telomere repeats were absent from many VSG-containing identical to the 68 VSGs in the v4.2 assemblies), and between 427
contigs that were expected to be subtelomeric; for example, those and T. evansi sequences (4 > 98% identical).
that were known to be in BES, MES or on MCs. Altogether, only 68 By mapping the positions of these near-identical sequences
links were made between a VSG and telomeric repeats, but 42 of onto the TREU927 genome assemblies, it can be seen that they
these links were to a complete VSG. are not clustered in ‘protected’ locations in the genome, but are
found throughout the VSG arrays (Supplementary Fig. 4). There is
no correlation between conservation and distance from assembled
3.5. Similarities among 427 and non-427 VSGs
chromosome ends (Supplementary Fig. 5; Spearman’s rank correla-
tion rho = 0.054, p = 0.2). If there were considerable recombination
To analyze similarities, BLAST was used to compare 559 full-
among silent VSGs, then these very highly similar sequences could
length VSGs (and the proteins they encode) to themselves and
represent sequences that by chance have not recombined. How-
to 959 non-Lister-427 sequences publically available in VSGdb
ever, non-coding sequences alignable between Lister 427, TREU927
(http://leishman.cent.gla.ac.uk/pward001/vsgdb/kook index.html)
and Dal927 show ∼1 SNP per 100 bp (excluding indels). Assuming
[40] in June 2012 and TREU927 chromosome assemblies
that this proportion of SNPs is a reasonable estimate for divergence
(http://tritrypdb.org/tritrypdb version 5.0). When comparing
of VSG sequences, the probability of an individual orthologous pair
−
559 essentially full-length Lister 427 VSG CDS with all 2613 6
remaining identical between stains is <10 . This suggests that
annotated unique sequences obtained from our assembly, 42
these sequences are not conserved orthologs, but are the result of
showed ≥97% identity over a contiguous stretch of at least half
recent genetic exchange among strains–even between East African
their length with another Lister 427 VSG. When compared to 959
and West African trypanosome subspecies.
non-427 VSGs, most from the TREU927 strain, 38 showed better
than 97% identity over a contiguous stretch of at least half the
length of another VSG. 3.6. The VSGnome is under purifying selection
To compare sequences across their entire length, all predicted
Lister 427 VSG nucleotide and protein sequences were aligned to VSG is in the major surface antigen of T. brucei and is in direct
their best match in TREU927 (Fig. 3). The majority of sequences are contact with the host immune system. VSG sequence diversity is
very dissimilar to their closest match (<50% identity across CDS; huge (see Fig. 3; Supplementary Fig. 3), presumably because the
<35% protein sequence identity). This is expected if there is recom- mechanism of antigenic variation on which the parasite relies
bination among VSGs, as the nearest match will only correspond to requires there to be little cross-reactivity between expressed
64 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
200 2613 set 559 set 150
100 Number
50
0 020406080 100 Aligned identity, 427 vs. 927 [%]
150
100
Number 50
0 020406080 100
Aligned identity, 427 vs. 927 [%]
Fig. 3. Similarity between Lister 427 and TREU927 VSG nucleotide and protein sequences. Results are shown for the best matches of each Lister 427 VSG with the TREU927
strain for all annotated 427 VSG genes or partial genes (2613 set) and a subset (559 set) that appear to be full length. There is no significant difference between the two sets.
As expected, the sequence conservation is greater for the DNA than for the protein.
variants. Given these conditions, it might be expected that VSGs against the huge sequence diversity observed in the family. It also
would be under diversifying pressure; that VSG genes would be implies that VSG sampling is sufficiently high relative to mutation
fixing nucleotide substitutions that alter the protein sequence rate for effective selection, despite only one VSG being expressed
(non-synonymous mutations) faster than those that do not (syn- at any time.
onymous mutations). However, comparing members of the Lister
427 VSGnome to their closest relatives in TREU927 shows that
VSGs are actually under purifying selection, with a median dN/dS
3.7. VSGs have an unusual codon-usage bias
ratio of 0.45 (Fig. 4). Hence, ∼50% of non-synonymous changes in
VSGs are removed by selection, implying that there are significant
VSG is a hugely abundant protein, comprising ∼10% of the total
restrictions on changes tolerated at a specific site on a VSG, even
protein content of BF T. brucei [41]. Given this level of expres-
sion, it is possible that VSG codon usage might be adapted for
efficient translation of VSG mRNA. Synonymous codon usage in
highly expressed ‘housekeeping’ genes (such as those encoding
␣ 
/ -tubulins or ribosomal components) has been shown to be
biased toward tRNAs with higher gene copy number in T. brucei,
T. cruzi and Leishmania [42]. Analysis of 368 full-length VSGs shows
a substantial codon-usage bias (Fig. 5; Supplementary Fig. 6). How-
ever, the bias in the VSG set is distinct from that found in highly
expressed housekeeping genes, which is of the same type as also
seen in pol II-transcribed genes annotated ‘hypothetical’ (having
no identifiable function), but to a lesser degree. VSG codon-usage
bias is more pronounced than for highly expressed genes, show-
ing a strong bias toward A or C at the third position (in contrast
to the G/C biases seen for other genes). The significance of this to
the biological role of VSG is not currently known, but it is appar-
ently not a product of being pol I-transcribed nor due to an intrinsic
nucleotide bias in the BES, since BES-associated genes (ESAGs) show
a codon-usage bias similar to that of the rest of the genome. It is
Fig. 4. Evidence for purifying codon selection in the VSGnome. Ratio of the rate
also not readily ascribed to the influence of recombination among
of non-synonymous to synonymous changes (dN/dS) inferred from codon-based
silent VSGs, since the nucleotide bias is not equally present in both
alignments of Lister 427 VSGs to their best match in TREU927. Only VSG pairs with
≥60% aligned identity were considered. sense and antisense strand.
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 65
2.0 VSG ESAG annotated housekeeping hypothetical 1.5 high expressed ribosomal protein
1.0
0.5 Usage bias
0
-0.5
-1.0 C T A G C T A G C T A T C A T G A G C T A G A G C T A G C T
AA GC CCA GC GC GC TGC TGT GAC GAT GAA GAG TTC TTT GGA GGG GGC GGT CA CA ATA ATC ATT AAG CT CT CT CT TTA TTG AAC AA CA AGA AGG CG CG CG CG AG AG TC TC TC TC AC AC AC AC GTA GTG GTC GTT TAC TAT TA TA TGA CCG CCC CCT CA
A C D E F G HPI K L N Q R S T V Y *
Fig. 5. Codon-usage bias among different gene types in T. brucei Lister 427. Shown are data for 559 functional or near-complete VSGs (VSG), 115 expression-site associated
genes (ESAG), 3070 annotated polII-transcribed genes (annotated housekeeping), 5576 hypothetical genes, 54 genes estimated to have >150 mRNA molecules per cell [82]
(high expressed) and 156 genes encoding ribosomal proteins (ribosomal). Codon-usage bias is the fold under- or over-representation of a given codon relative to its set of
synonymous codons (0 = no bias).
This strong codon bias is similar regardless of whether we con- A3 and (B1 + B2), respectively (data not shown). As previously, no
sidered the whole VSGnome, only those in BESs or on MCs, or VSGs good evidence for a separate ‘nC’ VSG N-terminal class was seen
known to be expressed in TREU927 (Supplementary Fig. 7A). How- [12], which is instead a subclass of the A-type domain.
ever, it is also seen (to a lesser degree) in the genes thought to The shorter C-terminal domain was also analyzed (data not
encode the coat of the epimastigote form (BARP). Interestingly, the shown), with similar results to those presented previously [12].
same or very similar biases are also present in the gene families Supplementary Fig. 9 shows the domain architecture (N- and C-
believed to encode the major surface proteins of both T. congolense terminal domains) for all of the analyzed full-length VSGs.
and T. vivax (Supplementary Fig. 7B and 8). Again, in these species,
these biases are different from those apparent in predicted high-
expressed genes. 3.9. Metacyclic (MES) and bloodstream expression site (BES) VSGs
Of fourteen VSGs previously associated with BES sequences in
3.8. Characterization of VSG N- and C-terminal domains a cell line derived from Lister 427 [46], eleven were assembled as
complete VSGs in the current project. Six contigs (Supplementary
The 368 apparently complete VSGs from Lister 427 were Fig. 10), ranging from 7197 to 44404 bp, contained a complete VSG
compared to all non-pseudogenic VSGs encoded in TREU927 downstream of a metacyclic promoter, with no intervening ESAGs.
and T. brucei gambiense Dal972 genome assemblies (v4), as Five of these MES contain VSGs (numbered 397, 531, 639, 653 and
well as 38 VSGs from other strains/species available from VSGdb 1954) identical to those expressed when a PF line of Lister 427 was
(http://leishman.cent.gla.ac.uk/pward001/vsgdb/kook index.html) differentiated to metacyclics in vitro [13]. The structure of the sixth
[40]. The level of sequence divergence for the majority of VSGs and longest contig was atypical of a MES: it had additional ORFs
makes the production of high-quality multiple-sequence align- encoding partial VSGs downstream of the intact VSG (582) that was
ments problematic, so a technique based on clustering sequences immediately downstream of the promoter, suggesting recombina-
based on all-against-all BLASTp local alignment scores was used tion between a VSG array and this MES locus had occurred. This
[43]. Independent comparisons were made for N- and C-terminal probably explains the lack of expression of this VSG during induced
VSG domains of all sequences. metacyclogenesis [13]. The MES structures upstream of the pro-
Comparison of N-terminal domains recapitulated their gross moter (Supplementary Fig. 10) have similar characteristics to three
division into two major classes: A and B, with A being the more previously described MES [47–50]. 70-bp repeats are absent (or
diverse in sequence (Fig. 6) [12,44]. Both A and B show clear divi- highly degenerate) and, in some cases, there are a few random par-
sion into sequence sub-classes. By extension of the nomenclature tial ESAG1 and VSG pseudogenes upstream of the promoter. In one
of Marcello and Barry [12], we name these types A1, A2, A3, B1, case there is a large region identical to chromosome 9 upstream
and B2. By using weighted scores for all possible pairwise combi- of the functional MES promoter, suggesting this MES might be at
nations, our method is more robust than clustering based only on one end of chromosome 9. In one case, there is an unusually long
BLAST hits, which is very sensitive to the applied threshold for hit distance between the MES promoter and the downstream VSG.
detection. This can be seen in that the TREU927 sequences classi- Fig. 7 shows an alignment of all currently known MES pro-
fied as types N1, N2, N3 and N4 [45], map onto sub-classes A1, A2, moter sequences, together with the three other RNA Polymerase
66 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
Fig. 6. N-terminal domain types. Clustering of N-terminal domain types of full-length VSG proteins using weighted BLASTp scores for all possible pairwise combinations.
The analysis shows 559 apparently complete or near-complete Lister 427 VSGs together with 138 non-pseudogenic TREU927 VSGs annotated in VSGdb and 28 annotated
VSG from the T.b. gambiense Dal927 genome (TriTrypDB v4.2).
I promoters known in T. brucei, used for transcription of rRNA, Telomeres account for ∼0.9% of the reference BF Lister 427
Procyclin and BES loci. The BES and MES consensus sequences are genome (Fig. 8 and Supplementary Table 1). Similar abundance was
distinct but more similar to each other than either is to the rRNA found in BF 501 cells (∼1.2%) and in PF TREU927 (0.5%) and T. brucei
or Procyclin promoters. The promoters driving mVSGs 582 and 639 gambiense Dal972 (∼0.7%), which also have fewer MCs (see below).
are identical to the Lister 427 MES promoter sequences previously However, telomere repeats are much more abundant in the PF BW
identified as 4–49 and ES4 respectively [51,52]. Lister 427 line, presumably as a consequence of continuous telom-
The small subsets of BES- and MES-encoded VSGs show no sig- ere growth during long-term propagation. Given a total nuclear
nificant bias toward particular N- or C-terminal types (Fig. 7). This DNA content of ∼78 Mb (see below), consisting of 11 diploid chro-
is consistent with metacyclic VSGs being normal members of the mosomes and ∼100 haploid MC and IC, the ‘telomere-rich’ lines
VSGnome, which differ only in being in MESs. have an average telomere length of ∼11 kbp. These long telom-
eres are also apparent in read-depth analysis of MC-specific DNA
isolated from pulsed-field gels, for which the same ratio of telom-
ere abundance among different cell lines was observed (Fig. 8 (SM
3.10. Variation in the quantity of telomere repeats among
HK MC and PF BW MC) and Supplementary Table 1), as expected
different lines of Lister 427
if telomere lengths from both MCs and MBCs are similar. Inde-
pendent estimates of telomere length in this MC-specific DNA
Because of the proposed [53] and demonstrated [54] linkage
agree well with that for whole-genome libraries (∼12 kbp for PF
between telomere length and VSG switching frequency, we exam-
Lister 427, based on MCs of average 75 kbp length and compris-
ined the representation of telomere repeats in various lines and
ing 55% 177-bp repeat). Surprisingly, however, telomere lengths
samples in our DNAseq data. Repeat composition was estimated
in PF lines were drastically reduced (to ∼30% of original length)
from the percentage of the total reads that were entirely composed
after sub-cloning (compare the 3 PF subclones, mc1-mc3, of the
of each repeat sequence. The read and target database lengths were
original PF BW MC sample in Fig. 8 and Supplementary Table
coordinated so each read could only align once with its target.
1).
Only the first 50 bp were mapped, because of a more pronounced
In agreement with previous observations [55], read-depth anal-
drop in sequence quality that was observed toward the 3 end
ysis shows a drastic reduction of telomeric repeat length during
of telomere-repeat reads compared to 177-bp-repeat reads. There
null
propagation of TERT lines (BF T. brucei lacking the catalytic sub-
was no significant difference in the number of telomere repeats
unit of telomerase) (Fig. 8, five t7 clones). Long-term propagation of
mapped when either two or three mismatches were allowed, which
null
the TERT line results in telomere stubs of apparently minimum
supports the assertion that quality was not an issue in mapping
length for cell viability [55], and also leads to a great decrease of
the first 50 bp. This suggests that our estimates for the amounts of
177-bp repeats (Fig. 8, T7-2 line).
telomeric DNA are quite accurate.
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 67
Fig. 7. Alignment of Metacyclic Expression Site (MES) promoters. The consensus residues shown in red are either identical in at least 17 of the 18 MES promoter sequences.
Residues that match the consensus are hidden. Yellow shading highlights residues that are identical to the 427 BES. All sequences end in either CA or CAA, so CA could
be a consensus transcription-initiation point. Sequences not identified in the present study are from the following sources (with GenBank accession numbers if available):
EATRO 1125 (X05710) [83]; TREU927 GUTat 10.1 [84] (AC087702); TREU 795 ILTat 1.22 (AJ012198 or S79897) and 1.61 (AJ012199) [85]; ILTat 1.63 and 1.64 (AJ486954 and
AJ486955) [86]; WRATat series MVAT4, MVAT5 and MVAT7 (AF068693, S80865 and U83435) [87–89]; AnTat 11.17 [90]; Lister 4274-49 [51] is identical to Lister 427 VSG
582; ES4 (identical to Lister 427 VSG 639) and ES7 (partial, not shown, but its 46 bp are identical only to Lister 427 VSG 397, ILTat 1.63 and EATRO 1125 X05710) from Lister
427 (incorrectly called EATRO 427 in [52]) were a personal communication from Frederic Bringaud in October 2003. The 427 BES, rRNA and Procyclin promoter sequences
are from the trypanosome genome project, but with boundaries informed by [91–93]. Not all promoters have been shown to be functional in actual metacyclic forms, and
some sequences could contain errors.
Fig. 8. Abundance of telomere and 177-bp repeat sequences in the genome and in minichromosomes. The left panel shows the amount of repeats as a percentage of total
DNA. The right panel, on a different scale, shows the results for various MC DNA samples. Line 501 was a culture derived from a 50-year-old cryopreserved sample of Lister
427 [16] that is progenitor of all Lister 427 lines used here and elsewhere. 501 cl2 is a further subclone of 501. Samples labeled SM are the BF ‘single marker’ line [35]; the
null
suffixes indicate different branches of this line. All PF samples were prepared in the laboratory of BW. The t46 sample is from the TERT line, at an intermediate stage of
null
telomere loss, as are the five t7 subclones. Sample T7-2 comes from TERT cells that have been maintained to the point where they have stable telomeres of minimum size [55].
68 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
3.11. Quantification of minichromosomes 177-bp repeats that are present on MC. Minichromosomal VSGs
were identified by sequencing MC from PFGE separations of
MC DNA was isolated from five T. brucei Lister 427 lines: a BF chromosomal DNA from two distantly separated PF and BF lines,
SM cell line; two distantly related PF cell lines; and three subclones and mapping the reads to the VSGnome. The purity of the MC
made from one of these PF lines. The DNA sequences from these fraction, defined by the absence of DNA fragments from larger
cell lines were mapped against the Lister 427 VSGnome, and also chromosomes, is exemplified by the low abundance of DNA hits
against the TREU927 genome, to gauge the (low) level of contam- from this fraction against VSGs that are known to reside on large
ination by fragmentation of MBC, as exemplified by the values for chromosomes (for example, the well defined ES VSGs [46]), and
tubulin and rRNA genes shown in Supplementary Table 1. the low representation of repeated genes such as ␣-tubulin and
177-bp repeats comprised ∼5.4% in four samples of the Lister the ribosomal RNA promoter (Supplementary Table 1).
427 DNA (Fig. 8 and Supplementary Table 1), equivalent to 4.2 Mb MC DNA from PF and BF cells was also assembled de novo, after
of 78 Mb of nuclear DNA. Because there is a little heterogeneity removing repetitive sequences, which constituted a large fraction
in the sequence of known 177-bp repeats [56], the quantification of the DNA of purified MC, and experimenting with different assem-
was determined using 0–3 (3 being the maximum allowed in the bler parameters. The rationale for doing this was that we would be
Bowtie software) mismatches. The number of matches approached able to complete some VSGs that were previously only partially
a plateau when three mismatches were allowed, suggesting that assembled, because of the lower complexity of the MC fraction.
most of the 177-bp repeats have been included in the final value, Indeed, the ORFs for nine MC VSGs that were previously truncated
although there could be a minor population of repeats (for which were now assembled into complete VSGs, raising the number of
we have no evidence: these repeats appear to be well conserved complete MC VSGs to 50, from a total of 67 distinct MC VSGs. As for
among strains and subspecies) that differ by >3%. From fine- BES and MES VSGs, representatives of all VSG types are present on
resolution mapping studies, it has been estimated [56] that 177-bp MCs (Fig. 6) with no evidence of systematic bias. However, the pro-
repeats comprise about 55% of the average MC, so the aggregate MC portion of complete VSGs is much higher in the MC set compared
length would be about 7.6 Mb. Assuming a mean MC size of 75 kbp, with the genome as a whole (74% and 14%, respectively). Hence
the repeat quantification suggests that there are about 96 MC per MCs possess a repertoire of VSGs that is distinct from those in the
nucleus (after correcting for the ∼5 intermediate chromosomes chromosomal internal arrays.
(IC), which also contain 177-bp repeats). Because some VSGs that By DNAseq mapping, 66 copies of 52 distinct VSGs were identi-
are only present on MC have only one copy per nucleus and we fied in the BF MC sample, and 64 copies of 52 VSGs in the PF sample.
found total of ∼65 MC-associated VSGs (see below), not all MCs con- This shows that there must be at least 33 MC, if all of them had a
tain a VSG and MC-VSGs must be also essentially haploid, similar to VSG at each end. Other calculations (see below) suggest there are
BES-VSGs. approximately 96 MC, in agreement with estimates of ∼100 made
This estimate of the number of MC, while subject to the assump- from PFGEs [11] which would mean that approximately one third
tion of their average length, which is based on high-resolution of MC ends contain VSGs.
pulsed-field gel electrophoresis (PFGE) analysis [56], is consistent 18 copies of 16 VSGs were found in the MC of the BF sample that
with the often cited “rough estimate” [11] of ∼100 MC, ranging from were not in the PF sample (Fig. 9). Conversely, 15 copies of 15 VSGs
50 to 150 kbp. If the amount of 177-bp repeats mirrors the num- were found in the PF sample but not in BF samples. VSG 17, which
ber of MC, strains TREU927 and T. brucei gambiense Dal972 would exists as a single copy on MC in a PF sample from the Wickstead lab,
have about 60 MC, which is consistent with PFGE analyses show- is on a 180-Kb IC BES in the BF SM sample from the Cross & Rudenko
ing that several T. brucei gambiense isolates, including Dal972, have labs [46]. VSG 19, which is present only in a >3.1 Mb BES from the
fewer MC than T. brucei isolates [57–59], and is consistent with our Cross & Rudenko labs [46] is present as two MC copies in PF and BF
estimates from PFGE. SM lines sequenced in the present study. Moreover, sub-cloning a
The increase in the percentage of 177-bp repeats in total PF line substantially alters the MC VSG profile (Supplementary Fig.
null
genomic DNA after subcloning the TERT line when its telomeres 11). These data suggest that the MC VSG repertoire is particularly
had not quite reached their minimum length (Fig. 8, samples 7 cl1 dynamic, whereas the number of MC, as judged by the comparable
to cl5) is unexplained, but possibly attributable to elevated levels amount of 177-bp repeat (Fig. 8 and Supplementary Table 1), is
of recombination among MCs as telomere shortening approaches a not so variable, suggesting that subtelomeric regions of MCs are
critical point, similar to the observation that recombinational VSG frequently recombining.
null
switching increases in TERT cells with short telomeres [54,60]. Independent confirmation that the calculated copy numbers
It could represent circularization or fusion of 177-bp sequences truly reflect the distribution of sequence reads across the MC VSGs
as MC destabilize, and could be related to the unexplained new is shown for a small region of the MC VSGnome by displaying the
band of likely circular DNA that was noted previously during gel alignments with the Integrative Genome Viewer [62,63] (Supple-
electrophoresis of DNA from cells with short telomeres [55]. mentary Fig. 12).
null
The two copies of the rRNA promoter previously reported in two In BF cells grown in the absence of telomerase (TERT ) until
MC [61], were also found by DNAseq of the PF MC fraction, but were telomeres had stabilized at the apparent minimum sustainable
absent from MC prepared from the BF SM cell line (Supplementary length [55], quantification of the 177-bp repeats suggested that
Table 1). two thirds of the MC had been lost (Supplementary Table 1), and we
could determine that, for 26 VSGs that were found exclusively on
3.12. Identification and stability of minichromosomal VSGs MC, 32 copies of these 26 VSGs had been lost, amounting to about
50% of the MC VSG repertoire of the parental line (Supplementary
Only seventeen full-length MC have been characterized in any Fig. 13). The number of MC copies lost will be slightly larger, as the
detail in T. brucei [56]. They have a uniform organization, consisting loss of some multi-copy VSGs found on both MC and MBC or IC was
of a palindromic region of 177-bp repeats flanked by non-repetitive not be assumed to represent loss of the MC copies. Also included
regions, suggesting that most or all might have a VSG gene at one or in Supplementary Fig. 13 are the 50 most abundant non-MC VSGs
null
both ends (only two VSGs were specifically identified). Attempts to in the TERT clone; no significant changes are seen in these VSGs
null
identify MC VSGs bioinformatically, based on their association with in the TERT line, probably because most are telomere-distal. In
177-bp repeat sequences, were unsuccessful, probably because a few cases, VSG abundance changes. This could reflect destabiliza-
of difficulties in assembling the large runs of almost identical tion of repeated sequences (note the variation in copies of tubulin
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 69
10
9
8
7
MC BF 6 MC PF
5 COPIES
4
3
2
1
0
23 30 25 31 19 17 16 22 392 645 514 503 600 476 618 622 637 444 646 371 430 424 671 649 672 663 416 365 629 647 620 541 377 717 542 643 634 575 567 315 507 369 374 666 1117 2114 2117 1621 2075 1712 3327 1640 3562 2096 1748 1440 1638 1706 2348 2057 3338 1849 1490 1503
VSG 1441 3340 3777 1963 3406
Fig. 9. Abundance of minichromosomal VSGs in diverged BF and PF cell lines. The data are shown as a stacked histogram (blue, PF; red BF) arranged in order of VSG abundance
in PF. Not all copies of multi-copy VSGs are necessarily on MCs.
and rRNA promoters among some lines in Supplementary Table overall number of assembled VSG sequences was much smaller
1) in the absence of TERT and/or as a consequence of subcloning or than might be expected to exist. Although a BAC-based sequencing
otherwise stressed culture conditions. Of the 14 identified BES VSGs approach might be ideally suited for assembling VSGs, which can
null
[46], only VSG 2 was missing from the TERT line, which expresses exist as linked or scattered genes with high similarity, this is
VSG3 from the BES previously occupied by VSG2. The active ES is very unlikely to happen because of the very high cost of such a
especially unstable in the absence of telomerase activity [60]. Both project. Meanwhile, the evolution of HTS methods will continue
copies of VSG 19, which are on a MC in the sequenced line but pre- to improve the ability to assemble large regions of contiguous
viously found only in a >3.1 Mb BES [46], were missing from the sequence, although homolog switching will still prevent the
null
TERT line. assembly of individual chromosomes from a diploid organism.
One MC contig contained a glycosyltransferase (probably galac- The main advantages of current-generation HTS approaches,
tosyl, or N-acetyl-glucosamine glycosyl transferase) gene <500 bp which have further increased since this project was initiated, are
upstream of the VSG (VSG-392). The association of this family of cost and speed, which allow one to cheaply obtain and compare
glycosyl transferases with VSG arrays has been noted previously sequence data from many strains and clones. However, the main
[10]. disadvantage of the HTS approaches available to this project was
the difficulty of correctly assembling reasonably long contigs from
4. Discussion short reads. This is particularly true for the several thousand VSG
genes or partial genes in the T. brucei genome, where many are
There are advantages and disadvantages to attempting almost identical from end to end and almost no genes are unique
to assemble the VSG repertoire using either Sanger or next- throughout their length. Three independent parameters suggest
generation high-throughput sequencing (HTS) methods. In the that the 559 essentially full-length VSGs identified in the current
T. brucei genome project, chromosomes 1, 9, 10 and 11 were study underestimates the total that are full length by as much as
sequenced from plasmid libraries prepared from chromosomal 20%. First, 6 of the 26 previously sequenced Lister 427 full-length
DNA bands separated by PFGE. Chromosomes 2–8 were assembled VSGs available in GenBank were only present as partial sequences in
by sequencing tiled Bacterial Artificial Chromosomes (BACs). The the de-novo assembly (these were not re-annotated or included in
first approach provides the sequence of only one chromosome the GenBank submission), most probably because they contained
homolog and the second approach results in the joining of BACs regions of identical sequence shared with at least one other VSG.
derived from either of the two homologs, resulting in a pseudo- Secondly, de-novo assembly of 100-bp paired-end reads performed
chromosome sequence. None of the libraries included MC, and the on a PF MC fraction resulted in the upgrading to full-length of 9
70 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
VSGs that were previously incomplete, in a total of 52 MC VSGs, attributable to the loss of a full MC. Based on the abundance of
presumably because the MC fraction contained few highly similar the 177-bp repeat, our data are consistent with previous studies
VSG sequences that would have prevented the assembly of com- suggesting that T. b. gambiense contain fewer MC [57–59].
plete sequences. Thirdly, 117 of the curated partial VSG sequences Our observations suggest that stressed culture conditions or
started or ended at the beginning or end of their respective con- cloning, which itself might be stressful (perhaps especially with PF
tigs. This number includes 58 of 945 partial N-terminal sequences [65]), can lead to loss of telomeric repeats that is not attributable to
that ran to the 3 end of their contigs (contig length ranged from loss of MCs. Indeed, we saw evidence for changes in other repetitive
849 to 12,727 bp) and might therefore have become complete if the sequences, including the number of arrayed tubulin, histone and
contigs had extended further. slRNA genes, for example (GAMC unpublished data), in different
As also noted in a previous study [12], we found about four times lines and subclones. It has been reported, recently, that double-
as many partial VSG sequences encoding VSG N-terminal domains strand breaks occur naturally at silent and transcribed telomeres
than sequences encoding C-terminal domains. Of special interest [66]. Such breaks probably contribute to the instability of the sub-
were some contigs containing several VSGs. Contigs encoding many telomeric VSG repertoire.
N-terminal domains may reflect the evolution of T. brucei VSGs from Strikingly, and surprisingly for the most highly-expressed gene
shorter ones in other species, as suggested in a major analysis of VSG in the genome, our analysis of codon usage shows that the VSG-
sequences from several trypanosome species [44]. nome of T. brucei has a strong codon-usage bias that is distinct
The number of VSG copies was determined by aligning 100-bp from other highly expressed genes. This bias is the same in both
DNA sequences against the compiled database of VSG genes and known expressed VSGs and in those that do not encode a func-
counting the number of sequences that aligned to each VSG, after tional VSG, but is not seen in other BES-associated genes. It is not
compensating for non-uniquely aligning sequences. Although most currently known what causes this bias, but it may be a signature
VSGs are present as single (haploid) genes, About 360 exist as two for coat genes across African trypanosomes, as it is also seen in the
or more identical (within the 2% mismatch allowance used to map epimastigote coat BARP genes and also VSG-like gene families in T.
the DNA sequences) copies, something that has not been evident in congolense and T. vivax. It is possible that non-optimal codon usage
previous studies. From the present study, we cannot say whether may slow translation of surface proteins in African trypanosomes
multiple identical VSG copies are linked in the genome, because and so improve folding and/or post-translational modification, as
these would probably not have assembled independently using the has recently been described for the FRQ protein in Neurospora [67].
current technology. Another recent study has shown that the presence of rare codons
Different strains of T. brucei vary in their total nuclear DNA con- at the N-terminal end of coding sequences results in higher protein
tent, as estimated by PFGE of chromosomal DNA, ranging from expression, and that the effect correlated best with a calculated
62 Mb in the ‘official genome strain’ TREU927 to 79.4 in strain 247 reduction in mRNA secondary structure, but that this was probably
[6]. The lengths of the 11 fully sequenced megabase chromosomes not the complete explanation [68].
of TREU927 add up to 26.3 Mb, remarkably close to the PFGE- Another functionally important subset of VSGs are those
estimated diploid MBC content of 53.4 Mb [6]. The corresponding expressed in metacyclic forms. The current assembly identified six
value for Lister 427 chromosomes was 70 Mb by PFGE analysis, contigs containing metacyclic promoters with full-length VSGs. One
leading to an estimated total nuclear DNA content of approximately of these was atypical but the others are likely to be true subtelom-
78 Mb, when intermediate-sized and 50–100 MC of 50–100 Kb are eric metacyclic ESs. This was confirmed in a study that showed, in
included [7]. This is in agreement with the cytophotometrically contrast to general prejudice, that PF of Lister 427 can be induced
determined nuclear content of 95 fg DNA [27], equivalent to 76 Mb to develop into epimastigote and mouse-infective metacyclic life-
[6]. cycle stages in vitro, where they express five VSGs whose cDNA
There were originally estimated to be about 100 MC in the Lis- sequences were identical to the assembled genomic copies [13].
ter 427 strain [11]. Despite the authors’ concern about “Obvious Our analysis of N-terminal domains suggests that the metacyclic
uncertainties in this calculation. . .” this value is in close agreement VSGs (and also those found in minichromosomes) are no different
with our independent estimate made from HTS read-depth. More from the general VSG population.
significantly, MCs have been widely suggested to be repositories BES-VSGs and many VSGs in arrays are flanked upstream by 70-
of VSGs, and therefore an important contributor the expressed VSG bp repeats, which are separated from the VSG itself by a region of
repertoire in the early stages of an infection, by having high prob- apparently unconserved sequence, called the co-transposed region
ability of transposition to the BES [4]. The current study suggests, (CTR) [69]. The CTR contains no obvious short motifs or other con-
however, that not all MC harbor VSGs, and certainly not at each served characteristics or defined function, although inadvertent or
end, as has been widely assumed. Only about one third of MC ends calculated manipulation of the CTR of the active VSG can affect
contain VSGs, but 15–18 of these VSGs were unique to one of two VSG switching [70], and transcripts from this region are processed
distinct lines of Lister 427, suggesting a dynamic flux of VSGs in and associated with polyribosomes [71]. Alignments of the CTRs
MC. It is striking that 85% of the MC VSGs were complete, com- associated with many complete VSG genes in the current study
pared to 20% in the overall genome. This draws attention to the did not illuminate their structure or function; predicted CTR sec-
unresolved question of the origin of MC; whether they are formed ondary structure was not investigated. Recombination at the 3 end
during displacement of a currently expressed and functionally val- of the VSG can be almost anywhere within the highly conserved
idated VSG during recombinatorial switching. VSGs from MC and sequences within the C-terminal coding sequence or immediately
BES are expressed early in an infection, presumably because they downstream, and point mutations occur during this event [9,72].
are more likely to recombine into the active ES because of the extent The role of the invariant 14-mer in the 3 UTR is unknown.
of their common flanking sequences [57–59]. We did not attempt to predict splicing and polyadenylation sites
By following the loss of three MC that had been tagged with a for the VSG repertoire. From previous sequencing of VSG mRNAs
selectable marker, MC have been reported to be quite stable (at in several laboratories, it is clear that both UTRs are usually very
−3
least in PF cells), being lost at a rate of <10 per cell per genera- short (∼50 bp), but there are exceptions. As noted elsewhere [73],
tion [64]. Unfortunately, we do not know how many cell divisions the VSG 3 splice site frequently does not conform to the consensus
separate the PF and BF samples examined here, which involved cell motifs (often lacking a clear poly-pyrimidine region) that are com-
lines whose common origin was at least 35 years ago, so it is not mon to most genes and which appear to strongly influence splicing
possible to estimate the rates of VSG loss or the number which are efficiency and, hence, protein expression.
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 71
The current study was initiated to facilitate the identification of Appendix A. Supplementary data
VSGs expressed in populations arising during propagation of Lister
427 under a variety of conditions, especially in vivo and in experi- Supplementary data associated with this article can be found,
ments in which various manipulations affecting VSG recombination in the online version, at http://dx.doi.org/10.1016/j.molbiopara.
frequency, for example, lead to VSG switching. Lister 427 has the 2014.06.004.
reputation of having a very low VSG switching frequency (less than
−5
10 detected switches per cell generation [54,74–76]) and this
probably results from its long-ago selection for virulent growth in References
laboratory rodents, and/or because the variants that do emerge as
[1] Franke E. Ueber trypanosomentherapie. Muenchener Medizinische Wochen-
switched populations are so virulent that more frequent switches
schrift 1905;42:2059–60.
that result in a growth disadvantage are not detected. Because of its [2] Morrison LJ, Marcello L, Mcculloch R. Antigenic variation in the African try-
virulence, mouse infections with Lister 427 rarely exceed 14 days, panosome: molecular mechanisms and phenotypic complexity. Cell Microbiol
2009;11:1724–34.
and, although gene manipulation could be used to generate attenu-
[3] Horn D, Mcculloch R. Molecular mechanisms underlying the control of anti-
ated lines, our studies have focused on switching among VSG genes
genic variation in African trypanosomes. Curr Opin Microbiol 2010;13:700–5.
expressed in the early acute infection. The switches that we have [4] Barry JD, Hall JPJ, Plenderleith L. Genome hyperevolution and the success of a
parasite. Ann N Y Acad Sci 2012;1267:11–7.
characterized in recent experiments [54,76] have all resulted in the
[5] Van Der Ploeg LHT, Valerio D, De Lange T, Bernards A, Borst P, Grosveld FG. An
expression of complete pre-existing VSGs. A recent major study [9]
analysis of cosmid clones of nuclear DNA from Trypanosoma brucei shows that
shows that, in a more natural chronic infection with the TREU927 the genes for variant surface glycoproteins are clustered in the genome. Nucl
Acids Res 1982;10:5905–23.
strain, the majority of expressed VSGs are mosaics that are con-
[6] Melville SE, Leech V, Gerrard CS, Tait A, Blackwell JM. The molecular karyotype
structed by the segmental gene conversion between two or more
of the megabase chromosomes of Trypanosoma brucei and the assignment of
pseudogenes concomitantly with VSG switching. When a few of chromosome markers. Mol Biochem Parasitol 1998;94:155–73.
[7] Melville SE, Leech V, Navarro M, Cross GAM. The molecular karyotype of the
the mosaics were tested as transgenes in Lister 427, one resulted in
megabase chromosomes of Trypanosoma brucei stock 427. Mol Biochem Para-
reduced virulence. A priori, it seems likely that many of the mosaics
sitol 2000;111:261–73.
generated concomitantly with activation will be imperfect and will [8] Callejas S, Leech V, Reitter C, Melville S. Hemizygous subtelomeres of an African
trypanosome chromosome may account for over 75% of chromosome length.
therefore compromise cell viability or virulence when moved into a
Genome Res 2006;16:1109–18.
BES, replacing the currently expressed VSG. In a large trypanosome
[9] Hall JPJ, Wang H, Barry JD. Mosaic VSGs and the scale of Trypanosoma brucei
population this will not result in clearance of the infection, so long antigenic variation. PLoS Pathogens 2013;9:e1003502.
as the population stays ahead of the immune response. [10] Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu
DC, et al. The genome of the African trypanosome Trypanosoma brucei. Science
From examination of VSG sequences gathered in the current
2005;309:416–22.
study and previously, it is clear that few of them are unique
[11] Van Der Ploeg LHT, Schwartz DC, Cantor CR, Borst P. Antigenic variation in
throughout their sequence. VSG sequences are generally very diver- Trypanosoma brucei analysed by electrophoretic separation of chromosome-
sized DNA molecules. Cell 1984;37:77–84.
gent between Lister 427 and TREU927 strains of T. brucei, but those
[12] Marcello L, Barry JD. Analysis of the VSG gene silent archive in Trypanosoma
that are highly similar are not found in ‘protected’ genomic envi-
brucei reveals that mosaic gene expression is prominent in antigenic variation
ronments, but may reflect genetic exchange among populations. and is favored by archive substructure. Genome Res 2007;17:1344–52.
[13] Kolev NG, Ramey-Butler K, Cross GAM, Ullu E, Tschudi C. Developmental pro-
Most VSGs, therefore, are probably mosaics that have evolved, and
gression to infectivity in Trypanosoma brucei triggered by an RNA-binding
continue to evolve, by mutation and recombination, probably at
protein. Science 2012;338:1352–3.
far greater rates than the general rate of gene mutation in try- [14] Hirumi H, Hirumi K. Continuous cultivation of Trypanosoma brucei bloodstream
forms in a medium containing a low concentration of serum protein without
panosomes. The mechanisms responsible for this rapid evolution
feeder cell layers. J Parasitol 1989;75:985–9.
of the VSG repertoire are not known, although it has been shown
[15] Brun R, Schonenberger M. Cultivation and in vitro cloning of procyclic cul-
that only very short regions of sequence identity are required for ture forms of Trypanosoma brucei in a semi-defined medium. Acta Trop
1979;36:289–92.
efficient homologous recombination in T. brucei [3,77–79]. Some of
[16] Peacock L, Ferris V, Bailey M, Gibson W. Fly transmission and mating of Try-
the isolated mosaic sequences show evidence of additional point
panosoma brucei brucei strain 427. Mol Biochem Parasitol 2008;160:100–6.
mutations, which resurrects the idea that VSGs can undergo somatic [17] Serwer P. Gel electrophoresis with discontinuous rotation of the gel: an alter-
hypermutation, by an unknown mechanism, on their way to being native to gel electrophoresis with changing direction of the electrical field.
Electrophoresis 1987;8:301–4.
expressed [80,81]. The information gained from this study will
[18] Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene
greatly facilitate ongoing studies of VSG expression and switching.
identification with GLIMMER. Nucl Acids Res 1999;27:4636–41.
[19] Lukashin A, Borodovsky M. GeneMark.hmm: new solutions for gene finding.
Nucl Acids Res 1998;26:1107–15.
Funding [20] Min XJ, Butler G, Storms R, Tsang A, OrfPredictor:. predicting protein-coding
regions in EST-derived sequences. Nucl Acids Res 2005;33:W677–80.
[21] Nielsen H, Engelbrecht J, Brunak S, Von Heijne G. Identification of prokaryotic
The work was supported by a BBSRC New Investigator award to
and eukaryotic signal peptides and prediction of their cleavage sites. Protein
B.W. and by grant R01 AI021729 to G.A.M.C. from the U.S. National Eng 1997;10:1–6.
[22] Nielsen H, Krogh A. Prediction of signal peptides and signal anchors by a hidden
Institutes of Health (NIH). The content is solely the responsibility
Markov model. Proc Int Conf Intell Syst Mol Biol 1998;6:122–30.
of the authors and does not necessarily represent the official views
[23] Bininda-Emonds OR. transAlign: using amino acids to facilitate the multiple
of the National Institute of Allergy and Infectious Diseases (NIAID) alignment of protein-coding DNA sequences. BMC Bioinformatics 2005;6:156.
or the NIH. [24] Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs Calculator: calculating Ka
and Ks through model selection and model averaging. Genomics Proteomics
Bioinformatics 2006;4:259–63.
Acknowledgements [25] Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol
2009;10:R25.
We thank Christian Tschudi & Elisabetta Ullu for sharing RNAseq [26] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer J, et al.
BLAST+: architecture and applications. BMC Bioinformatics 2009;10:421.
data that led to the recognition and removal of RHS sequence
[27] Borst P, Van Der Ploeg LHT, Van Hoek JFM, Tas J, James J. On the DNA content
elements that were fused to VSG translations, to the staff of the
and ploidy of trypanosomes. Mol Biochem Parasitol 1982;6:13–23.
Genomics Resource Center of The Rockefeller University and the [28] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped
Genomics Core Lab at Memorial Sloan Kettering Cancer Center, and BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucl Acids Res 1997;25:3389–402.
to members of Nina Papavasiliou’s laboratory at The Rockefeller
[29] Bringaud F, Biteau N, Melville SE, Hez S, El-Sayed NM, Leech V, et al. A new,
University for their comments and suggestions during this study.
expressed multigene family containing a hot spot for insertion of retroelements
72 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73
is associated with polymorphic subtelomeric regions of Trypanosoma brucei. [60] Dreesen O, Cross GAM. Consequences of telomere shortening of an active
Euk Cell 2002;1:137–51. VSG expression site in telomerase-deficient Trypanosoma brucei. Eukaryot Cell
[30] Bringaud F, Biteau N, Zuiderwijk E, Berriman M, El-Sayed NM, Ghedin E, et al. 2006;5:2114–9.
The ingi and RIME non-LTR retrotransposons are not randomly distributed in [61] Zomerdijk JCBM, Kieft R, Borst P. A ribosomal RNA gene promoter at the
the genome of Trypanosoma brucei. Mol Biol Evol 2004;21:520–8. telomere of a mini-chromosome in Trypanosoma brucei. Nucleic Acids Res
[31] Hoeijmakers JHJ, Frasch ACC, Bernards A, Borst P, Cross GAM. Novel expression- 1992;20:2725–34.
linked copies of the genes for variant surface antigens in trypanosomes. Nature [62] Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G,
1980;284:78–80. et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24–6.
[32] Beals TP, Boothroyd JC. Genomic organization and context of a trypanosome [63] Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV):
variant surface glycoprotein gene family. J Mol Biol 1992;225:961–71. high-performance genomics data visualization and exploration. Brief Bioin-
[33] Beals TP, Boothroyd JC. Sequence divergence among members of a trypanosome form 2012, http://dx.doi.org/10.1093/bib/bbs017.
variant surface glycoprotein gene family. J Mol Biol 1992;225:973–83.
[64] Wickstead B, Ersfeld K, Gull K. The mitotic stability of the minichromosomes of
[34] Field MC, Boothroyd JC. Sequence divergence in a family of variant surface
Trypanosoma brucei. Mol Biochem Parasitol 2003;132:97–100.
glycoprotein genes from trypanosomes: coding region hypervariability and
[65] Klingbeil MM, Motyka SA, Englund PT. Multiple mitochondrial DNA poly-
downstream recombinogenic repeats. J Mol Evol 1996;42:500–11.
merases in Trypanosoma brucei. Mol Cell 2002;10:175–86.
[35] Wirtz E, Leal S, Ochatt C, Cross GAM. A tightly regulated inducible expression
[66] Glover L, Alsford S, Horn D. DNA break site at fragile subtelomeres determines
system for dominant negative approaches in Trypanosoma brucei. Mol Biochem
probability and mechanism of antigenic variation in African trypanosomes.
Parasitol 1999;99:89–101.
PLoS Pathogens 2013;9:e1003260.
[36] Doyle JJ, Hirumi H, Hirumi K, Lupton EN, Cross GAM. Antigenic variation in
[67] Zhou M, Guo J, Cha J, Chae M, Chen S, Barral JM, et al. Non-optimal codon
clones of animal-infective Trypanosoma brucei derived and maintained in vitro.
usage affects expression, structure and function of clock protein FRQ. Nature
Parasitology 1980;80:359–69. 2013;495:111–5.
[37] Aline Jr RF, Macdonald G, Brown E, Allison J, Myler P, Rothwell V, et al. (TAA)n
[68] Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias
within sequences flanking several intrachromosomal variant surface glycopro-
in bacterial genes. Science 2013;342:475–9.
tein genes in Trypanosoma brucei. Nucl Acids Res 1985;13:3161–77.
[69] Liu AYC, Van Der Ploeg LHT, Rijsewijk FAM, Borst P. The transposition unit
[38] Majumder HK, Boothroyd JC, Weber H. Homologous 3 -terminal regions of
of VSG gene 118 of Trypanosoma brucei: presence of repeated elements at its
mRNAs for surface antigens of different antigenic variants of Trypanosoma
border and absence of promoter associated sequences. J Mol Biol 1983;167:
brucei. Nucl Acids Res 1981;9:4745–53. 57–75.
[39] Carrington M, Miller N, Blum M, Roditi I, Wiley D, Turner MJ. Variant
[70] Davies KP, Carruthers VB, Cross GAM. Manipulation of the VSG co-transposed
specific glycoprotein of Trypanosoma brucei consists of two domains each
region increases expression-site switching in Trypanosoma brucei. Mol Biochem
having an independently conserved pattern of cysteine residues. J Mol Biol Parasitol 1997;86:163–77.
1991;221:823–35.
[71] Aline RF, Scholler JK, Stuart K. Transcripts from the co-transposed segment of
[40] Marcello L, Menon S, Ward P, Wilkes JM, Jones NG, Carrington M, et al. VSGdb:
variant surface glycoprotein genes are in Trypanosoma brucei polyribosomes.
a database for trypanosome variant surface glycoproteins, a large and diverse
Mol Biochem Parasitol 1989;32:169–78.
family of coiled coil proteins. BMC Bioinformatics 2007;8:143.
[72] Bernards A, Van Der Ploeg LHT, Frasch ACC, Borst P, Boothroyd JC, Cole-
[41] Cross GAM, Identification. purification and properties of clone-specific gly-
man S, et al. Activation of trypanosome surface glycoprotein genes involves
coprotein antigens constituting the surface coat of Trypanosoma brucei.
a gene duplication-transposition leading to an altered 3 end. Cell 1981;27:
Parasitology 1975;71:393–417. 497–505.
[42] Horn D. Codon usage suggests that translational selection has a major impact
[73] Siegel TN, Tan KSW, Cross GAM. A systematic study of sequence motifs for RNA
on protein expression in trypanosomatids. BMC Genomics 2008;9:2.
trans-splicing in Trypanosoma brucei. Mol Cell Biol 2005;25:9586–94.
[43] Wickstead B, Gull K. Dyneins across eukaryotes: a comparative genomic anal-
[74] Lamont GS, Tucker RS, Cross GAM. Analysis of antigen switching rates in Try-
ysis. Traffic 2007;8:1708–21.
panosoma brucei. Parasitology 1986;92:355–67.
[44] Jackson AP, Berry A, Aslett M, Allison HC, Burton P, Vavrova-Anderson J, et al.
[75] Horn D, Cross GAM. Analysis of Trypanosoma brucei VSG expression site switch-
Antigenic diversity is generated by distinct evolutionary mechanisms in African
ing in vitro. Mol Biochem Parasitol 1997;84:189–201.
trypanosome species. Proc Natl Acad Sci USA 2012;109:3416–21.
[76] Boothroyd CE, Dreesen O, Leonova T, Ly KI, Figueiredo LM, Cross GAM, et al.
[45] Weirather JL, Wilson ME, Donelson JE. Mapping of VSG similarities in Try-
A yeast-endonuclease-generated DNA break induces antigenic switching in
panosoma brucei. Mol Biochem Parasitol 2012;181:141–52.
Trypanosoma brucei. Nature 2009;459:278–81.
[46] Hertz-Fowler C, Figueiredo LM, Quail MA, Becker M, Jackson A, Bason N, et al.
[77] Barnes RL, Mcculloch R. Trypanosoma brucei homologous recombination is
Telomeric expression sites are highly conserved in Trypanosoma brucei. PLoS
dependent on substrate length and homology, though displays a differential
ONE 2008;3:e3527.
dependence on mismatch repair as substrate length decreases. Nucl Acids Res
[47] Lenardo MJ, Rice-Ficht AC, Kelly G, Esser KM, Donelson JE. Characterization 2007;35:3478–93.
of the genes specifying two metacyclic variable antigen types in Trypanosoma
[78] Glover L, Mcculloch R, Horn D. Sequence homology and microhomology dom-
brucei rhodesiense. Proc Natl Acad Sci USA 1984;81:6642–6.
inate chromosomal double-strand break repair in African trypanosomes. Nucl
[48] Matthews KR, Shiels PG, Graham SV, Cowan C, Barry JD. Duplicative activation
Acids Res 2008;36:2608–18.
mechanisms of two trypanosome telomeric VSG genes with structurally simple
[79] Glover L, Jun J, Horn D. Microhomology-mediated deletion and gene conversion
5 flanks. Nucl Acids Res 1990;18:7219–27.
in African trypanosomes. Nucl Acids Res 2011;39:1372–80.
[49] Barry JD, Graham SV, Fotheringham M, Graham VS, Kobryn K, Wymer B. VSG
[80] Lu Y, Hall T, Gay LS, Donelson JE. Point mutations are associated with a gene
gene control and infectivity strategy of metacyclic stage Trypanosoma brucei.
duplication leading to the bloodstream reexpression of a trypanosome meta-
Mol Biochem Parasitol 1998;91:93–105.
cyclic VSG. Cell 1993;72:397–406.
[50] Pedram M, Donelson JE. The anatomy and transcription of a monocistronic
[81] Lu Y, Alarcon CM, Hall T, Reddy LV, Donelson JE. A strand bias occurs in point
expression site for a metacyclic variant surface glycoprotein gene in Try-
mutations associated with variant surface glycoprotein gene conversion in Try-
panosoma brucei. J Biol Chem 1999;274:16876–83.
panosoma rhodesiense. Mol Cell Biol 1994;14:3971–80.
[51] Mcandrew M, Graham S, Hartmann C, Clayton C. Testing promoter activity
[82] Kolev NG, Franklin JB, Carmi S, Shi HF, Michaeli S, Tschudi C. The transcriptome
in the trypanosome genome: isolation of a metacyclic-type VSG promoter
of the human pathogen Trypanosoma brucei at single-nucleotide resolution.
and unexpected insights into RNA polymerase II transcription. Exp Parasitol
PLoS Pathogens 2010;6:e1001090.
1998;90:65–76.
[83] Murphy NB, Pays A, Tebabi P, Coquelet H, Gyaux M, Steinert M, et al. Try-
[52] Bringaud F, Biteau N, Donelson JE, Baltz T. Conservation of metacyclic variant
panosoma brucei repeated element with unusual structural and transcriptional
surface glycoprotein expression sites among different trypanosome isolates.
properties. J Mol Biol 1987;195:855–71.
Mol Biochem Parasitol 2001;113:67–78.
[84] Lacount DJ, El-Sayed NM, Kaul S, Wanless D, Turner CM, Donelson JE. Analysis
[53] Dreesen O, Li B, Cross GAM. Telomere structure and function in trypanosomes:
of a donor gene region for a variant surface glycoprotein and its expression site
a proposal. Nat Rev Microbiol 2007;5:70–5.
in African trypanosomes. Nucl Acids Res 2001;29:2012–9.
[54] Hovel-Miner GA, Boothroyd CE, Mugnier M, Dreesen O, Cross GAM, Papavasil-
[85] Graham SV, Terry S, Barry JD. A structural and transcription pattern for
iou FN. Telomere length affects the frequency and mechanism of antigenic
variant surface glycoprotein gene expression sites used in metacyclic stage
variation in Trypanosoma brucei. PLoS Pathogens 2012;8:e1002900.
Trypanosoma brucei. Mol Biochem Parasitol 1999;103:141–54.
[55] Dreesen O, Cross GAM. Telomerase-independent stabilization of short telom-
[86] Ginger ML, Blundell PA, Lewis AM, Browitt A, Gunzl A, Barry JD. Ex vivo and
eres in Trypanosoma brucei. Mol Cell Biol 2006;26:4911–9.
in vitro identification of a consensus promoter for VSG genes expressed by
[56] Wickstead B, Ersfeld K, Gull K. The small chromosomes of Trypanosoma brucei
metacyclic-stage trypanosomes in the tsetse fly. Euk Cell 2002;1:1000–9.
involved in antigenic variation are constructed around repetitive palindromes.
[87] Alarcon CM, Son HJ, Hall T, Donelson JE. A monocistronic transcript for a try-
Genome Res 2004;14:1014–24.
panosome variant surface glycoprotein. Mol Cell Biol 1994;14:5579–91.
[57] Gibson WC, Borst P. Size-fractionation of the small chromosomes of Try-
[88] Nagoshi YL, Alarcon CM, Donelson JE. The putative promoter for a meta-
panozoon and Nannomonas trypanosomes by pulsed field gradient gel
cyclic VSG gene in African trypanosomes. Mol Biochem Parasitol 1995;72:
electrophoresis. Mol Biochem Parasitol 1986;18:127–41. 33–45.
[58] Dero B, Zampetti-Bosseler F, Pays E, Steinert M. The genome and the antigen
[89] Kim KS, Donelson JE. Co-duplication of a variant surface glycoprotein gene
gene repertoire of Trypanosoma brucei gambiense are smaller than those of T. b.
and its promoter to an expression site in African trypanosomes. J Biol Chem
brucei. Mol Biochem Parasitol 1987;26:247–56. 1997;272:24637–45.
[59] Kanmogne GD, Bailey M, Gibson WC. Wide variation in DNA content among
[90] Vanhamme L, Pays A, Tebabi P, Alexandre S, Pays E. Specific binding of pro-
isolates of Trypanosoma brucei ssp. Acta Trop 1997;63:75–87.
teins to the noncoding strand of a crucial element of the variant surface
G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 73
glycoprotein, procyclin, and ribosomal promoters of Trypanosoma brucei. Mol [92] Sherman DR, Janz L, Hug M, Clayton C. Anatomy of the parp gene promoter of
Cell Biol 1995;15:5598–606. Trypanosoma brucei. EMBO J 1991;10:3379–86.
[91] Brandenburg J, Schimanski B, Nogoceke E, Nguyen TN, Padovan JC, [93] Janz L, Clayton C, The PARP. and rRNA promoters of Trypanosoma brucei are com-
Chait BT, et al. Multifunctional class I transcription in Trypanosoma posed of dissimilar sequence elements that are functionally interchangeable.
brucei depends on a novel protein complex. EMBO J 2007;26: Mol Cell Biol 1994;14:5804–11. 4856–66.