<<

Molecular & Biochemical Parasitology 195 (2014) 59–73

Contents lists available at ScienceDirect

Molecular & Biochemical Parasitology

Capturing the variant surface glycoprotein repertoire (the VSGnome)

of Trypanosoma brucei Lister 427

a,∗ a,1 b

George A.M. Cross , Hee-Sook Kim , Bill Wickstead

a

Laboratory of Molecular Parasitology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA

b

Medical School, Queen’s Medical Centre, University of Nottingham, Nottingham NG7 2UH, UK

a

r t a b

i s

c l e i n f o t r a c t

Article history: Trypanosoma brucei evades the adaptive immune response through the expression of antigenically dis-

Received 7 March 2014

tinct Variant Surface Glycoprotein (VSG) coats. To understand the progression and mechanisms of VSG

Received in revised form 19 June 2014

switching, and to identify the VSGs expressed in populations of trypanosomes, it is desirable to prede-

Accepted 23 June 2014

termine the available repertoire of VSG (the ‘VSGnome’). To date, the catalog of VSG genes present

Available online 30 June 2014

in any strain is far from complete and the majority of current information regarding VSGs is derived

from the TREU927 strain that is not commonly used as an experimental model. We have assembled,

Keywords:

annotated and analyzed 2563 distinct and previously unsequenced genes encoding complete and partial

Antigenic variation

VSGs of the widely used Lister 427 strain of T. brucei. Around 80% of the VSGnome consists of incomplete

Variant surface glycoprotein

genes or pseudogenes. Read-depth analysis demonstrated that most VSGs exist as single copies, but 360

Codon-usage bias

Minichromosome exist as two or more indistinguishable copies. The assembled regions include five functional metacyclic

Expression site VSG expression sites. One third of minichromosome sub- contain a VSG (64–67 VSGs on 96

Metacyclic expression site minichromosomes), of which 85% appear to be functionally competent. The minichromosomal reper-

toire is very dynamic, differing among clones of the same strain. Few VSGs are unique along their entire

length: frequent recombination events are likely to have shaped (and to continue to shape) the reper-

toire. In spite of their low sequence conservation and short window of expression, VSGs show evidence

of purifying selection, with ∼40% of non-synonymous mutations being removed from the population.

VSGs show a strong codon-usage bias that is distinct from that of any other group of trypanosome genes.

VSG sequences are generally very divergent between Lister 427 and TREU927 strains of T. brucei, but

those that are highly similar are not found in ‘protected’ genomic environments, but may reflect genetic

exchange among populations.

© 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND

license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

1. Introduction

The existence of antigenic variation in Trypanosoma brucei has

been known for over 100 years [1], but the extent of the Variant Sur-

Abbreviations: BAC, Bacterial Artificial ; BES, bloodstream-form face Glycoprotein (VSG) repertoire, the mechanisms by which the

VSG expression site; BF, bloodstream form of the T. brucei life cycle; CDS, coding

repertoire evolves, the mechanisms that determine the VSG tem-

sequence; GPI, glycosyl phosphatidyl inositol; HTS, high-throughput sequencing; IC,

porally expressed by an individual cell, and the range of sequence

intermediate-sized chromosome; MBC, megabase chromosome; MES, metacyclic-

similarities among VSGs within and between strains and isolates,

form VSG expression site; MC, minichromosome; ORF, open reading frame; PF,

procyclic (tsetse midgut) form of the T. brucei life cycle; PFGE, pulsed-field gel are not fully understood [2–4]. This is despite the importance of

electrophoresis; VSG, variant surface glycoprotein.

these parameters to understanding antigenic variation and the

All VSGs identified in this study are available in GenBank under accession num-

many studies that have pursued them as aims. Perhaps the sim-

bers KC611295–KC613858 and KC669596–KC669620. A BLAST server is available

plest but essential question to ask to achieve these aims concerns

at http://tryps.rockefeller.edu and FASTA-formatted files can be downloaded from

the repertoire of VSG genes, their diversity and evolution. this server.

Corresponding author. Tel.: +1 212 327 7571. From the earliest work, there were indications of an extensive

E-mail addresses: [email protected] (G.A.M. Cross),

VSG repertoire. Analysis of cosmid clones of nuclear DNA from Lister

[email protected] (H.-S. Kim), [email protected]

427 showed that ∼9% hybridized with a conserved region flank-

(B. Wickstead).

1 ing known VSGs [5]. Several of these clones contained multiple

Present address: Laboratory of Lymphocyte Biology, The Rockefeller University,

1230 York Avenue, New York, NY 10065, USA. putative VSGs, suggesting that VSGs might be highly clustered. T.

http://dx.doi.org/10.1016/j.molbiopara.2014.06.004

0166-6851/© 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

60 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

brucei shows enormous variations in the size of diploid chromo- through Glossina by Wendy Gibson in 1992 [16], from whom we

some homologs within and between strains, and up to 30% variation obtained it.

in total genomic DNA among different strains [6,7]. Much of this

variation is attributable to subtelomeric VSG arrays. For exam- 2.2. DNA sequencing, assembly, annotation and analysis

ple, in the strain used for the T. brucei genome project, TREU927,

VSG arrays comprise >50% of the larger homolog of chromosome For the genome assembly, DNA was extracted from BF try-

I [8]. By comparison, in the widely used experimental strain Lis- panosomes using QIAamp DNA blood mini kit (from QIAGEN),

ter 427, the larger homolog of chromosome I is ∼4× larger than and submitted to local sequencing centers for library prepara-

the smaller homolog [7]. Haploid sub-telomeric regions are appar- tion (fragments sheared to mean length ∼500 bp) and sequencing

6

ently repositories for various repetitive families, including VSG with Illumina (>35 × 10 76-bp reads) or 454 Life Sciences systems

6

genes and VSG pseudogenes, but how these sequences are captured (1.2 × 10 350 bp median length reads). The sequence reads were

and diversified, and the contribution of these sub-telomeric loci assembled de novo on a MacPro computer with 32 GB RAM using

to the generation of novel functionally competent VSGs (including DNASTAR SeqMan NGen software v 2.2 with default parameters

‘mosaic’ VSGs [9]), are not fully clear, partly because their extent (after a great deal of experimenting with various parameter modi-

and complexity are not well defined. fications) in four iterative cycles (the software could accommodate

7

The T. brucei genome sequencing project [10] assembled and only 10 sequences per assembly). The first assembly iteration used

6

annotated the majority of the chromosome-internal portion of 10 Illumina and all the 454 reads; the second and third iterations

6

the genome (the eleven ‘megabase’ ; MBC) in the each added 12 × 10 Illumina reads. The 454 reads were added

TREU927 strain. This project identified 806 VSG genes, most of again in a fourth iteration, which substantially improved the results

which were pseudogenes, with only 57 encoding all features con- in terms of average contig length and the presence of the VSG-

sistent with functionality [10]. These VSGs are mostly located in associated 14-mer motif.

arrays flanking the cores of the chromosomes, and their assem- To isolate minichromosomal DNA, PF and BF cells were embed-

bly into larger contiguous sequences was limited by repetitive ded in low-melting point agarose and digested with proteinase K in

components of the VSG clusters. The majority of sub-telomeric N-lauroyl sarcosine solution as in [6]. MC DNA was separated from

sequences, the intermediate-sized chromosomes (ICs) and all of the rest of the genomic DNA by either pulsed-field gel electrophore-

the numerous minichromosomes (MCs) [11] were absent from the sis (1% SeaKem Gold agarose in TBE (90 mM Tris-borate, 0.2 mm

assembled genome. A later, detailed analysis of an expanded 940 EDTA, pH 8.2) at 12 C, 2 V/cm with pulse time linearly ramped

VSG set from the TREU927 assemblies found only 43 functional 18–30 min over 80 h) followed by electroelution and ethanol pre-

VSGs, with a further 89 ‘atypical’ and 197 incomplete genes [12]. cipitation, or by rotating agarose gel electrophoresis [17] in TBE at

More recent assemblies (http://tritrypdb.org/common/downloads/ 12 C at 150 V with linear pulse time 10–40 s for 8 h, at 120 V with

release-5.0/) contain 1274 annotated VSG-like sequences (1000 of linear pulse time 100–300 s for 2 h, and at 50 V with linear pulse

which are pseudogenic), encompassing fewer than 200 putative time 1200–2500 s for 85 h, followed by purification of MC DNA from

complete or almost-complete VSGs, and revisions are continuing a gel slice using the QIAEX II gel extraction kit (from QIAGEN).

(http://www.genedb.org/Homepage/Tbruceibrucei927). Publically available software and servers for gene and signal-

Owing to the diversity of VSGs among various strains and sequence prediction were used as follows: Glimmer [18] at (http://

isolates of T. brucei, it is necessary to characterize the repertoire www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer 3.cgi); Gen-

in the strain being studied. The most widely-studied strain by eMark [19] (http://exon.biology.gatech.edu/gmhmm2 prok.cgi);

far is Lister 427, which is amenable to laboratory propagation OrfPredictor [20] (http://proteomics.ysu.edu/tools/OrfPredictor.

and genetic manipulation and also possesses many of the criteria html); SignalP v3.0 [21,22] (http://www.cbs.dtu.dk/services/

considered to be representative of natural infections, including SignalP). The ORF and translation and coordinates were coordi-

the recent demonstration of its ability to complete the life cycle nated with the contig sequences in Microsoft Excel. Contig and

in vitro [13]. The propagation history and properties of Lister 427 VSG sequences, contig coordinates, annotations and analytical data

lines are discussed in more detail elsewhere (GAMC, manuscript were maintained in a FileMaker Pro database, which allowed the

in preparation; http://tryps.rockefeller.edu/DocumentsGlobal/ appropriate sense-orientated DNA coding sequences and flanks

lineage Lister427.pdf). The current study was undertaken to deter- to be computed, displayed, exported and imported conveniently.

mine as much of the VSGnome of the Lister 427 strain as possible, Many parameters of interest were identified computationally,

to facilitate the identification, by high-throughput sequencing of either using ad hoc Perl scripts or by calculations within the File-

RNA isolated from trypanosome populations, of VSGs expressed Maker Pro database.

in various experimental situations, and to characterize the MC For codon-aware pairwise alignments, protein-based DNA

component of the VSG repertoire, which was hitherto unknown. alignments were made with transAlign [23]. dN/dS ratios were cal-

culated from codon-aware alignments of the coding sequences of

Lister 427 VSGs with their best matches from TREU927. Nonsyn-

2. Materials and methods onymous and synonymous substitution rates were estimated using

KaKs Calculator [24] with the MYN method of model averaging.

2.1. Strains The values for VSG copy number were determined by mapping

100-bp Illumina reads from whole genomic or minichromosomal

T. brucei Lister 427 (information about the identity and geneal- DNA onto the assembled VSGnome, using Bowtie version 0.12.7.

ogy of this strain is available at http://tryps.rockefeller.edu/ [25], allowing two or three mismatches and counting the align-

DocumentsGlobal/lineage Lister427.pdf) bloodstream-form (BF) ments assigned to each VSG. To account for the overlap of VSG

and procyclic-form (PF) cells were grown under standard culture sequences, two approaches were tested to estimate counts. In the

conditions in HMI-9 or SDM-79 media [14,15]. DNA from PF of T. first approach, only reads that mapped uniquely to a VSG were

brucei TREU927/4 (genome strain) and T. b. gambiense Dal 972 was counted. By fragmenting the VSGnome and mapping unique frag-

included in some experiments. BF Lister 427–501 is a line that was ments onto itself, the deficit between the theoretical number of

adapted some years ago to culture from a mouse infected with an matches and the mapped unique matches was used to derive a



ancestral sample of strain 427 isolated as ‘variant 3 at the Lister correction factor that could be applied to the experimental data.

Institute in 1962 and frozen in 1964. Variant 3 was transmitted The alternative was to map all reads and randomly assign those

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 61

Table 1

that map to one or more. The two approaches, whose veracity was

Newly sequenced and annotated unique complete and partial VSGs. Few (if any)

verified by simulations, gave similar results (data not shown). The

VSGs are unique throughout their sequences: unique was operationally defined as

number of copies of each VSG was calculated as the number of reads

being distinguishable when aligning 50-bp HTS reads allowing 2 mismatches. Num-

per kilobase mapping to each VSG divided by the value for a pool bers exclude 24 previously sequenced VSGs. Four previously known VSGs did not

of known single-copy VSGs. Making a similar calculation but taking fully assemble here but 20 did. An additional 45 would be classified as function-

ally complete except for having a non-canonical GPI site (not Aspartate, Serine or

the ratio of VSG hits to hits for a pools of single-copy chromosomal

Asparagine).

housekeeping genes gave similar results, but this method could not

be used for VSGs from purified MC. Similarity among VSG proteins Category Number Comments

was assessed using BLAST [26] version 2.2.18 (October 2008). ≥

Total 150 AAs 2563

Total ≥ 250 AAs 2447

Appear functionally complete 325 436–582 AAs

Extended or almost complete 211 Lack a few AAs at N- or

3. Results

C-terminus or slightly

extended at C-terminus

3.1. Assembling the genome and identifying VSGs N-terminal partial 250 AAs 918 250–561 AAs

C-terminal partial 250 AAs 228 250–575 AAs

Internal partial ≥ 250 AAs 798 250–578 AAs

To identify the VSGnome in a parasite of great experimen-

tal relevance, a baseline de-novo DNA assembly was performed

using DNA from a recently cloned sample of bloodstream-form

(BF) Lister 427. Total genomic DNA from this clone was sequenced

that contain frameshifts, but inspection of a large number of these

using Illumina and 454 Life Sciences high-throughput sequenc-

contigs suggests that is true for only a few of the partial sequences.

ing systems to a total coverage of ∼60-fold (based on a haploid

The association of VSGs with RHS/INGI/RIME [29,30] sequences

genome of ∼38 Mbp [6,27]). De-novo assembly yielded 14,821 con-

was not investigated in a systematic manner, but was an inconve-

tigs ranging from 150 to 83,202 bp (N 13 kbp). Contigs were

50 nience when sometimes these sequences were fused in frame to the

aligned against a local database of all available VSG sequences,

originally computed CDS. Such fusions were edited out manually

using the BLASTn algorithm [28]. 4265 contigs (>278 bp) contained

(they will now show up in the upstream or downstream regions).

−10

VSG-like sequence elements with e-values better than 10 . Glim-

In some cases, annotated VSGs from the TREU927 strain, which

mer and GeneMark were used to identify ∼16,000 open reading

were used for VSG motif identification and translation verification

frames (ORFs) > 149 bp in these 4265 contigs. Based on BLASTp

using BLAST, are also fused to RHS sequences. A few VSG sequences

−12

alignments (e-values better than 10 ) with all previously known

were also initially fused in-frame to the 177-bp repeat sequence

VSGs, GeneMark identified almost four times as many VSG-like

characteristic of MCs. As expected for a highly arrayed gene family,

ORFs as Glimmer, which only identified 11 subsequently annotated

many contigs contained multiple (up to 16) complete or partial

VSGs in addition to those identified by GeneMark. The BLASTp

VSGs. Fifty-three contigs (11–83 kbp) contained at least 10 VSG

alignments showed an enormous range of similarity with previ-

ORFs. Out of 492 ‘complete’ sequences that are not contained in BF

ously sequenced VSGs from other strains (predominantly TREU927)

VSG ‘Expression Sites’ (BES) or on MCs, 321 were contained in con-

and subspecies. A few were identical, surprisingly, but some were

tigs > 6 kbp. Some of these contigs contain several complete VSGs,

not obviously VSG-related by visual inspection. A BLASTp search

but they are interspersed with incomplete VSGs and very short

against all proteins in the TREU927 genome database confirmed

ORFs containing VSG motifs. The arrangement of VSGs within the

that all of the newly annotated VSG sequences aligned only to pro-

longer contigs showed no obvious pattern, and was not investigated

teins previously annotated as VSGs, VSG pseudogenes, or atypical further.

VSGs.

After coordinating the assembly and analysis information, all

3.2. VSGnome coverage

VSG ORFs >149 amino acids were manually reviewed, and putative

N-terminal signal peptides and C-terminal glycosyl phosphatidyl

Twenty of the 26 (77%) previously sequenced Lister 427 VSGs

inositol- (GPI) anchor signal sequences were identified by SignalP

were present as complete sequences in our de novo assembly. It is

[21,22] or by manual inspection (easy for VSGs because of the

likely that the remaining VSGs (VSGs 4, 6, 7–9 and 13; previously

extreme conservation of the GPI signal), respectively. The results

known as 117, 121 and MITat 1.7–1.9 and 1.13 respectively) failed

of the annotation were 2563 previously unsequenced VSG coding

to assemble to completion due to regions of identical sequence

sequences ≥450 bp, including 2447 genes encoding at least half the

shared with other VSGs, although there could be differences in

length of an average-length VSG. The median length of the con-

the completeness of some VSGs among different lines of the same

tigs containing these annotated VSGs was 5100 bp. The sequences

strain. In agreement with this interpretation, some of these VSGs

were classified as shown in Table 1, and unique sequential num-

(4 and 6, for example) are known from Southern blots [31] and

bers were assigned to each new complete or partial VSG (to avoid

sequencing [32–34], to be members of families containing regions

confusion, numbers that had been assigned previously, formally or

of closely related sequences. Similarity data for six VSG 4 (117)

informally, for VSGs of Lister 427, were skipped). Of these genes,

family members is shown in Table 2. The underlying nucleotide

325 (13%) encode what appear likely to be fully functional VSGs,

sequences of this family are highly similar [33,34]. Overall, there-

meaning that they have N-terminal and GPI signal sequence predic-

fore, we estimate that Lister 427 contains at least ∼2700 unique

tions and cysteine residues consistent with established groupings.

complete and partial VSGs (excluding exact copies: see below), but

Another 211 are complete except for an atypical GPI-anchor site or

the number of complete and presumptively functionally competent

a minor C-terminal extension, or lack a few amino acids at the N-

VSGs we identified is probably underestimated by as much as ∼20%.

or C-terminus. The remaining ORFs were classified as ‘partial’. N-

The precise total number of VSGs is not so important, as it differs

terminal and C-terminal partial sequences were designated as such

among clones and strains (see below). However, both the estimated

on the basis of a clear signal peptide for endoplasmic-reticulum

total number of VSGs (∼2700) and also the proportion that encode

targeting or GPI signal sequence, respectively. Those sequences

likely functional proteins (13–20%) are considerably higher when

designated ‘internal partial’ had no obvious secretory or GPI signal

assessing the Lister 427 VSGnome as a whole than estimates based

sequences. Some of the sequences annotated as partial may have

on TREU927 chromosome-internal copies [10].

been rendered as such because they are part of longer sequences

62 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

Table 2

Assembly of members of one family of highly similar VSGs. Eight highly similar

members of the VSG 4 (117) family were sequenced previously [33,34]. This table

shows the % amino acid identity of five independently assembled sequences from

the current study with various VSG 4 family members in the GenBank database.

VSG Assembled Annotation Prior GenBank % Identity

length entry

4 526 Incomplete in L34415 100

this study

342 528 Complete L31602/L31608 99/98

1506 381 N-terminal L31603 99

1927 245 N-terminal L31606/L31607 96/95

3102 416 Internal L31604 97

3739 273 Internal L31605 98

3.3. Many VSGs are present in multiple copies

A factor in calculating the VSG repertoire size that has not pre-

viously been evaluated is the presence of exact duplicate copies,

either due to allelic amplification, the presence of VSGs within

diploid regions of the chromosomes, and the uncertain ploidy of

MCs. This information often cannot easily be deduced from the

assembly data, but is highly amenable to estimation from read

depth at high coverage. To do this, short-sequence reads were

mapped onto all assembled VSGs and the normalized number of

reads per kilobase was calculated, after correcting for reads that do

not map uniquely to one VSG. About 2000 assembled VSGs exist as

single copies, 300 as duplicates and 60 have more than 2 copies that

are not distinguishable at the level of short-read sequence mapping

(Fig. 1A). A similar trend was observed for VSGs annotated Complete

(Fig. 1B).

Lister 427 is the major experimental strain for trypanosome

research and has been distributed to many laboratories over sev-

eral decades and propagated separately as BF and PF lines. Since the

VSGnome is likely to be one of the most rapidly evolving parts of

the genome, this raises the possibility that the VSGnomes of Lister

Fig. 1. VSG copies. All partial and complete VSGs (A) and only those that are judged

427 cell lines propagated in different laboratories, and as different

to be functionally complete (B) are grouped according to their average number of

life-cycle forms, may contain a different VSG repertoire. To investi- copies in four samples (a PF from BW lab, two samples of SM BF from Rockefeller

labs (SM HK and SM GHM) and a BF telomerase long-term deletion line (TERT KO).

gate these possibilities, we first compared our reference VSGnome

(from a BF ‘single marker’ (SM) line [35]) to an insect procyclic form

(PF) line derived from a common ancestral BF line at least 35 years

highly conserved motifs (AGTGTTGTGAGTGTG and TATAATAA-

ago (Fig. 2). Comparison of the VSGnome copy number between

GAGCAGTAAT, one or both of which are present in 83% of the 70-bp

these BF and PF lines shows evidence of both gene loss and dupli-

set from which these consensus motifs were extracted) coupled

cation events in PF relative to the BF assembly. Similar gene loss and

to varying numbers of tandem TAA triplets [38,39]. Although a

duplication was also seen comparing our reference to clone 501, a

 probe representing the 70-bp repeat hybridized with about 9% of

BF culture derived from ‘variant 3 , a 50-year-old cryopreserved

the clones in a cosmid library [5], 70-bp repeats were not found

sample of Lister 427 [16] that is progenitor of both lines shown in

upstream of all VSGs [37]. In the current assembly, 790 instances

Fig. 2 (Supplementary Fig. 1). From these data, we can deduce that 

of the 3 14mer were found in 724 contigs, of which 156 were

the VSGnome is evolving in axenic cultures of both BF and PF cells

within 100 bp of the end of a VSG CDS, as would be expected if

and that Lister 427 cell lines from different laboratories are likely

forming part of the UTR. Among all the annotated VSGs in the

to differ at many VSG loci. Some of the lost VSGs were on MCs (see

current study, one or both of the 70-bp repeat component motifs

below), but many were not. VSG 2 (widely known as 221), which is

were present in 697 contigs (54%) that contained 1288 annotated

expressed in most cultured lines of Lister 427 BF (and was the VSG

VSGs. In contrast, the 70-bp motifs were only found in 165/10,558

expressed when Lister 427 BF were first cultured in vitro [36]), was −10

 contigs (1.6%) that lacked identified (e ) VSG motifs, confirming

lost when adapting ‘variant 3 to culture, numbered as clone 501.

that the search motifs were largely specific to VSGs. The simi-

Comparison between lines of BF Lister 427 from two closely

larity of these sequences probably hindered their assembly into

linked laboratories illustrates that VSGs can also be readily lost

specific contigs, and prevented the assembly of many longer VSG-

and duplicated during routine culture of BF cells (Supplementary

containing contigs, and variation in their sequence and the number

Fig. 2), which is hardly surprising considering the opportunities for

of copies associated with different VSGs probably underestimates

recombination that VSGs and their flanking sequences provide.

their overall number. The consensus motifs were derived from a few

sequenced expression-site VSG copies and it is possible that there

3.4. VSG-associated motifs is wider variation among these sequences in non-BES VSG copies,

or that they are not associated with all VSG sequences, as previ-

Early studies revealed that many VSGs are flanked by an ously noted [37]. The identified motifs in our contigs were often



invariant 14-bp (GATATATTTTAACA) motif in the 3 UTR, and an far from the annotated VSGs. Another study [12], using data from

upstream heterogeneous ∼70-bp repeat [5,37], containing two conventional (long-read) Sanger sequencing, found that 92% (687)

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 63

Fig. 2. Fluctuation in the copy number of individual VSGs in long-separated PF and BF lines. Data are shown for 328 complete newly assembled VSG genes that are anticipated

to encode functional VSG proteins.

of VSGs were flanked upstream by at least one (74%) 70-bp repeat, part of the CDS. Even when considering local alignments, however,

and noted that the repeats were more degenerate than previously only 5% (146/2613) have regions that are >98% identical over 50 bp

recognized. (Supplementary Fig. 3). However, a small number are very similar;

Telomere repeats would be expected downstream of some VSG 20 of 2613 Lister 427 VSGs are >98% identical to ORFs in TREU927

genes, but would be linked to them during the assembly only if the across their entire length, 3 of which are 100% identical. Near iden-

intervening sequence is of sufficient complexity to allow assem- tical sequences are also seen between Lister 427 and other T. brucei

bly. However, due to the low sequence complexity in subtelomeric strains, including in T. brucei gambiense Dal927 (12 sequences >98%

DNA, repeats were absent from many VSG-containing identical to the 68 VSGs in the v4.2 assemblies), and between 427

contigs that were expected to be subtelomeric; for example, those and T. evansi sequences (4 > 98% identical).

that were known to be in BES, MES or on MCs. Altogether, only 68 By mapping the positions of these near-identical sequences

links were made between a VSG and telomeric repeats, but 42 of onto the TREU927 genome assemblies, it can be seen that they

these links were to a complete VSG. are not clustered in ‘protected’ locations in the genome, but are

found throughout the VSG arrays (Supplementary Fig. 4). There is

no correlation between conservation and distance from assembled

3.5. Similarities among 427 and non-427 VSGs

chromosome ends (Supplementary Fig. 5; Spearman’s rank correla-

tion rho = 0.054, p = 0.2). If there were considerable recombination

To analyze similarities, BLAST was used to compare 559 full-

among silent VSGs, then these very highly similar sequences could

length VSGs (and the proteins they encode) to themselves and

represent sequences that by chance have not recombined. How-

to 959 non-Lister-427 sequences publically available in VSGdb

ever, non-coding sequences alignable between Lister 427, TREU927

(http://leishman.cent.gla.ac.uk/pward001/vsgdb/kook index.html)

and Dal927 show ∼1 SNP per 100 bp (excluding indels). Assuming

[40] in June 2012 and TREU927 chromosome assemblies

that this proportion of SNPs is a reasonable estimate for divergence

(http://tritrypdb.org/tritrypdb version 5.0). When comparing

of VSG sequences, the probability of an individual orthologous pair

559 essentially full-length Lister 427 VSG CDS with all 2613 6

remaining identical between stains is <10 . This suggests that

annotated unique sequences obtained from our assembly, 42

these sequences are not conserved orthologs, but are the result of

showed ≥97% identity over a contiguous stretch of at least half

recent genetic exchange among strains–even between East African

their length with another Lister 427 VSG. When compared to 959

and West African trypanosome subspecies.

non-427 VSGs, most from the TREU927 strain, 38 showed better

than 97% identity over a contiguous stretch of at least half the

length of another VSG. 3.6. The VSGnome is under purifying selection

To compare sequences across their entire length, all predicted

Lister 427 VSG nucleotide and protein sequences were aligned to VSG is in the major surface antigen of T. brucei and is in direct

their best match in TREU927 (Fig. 3). The majority of sequences are contact with the host immune system. VSG sequence diversity is

very dissimilar to their closest match (<50% identity across CDS; huge (see Fig. 3; Supplementary Fig. 3), presumably because the

<35% protein sequence identity). This is expected if there is recom- mechanism of antigenic variation on which the parasite relies

bination among VSGs, as the nearest match will only correspond to requires there to be little cross-reactivity between expressed

64 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

200 2613 set 559 set 150

100 Number

50

0 020406080 100 Aligned identity, 427 vs. 927 [%]

150

100

Number 50

0 020406080 100

Aligned identity, 427 vs. 927 [%]

Fig. 3. Similarity between Lister 427 and TREU927 VSG nucleotide and protein sequences. Results are shown for the best matches of each Lister 427 VSG with the TREU927

strain for all annotated 427 VSG genes or partial genes (2613 set) and a subset (559 set) that appear to be full length. There is no significant difference between the two sets.

As expected, the sequence conservation is greater for the DNA than for the protein.

variants. Given these conditions, it might be expected that VSGs against the huge sequence diversity observed in the family. It also

would be under diversifying pressure; that VSG genes would be implies that VSG sampling is sufficiently high relative to mutation

fixing nucleotide substitutions that alter the protein sequence rate for effective selection, despite only one VSG being expressed

(non-synonymous mutations) faster than those that do not (syn- at any time.

onymous mutations). However, comparing members of the Lister

427 VSGnome to their closest relatives in TREU927 shows that

VSGs are actually under purifying selection, with a median dN/dS

3.7. VSGs have an unusual codon-usage bias

ratio of 0.45 (Fig. 4). Hence, ∼50% of non-synonymous changes in

VSGs are removed by selection, implying that there are significant

VSG is a hugely abundant protein, comprising ∼10% of the total

restrictions on changes tolerated at a specific site on a VSG, even

protein content of BF T. brucei [41]. Given this level of expres-

sion, it is possible that VSG codon usage might be adapted for

efficient translation of VSG mRNA. Synonymous codon usage in

highly expressed ‘housekeeping’ genes (such as those encoding

␣ ␤

/ -tubulins or ribosomal components) has been shown to be

biased toward tRNAs with higher gene copy number in T. brucei,

T. cruzi and Leishmania [42]. Analysis of 368 full-length VSGs shows

a substantial codon-usage bias (Fig. 5; Supplementary Fig. 6). How-

ever, the bias in the VSG set is distinct from that found in highly

expressed housekeeping genes, which is of the same type as also

seen in pol II-transcribed genes annotated ‘hypothetical’ (having

no identifiable function), but to a lesser degree. VSG codon-usage

bias is more pronounced than for highly expressed genes, show-

ing a strong bias toward A or C at the third position (in contrast

to the G/C biases seen for other genes). The significance of this to

the biological role of VSG is not currently known, but it is appar-

ently not a product of being pol I-transcribed nor due to an intrinsic

nucleotide bias in the BES, since BES-associated genes (ESAGs) show

a codon-usage bias similar to that of the rest of the genome. It is

Fig. 4. Evidence for purifying codon selection in the VSGnome. Ratio of the rate

also not readily ascribed to the influence of recombination among

of non-synonymous to synonymous changes (dN/dS) inferred from codon-based

silent VSGs, since the nucleotide bias is not equally present in both

alignments of Lister 427 VSGs to their best match in TREU927. Only VSG pairs with

≥60% aligned identity were considered. sense and antisense strand.

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 65

2.0 VSG ESAG annotated housekeeping hypothetical 1.5 high expressed ribosomal protein

1.0

0.5 Usage bias

0

-0.5

-1.0 C T A G C T A G C T A T C A T G A G C T A G A G C T A G C T

AA GC CCA GC GC GC TGC TGT GAC GAT GAA GAG TTC TTT GGA GGG GGC GGT CA CA ATA ATC ATT AAG CT CT CT CT TTA TTG AAC AA CA AGA AGG CG CG CG CG AG AG TC TC TC TC AC AC AC AC GTA GTG GTC GTT TAC TAT TA TA TGA CCG CCC CCT CA

A C D E F G HPI K L N Q R S T V Y *

Fig. 5. Codon-usage bias among different gene types in T. brucei Lister 427. Shown are data for 559 functional or near-complete VSGs (VSG), 115 expression-site associated

genes (ESAG), 3070 annotated polII-transcribed genes (annotated housekeeping), 5576 hypothetical genes, 54 genes estimated to have >150 mRNA molecules per cell [82]

(high expressed) and 156 genes encoding ribosomal proteins (ribosomal). Codon-usage bias is the fold under- or over-representation of a given codon relative to its set of

synonymous codons (0 = no bias).

This strong codon bias is similar regardless of whether we con- A3 and (B1 + B2), respectively (data not shown). As previously, no

sidered the whole VSGnome, only those in BESs or on MCs, or VSGs good evidence for a separate ‘nC’ VSG N-terminal class was seen

known to be expressed in TREU927 (Supplementary Fig. 7A). How- [12], which is instead a subclass of the A-type domain.

ever, it is also seen (to a lesser degree) in the genes thought to The shorter C-terminal domain was also analyzed (data not

encode the coat of the epimastigote form (BARP). Interestingly, the shown), with similar results to those presented previously [12].

same or very similar biases are also present in the gene families Supplementary Fig. 9 shows the domain architecture (N- and C-

believed to encode the major surface proteins of both T. congolense terminal domains) for all of the analyzed full-length VSGs.

and T. vivax (Supplementary Fig. 7B and 8). Again, in these species,

these biases are different from those apparent in predicted high-

expressed genes. 3.9. Metacyclic (MES) and bloodstream expression site (BES) VSGs

Of fourteen VSGs previously associated with BES sequences in

3.8. Characterization of VSG N- and C-terminal domains a cell line derived from Lister 427 [46], eleven were assembled as

complete VSGs in the current project. Six contigs (Supplementary

The 368 apparently complete VSGs from Lister 427 were Fig. 10), ranging from 7197 to 44404 bp, contained a complete VSG

compared to all non-pseudogenic VSGs encoded in TREU927 downstream of a metacyclic promoter, with no intervening ESAGs.

and T. brucei gambiense Dal972 genome assemblies (v4), as Five of these MES contain VSGs (numbered 397, 531, 639, 653 and

well as 38 VSGs from other strains/species available from VSGdb 1954) identical to those expressed when a PF line of Lister 427 was

(http://leishman.cent.gla.ac.uk/pward001/vsgdb/kook index.html) differentiated to metacyclics in vitro [13]. The structure of the sixth

[40]. The level of sequence divergence for the majority of VSGs and longest contig was atypical of a MES: it had additional ORFs

makes the production of high-quality multiple-sequence align- encoding partial VSGs downstream of the intact VSG (582) that was

ments problematic, so a technique based on clustering sequences immediately downstream of the promoter, suggesting recombina-

based on all-against-all BLASTp local alignment scores was used tion between a VSG array and this MES locus had occurred. This

[43]. Independent comparisons were made for N- and C-terminal probably explains the lack of expression of this VSG during induced

VSG domains of all sequences. metacyclogenesis [13]. The MES structures upstream of the pro-

Comparison of N-terminal domains recapitulated their gross moter (Supplementary Fig. 10) have similar characteristics to three

division into two major classes: A and B, with A being the more previously described MES [47–50]. 70-bp repeats are absent (or

diverse in sequence (Fig. 6) [12,44]. Both A and B show clear divi- highly degenerate) and, in some cases, there are a few random par-

sion into sequence sub-classes. By extension of the nomenclature tial ESAG1 and VSG pseudogenes upstream of the promoter. In one

of Marcello and Barry [12], we name these types A1, A2, A3, B1, case there is a large region identical to chromosome 9 upstream

and B2. By using weighted scores for all possible pairwise combi- of the functional MES promoter, suggesting this MES might be at

nations, our method is more robust than clustering based only on one end of chromosome 9. In one case, there is an unusually long

BLAST hits, which is very sensitive to the applied threshold for hit distance between the MES promoter and the downstream VSG.

detection. This can be seen in that the TREU927 sequences classi- Fig. 7 shows an alignment of all currently known MES pro-

fied as types N1, N2, N3 and N4 [45], map onto sub-classes A1, A2, moter sequences, together with the three other RNA Polymerase

66 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

Fig. 6. N-terminal domain types. Clustering of N-terminal domain types of full-length VSG proteins using weighted BLASTp scores for all possible pairwise combinations.

The analysis shows 559 apparently complete or near-complete Lister 427 VSGs together with 138 non-pseudogenic TREU927 VSGs annotated in VSGdb and 28 annotated

VSG from the T.b. gambiense Dal927 genome (TriTrypDB v4.2).

I promoters known in T. brucei, used for transcription of rRNA, Telomeres account for ∼0.9% of the reference BF Lister 427

Procyclin and BES loci. The BES and MES consensus sequences are genome (Fig. 8 and Supplementary Table 1). Similar abundance was

distinct but more similar to each other than either is to the rRNA found in BF 501 cells (∼1.2%) and in PF TREU927 (0.5%) and T. brucei

or Procyclin promoters. The promoters driving mVSGs 582 and 639 gambiense Dal972 (∼0.7%), which also have fewer MCs (see below).

are identical to the Lister 427 MES promoter sequences previously However, telomere repeats are much more abundant in the PF BW

identified as 4–49 and ES4 respectively [51,52]. Lister 427 line, presumably as a consequence of continuous telom-

The small subsets of BES- and MES-encoded VSGs show no sig- ere growth during long-term propagation. Given a total nuclear

nificant bias toward particular N- or C-terminal types (Fig. 7). This DNA content of ∼78 Mb (see below), consisting of 11 diploid chro-

is consistent with metacyclic VSGs being normal members of the mosomes and ∼100 haploid MC and IC, the ‘telomere-rich’ lines

VSGnome, which differ only in being in MESs. have an average telomere length of ∼11 kbp. These long telom-

eres are also apparent in read-depth analysis of MC-specific DNA

isolated from pulsed-field gels, for which the same ratio of telom-

ere abundance among different cell lines was observed (Fig. 8 (SM

3.10. Variation in the quantity of telomere repeats among

HK MC and PF BW MC) and Supplementary Table 1), as expected

different lines of Lister 427

if telomere lengths from both MCs and MBCs are similar. Inde-

pendent estimates of telomere length in this MC-specific DNA

Because of the proposed [53] and demonstrated [54] linkage

agree well with that for whole-genome libraries (∼12 kbp for PF

between telomere length and VSG switching frequency, we exam-

Lister 427, based on MCs of average 75 kbp length and compris-

ined the representation of telomere repeats in various lines and

ing 55% 177-bp repeat). Surprisingly, however, telomere lengths

samples in our DNAseq data. Repeat composition was estimated

in PF lines were drastically reduced (to ∼30% of original length)

from the percentage of the total reads that were entirely composed

after sub-cloning (compare the 3 PF subclones, mc1-mc3, of the

of each repeat sequence. The read and target database lengths were

original PF BW MC sample in Fig. 8 and Supplementary Table

coordinated so each read could only align once with its target.

1).

Only the first 50 bp were mapped, because of a more pronounced



In agreement with previous observations [55], read-depth anal-

drop in sequence quality that was observed toward the 3 end

ysis shows a drastic reduction of telomeric repeat length during

of telomere-repeat reads compared to 177-bp-repeat reads. There

null

propagation of TERT lines (BF T. brucei lacking the catalytic sub-

was no significant difference in the number of telomere repeats

unit of telomerase) (Fig. 8, five t7 clones). Long-term propagation of

mapped when either two or three mismatches were allowed, which

null

the TERT line results in telomere stubs of apparently minimum

supports the assertion that quality was not an issue in mapping

length for cell viability [55], and also leads to a great decrease of

the first 50 bp. This suggests that our estimates for the amounts of

177-bp repeats (Fig. 8, T7-2 line).

telomeric DNA are quite accurate.

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 67

Fig. 7. Alignment of Metacyclic Expression Site (MES) promoters. The consensus residues shown in red are either identical in at least 17 of the 18 MES promoter sequences.

Residues that match the consensus are hidden. Yellow shading highlights residues that are identical to the 427 BES. All sequences end in either CA or CAA, so CA could

be a consensus transcription-initiation point. Sequences not identified in the present study are from the following sources (with GenBank accession numbers if available):

EATRO 1125 (X05710) [83]; TREU927 GUTat 10.1 [84] (AC087702); TREU 795 ILTat 1.22 (AJ012198 or S79897) and 1.61 (AJ012199) [85]; ILTat 1.63 and 1.64 (AJ486954 and

AJ486955) [86]; WRATat series MVAT4, MVAT5 and MVAT7 (AF068693, S80865 and U83435) [87–89]; AnTat 11.17 [90]; Lister 4274-49 [51] is identical to Lister 427 VSG

582; ES4 (identical to Lister 427 VSG 639) and ES7 (partial, not shown, but its 46 bp are identical only to Lister 427 VSG 397, ILTat 1.63 and EATRO 1125 X05710) from Lister

427 (incorrectly called EATRO 427 in [52]) were a personal communication from Frederic Bringaud in October 2003. The 427 BES, rRNA and Procyclin promoter sequences

are from the trypanosome genome project, but with boundaries informed by [91–93]. Not all promoters have been shown to be functional in actual metacyclic forms, and

some sequences could contain errors.

Fig. 8. Abundance of telomere and 177-bp repeat sequences in the genome and in minichromosomes. The left panel shows the amount of repeats as a percentage of total

DNA. The right panel, on a different scale, shows the results for various MC DNA samples. Line 501 was a culture derived from a 50-year-old cryopreserved sample of Lister

427 [16] that is progenitor of all Lister 427 lines used here and elsewhere. 501 cl2 is a further subclone of 501. Samples labeled SM are the BF ‘single marker’ line [35]; the

null

suffixes indicate different branches of this line. All PF samples were prepared in the laboratory of BW. The t46 sample is from the TERT line, at an intermediate stage of

null

telomere loss, as are the five t7 subclones. Sample T7-2 comes from TERT cells that have been maintained to the point where they have stable telomeres of minimum size [55].

68 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

3.11. Quantification of minichromosomes 177-bp repeats that are present on MC. Minichromosomal VSGs

were identified by sequencing MC from PFGE separations of

MC DNA was isolated from five T. brucei Lister 427 lines: a BF chromosomal DNA from two distantly separated PF and BF lines,

SM cell line; two distantly related PF cell lines; and three subclones and mapping the reads to the VSGnome. The purity of the MC

made from one of these PF lines. The DNA sequences from these fraction, defined by the absence of DNA fragments from larger

cell lines were mapped against the Lister 427 VSGnome, and also chromosomes, is exemplified by the low abundance of DNA hits

against the TREU927 genome, to gauge the (low) level of contam- from this fraction against VSGs that are known to reside on large

ination by fragmentation of MBC, as exemplified by the values for chromosomes (for example, the well defined ES VSGs [46]), and

tubulin and rRNA genes shown in Supplementary Table 1. the low representation of repeated genes such as ␣-tubulin and

177-bp repeats comprised ∼5.4% in four samples of the Lister the ribosomal RNA promoter (Supplementary Table 1).

427 DNA (Fig. 8 and Supplementary Table 1), equivalent to 4.2 Mb MC DNA from PF and BF cells was also assembled de novo, after

of 78 Mb of nuclear DNA. Because there is a little heterogeneity removing repetitive sequences, which constituted a large fraction

in the sequence of known 177-bp repeats [56], the quantification of the DNA of purified MC, and experimenting with different assem-

was determined using 0–3 (3 being the maximum allowed in the bler parameters. The rationale for doing this was that we would be

Bowtie software) mismatches. The number of matches approached able to complete some VSGs that were previously only partially

a plateau when three mismatches were allowed, suggesting that assembled, because of the lower complexity of the MC fraction.

most of the 177-bp repeats have been included in the final value, Indeed, the ORFs for nine MC VSGs that were previously truncated

although there could be a minor population of repeats (for which were now assembled into complete VSGs, raising the number of

we have no evidence: these repeats appear to be well conserved complete MC VSGs to 50, from a total of 67 distinct MC VSGs. As for

among strains and subspecies) that differ by >3%. From fine- BES and MES VSGs, representatives of all VSG types are present on

resolution mapping studies, it has been estimated [56] that 177-bp MCs (Fig. 6) with no evidence of systematic bias. However, the pro-

repeats comprise about 55% of the average MC, so the aggregate MC portion of complete VSGs is much higher in the MC set compared

length would be about 7.6 Mb. Assuming a mean MC size of 75 kbp, with the genome as a whole (74% and 14%, respectively). Hence

the repeat quantification suggests that there are about 96 MC per MCs possess a repertoire of VSGs that is distinct from those in the

nucleus (after correcting for the ∼5 intermediate chromosomes chromosomal internal arrays.

(IC), which also contain 177-bp repeats). Because some VSGs that By DNAseq mapping, 66 copies of 52 distinct VSGs were identi-

are only present on MC have only one copy per nucleus and we fied in the BF MC sample, and 64 copies of 52 VSGs in the PF sample.

found total of ∼65 MC-associated VSGs (see below), not all MCs con- This shows that there must be at least 33 MC, if all of them had a

tain a VSG and MC-VSGs must be also essentially haploid, similar to VSG at each end. Other calculations (see below) suggest there are

BES-VSGs. approximately 96 MC, in agreement with estimates of ∼100 made

This estimate of the number of MC, while subject to the assump- from PFGEs [11] which would mean that approximately one third

tion of their average length, which is based on high-resolution of MC ends contain VSGs.

pulsed-field gel electrophoresis (PFGE) analysis [56], is consistent 18 copies of 16 VSGs were found in the MC of the BF sample that

with the often cited “rough estimate” [11] of ∼100 MC, ranging from were not in the PF sample (Fig. 9). Conversely, 15 copies of 15 VSGs

50 to 150 kbp. If the amount of 177-bp repeats mirrors the num- were found in the PF sample but not in BF samples. VSG 17, which

ber of MC, strains TREU927 and T. brucei gambiense Dal972 would exists as a single copy on MC in a PF sample from the Wickstead lab,

have about 60 MC, which is consistent with PFGE analyses show- is on a 180-Kb IC BES in the BF SM sample from the Cross & Rudenko

ing that several T. brucei gambiense isolates, including Dal972, have labs [46]. VSG 19, which is present only in a >3.1 Mb BES from the

fewer MC than T. brucei isolates [57–59], and is consistent with our Cross & Rudenko labs [46] is present as two MC copies in PF and BF

estimates from PFGE. SM lines sequenced in the present study. Moreover, sub-cloning a

The increase in the percentage of 177-bp repeats in total PF line substantially alters the MC VSG profile (Supplementary Fig.

null

genomic DNA after subcloning the TERT line when its telomeres 11). These data suggest that the MC VSG repertoire is particularly

had not quite reached their minimum length (Fig. 8, samples 7 cl1 dynamic, whereas the number of MC, as judged by the comparable

to cl5) is unexplained, but possibly attributable to elevated levels amount of 177-bp repeat (Fig. 8 and Supplementary Table 1), is

of recombination among MCs as telomere shortening approaches a not so variable, suggesting that subtelomeric regions of MCs are

critical point, similar to the observation that recombinational VSG frequently recombining.

null

switching increases in TERT cells with short telomeres [54,60]. Independent confirmation that the calculated copy numbers

It could represent circularization or fusion of 177-bp sequences truly reflect the distribution of sequence reads across the MC VSGs

as MC destabilize, and could be related to the unexplained new is shown for a small region of the MC VSGnome by displaying the

band of likely circular DNA that was noted previously during gel alignments with the Integrative Genome Viewer [62,63] (Supple-

electrophoresis of DNA from cells with short telomeres [55]. mentary Fig. 12).

null

The two copies of the rRNA promoter previously reported in two In BF cells grown in the absence of telomerase (TERT ) until

MC [61], were also found by DNAseq of the PF MC fraction, but were telomeres had stabilized at the apparent minimum sustainable

absent from MC prepared from the BF SM cell line (Supplementary length [55], quantification of the 177-bp repeats suggested that

Table 1). two thirds of the MC had been lost (Supplementary Table 1), and we

could determine that, for 26 VSGs that were found exclusively on

3.12. Identification and stability of minichromosomal VSGs MC, 32 copies of these 26 VSGs had been lost, amounting to about

50% of the MC VSG repertoire of the parental line (Supplementary

Only seventeen full-length MC have been characterized in any Fig. 13). The number of MC copies lost will be slightly larger, as the

detail in T. brucei [56]. They have a uniform organization, consisting loss of some multi-copy VSGs found on both MC and MBC or IC was

of a palindromic region of 177-bp repeats flanked by non-repetitive not be assumed to represent loss of the MC copies. Also included

regions, suggesting that most or all might have a VSG gene at one or in Supplementary Fig. 13 are the 50 most abundant non-MC VSGs

null

both ends (only two VSGs were specifically identified). Attempts to in the TERT clone; no significant changes are seen in these VSGs

null

identify MC VSGs bioinformatically, based on their association with in the TERT line, probably because most are telomere-distal. In

177-bp repeat sequences, were unsuccessful, probably because a few cases, VSG abundance changes. This could reflect destabiliza-

of difficulties in assembling the large runs of almost identical tion of repeated sequences (note the variation in copies of tubulin

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 69

10

9

8

7

MC BF 6 MC PF

5 COPIES

4

3

2

1

0

23 30 25 31 19 17 16 22 392 645 514 503 600 476 618 622 637 444 646 371 430 424 671 649 672 663 416 365 629 647 620 541 377 717 542 643 634 575 567 315 507 369 374 666 1117 2114 2117 1621 2075 1712 3327 1640 3562 2096 1748 1440 1638 1706 2348 2057 3338 1849 1490 1503

VSG 1441 3340 3777 1963 3406

Fig. 9. Abundance of minichromosomal VSGs in diverged BF and PF cell lines. The data are shown as a stacked histogram (blue, PF; red BF) arranged in order of VSG abundance

in PF. Not all copies of multi-copy VSGs are necessarily on MCs.

and rRNA promoters among some lines in Supplementary Table overall number of assembled VSG sequences was much smaller

1) in the absence of TERT and/or as a consequence of subcloning or than might be expected to exist. Although a BAC-based sequencing

otherwise stressed culture conditions. Of the 14 identified BES VSGs approach might be ideally suited for assembling VSGs, which can

null

[46], only VSG 2 was missing from the TERT line, which expresses exist as linked or scattered genes with high similarity, this is

VSG3 from the BES previously occupied by VSG2. The active ES is very unlikely to happen because of the very high cost of such a

especially unstable in the absence of telomerase activity [60]. Both project. Meanwhile, the evolution of HTS methods will continue

copies of VSG 19, which are on a MC in the sequenced line but pre- to improve the ability to assemble large regions of contiguous

viously found only in a >3.1 Mb BES [46], were missing from the sequence, although homolog switching will still prevent the

null

TERT line. assembly of individual chromosomes from a diploid organism.

One MC contig contained a glycosyltransferase (probably galac- The main advantages of current-generation HTS approaches,

tosyl, or N-acetyl-glucosamine glycosyl transferase) gene <500 bp which have further increased since this project was initiated, are

upstream of the VSG (VSG-392). The association of this family of cost and speed, which allow one to cheaply obtain and compare

glycosyl transferases with VSG arrays has been noted previously sequence data from many strains and clones. However, the main

[10]. disadvantage of the HTS approaches available to this project was

the difficulty of correctly assembling reasonably long contigs from

4. Discussion short reads. This is particularly true for the several thousand VSG

genes or partial genes in the T. brucei genome, where many are

There are advantages and disadvantages to attempting almost identical from end to end and almost no genes are unique

to assemble the VSG repertoire using either Sanger or next- throughout their length. Three independent parameters suggest

generation high-throughput sequencing (HTS) methods. In the that the 559 essentially full-length VSGs identified in the current

T. brucei genome project, chromosomes 1, 9, 10 and 11 were study underestimates the total that are full length by as much as

sequenced from libraries prepared from chromosomal 20%. First, 6 of the 26 previously sequenced Lister 427 full-length

DNA bands separated by PFGE. Chromosomes 2–8 were assembled VSGs available in GenBank were only present as partial sequences in

by sequencing tiled Bacterial Artificial Chromosomes (BACs). The the de-novo assembly (these were not re-annotated or included in

first approach provides the sequence of only one chromosome the GenBank submission), most probably because they contained

homolog and the second approach results in the joining of BACs regions of identical sequence shared with at least one other VSG.

derived from either of the two homologs, resulting in a pseudo- Secondly, de-novo assembly of 100-bp paired-end reads performed

chromosome sequence. None of the libraries included MC, and the on a PF MC fraction resulted in the upgrading to full-length of 9

70 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

VSGs that were previously incomplete, in a total of 52 MC VSGs, attributable to the loss of a full MC. Based on the abundance of

presumably because the MC fraction contained few highly similar the 177-bp repeat, our data are consistent with previous studies

VSG sequences that would have prevented the assembly of com- suggesting that T. b. gambiense contain fewer MC [57–59].

plete sequences. Thirdly, 117 of the curated partial VSG sequences Our observations suggest that stressed culture conditions or

started or ended at the beginning or end of their respective con- cloning, which itself might be stressful (perhaps especially with PF

tigs. This number includes 58 of 945 partial N-terminal sequences [65]), can lead to loss of telomeric repeats that is not attributable to



that ran to the 3 end of their contigs (contig length ranged from loss of MCs. Indeed, we saw evidence for changes in other repetitive

849 to 12,727 bp) and might therefore have become complete if the sequences, including the number of arrayed tubulin, histone and

contigs had extended further. slRNA genes, for example (GAMC unpublished data), in different

As also noted in a previous study [12], we found about four times lines and subclones. It has been reported, recently, that double-

as many partial VSG sequences encoding VSG N-terminal domains strand breaks occur naturally at silent and transcribed telomeres

than sequences encoding C-terminal domains. Of special interest [66]. Such breaks probably contribute to the instability of the sub-

were some contigs containing several VSGs. Contigs encoding many telomeric VSG repertoire.

N-terminal domains may reflect the evolution of T. brucei VSGs from Strikingly, and surprisingly for the most highly-expressed gene

shorter ones in other species, as suggested in a major analysis of VSG in the genome, our analysis of codon usage shows that the VSG-

sequences from several trypanosome species [44]. nome of T. brucei has a strong codon-usage bias that is distinct

The number of VSG copies was determined by aligning 100-bp from other highly expressed genes. This bias is the same in both

DNA sequences against the compiled database of VSG genes and known expressed VSGs and in those that do not encode a func-

counting the number of sequences that aligned to each VSG, after tional VSG, but is not seen in other BES-associated genes. It is not

compensating for non-uniquely aligning sequences. Although most currently known what causes this bias, but it may be a signature

VSGs are present as single (haploid) genes, About 360 exist as two for coat genes across African trypanosomes, as it is also seen in the

or more identical (within the 2% mismatch allowance used to map epimastigote coat BARP genes and also VSG-like gene families in T.

the DNA sequences) copies, something that has not been evident in congolense and T. vivax. It is possible that non-optimal codon usage

previous studies. From the present study, we cannot say whether may slow translation of surface proteins in African trypanosomes

multiple identical VSG copies are linked in the genome, because and so improve folding and/or post-translational modification, as

these would probably not have assembled independently using the has recently been described for the FRQ protein in Neurospora [67].

current technology. Another recent study has shown that the presence of rare codons

Different strains of T. brucei vary in their total nuclear DNA con- at the N-terminal end of coding sequences results in higher protein

tent, as estimated by PFGE of chromosomal DNA, ranging from expression, and that the effect correlated best with a calculated

62 Mb in the ‘official genome strain’ TREU927 to 79.4 in strain 247 reduction in mRNA secondary structure, but that this was probably

[6]. The lengths of the 11 fully sequenced megabase chromosomes not the complete explanation [68].

of TREU927 add up to 26.3 Mb, remarkably close to the PFGE- Another functionally important subset of VSGs are those

estimated diploid MBC content of 53.4 Mb [6]. The corresponding expressed in metacyclic forms. The current assembly identified six

value for Lister 427 chromosomes was 70 Mb by PFGE analysis, contigs containing metacyclic promoters with full-length VSGs. One

leading to an estimated total nuclear DNA content of approximately of these was atypical but the others are likely to be true subtelom-

78 Mb, when intermediate-sized and 50–100 MC of 50–100 Kb are eric metacyclic ESs. This was confirmed in a study that showed, in

included [7]. This is in agreement with the cytophotometrically contrast to general prejudice, that PF of Lister 427 can be induced

determined nuclear content of 95 fg DNA [27], equivalent to 76 Mb to develop into epimastigote and mouse-infective metacyclic life-

[6]. cycle stages in vitro, where they express five VSGs whose cDNA

There were originally estimated to be about 100 MC in the Lis- sequences were identical to the assembled genomic copies [13].

ter 427 strain [11]. Despite the authors’ concern about “Obvious Our analysis of N-terminal domains suggests that the metacyclic

uncertainties in this calculation. . .” this value is in close agreement VSGs (and also those found in minichromosomes) are no different

with our independent estimate made from HTS read-depth. More from the general VSG population.

significantly, MCs have been widely suggested to be repositories BES-VSGs and many VSGs in arrays are flanked upstream by 70-

of VSGs, and therefore an important contributor the expressed VSG bp repeats, which are separated from the VSG itself by a region of

repertoire in the early stages of an infection, by having high prob- apparently unconserved sequence, called the co-transposed region

ability of transposition to the BES [4]. The current study suggests, (CTR) [69]. The CTR contains no obvious short motifs or other con-

however, that not all MC harbor VSGs, and certainly not at each served characteristics or defined function, although inadvertent or

end, as has been widely assumed. Only about one third of MC ends calculated manipulation of the CTR of the active VSG can affect

contain VSGs, but 15–18 of these VSGs were unique to one of two VSG switching [70], and transcripts from this region are processed

distinct lines of Lister 427, suggesting a dynamic flux of VSGs in and associated with polyribosomes [71]. Alignments of the CTRs

MC. It is striking that 85% of the MC VSGs were complete, com- associated with many complete VSG genes in the current study

pared to 20% in the overall genome. This draws attention to the did not illuminate their structure or function; predicted CTR sec-



unresolved question of the origin of MC; whether they are formed ondary structure was not investigated. Recombination at the 3 end

during displacement of a currently expressed and functionally val- of the VSG can be almost anywhere within the highly conserved

idated VSG during recombinatorial switching. VSGs from MC and sequences within the C-terminal coding sequence or immediately

BES are expressed early in an infection, presumably because they downstream, and point mutations occur during this event [9,72].



are more likely to recombine into the active ES because of the extent The role of the invariant 14-mer in the 3 UTR is unknown.

of their common flanking sequences [57–59]. We did not attempt to predict splicing and polyadenylation sites

By following the loss of three MC that had been tagged with a for the VSG repertoire. From previous sequencing of VSG mRNAs

selectable marker, MC have been reported to be quite stable (at in several laboratories, it is clear that both UTRs are usually very

−3

least in PF cells), being lost at a rate of <10 per cell per genera- short (∼50 bp), but there are exceptions. As noted elsewhere [73],



tion [64]. Unfortunately, we do not know how many cell divisions the VSG 3 splice site frequently does not conform to the consensus

separate the PF and BF samples examined here, which involved cell motifs (often lacking a clear poly-pyrimidine region) that are com-

lines whose common origin was at least 35 years ago, so it is not mon to most genes and which appear to strongly influence splicing

possible to estimate the rates of VSG loss or the number which are efficiency and, hence, protein expression.

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 71

The current study was initiated to facilitate the identification of Appendix A. Supplementary data

VSGs expressed in populations arising during propagation of Lister

427 under a variety of conditions, especially in vivo and in experi- Supplementary data associated with this article can be found,

ments in which various manipulations affecting VSG recombination in the online version, at http://dx.doi.org/10.1016/j.molbiopara.

frequency, for example, lead to VSG switching. Lister 427 has the 2014.06.004.

reputation of having a very low VSG switching frequency (less than

−5

10 detected switches per cell generation [54,74–76]) and this

probably results from its long-ago selection for virulent growth in References

laboratory rodents, and/or because the variants that do emerge as

[1] Franke E. Ueber trypanosomentherapie. Muenchener Medizinische Wochen-

switched populations are so virulent that more frequent switches

schrift 1905;42:2059–60.

that result in a growth disadvantage are not detected. Because of its [2] Morrison LJ, Marcello L, Mcculloch R. Antigenic variation in the African try-

virulence, mouse infections with Lister 427 rarely exceed 14 days, panosome: molecular mechanisms and phenotypic complexity. Cell Microbiol

2009;11:1724–34.

and, although gene manipulation could be used to generate attenu-

[3] Horn D, Mcculloch R. Molecular mechanisms underlying the control of anti-

ated lines, our studies have focused on switching among VSG genes

genic variation in African trypanosomes. Curr Opin Microbiol 2010;13:700–5.

expressed in the early acute infection. The switches that we have [4] Barry JD, Hall JPJ, Plenderleith L. Genome hyperevolution and the success of a

parasite. Ann N Y Acad Sci 2012;1267:11–7.

characterized in recent experiments [54,76] have all resulted in the

[5] Van Der Ploeg LHT, Valerio D, De Lange T, Bernards A, Borst P, Grosveld FG. An

expression of complete pre-existing VSGs. A recent major study [9]

analysis of cosmid clones of nuclear DNA from Trypanosoma brucei shows that

shows that, in a more natural chronic infection with the TREU927 the genes for variant surface glycoproteins are clustered in the genome. Nucl

Acids Res 1982;10:5905–23.

strain, the majority of expressed VSGs are mosaics that are con-

[6] Melville SE, Leech V, Gerrard CS, Tait A, Blackwell JM. The molecular karyotype

structed by the segmental gene conversion between two or more

of the megabase chromosomes of Trypanosoma brucei and the assignment of

pseudogenes concomitantly with VSG switching. When a few of chromosome markers. Mol Biochem Parasitol 1998;94:155–73.

[7] Melville SE, Leech V, Navarro M, Cross GAM. The molecular karyotype of the

the mosaics were tested as transgenes in Lister 427, one resulted in

megabase chromosomes of Trypanosoma brucei stock 427. Mol Biochem Para-

reduced virulence. A priori, it seems likely that many of the mosaics

sitol 2000;111:261–73.

generated concomitantly with activation will be imperfect and will [8] Callejas S, Leech V, Reitter C, Melville S. Hemizygous subtelomeres of an African

trypanosome chromosome may account for over 75% of chromosome length.

therefore compromise cell viability or virulence when moved into a

Genome Res 2006;16:1109–18.

BES, replacing the currently expressed VSG. In a large trypanosome

[9] Hall JPJ, Wang H, Barry JD. Mosaic VSGs and the scale of Trypanosoma brucei

population this will not result in clearance of the infection, so long antigenic variation. PLoS Pathogens 2013;9:e1003502.

as the population stays ahead of the immune response. [10] Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu

DC, et al. The genome of the African trypanosome Trypanosoma brucei. Science

From examination of VSG sequences gathered in the current

2005;309:416–22.

study and previously, it is clear that few of them are unique

[11] Van Der Ploeg LHT, Schwartz DC, Cantor CR, Borst P. Antigenic variation in

throughout their sequence. VSG sequences are generally very diver- Trypanosoma brucei analysed by electrophoretic separation of chromosome-

sized DNA molecules. Cell 1984;37:77–84.

gent between Lister 427 and TREU927 strains of T. brucei, but those

[12] Marcello L, Barry JD. Analysis of the VSG gene silent archive in Trypanosoma

that are highly similar are not found in ‘protected’ genomic envi-

brucei reveals that mosaic gene expression is prominent in antigenic variation

ronments, but may reflect genetic exchange among populations. and is favored by archive substructure. Genome Res 2007;17:1344–52.

[13] Kolev NG, Ramey-Butler K, Cross GAM, Ullu E, Tschudi C. Developmental pro-

Most VSGs, therefore, are probably mosaics that have evolved, and

gression to infectivity in Trypanosoma brucei triggered by an RNA-binding

continue to evolve, by mutation and recombination, probably at

protein. Science 2012;338:1352–3.

far greater rates than the general rate of gene mutation in try- [14] Hirumi H, Hirumi K. Continuous cultivation of Trypanosoma brucei bloodstream

forms in a medium containing a low concentration of serum protein without

panosomes. The mechanisms responsible for this rapid evolution

feeder cell layers. J Parasitol 1989;75:985–9.

of the VSG repertoire are not known, although it has been shown

[15] Brun R, Schonenberger M. Cultivation and in vitro cloning of procyclic cul-

that only very short regions of sequence identity are required for ture forms of Trypanosoma brucei in a semi-defined medium. Acta Trop

1979;36:289–92.

efficient homologous recombination in T. brucei [3,77–79]. Some of

[16] Peacock L, Ferris V, Bailey M, Gibson W. Fly transmission and mating of Try-

the isolated mosaic sequences show evidence of additional point

panosoma brucei brucei strain 427. Mol Biochem Parasitol 2008;160:100–6.

mutations, which resurrects the idea that VSGs can undergo somatic [17] Serwer P. Gel electrophoresis with discontinuous rotation of the gel: an alter-

hypermutation, by an unknown mechanism, on their way to being native to gel electrophoresis with changing direction of the electrical field.

Electrophoresis 1987;8:301–4.

expressed [80,81]. The information gained from this study will

[18] Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene

greatly facilitate ongoing studies of VSG expression and switching.

identification with GLIMMER. Nucl Acids Res 1999;27:4636–41.

[19] Lukashin A, Borodovsky M. GeneMark.hmm: new solutions for gene finding.

Nucl Acids Res 1998;26:1107–15.

Funding [20] Min XJ, Butler G, Storms R, Tsang A, OrfPredictor:. predicting protein-coding

regions in EST-derived sequences. Nucl Acids Res 2005;33:W677–80.

[21] Nielsen H, Engelbrecht J, Brunak S, Von Heijne G. Identification of prokaryotic

The work was supported by a BBSRC New Investigator award to

and eukaryotic signal peptides and prediction of their cleavage sites. Protein

B.W. and by grant R01 AI021729 to G.A.M.C. from the U.S. National Eng 1997;10:1–6.

[22] Nielsen H, Krogh A. Prediction of signal peptides and signal anchors by a hidden

Institutes of Health (NIH). The content is solely the responsibility

Markov model. Proc Int Conf Intell Syst Mol Biol 1998;6:122–30.

of the authors and does not necessarily represent the official views

[23] Bininda-Emonds OR. transAlign: using amino acids to facilitate the multiple

of the National Institute of Allergy and Infectious Diseases (NIAID) alignment of protein-coding DNA sequences. BMC Bioinformatics 2005;6:156.

or the NIH. [24] Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs Calculator: calculating Ka

and Ks through model selection and model averaging. Genomics Proteomics

Bioinformatics 2006;4:259–63.

Acknowledgements [25] Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient

alignment of short DNA sequences to the human genome. Genome Biol

2009;10:R25.

We thank Christian Tschudi & Elisabetta Ullu for sharing RNAseq [26] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer J, et al.

BLAST+: architecture and applications. BMC Bioinformatics 2009;10:421.

data that led to the recognition and removal of RHS sequence

[27] Borst P, Van Der Ploeg LHT, Van Hoek JFM, Tas J, James J. On the DNA content

elements that were fused to VSG translations, to the staff of the

and ploidy of trypanosomes. Mol Biochem Parasitol 1982;6:13–23.

Genomics Resource Center of The Rockefeller University and the [28] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped

Genomics Core Lab at Memorial Sloan Kettering Cancer Center, and BLAST and PSI-BLAST: a new generation of protein database search programs.

Nucl Acids Res 1997;25:3389–402.

to members of Nina Papavasiliou’s laboratory at The Rockefeller

[29] Bringaud F, Biteau N, Melville SE, Hez S, El-Sayed NM, Leech V, et al. A new,

University for their comments and suggestions during this study.

expressed multigene family containing a hot spot for insertion of retroelements

72 G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73

is associated with polymorphic subtelomeric regions of Trypanosoma brucei. [60] Dreesen O, Cross GAM. Consequences of telomere shortening of an active

Euk Cell 2002;1:137–51. VSG expression site in telomerase-deficient Trypanosoma brucei. Eukaryot Cell

[30] Bringaud F, Biteau N, Zuiderwijk E, Berriman M, El-Sayed NM, Ghedin E, et al. 2006;5:2114–9.

The ingi and RIME non-LTR retrotransposons are not randomly distributed in [61] Zomerdijk JCBM, Kieft R, Borst P. A ribosomal RNA gene promoter at the

the genome of Trypanosoma brucei. Mol Biol Evol 2004;21:520–8. telomere of a mini-chromosome in Trypanosoma brucei. Nucleic Acids Res

[31] Hoeijmakers JHJ, Frasch ACC, Bernards A, Borst P, Cross GAM. Novel expression- 1992;20:2725–34.

linked copies of the genes for variant surface antigens in trypanosomes. Nature [62] Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G,

1980;284:78–80. et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24–6.

[32] Beals TP, Boothroyd JC. Genomic organization and context of a trypanosome [63] Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV):

variant surface glycoprotein gene family. J Mol Biol 1992;225:961–71. high-performance genomics data visualization and exploration. Brief Bioin-

[33] Beals TP, Boothroyd JC. Sequence divergence among members of a trypanosome form 2012, http://dx.doi.org/10.1093/bib/bbs017.

variant surface glycoprotein gene family. J Mol Biol 1992;225:973–83.

[64] Wickstead B, Ersfeld K, Gull K. The mitotic stability of the minichromosomes of

[34] Field MC, Boothroyd JC. Sequence divergence in a family of variant surface

Trypanosoma brucei. Mol Biochem Parasitol 2003;132:97–100.

glycoprotein genes from trypanosomes: coding region hypervariability and

[65] Klingbeil MM, Motyka SA, Englund PT. Multiple mitochondrial DNA poly-

downstream recombinogenic repeats. J Mol Evol 1996;42:500–11.

merases in Trypanosoma brucei. Mol Cell 2002;10:175–86.

[35] Wirtz E, Leal S, Ochatt C, Cross GAM. A tightly regulated inducible expression

[66] Glover L, Alsford S, Horn D. DNA break site at fragile subtelomeres determines

system for dominant negative approaches in Trypanosoma brucei. Mol Biochem

probability and mechanism of antigenic variation in African trypanosomes.

Parasitol 1999;99:89–101.

PLoS Pathogens 2013;9:e1003260.

[36] Doyle JJ, Hirumi H, Hirumi K, Lupton EN, Cross GAM. Antigenic variation in

[67] Zhou M, Guo J, Cha J, Chae M, Chen S, Barral JM, et al. Non-optimal codon

clones of animal-infective Trypanosoma brucei derived and maintained in vitro.

usage affects expression, structure and function of clock protein FRQ. Nature

Parasitology 1980;80:359–69. 2013;495:111–5.

[37] Aline Jr RF, Macdonald G, Brown E, Allison J, Myler P, Rothwell V, et al. (TAA)n

[68] Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias

within sequences flanking several intrachromosomal variant surface glycopro-

in bacterial genes. Science 2013;342:475–9.

tein genes in Trypanosoma brucei. Nucl Acids Res 1985;13:3161–77.

[69] Liu AYC, Van Der Ploeg LHT, Rijsewijk FAM, Borst P. The transposition unit



[38] Majumder HK, Boothroyd JC, Weber H. Homologous 3 -terminal regions of

of VSG gene 118 of Trypanosoma brucei: presence of repeated elements at its

mRNAs for surface antigens of different antigenic variants of Trypanosoma

border and absence of promoter associated sequences. J Mol Biol 1983;167:

brucei. Nucl Acids Res 1981;9:4745–53. 57–75.

[39] Carrington M, Miller N, Blum M, Roditi I, Wiley D, Turner MJ. Variant

[70] Davies KP, Carruthers VB, Cross GAM. Manipulation of the VSG co-transposed

specific glycoprotein of Trypanosoma brucei consists of two domains each

region increases expression-site switching in Trypanosoma brucei. Mol Biochem

having an independently conserved pattern of cysteine residues. J Mol Biol Parasitol 1997;86:163–77.

1991;221:823–35.

[71] Aline RF, Scholler JK, Stuart K. Transcripts from the co-transposed segment of

[40] Marcello L, Menon S, Ward P, Wilkes JM, Jones NG, Carrington M, et al. VSGdb:

variant surface glycoprotein genes are in Trypanosoma brucei polyribosomes.

a database for trypanosome variant surface glycoproteins, a large and diverse

Mol Biochem Parasitol 1989;32:169–78.

family of coiled coil proteins. BMC Bioinformatics 2007;8:143.

[72] Bernards A, Van Der Ploeg LHT, Frasch ACC, Borst P, Boothroyd JC, Cole-

[41] Cross GAM, Identification. purification and properties of clone-specific gly-

man S, et al. Activation of trypanosome surface glycoprotein genes involves



coprotein antigens constituting the surface coat of Trypanosoma brucei.

a gene duplication-transposition leading to an altered 3 end. Cell 1981;27:

Parasitology 1975;71:393–417. 497–505.

[42] Horn D. Codon usage suggests that translational selection has a major impact

[73] Siegel TN, Tan KSW, Cross GAM. A systematic study of sequence motifs for RNA

on protein expression in trypanosomatids. BMC Genomics 2008;9:2.

trans-splicing in Trypanosoma brucei. Mol Cell Biol 2005;25:9586–94.

[43] Wickstead B, Gull K. Dyneins across eukaryotes: a comparative genomic anal-

[74] Lamont GS, Tucker RS, Cross GAM. Analysis of antigen switching rates in Try-

ysis. Traffic 2007;8:1708–21.

panosoma brucei. Parasitology 1986;92:355–67.

[44] Jackson AP, Berry A, Aslett M, Allison HC, Burton P, Vavrova-Anderson J, et al.

[75] Horn D, Cross GAM. Analysis of Trypanosoma brucei VSG expression site switch-

Antigenic diversity is generated by distinct evolutionary mechanisms in African

ing in vitro. Mol Biochem Parasitol 1997;84:189–201.

trypanosome species. Proc Natl Acad Sci USA 2012;109:3416–21.

[76] Boothroyd CE, Dreesen O, Leonova T, Ly KI, Figueiredo LM, Cross GAM, et al.

[45] Weirather JL, Wilson ME, Donelson JE. Mapping of VSG similarities in Try-

A yeast-endonuclease-generated DNA break induces antigenic switching in

panosoma brucei. Mol Biochem Parasitol 2012;181:141–52.

Trypanosoma brucei. Nature 2009;459:278–81.

[46] Hertz-Fowler C, Figueiredo LM, Quail MA, Becker M, Jackson A, Bason N, et al.

[77] Barnes RL, Mcculloch R. Trypanosoma brucei homologous recombination is

Telomeric expression sites are highly conserved in Trypanosoma brucei. PLoS

dependent on substrate length and homology, though displays a differential

ONE 2008;3:e3527.

dependence on mismatch repair as substrate length decreases. Nucl Acids Res

[47] Lenardo MJ, Rice-Ficht AC, Kelly G, Esser KM, Donelson JE. Characterization 2007;35:3478–93.

of the genes specifying two metacyclic variable antigen types in Trypanosoma

[78] Glover L, Mcculloch R, Horn D. Sequence homology and microhomology dom-

brucei rhodesiense. Proc Natl Acad Sci USA 1984;81:6642–6.

inate chromosomal double-strand break repair in African trypanosomes. Nucl

[48] Matthews KR, Shiels PG, Graham SV, Cowan C, Barry JD. Duplicative activation

Acids Res 2008;36:2608–18.

mechanisms of two trypanosome telomeric VSG genes with structurally simple

[79] Glover L, Jun J, Horn D. Microhomology-mediated deletion and gene conversion



5 flanks. Nucl Acids Res 1990;18:7219–27.

in African trypanosomes. Nucl Acids Res 2011;39:1372–80.

[49] Barry JD, Graham SV, Fotheringham M, Graham VS, Kobryn K, Wymer B. VSG

[80] Lu Y, Hall T, Gay LS, Donelson JE. Point mutations are associated with a gene

gene control and infectivity strategy of metacyclic stage Trypanosoma brucei.

duplication leading to the bloodstream reexpression of a trypanosome meta-

Mol Biochem Parasitol 1998;91:93–105.

cyclic VSG. Cell 1993;72:397–406.

[50] Pedram M, Donelson JE. The anatomy and transcription of a monocistronic

[81] Lu Y, Alarcon CM, Hall T, Reddy LV, Donelson JE. A strand bias occurs in point

expression site for a metacyclic variant surface glycoprotein gene in Try-

mutations associated with variant surface glycoprotein gene conversion in Try-

panosoma brucei. J Biol Chem 1999;274:16876–83.

panosoma rhodesiense. Mol Cell Biol 1994;14:3971–80.

[51] Mcandrew M, Graham S, Hartmann C, Clayton C. Testing promoter activity

[82] Kolev NG, Franklin JB, Carmi S, Shi HF, Michaeli S, Tschudi C. The transcriptome

in the trypanosome genome: isolation of a metacyclic-type VSG promoter

of the human pathogen Trypanosoma brucei at single-nucleotide resolution.

and unexpected insights into RNA polymerase II transcription. Exp Parasitol

PLoS Pathogens 2010;6:e1001090.

1998;90:65–76.

[83] Murphy NB, Pays A, Tebabi P, Coquelet H, Gyaux M, Steinert M, et al. Try-

[52] Bringaud F, Biteau N, Donelson JE, Baltz T. Conservation of metacyclic variant

panosoma brucei repeated element with unusual structural and transcriptional

surface glycoprotein expression sites among different trypanosome isolates.

properties. J Mol Biol 1987;195:855–71.

Mol Biochem Parasitol 2001;113:67–78.

[84] Lacount DJ, El-Sayed NM, Kaul S, Wanless D, Turner CM, Donelson JE. Analysis

[53] Dreesen O, Li B, Cross GAM. Telomere structure and function in trypanosomes:

of a donor gene region for a variant surface glycoprotein and its expression site

a proposal. Nat Rev Microbiol 2007;5:70–5.

in African trypanosomes. Nucl Acids Res 2001;29:2012–9.

[54] Hovel-Miner GA, Boothroyd CE, Mugnier M, Dreesen O, Cross GAM, Papavasil-

[85] Graham SV, Terry S, Barry JD. A structural and transcription pattern for

iou FN. Telomere length affects the frequency and mechanism of antigenic

variant surface glycoprotein gene expression sites used in metacyclic stage

variation in Trypanosoma brucei. PLoS Pathogens 2012;8:e1002900.

Trypanosoma brucei. Mol Biochem Parasitol 1999;103:141–54.

[55] Dreesen O, Cross GAM. Telomerase-independent stabilization of short telom-

[86] Ginger ML, Blundell PA, Lewis AM, Browitt A, Gunzl A, Barry JD. Ex vivo and

eres in Trypanosoma brucei. Mol Cell Biol 2006;26:4911–9.

in vitro identification of a consensus promoter for VSG genes expressed by

[56] Wickstead B, Ersfeld K, Gull K. The small chromosomes of Trypanosoma brucei

metacyclic-stage trypanosomes in the tsetse fly. Euk Cell 2002;1:1000–9.

involved in antigenic variation are constructed around repetitive palindromes.

[87] Alarcon CM, Son HJ, Hall T, Donelson JE. A monocistronic transcript for a try-

Genome Res 2004;14:1014–24.

panosome variant surface glycoprotein. Mol Cell Biol 1994;14:5579–91.

[57] Gibson WC, Borst P. Size-fractionation of the small chromosomes of Try-

[88] Nagoshi YL, Alarcon CM, Donelson JE. The putative promoter for a meta-

panozoon and Nannomonas trypanosomes by pulsed field gradient gel

cyclic VSG gene in African trypanosomes. Mol Biochem Parasitol 1995;72:

electrophoresis. Mol Biochem Parasitol 1986;18:127–41. 33–45.

[58] Dero B, Zampetti-Bosseler F, Pays E, Steinert M. The genome and the antigen

[89] Kim KS, Donelson JE. Co-duplication of a variant surface glycoprotein gene

gene repertoire of Trypanosoma brucei gambiense are smaller than those of T. b.

and its promoter to an expression site in African trypanosomes. J Biol Chem

brucei. Mol Biochem Parasitol 1987;26:247–56. 1997;272:24637–45.

[59] Kanmogne GD, Bailey M, Gibson WC. Wide variation in DNA content among

[90] Vanhamme L, Pays A, Tebabi P, Alexandre S, Pays E. Specific binding of pro-

isolates of Trypanosoma brucei ssp. Acta Trop 1997;63:75–87.

teins to the noncoding strand of a crucial element of the variant surface

G.A.M. Cross et al. / Molecular & Biochemical Parasitology 195 (2014) 59–73 73

glycoprotein, procyclin, and ribosomal promoters of Trypanosoma brucei. Mol [92] Sherman DR, Janz L, Hug M, Clayton C. Anatomy of the parp gene promoter of

Cell Biol 1995;15:5598–606. Trypanosoma brucei. EMBO J 1991;10:3379–86.

[91] Brandenburg J, Schimanski B, Nogoceke E, Nguyen TN, Padovan JC, [93] Janz L, Clayton C, The PARP. and rRNA promoters of Trypanosoma brucei are com-

Chait BT, et al. Multifunctional class I transcription in Trypanosoma posed of dissimilar sequence elements that are functionally interchangeable.

brucei depends on a novel protein complex. EMBO J 2007;26: Mol Cell Biol 1994;14:5804–11. 4856–66.