The EMBO Journal vol.5 no.6 pp. 1291 -1298, 1986

Characterization of the gene for the microbody (glycosomal) triosephosphate isomerase of Trypanosoma brucei

Bart W.Swinkels1, Wendy C.Gibson1, Klaas A.Osinga13, isomerase, EC 5.3.1.1) is particularly suitable for such com- Roel Kramer1, Gerrit H.Veeneman2, parative studies. The enzyme is well characterized (see Straus Jacques H.van Boom2 and Piet Borst1 et al., 1985); the amino acid sequence of TIMs from both 'Division of Molecular Biology, The Netherlands Cancer Institute, eukaryotic (Kolb et al., 1974; Corran and Waley, 1975; Alber Plesmanlaan 121, 1066 CX Amsterdam, and 20rganic Chemistry and Kawasaki, 1982; Maquat et al., 1985; Straus and Gilbert, Laboratory, State University Leiden, Gorlaeus Laboratory, PO Box 9502, 1985a) and prokaryotic (Artavanis-Tsakonas and Harris, 1980; 2300 RA Leiden, The Netherlands Pichersky et al., 1984) sources has been determined and high 3Present address: Research and Development, Gist-brocades NV, Postbus 1, resolution structures for the chicken (Banner et al., 1975) and 2600 MA Delft, The Netherlands yeast (Alber et al., 1981) proteins are available. This makes TIM Communicated by P.Borst suitable for deducing long-range evolutionary relationships. To determine how microbody enzymes enter microbodies, we TIM has previously been purified from Trypanosoma brucei are studying the genes for glycosomal (microbody) enzymes (Misset and Opperdoes, 1984) and crystals for X-ray diffraction in Trypanosoma brucei. Here we present our results for triose- have been obtained (Weirenga et al., 1984), allowing the elucida- phosphate isomerase (TIM), which is found exclusively in the tion of the 3-D structure of the enzyme. Here we present the . We found a single TIM gene without introns, hav- sequence of the gene and an analysis of its transcription. ing one major polyadenylated transcript of 1500 nucleotides Results with a long untranslated tail of - 600 nucleotides. By a novel method, suitable for low abundance transcripts, we Isolation and characterization of the TIM gene of T. brucei demonstrate that TIM mRNA contains the 35-nucleotide We used a genomic DNA probe from the yeast TIM gene to iden- leader sequence (mini-exon) also found on several other tify the corresponding gene(s) in trypanosome nuclear DNA and trypanosome mRNAs. The TIM gene and a DNA segment in cDNA and genomic DNA recombinant clone banks. A single of at least 6 kbp upstream of the gene are transcribed at an trypanosome gene was found with homology to the yeast probe equal rate in isolated nuclei, suggesting that the gene is part (see also Gibson et al., 1985). The physical map of this gene of a much larger transcription unit. The predicted protein is presented in Figure 1 together with the genomic clones and is of the same size as TIMs from other organisms and shares subclones used to characterize the gene. -50% amino acid homology with other eukaryote TIMs, Figure 2 shows a Southern blot of trypanosome nuclear DNA somewhat less with prokaryote TIMs. Trypanosome TIM is hybridized with a trypanosome TIM cDNA probe. A single gene the most basic of all TIMs sequenced thus far. This is, in part, is present in strain 427 and in 20 other strains analysed; addi- due to the presence of two clusters of positively charged tional sub-stoichiometric bands seen previously and attributed to residues in the molecule which may act as a signal for entry the presence of related genes (Gibson et al., 1985) were in fact into . due to a probe contaminated with an upstream fragment. Key words: glycosome/topogenesis/ biogenesis/micro- Sequence of the trypanosome TIM gene body evolution/mini-exon The nucleotide sequence of the TIM gene was determined by the chain termination method, using a serial sequencing approach as detailed in Figure 1. The sequence is presented in Figure 3; Introduction a comparison of the amino acid sequence homology of the The parasitic protozoa grouped in the family Trypanosomatidae trypanosome TIM with TIMs of other organisms is shown in are characterized by an unusual microbody, called the glycosome Figure 4 and Table I. Clearly, the trypanosome gene, isolated (Opperdoes and Borst, 1977; Opperdoes et al., 1984), which con- via its homology with the yeast TIM gene, is a typical TIM gene. tains enzymes of the glycolytic pathway in addition to (parts of) It specifies a protein of the same size as other TIMs, and shares several other metabolic pathways [see Opperdoes (1985) and -50% sequence homology, at both the nucleotide and amino references quoted therein]. The transfer of glycolysis from its acid levels, with other eukaryotic TIMs. No major insertions are usual location in the to a microbody represents a required to align the amino acid sequences of the trypanosome remarkable step in eukaryotic evolution. We are interested in the TIM and those of other organisms, indicating that the trypano- mechanism of this translocation and in the changes required in some gene lacks introns. cytosolic glycolytic enzymes to allow their delivery into micro- Although we have no direct amino acid sequence data to link bodies. We hope that such changes may eventually be exploited the sequenced gene to the glycosomal TIM, the indirect evidence in chemotherapy (see Wierenga et al., 1984), and expect that for this link is strong. Only one TIM has been detected in they will provide information about the mechanism of entry of T. brucei (Misset and Opperdoes, 1984) and we find one gene. proteins into microbodies (, ) in general The subunit mass for T. brucei TIM, as calculated from the and, possibly, about the evolutionary origin of these . nucleotide sequence, is 26 718 (not counting the initiator The glycolytic enzyme triosephosphate isomerase (Meyerhof methionine), which is in good agreement with the mass of 27 000 and Lohmann, 1934) (TIM; D-glyceraldehyde-3-phosphate ketol 4 500 deduced from SDS gel electrophoretic analysis of the

© IRL Press Limited, Oxford, England 1291 B.W.Swinkels et al.

TIM

H H K R P C pK A H P pKp A K R C 5. IP,I I '1 I I I I I II 1 3. * mRNA 1500 nts

I- I pTcTIM9

K H P PP RC P K R C P P, H pTgTIM12 1 I I I _ I I I

R p C pK Hti A H PHif. R pTgTIM8 iK 1kb I I Hr-. c *

p P Bc P K Ev .. R I.-,_ I .i.-- 200bp

5. 3. M13 subclones & , sequence strategy

Fig. 1. Physical map of the TIM gene of T brucei 427. The genomic environment of the TIM gene (black box) is shown and below, maps of the inserts of two genomic clones, pTgTIM8 (in pAT153) and pTgTIM12 (in pBR322) are shown. The positions of the mRNA and cDNA clone pTcTIM9 are indicated below the gene. The broken line represents a polymorphic restriction site differing between the two TIM alleles in T. brucei 427. HincIl sites are only shown in pTgTIM8, where they were mapped. Part of the map of pTgTIM8 is enlarged to show the sequence strategy: bars indicate two PstI fragments subcloned into M13, the larger of which contains plasmid sequences (indicated by open box); vector arrows indicate length and orientation of sequence obtained with each primer (dot) or the Maxam and Gilbert method (cross). The 5' and 3' ends of the SI nuclease-mapped transcript are arrowed below the enlarged map. Restriction sites: A = AvaII, C = ClaI, Ev = EcoRV, H = HindIll, Hc = HincII, K = KpnI, P = PstI, Pv = PvuII and R = EcoI. purified glycosomal protein (O.Misset and F.R.Opperdoes, per- EcoRI Pst Avail Cla I sonal communication). Also the high net positive charge, calculated from the predicted amino acid sequence (see below), is in agreement with an isoelectric point >pH 9.0, found for kb the glycosomal TIM (O.Misset and F.R.Opperdoes, personal 8.0- W communication). Finally, X-ray diffraction studies of T. brucei TIM crystals show the enzyme to closely resemble other TIMs 5.3- _ and show divergence in electron density at positions predicted by divergence of the T. brucei TIM amino acid sequence from other sequences (Wierenga et al., 1984; R.K.Wierenga and W.G.J.Hol, personal communication). Hence, we conclude that our TIM sequence corresponds to the glycosomal TIM and that 2.6- wp we have not sequenced a gene for a second minor TIM or a non- functional gene. Our comparison of the amino acid sequences of the cytosolic and glycosomal phosphoglycerate kinases (PGK) of T. brucei (Osinga et al., 1985) revealed two types of difference: the 1.3- glycosomal enzyme has a C-terminal extension of 20 residues and a much higher positive charge than the cytosolic isoenzymes. For TIM there is no cytosolic equivalent of the glycosomal en- zyme, but the comparison of its amino acid sequence with those of TIMs from other organisms shows that the trypanosomal TIM lacks a C-terminal extension (Figure 4). This has been confirm- 0.5- ed by X-ray diffraction of TIM crystals (R.K.Wierenga and W.G.J.Hol, personal communication). The charge of different probe: TIM c DNA TIMs, as calculated from their amino acid sequences, assuming neutral pH, is presented in Figure 5. We do not know how this relates to the pH in situ. However, a large difference in pH bet- Fig. 2. The TIM gene is present in a single copy in T. brucei. Nuclear DNA from T. brucei was digested with the restriction enzymes shown, size ween the cytosol and the glycosomal matrix seems unlikely in fractionated through a 0.7% agarose gel and hybridized with the TIM view of the pH optima of the cytosolic and glycosomal PGK, cDNA clone pTcTIM9. Post-hybridizational washes were to 0.1 x SSC at which are both -pH 8.0 (O.Misset, personal communication). 650C. 1292 Triosephosphate isomerase gene of T. brucei

CTGCAGCAACTTACTGGGGACGCTGCTATCCTTTCTTCTTCATATTTCTCGTTTACCTACGTTTAGAGTCTCTGAGATCATTACTAGCAAG -76 met ser lys pro gln CAAACAAGAAGCCATTTGAGTTTCAAGCAAAGTCTACCAAAAAACAAACTCTTATTATACCGTGCCAAATT ATG TCC AAG CCA CAA 15

pro ile ala ala ala asn trp lys cys asn gly ser gln gln ser leu ser glu leu ile asp leu phe CCC ATC GCA GCA GCC AAC TGG AAG TGC AAC GGC TCC CAA CAG TCT TTG TCG GAG CTT ATT GAT CTG TTT 84

asn ser thr ser ile asn his asp val gln cys val val ala ser thr phe val his leu ala met thr AAC TCC ACA AGC ATC AAC CAC GAC GTG CAA TGC GTA GTG GCC TCC ACC TTT GTT CAC CTT GCC ATG ACG 153

lys glu arg leu ser his pro lys phe val ile ala ala gln asn ala ile ala lys ser gly ala phe AAG GAG CGT CTT TCA CAC CCC AAA TTT GTG ATT GCG GCG CAG AAC GCC ATT GCA AAG AGC GGT GCC TTC 222

thr gly glu val ser leu pro ile leu lys asp phe qly val asn trp ile val leu gly his ser glu ACC GGC GAA GTC TCC CTG CCC ATC CTC AAA GAT TTC GGT GTC AAC TGG ATT GTT CTG GGT CAC TCC GAG 291

arg arg ala tyr tyr gly glu thr asn glu ile val ala asp lys val ala ala ala val ala ser gly CGC CGC GCA TAC TAT GGT GAG ACA AAC GAG ATT GTT GCG GAC AAG GTT GCC GCC GCC GTT GCT TCT GGT 360

phe met val ile ala cys ile gly glu thr leu gln glu arg glu ser gly arg thr ala val val val TTC ATG GTT ATT GCT TGC ATC GGC GAA ACG CTG CAG GAG CGT GAA TCA GGT CGC ACC GCT GTT GTT GTG 429

leu thr gln ile ala ala ile ala lys lys leu lys lys ala asp trp ala lys val val ile ala tyr CTC ACA CAG ATC GCT GCT ATT GCT AAG AAA CTG AAG AAG GCT GAC TGG GCC AAA GTT GTC ATC GCC TAC 498

glu pro val trp ala ile gly thr gly lys val ala thr pro gln gln ala gln glu ala his ala leu GAA CCC GTT TGG GCC ATT GGT ACC GGC AAG GTG GCG ACA CCA CAG CAA GCG CAG GAA GCC CAC GCA CTC 567

ile arg ser trp val ser ser lys ile gly ala asp val arg gly glu leu arg ile leu tyr gly gly ATC CGC AGC TGG GTG AGC AGC AAG ATT GGA GCA GAT GTG CGC GGA GAG CTC CGC ATT CTT TAC GGC GGT 636

ser val asn gly lys asn ala arg thr leu tyr gln gln arg asp val asn gly phe leu val gly gly TCT GTT AAT GGA AAG AAT GCG CGC ACT CTT TAC CAA CAG CGA GAC GTC AAC GGC TTC CTT GTT GGT GGT 705

ala ser leu lys pro glu phe va 1 asp ile ile lys ala thr gln *** GCC TCA CTT AAG CCA GAA TTT GTG GAC ATC ATC AAA GCC ACT CAG TGAI rTTTCCTTCATGTGTCAATGAGGTTTGG 781 TGCTTTTGCCGTTGAGTGGGTGAAGATAGCGGTATATATATATATATATATATATATATATGCGCAAGTGAATATAAAAAAGATGTAAAGA 872 CAGGTAGCAGGGAGAAAACCTCGCATAACATTATAAAAGGGAGTGTAACTGGAGTGGGAAAACAAAGGAAAGGGGGATTCGTGTATTGAGC 963 ATATGAGAAAAAAAAAAGAAATTATGTTGTATGTTTTTACCTATAATTTATGCGAAGTGAATGACAAAACAAAAACCAAAAGGATATCATC 1054 ATATGCTTTTGTTTCATCCAAATGGTTGTTTCTTCCGTACCTCAGGGTCACTACTTCGTTGAGTGTGGTTTTAGCGAGGAGAGGGAACAAT 1145 AGGGGGTGTTGTATACATTTACACGTACGTATCTTCCTTTACTCTCTCTTGCCTTCATTATATTCCCCCTTTTTCTGGGAGAGGAAAAGAG 1236 AGTGTAGAATGAGGGGAGTACGTGTACGGAATTTTAACGATTACCCCCTTTTTTTTCTTTGAACTATTATTTTTAGAATTC 1322

Fig. 3. Complete nucleotide and amino acid sequence of the TIM gene from T. brucei. Amino acid coding region begins at position 1 and ends at position 753. The 5' end of the TIM mRNA was mapped by SI analysis to position -103 5 bp (underlined); the sequence shown here ends -20 bp upstream of the 3' end of the mRNA.

It should be noted that some of these sequences (rabbit, coelacanth major transcript of 1500 nucleotides. We precisely located the and Bacillus stearothermophilus) were determined directly rather ends of this transcript by SI nuclease protection experiments than via the corresponding DNA sequence and may contain er- (Figure 6B). The 5' end of the transcript is located 103 5 rors (Maquat et al., 1985; Straus and Gilbert, 1985a). We also nucleotides upstream from the start of the coding region. Close note that the protein sequences derived from DNA sequences may to this position an AG is present in the sequence and slightly miss post-translational modifications that could affect charge. upstream a pyrimidine stretch. This may be the intron-exon With these provisos, the trypanosomal TIM is the most basic of border used in splicing the 35-nucleotide leader sequence (see all TIMs sequenced thus far. below) onto the body of the niRNA. The 3' end of the RNA It is clear from Table I and Figure 4 that the trypanosomal is not contained in the area sequenced but is -20 nucleotides TIM is somewhat more homologous to the eukaryotic than to downstream of the EcoRI site (data not shown; see Figure 1). the prokaryotic TIMs. This is especially apparent in the C- Thus the TIM message has a very long 3'-untranslated sequence terminal part, although the differences between the eukaryotic of - 600 nucleotides, as also found for the glycosomal, but not and prokaryotic versions of TIM are not dramatic. We tentitative- the cytosolic, PGK message (Osinga et al., 1985) and for the ly conclude that this microbody enzyme is of eukaryotic origin. glycosomal aldolase gene (Clayton, 1985). All trypanosome mRNAs analysed thus far contain the same Transcription of the TIM gene 35-nucleotide leader sequence (mini-exon) at their 5' end, which Figure 6A shows an RNA blot of poly(A) + RNA isolated from has never been found to be part of the transcription unit of the bloodstream form and cultured insect form trypanosomes main body of the mRNA (see Boothroyd, 1985; Borst, 1986b). hybridized with a TIM cDNA probe. In both cases there is one Primer extension analysis of the TIM mRNA yields a run-off 1293 B.W.Swinkels et al.

10 20 30 40 T.brucei (M) S K P Q P I A A A N W K C N G S Q Q S L S E L I D L F N S T S I N H D V Q C V V Human A P S R K F F V G G . . . M . . R K . . . G . . . G T L . AA K VP A . T E V . C Rabbit A P S R K F F V G G . . . M . . R K K N . G . . . TTL . AA K VP A . T E V . C Chicken (M) A P R K F F V G G . . . M . . D K K . . G . . . H T L .G A K LS A . T E V . C Coelacanth A P R K F F V G G . . . M . . D K K . . G . . . Q T L . AA K V P F T G E I .C Yeast (M) A R T F F V G G . F . L . . . K . . I K . I VE R L . T A . .P E N . E V . I E.coli (M) R H . L V M G . . . L . . . R H M V H . . V S N L R K E L A GV A G C A VA I B.stea. (M) R K . . I . G . . . M H K T L A E A V Q F V E D V K G H V PP A . E V D S.

50 1 60 70 80 T.brucei A S T F V H L A M T K E R L S H P K F V I A A Q N A I A K S G A F T G E V S L; P I Human .P P T A Y I D F A R Q K .D ..I A V . . . . C Y K V T N ...... I .P G M Rabbit .P P T A Y I D F A R Q K .D ..I A V . . . . C Y K V T N ...... I .P G M Chicken G A P S I Y .D F A R Q K .D A. I G V . . . . C Y K V P K ...... I .P A M Coelacanth . P P E A Y .D F A R L K V D . . . G V . . . . C Y K V . K ...... I .P A M Yeast C P P A T I .D Y S V S L V K K . Q V T V G . . . . Y L . A S ...... N .V D Q E.coli . P P E M Y I D . A . R E A E G S H I M L G . . . V N L N L S ...... T .A A M B.stea. V A P . L F R L V Q A A D G T D L Q K . G . . T M H F A B Z ..Y .. ... P V M

90 100 110 120 T.brucei L K D F G V N WI V L G H S E R R A Y Y G E T N E I V A D K V A A A V A S G F M V I Human I ..C .A T .V...... H V F . . S D . L I G Q . ..H .L . E .L G .. Rabbit I..C .A T .V...... H V F . . S D . L I G Q . ..H .L S E .L G .. Chicken I..I .A A .V I...... H V F . . S D . L I G Q . ..H .L .E .L G .. Coelacanth I. .C . T .V I...... H V F . . S D . L I G Q . ..H .L .E .L G .V Yeast I. .V A K .V I...... S. F H . D D K F I . . . T K F .L G Q .V G..

E.coli I. ..I .A Q Y . I I . * * * ** T. H R . S D . L I . K . F . V L K E Q .L T P V B.stea. . .C.L . T Y V I ...... I M F A Z . B Z T . B K . . L . . F T R .L I P .

130 140 15,0 I 160 T.brucei A C I G E 'T L Q E R E S G R T A V V V L T Q I A A IA K K L K KA D W A K V V I A Y Human . . . . . K . D . .. A . I . E K . . F E . T K V ..D N V. . . . L . .

Rabbit . . . . . K .D . .. A . I . E K . . F E . T K V ..D N V. . . . L . . * s Chicken . . . . . K .D . .. A . I . E K . . F E . T K . ..D N V. . . . L . .

Coelacanth . . . . . K .D . .. A . I . E G . . F E V T E V ..D D V. . . . L . . Yeast L ...... E . KK A . K . L D . . E R . L N . V L E E V. N . . V . . E.coli L . . . . . E A . N. A . K . E E . C A R . . D . V L . T Q G A . A FE G A . . . . B.stea. I . C . . S *E.EA .Q A . E . D A . . S Q V E K . L . G L T P Q E. . I L . .

170 180 190 200 T.brucei E P V W A I G T G K V A T P Q Q A Q E A U A L Y R S W V S S K I G A D V R G E L R I

Human * . .* * . . . . . T ...... V . E K L . G .L K .N V S D A .A Q S T

Rabbit * . .* * . . . . . T ...... V . E K L . G .L K .N V S D A .A Q S T

Chicken * . . . * . . . . .* ...... V . E K L . G .L K H V S G A .A Q S T .T...... S Coelacanth * . . . . . T . . . S . .L . G K L . K .L K E N V S E T .A D S V . . Yeast ...... L A . . . E D . . D I . . S . . K F L A ..L . D K AA S . .

* . . . . I . E.coli . . .. A . . . A V . K P . . D H I A .V D .IN I A E Q V . B.stea. * . . . . . s * Z B . B SV C G H . . . V . . R L F . P E A A E A I

210 220 I 230 240 250 T.brucei L Y G G S V N G K N A R T L Y Q Q R D V N G F L V G G A S L K P E F V D I I K A T Q

Human I . . . . .T .A T C K E . A S . P . .D . . * ...... N .K .

Rabbit I . . . . .T .A T C K E . A S . P . .D ...... N .K .

Chicken I . . . . .T . G . C K E . A S . H . .D . . * ...... L . . N .K H AV.

Coelacanth I . . . . .T .A T C K E . A S E P . .D . . * ...... E Y K D VR .

Yeast * . . . . A . .S . . V .F K D K A . .D ...... * * . . . . . N SR N

E.coli Q ...... A S . . A E . F A . P . I D . A ...... * .A D A . A V .V A A E A . K .A

B.stea. Q . . . . .K P D . I . D F L A Z Z ZI.B . A * ...... E . A S . L Q L V Q A GR HE

Fig. 4. Comparison of the amino acid sequences of the trypanosome TIM with those of TIMs from other organisms. The numbering refers to the trypanosome TIM and only this sequence is shown in full. Dots in the other sequences indicate residues indentical to trypanosomal ones; some spaces were necessary in order to align the sequences. Arrows above the sequences indicate the segment borders (see Discussion and Table II). Sequences are from: man (Macquat et al., 1985), rabbit (Corran and Waley, 1975), chicken (Straus and Gilbert, 1985a), coelacanth (Kolb et al., 1974), yeast (Alber and Kawasaki, 1982), E. coli (Pichersky et al., 1984) and B. stearothernwophilus (Artavanis-Tsaknoas and Harris, 1980). 1294 Triosephosphate isomerase gene of T. brucei

Table I. Percentage amino acid homology of TIMs calculated from Figure 4 T. brucei Human Rabbit Chicken Coelacanth Yeast E. coli B. stearothermophilus T. brucei XXXX Human 52 XXXX Rabbit 51 98 XXXX Chicken 52 89 88 XXXX Coelacanth 48 82 82 80 XXXX Yeast 48 53 52 53 49 XXXX E. coli 44 46 46 46 44 45 XXXX B. stearothermophilus 38 37 37 39 36 34 38 XXXX

product which is - 35 nucleotides longer than that predicted by 6_ T. brucei + the SI analysis (Figure 6C). To verify that the 35-nucleotide -5 leader is indeed the mini-exon sequence, we annealed an 4 oligonucleotide, complementary to the 3' 22 nucleotides of the -3 mini-exon (De Lange et al., 1983), to the RNA and digested with -2 RNase H, which specifically cuts in RNA -DNA hybrids (Berkower et al., 1973; Donis-Keller, 1979). Subsequent primer _O_ Rabbit (Human extension analysis revealed several shortened run-off products 1 Chicken (Figure 6C). When the same experiment was carried out using 2 a non-homologous oligonucleotide, no effect on the length of the 3 _o Yeast run-off products was seen. We therefore conclude that the TIM 4 mRNA carries a mini-exon sequence at its 5' end. 5 -6 .Coelacanth. In an attempt to delineate the 5' edge of the TIM transcription E. coli unit, we analysed nascent transcripts elongated in isolated nuclei incubated in the presence of 32P-labelled UTP. Figure 7 shows Fig. 5. Net charges of TIMs as calculated from the amino acid sequences an experiment where this nascent RNA was hybridized to restric- presented in Figure 4. tion digests of the genomic clone pTgTIM8. All restriction fragments from this clone (except vector-derived fragments) that have been sequenced recently are less informative in this hybridize with the nascent RNA and the degree of hybridization respect since either sequences of prokaryotic counterparts are not is approximately proportional to the fragment length. The same available [PGK (Osinga et al., 1985); aldolase (Clayton, 1985)] result was obtained with nuclei from both bloodstream and culture or the glycosomal enzyme is as homologous to the eukaryotic fonn trypanosomes (not shown). The TIM gene is therefore either as to the prokaryotic enzymes [glyceraldehyde-3-phosphate part of a large transcription unit which yields rapidly processed dehydrogenase (GAPDH), P.A.M.Michels and F.R.Opperdoes, precursors, or is surrounded by adjacent transcription units which personal communication]. We tentatively conclude that the are transcribed at an equal rate. We have some evidence for larger glycosomal TIM did not arise from a prokaryotic endosymbiont precursors in steady-state RNA, since large RNA species but evolved via modification of its endogenous cytosolic ancestor. (3000-5000 nucleotides) can be detected with a TIM cDNA pro- This does not necessarily mean that the glycosome itself is of be in RNA blots of total RNA (not shown). This analysis is com- eukaryotic origin since the enzyme complement of the microbody plicated by the presence of another gene upstream of the TIM could have changed substantially during evolution by the simple gene which also produces a transcript of 3000 nucleotides. This rerouting of endogenous eukaryotic enzymes (see Borst, 1986a). transcript is also seen in poly(A) + RNA, however, whereas the The high homology between the cytosolic and glycosomal PGKs putative precursors are not. indeed shows that only few alterations are required to redirect a cytosolic enzyme to the glycosome (Osinga et al., 1985). Discussion The deduced amino acid sequence for the TIM protein shows It is now generally accepted that microbodies arise from pre- -50% homology to TIMs from other organisms and the pro- existing microbodies and not de novo or from the endoplasmic tein is about the same size. The active site is largely conserved reticulum (see Lazarow and Fujiki, 1985; Borst, 1986a). What, with respect to the chicken TIM (Banner et al., 1975) except then, was the origin of the first microbody? Several authors for a lysine to threonine substitution at position 130 (Figure 4). (Lazarow et al., 1980; Borst, 1981, 1986a; De Duve, 1982) have Residues known to be involved in subunit contact in the native raised the possibility that microbodies might have originated from chicken enzyme (Banner et al., 1975), which is a homo-dimer prokaryotic endosymbionts, like the ancestors of mitochondria (Johnson and Waley, 1967), are less conserved, but in this the and . If this were true, we would expect the genes trypanosomal TIM is no exception. So despite its unusual for microbody proteins to be of prokaryotic origin. We have iden- subcellular location, the trypanosomal enzyme is a typical TIM. tified and sequenced the gene for the glycosomal TIM in T. brucei How then is this largely unaltered cytosolic enzyme routed to and a comparison of its predicted amino acid sequence with those the glycosome? Our analysis of trypanosome PGKs has shown of TIMs of both pro- and eukaryotic origin shows this microbody that the only major differences between the glycosomal and enzyme to be somewhat more homologous to the eukaryotic than cytosolic isoenzymes are a C-terminal extension and a much to the prokaryotic TIMs. The three other glycosomal enzymes higher positive charge in the glycosomal PGKs (Osinga et al., 1295 B.W.Swinkels et al.

RNA BF CF C 1 2 3 A poly A + B BF CF m 1 2 3 4 5 6 7 nt

nt _240 1632 - 9 _190

517 - - _160 396 - _- 365 nt 147 Q ._-_ .. 151 nt 1500 nt -

298- .mA 1 22 p

_ 110

220- _ F9

probe: TIM cDNA

Fig. 6. (A) Northern blot of poly(A)+ RNA from T. brucei 427 hybridized wtih the TIM cDNA probe (see Figure 1) and washed to 0.1 x SSC at 65°C. BF = RNA from bloodstream form trypanosomes, CF = RNA from cultured insect form trypanosomes. (B) SI protection experiment to map the 5' end of the TIM mRNA. The 810-bp HincH fragment from pTgTIM8 (see Figure 1), covering the 5' end of the TIM gene, was 5' 32P end-labelled, hybridized with total RNA from bloodstream form (BF, lanes 2 and 3) or culture form (CF, lanes 4 and 5) trypanosomes and digested with SI nuclease. The SI nuclease- protected DNA was analysed on a 6% denaturing polyacrylamide gel. Lane 1, input fragment; lane 2, hybridization wth BF RNA at 42°C followed by SI nuclease digestion; lane 3, as 2 at 46°C; lane 4, hybridization with CF RNA at 42°C followed by S1 nuclease digestion; lane 5, as 4 at 46°C; lane 6, hybridization with sheared salmon sperm DNA at 42°C followed by SI nuclease digestion; lane 7, hybridization with BF RNA at 42°C only; m = marker, end-labelled HinfI-digested pAT153. (C) Primer extension analysis of the TIM mRNA. A 5' 32P end-labelled 16-mer deoxynucleotide, complementary to the 5' end of the TIM gene (Figure 3, position 1-16), was used to prime cDNA synthesis on poly(A)+ RNA isolated from blood stream form trypanosomes. Lane 1 shows a run-off product of 151 nucleotides, whereas S1 nuclease analysis predicts a product of 119 ± 5 nucleotides (see Figure 3, B and text). To see if this discrepancy is caused by the presence of the 35-nucleotide mini-exon sequence characteristic of all trypanosome mRNAs analysed thus far (see Boothroyd, 1985; Borst, 1986b), a 22-mer deoxynucleotide, complementary to the mini-exon, was hybridized to the RNA which was then treated with RNase H, prior to primer extension analysis (lane 2). The shortened run-off products result from RNase H cutting in the hybrid. These products lie between 116 and 138 nucleotides, which corresponds to the position of the 22-mer on the RNA if the mini-exon is present. Lane 3 shows a similar experiment using a non- homologous oligonucleotide; no effect on the length of the run-off products is observed. The autoradiograph shown in lane 2 was exposed five times longer than those in lanes 1 and 3. 1985). From Figure 4 it is clear that the trypanosome TIM has of five amino acids, of which four are lysines (Figure 4, posi- no terminal extensions which could act as (cleavable) signal se- tion 152-156), whereas all other TIMs are neutral or negative quences. It does have a high net positive charge compared with in this area, except Escherichia coli TIM, which has only one the other TIMs, however (Figure 5 and Table H). To assess the positive charge in this area; in segment VII, although less strik- distribution of these charge differences, we divided the TIM se- ing, a stretch of five amino acids (Figure 4, position 217-221) quences into eight segments as indicated in Figure 4. These is found with a net charge of +2, whereas the same stretch is segments correspond to the eight segments which compose the neutral or negative in all other TIMs. Thus the extra positive pseudosymmetrical fl-barrel of chicken TIM, each consisting of charges are clustered in two areas of the molecule, a situation a :-sheet and an ca-helical part (Banner et al., 1975; see also analogous to that found for PGK (Osinga et al., 1985) and Straus and Gilbert, 1985b). Table II compares the net charges GAPDH (P.A.M.Michels and F.R.Opperdoes, personal com- of these segments among the different TIMs. It is clear that the munication) and also deduced from the published (Clayton, 1985) trypanosomal TIM differs from the other TIMs in segments I, sequence of aldolase (R.K.Wierenga and W.G.J.Hol, personal V and VII whereas the rest of the enzyme does not deviate much. communication). A more detailed analysis of the sequence and Segment I in the trypanosomal TIM is 3-5 charges less positive 3-D structure of glycosomal PGK, TIM, GAPDH and aldolase than in other TIMs. This seems paradoxical in view of the high has shown that two clusters of positive charges located at a precise net positive charge of the molecule and may suggest that this distance on the surface of these enzymes is the only sequence segment became less positively charged to compensate for the motif common to all four enzymes (R.K.Wierenga and positive charges required to address TIM to the glycosome. W.G.J.Hol, personal communication). We think therefore that Segments V and VII are more positively charged in the trypano- this represents the signal for entry into glycosomes. Whether somal TIM and a more detailed analysis reveals clustering of similar signals are used by other organisms to direct proteins to positive charges in both segments: segment V contains a stretch microbodies is as yet unclear (Borst, 1986a). 1296 Triosephosphate isomerase gene of T. brucei

Pst I Hincli 0.15 M NaCl; 0.015 M sodium citrate; pH 7.0) at 58°C]. Post-hybridizational kb washes were carried out for 2 h at 58°C with several changes of 5 x SSC, 0.1% SDS, followed by one wash of 3 x SSC, 0.1 % SDS at 58°C for 30 min. Under these conditions yeast TIM hybridizes specifically with trypanosome DNA (not .5.8 a. shown). Genomic T. brucei 427 TIM clones were isolated from both EcoRI (T.De Lange, unpublished) and Hindl (J.M.Kooter and A.J.Winter, unpublished) clone banks, -_2.8 b_ e_ using the isolated insert of pTcTIM9 (see Figure 1) as probe. From the EcoRI I- clone bank two types of clone were isolated, representing two allelic TIM genes Ia which differed in one restriction site (Gibson et al., 1985). _1.65 C- f,g- DNA sequence analysis h_ DNA sequence analysis was performed using mainly the dideoxy method (Sanger 'Ar- et al., 1977) with the modifications described by Biggin et al. (1983) except that 32p was used as label. Two PstI fragments were isolated from the genomic clone k_ pTgTIM8 and subcloned into M13mp9. Cloning and preparation oftemplate DNA I- was performed using standard protocols (Maniatis et al., 1982; Messing, 1983). Sequencing across the gene internal PstI site (see Figure 1) was carried out by _0.55 d_ the Maxam and Gilbert (1980) method from the labelled KpnI site. n_ Oligonucleotides, used for sequencing and primer extension analysis of the TIM mRNA (see below), were prepared by phosphotriester chemistry (Marrug et al., 1984), using a fully automatic synthesizer (Biosearch, SAM). Isolation and blotting of DNA and RNA Total RNA was prepared from T brucei 427 cultured insect or bloodstream form trypanosomes by LiCl precipitation (Auffray and Rougeon, 1980), and the poly(A)+ fractions were purified by oligo(dT)-cellulose chromatography (Hoeij- makers et al., 1980). Nuclear DNA from T. brucei 427 was isolated, digested with various restriction enzymes, size-fractionated and blotted as described previously (Gibson etal., 1985). The blots were hybridized with the nick-translated, P p PP P 32P-labelled (Rigby et al., 1977) insert from cDNA clone pTcTIM9 essentially a I b as described by Jeffreys and Flavell (1977), with the addition of 10% (w/v) dex- e h | g |m I H k HnH i Ir I jI tran sulphate (Wahl et al., 1979). Post-hybridizational washes were to 0.1 x SSC H H H H H HH H H H H at 650C. SI nuclease protection experiments Total RNA (50 Ag) from bloodstream or cultured insect form trypanosomes was Fig. 7. Nascent transcripts of the TIM gene and flanking ragions. PstI and dissolved in 15 11 of hybridization buffer (Berk and Sharp, 1977) with 32P end- Hincd digests of pTgTIM8 were blotted and hybridized to 32P-labelled labelled probes (see below), heated to 85°C for 15 min and incubated overnight nascent RNA made in isolated trypanosome nuclei (see Materials and at 42°C or 46°C. Following digestion with S1 nuclease (Sigma) (final volume methods); the ethidium bromide stained gel is shown to the left and the 300 $1, 500 U/mi S1 nuclease) at 30°C for 1 h, products were recovered by autoradiograph to the right. In the map below, the position of each fragment isopropanol precipitation and analysed on 6% denaturing polyacrylamide gels. is indicated (broken lines represent vector sequences). Transcription To map the 5' end of the mRNA, we used a 5' end-labelled 810 bp HincIl frag- proceeds at the same rate throughout the region since the amount of ment from pTgTIM8 covering the 5' end of the gene. To map the 3' end of the hybridization with each fragment is roughly proportional to its length; message, we used a 3' end-labelled 1200-bp PvullPstl fragment from pTgTIM12 fragments a and e hybridize poorly because they are largely vector. P = covering the 3' end of the gene. PstI, H = HincII. Primer extension analysis A 5' end-labelled synthetic 16-mer deoxynucleotide, complementary to the 5' Table II. Net charges of TIM segments as indicated in Figure 4a part of the TIM gene (5'-GTTGTGGCTTGGACAT; Figure 3, position 1-16), was used to prime cDNA synthesis on poly(A)+-enriched RNA. The reactions Segment were performed essentially as described by Michels et al. (1983) using 1 ng of the 5' phosphorylated 16-mer primer and 1.5 itg of poly(A)+-enriched RNA, I II HI IV V VI VII VIII Total isolated from bloodstream form trypanosomes. The primer-extended products were analysed on a 6% denaturing polyacrylamide gel. T. brucei 0 0 1+ 1- 3+ 0 4+ 1- 6+ In the case of RNase H treatment of the RNA (Donis-Keller, 1979), prior to Human 4+ 1- 1+ 2- 2- 0 1+ 2- 1- primer extension analysis, 1.5 Ag of poly(A)+-enriched RNA was mixed with 2 jig of a non-labelled oligodeoxynucleotide (see below) and was heated to 90°C Rabbit 5+ 1- 1+ 2- 2- 0 1 + 2- 0 for 2 min. The reaction mixture was then adjusted to 40 mM Tris-HCI (pH 7.7), Chicken 3+ 1- 2+ 2- 2- 0 1 + 2- 1- 4 mM MgCl2, 2 mM dithiothreitol and 100 mM NaCl (NaCl was added to in- crease hybrid stability) (final volume 10 1I) and incubated at 32°C for 15 min. Coelacanth 3+ 1- 2+ 2- 6- 0 0 2- 6- Following digestion with 1 U RNase H (Boehringer Mannheim; E. coli type), Yeast 3+ 1 - 1- 1+ 2- 3- 2+ 2- 3- at 32°C for 30 min, the reaction was stopped by adding 40 i1 of 50 mM Tris- E coli 3+ 2- 1- 1+ 4- 1- 1- 1- 6- HCI (pH 8.0), 10 mM EDTA, 2% (w/v) SDS and extracting twice with phenol/ chloroform/isoamylalcohol (50:48:2). After ethanol precipitation, the RNA was redissolved and used for primer extension analysis. The oligodeoxynucleotides aThe B. stearothermophilus TIM is not included in this table because in its used for the RNase H digestion of the RNA were: a 22-mer complementary to published sequence (see Figure 4) no discrimination is made between the 3' end of the mini-exon (De Lange et al., 1983) and an 18-mer aspartic acid and aspargine, and between glutamic acid and glutamine. (5'-CTTCATTATGTACTTCTC) not complementary to the TIM mRNA. Materials and methods Analysis of nascent transcripts The procedures for the analysis of nascent RNA are described by Kooter et al. Isolation of trypanosome TIM gene (1984). Briefly, nuclei were isolated from infected rat blood as quickly as possi- A T. brucei strain 427 cDNA library (De Lange et al., 1984) was screened with ble using a Stansted Disrupter and used either immediately or after storage a yeast TIM probe, consisting of a Bglll and a BglllIXbaI fragment, isolated from at -70°C. In vitro elongation of nascent RNA was allowed to proceed for 4 min plasmid pTPIC10 [this plasmid was kindly provided by Dr D.G.Fraenkel, Har- in the presence of 32P-labelled UTP and terminated by incubation at 700C followed vard Medical School, Boston, MA (Alber and Kawasaki, 1982; Kawasaki and by DNase I digestion. Nascent transcripts were hybridized to blots of cloned DNA Fraenkel, 1982)]. Colony hybridization of filters was performed using standard in the presence of 10% (w/v) dextran sulphate and 100 jig of E. coli tRNA per procedures (Maniatis et al., 1982), at relaxed stringency [5 x SSC (SSC = ml for 18 h. Blots were washed at 65°C in 0.3 x SSC, 0.1% (w/v) SDS. 1297 B.W.Swinkels et al.

Acknowledgements Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74, 5463-5467. We thank Dr D.G.Fraenkel (Harvard Medical School, Boston) for providing the Straus,D. and Gilbert,W. (1985a) Proc. Natl. Acad. Sci. USA, 82, 2014-2018. yeast TIM clone. We are grateful to our colleagues for their helpful advice and Straus,D. and Gilbert,W. (1985b) Mol. Cell. Biol., 5, 3497-3506. to Drs W.G.J.Hol, R.K.Wierenga (University of Groningen) and Drs O.Misset, Straus,D., Raines,R., Kawashima,E., Knowles,J.R. and Gilbert,W. (1985) Proc. P.A.M.Michels and F.R.Opperdoes (International Institute of Cellular and Natl. Acad. Sci. USA, 82, 2272-2276. Molecular Pathology, Brussels) for sharing unpublished results. This work was Wahl,G.M., Stern,M. and Stark,P.G.R. (1979) Proc. Natl. Acad. Sci. USA, 76, supported by The Netherlands Foundation for Chemical Research (SON) with 3683-3687. financial aid from The Netherlands Organization for the Advancement of Pure Wierenga,R.K., Hol,W.G.J., Misset,O. and Opperdoes,F.R. (1984) J. Mol. Biol., Research (ZWO) and a fellowship of the Wellcome Trust to W.C.G. 178, 487-490.

References Received on 7 February 1986; revised on I April 1986 Alber,T. and Kawasaki,G. (1982) J. Mol. Appl. Genet., 1, 419-434. Alber,T., Banner,D.W., Bloomer,A.C., Petsko,G.A., Phillips,D.C., Rivers,P.S. and Wilson,I.A. (1981) Phil. Trans. R. Soc. Lond., 293, 159-171. Artavanis-Tsakonas,S. and Harris,J.J. (1980) Eur. J. Biochem., 108, 599-611. Auffray,C. and Rougeon,F. (1980) Eur. J. Biochem., 107, 303-314. Banner,D.W., Bloomer,A.C., Petsko,G.A., Phillips,D.C., Pogson,C.I., Wilson,I.A., Corran,P.H., Furth,A.J., Milman,J.D., Offord,R.D., Priddle,J.D. and Waley,S.C. (1975) Nature, 225, 609-614. Berk,A.J. and Sharp,P.A. (1977) Cell, 12, 721-732. Berkower,I., Leis,J. and Hurwitz,J. (1973) J. Biol. Chem., 248, 5914-5921. Biggin,M.D., Gibson,T.J. and Hong,G.F. (1983) Proc. Natl. Acad. Sci. USA, 80, 3963-3965. Boothroyd,J.C. (1985) Annu. Rev. Microbiol., 39, 474-502. Borst,P. (1981) In Lloyd,C.W. and Rees,D.A. (eds), Cellular Controls in Dif- ferentiation. Academic Press, London, pp. 231-251. Borst,P. (1986a) Biochim. Biophys. Acta, 866, 179-203. Borst.P. (1986b) Annu. Rev. Biochem., 55, 701-732. Clayton,C.E. (1985) EMBO J., 4, 2977-3003. Corran,P.H. and Waley,S.G. (1975) Biochem. J., 145, 335-344. De Duve,C. (1982) Ann. N.Y Acad. Sci. USA, 386, 1-4. De Lange,T., Liu,A.Y.C., Van Der Ploeg,L.H.T., Borst,P., Tromp,M.C. and Van Boom,J.H. (1983) Cell, 34, 891-900. De Lange,T., Michels,P.A.M., Veerman,H.J.G., Cornelissen,A.W.C.A. and Borst,P. (1984) Nucleic Acids Res., 12, 3777-3790. Donis-Keller,H. (1979) Nucleic Acids Res., 7, 179-192. Gibson,W.C., Osinga,K.A., Michels,P.A.M. and Borst,P. (1985) Mol. Biochem. Parasitol., 16, 231-242. Hoeijmakers,J.H.J., Borst,P., Van den Burg,J., Weissmann,C. and Cross,G.A.M. (1980) Gene, 8, 391-417. Jeffreys,A.J. and Flavell,R.A. (1977) Cell, 12, 429-439. Johnson,L.N. and Waley,S.G. (1967) J. Mol. Biol., 29, 321-322. Kawasaki,G. and Fraenkel,D.G. (1982) Biochem. Biophys. Res. Commun., 108, 1107-1112. Kolb,E., Harris,J.J. and Bridgen,J. (1974) Biochem. J., 137, 185-197. Kooter,J.M., De Lange,T. and Borst,P. (1984) EMBO J., 3, 2387-2392. Lazarow,P.B. and Fujiki,Y. (1985) Annu. Rev. Cell. Biol., 1, 489-530. Lazarow,P.B., Shio,H. and Robbi,M. (1980) In Weiss,H., Bucher,T. and Sebald,W. (eds), 7he Proceedings ofthe 31st Mosbach Colloquium, Biological Chemistry of Formation. Springer-Verlag, Berlin, pp. 187-206. Maniatis,T., Fritsch,E.F. and Sambrook,J. (1982) Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Har- bor, NY. Maquat,L.E., Chilote,R. and Ryan,P.M. (1985) J. Biol. Chem., 260, 3748-3753. Marrug,J.E., Piel,N., McLaughlin,L.W., Tromp,M.C., Veeneman,G.H., Van der Marel,G.A. and Van Boom,J.H. (1984) Nucleic Acids Res., 12, 8639-8651. Maxam,A. and Gilbert,W. (1980) Methods Enzymol., 65, 499-560. Messing,J. (1983) Methods Enzymol., 101, 20-79. Meyerhof,O. and Lohmann,K. (1934) Biochem. Z., 273, 413-418. Michels,P.A.M., Liu,A.Y.C., Bernards,A., Sloof,P., Van der Bijl,M.M.W., Schinkel,A.H., Menke,H.H., Borst,P., Veeneman,G.H., Tromp,M.C. and Van Boom,J.H. (1983) J. Mol. Biol., 166, 537-556. Misset,O. and Opperdoes,F.R. (1984) Eur. J. Biochem., 144, 475-483. Opperdoes,F.R. (1985) Br. Med. Bull., 41, 130-136. Opperdoes,F.R. and Borst,P. (1977) FEBS Lett., 80, 360-364. Opperdoes,F.R., Baudhuin,P., Coppens,I., De Roe,C., Edwards,S.W., Wei- jers,P.J. and Misset,O. (1984) J. Cell Biol., 98, 1178-1184. Osinga,K.A., Swinkels,B.W., Gibson,W.C., Borst,P., Venneman,G.H., Van Boom,J.H., Michels,P.A.M. and Opperdoes,F.R. (1985) EMBO J., 4, 3811-3817. Pichersky,E., Gottlich,L.D. and Hess,J.F. (1984) Mol. Gen. Genet., 195, 314-320. Rigby,P.W.J., Dieckmann,M., Rhodes,C. and Berg,P. (1977) J. Mol. Biol., 113, 1237-251. 1298