Proc. Nati. Acad. Sci. USA Vol. 83, pp. 2584-2588, April 1986 Genetics Primary structure of the encoding the bifunctional dihydrofolate - of Leishmania major (/protein homologies/intervening sequences/gene ampliflcation/protozoan parasites) STEPHEN M. BEVERLEY, THOMAS E. ELLENBERGER, AND JOHN S. CORDINGLEY Department of Pharmacology, Harvard Medical School, Boston, MA 02125 Communicated by George H. Hitchings, December 24, 1985

ABSTRACT We have determined the sequence within the R region encoding the bifunctional DHFR- of the -thymidylate synthetase (DHFR- TS of Leishmania major. TS) gene of the protozoan parasite Leishmania major (dihydrofolate reductase, EC 1.5.1.3 and thymidylate syn- MATERIALS AND METHODS thase, EC 2.1.1.45). The DHFR-TS protein is encoded by a Cells, Genomic, and Recombinant . The LT 252 cell single 1560-base-pair open reading frame within genomic line (11) and the -resistant R1000 derivative cell DNA, in contrast to vertebrate DHFRs or mouse and phage T4 line (6) were utilized as sources of genomic DNA and RNA TSs, which contain intervening sequences. Comparisons of the as described (10). We have shown by DNA hybridization and DHFR-TS sequence with DHFR and TS sequences of other serological tests that this line actually is of the organisms indicate that (i) the order of enzymatic activities Leishmania major (L. tropica major; S.M.B. and D. McMahon- within the bifunctional polypeptide chain is DHFR followed by Pratt, unpublished data). A 4.5-kilobase (kb) Sal I fragment TS, (ii) the Leishmania bifunctional DHFR-TS evolved inde- from phage X LTS-5 (10) was isolated and inserted into a pUC pendently and not through a phage T4-related intermediate, vector (12); this recombinant is named pLTS-5-S45. and (iii) the rate ofevolution ofboth the DHFR and TS domains cDNA Library. A cDNA library of polyadenylylated has not detectably changed despite the acquisition of new mRNA from R1000 promastigotes was constructed in the functional properties by the bifunctional enzyme. The Leish- bacteriophage vectors XgtlO (13) and Xgtll (14) using the mania gene is 86% G+C in the third codon position, in contrast protocol of T. St. John, J. Rosen, and H. Gershenfeld to of the parasite Plasmodiumfalciparum, which exhibit (personal communication), and screened with pLTS-5-S45 an opposite bias toward A+T. The DHFR-TS is encoded (15). Approximately 400,000 independent recombinants were within a region of DNA amplified in methotrexate-resistant obtained for each library. The largest recombinant insert, one lines, as previously proposed. of 0.6 kb in XgtlO, was inserted into M13 vectors for DNA sequencing. The dihydrofolate reductase (DHFR; EC 1.5.1.3) DNA Sequencing. DNA sequencing was performed using and thymidylate synthase (TS; EC 2.1.1.45) have critical the dideoxynucleotide technique (16) with 3"S-labeled roles in intermediary and consequently are im- deoxynucleotide triphosphates (17), and Leishmania-derived portant targets for chemotherapy. Drugs used in the treat- DNAs were inserted into, M13 vectors (18). The entire ment of human tumors that inhibit DHFR and TS are sequence was determined on both strands. methotrexate and metabolites of 5-, respectively (1, 2). Other such as and pyrimeth- RESULTS amine have been employed in the treatment of pathogens (3, Sequence of the DHFR-TS Region. Analysis of mRNAs 4). In most organisms DHFR and TS exist as separate encoded by the amplified R region of the methotrexate- molecular entities, DHFR as a monomer of about 20 kDa and resistant R1000 line of Leishmania (6) has indicated that a TS as a dimer made up of --35-kDa subunits (5). In contrast, 3.2-kb polyadenylylated mRNA encodes the bifunctional in Leishmania as well as all protists examined to date, these DHFR-TS (refs. 19 and 20; G. Kapler and S.M.B., unpub- enzymes are part of a bifunctional DHFR-TS complex, lished results). This region is entirely contained within a consisting of a homodimer of a 54- to 100-kDa polypeptide 4.5-kb Sal I fragment (Fig. 1), which is also amplified within chain (6-8). This bifunctional enzyme exhibits distinct bio- the R region of the methotrexate-resistant R1000 line (6, 10). chemical properties such as metabolic channeling (9) and By restriction enzyme mapping, this region of DNA appears of methotrexate and 2'-deoxy-5-fluorouri- to be identical in the wild-type and R1000 lines (10), as is the disparate binding DHFR-TS protein synthesized as judged by biochemical dine monophosphate (5-FdUMP) (8, 9). Detailed molecular criteria (6, 9). and biochemical studies may reveal additional features that DNA sequencing ofpLTS-5-S45 reveals a single large open will prove useful in developing agents that may selectively reading frame, beginning at the ATG initiation codon num- inhibit the Leishmania enzyme, analogous to the selective bered as base 1 and proceeding for 1560 bp (Figs. 1 and 2). inhibition of Plasmodium DHFR by (4). This ATG is the first between the long open reading frame and Mutants of Leishmania have been obtained that overpro- the 5' side of the mRNA, as revealed by S1 analysis duce the bifunctional enzyme DHFR-TS, thus facilitating its (unpublished data). The derived sequence of this isolation and characterization (6, 9). Mutants exhibiting protein includes the Met-Asp-Leu-Gly-Pro-Val- enzyme overproduction also exhibit amplification of -Tyr-Gly-Phe-Gln-Trp-Arg-His-Phe-Xaa-Ala-Asp-Tyr-Lys- extrachromosomal circular DNAs (10), one of which is -Xaa-Phe-Glu-Ala-Asn-Tyr-Xaa-Gly-Glu that differs by one correlated with enzyme overproduction (the R region). We amino acid from an internal peptide of the DHFR-TS of a have now determined the nucleotide sequence of the gene 5,8-dideaza-10-propargyl -resistant line of Leishmania

The publication costs ofthis article were defrayed in part by page charge Abbreviations: DHFR, dihydrofolate reductase; TS, thymidylate payment. This article must therefore be hereby marked "advertisement" synthase; 5-FdUMP, 2'-deoxy-5-fluorouridine monophosphate; bp, in accordance with 18 U.S.C. §1734 solely to indicate this fact. (s); kb, kilobase(s). 2584 Downloaded by guest on September 25, 2021 Genetics: Beverley et al. Proc. Natl. Acad. Sci. USA 83 (1986) 2585

300 bp There is a nonrandom utilization of the four nucleotide bases at degenerate codon positions ofthe DHFR-TS. Within S P R P P T the coding region the first codon position is 62% G+C, the A I I cDNA second position 41% G+C, and the third position 86% G+C. Is -4 Neither the 3'- nor the 5'-untranslated regions show a positional bias. In contrast, genes ofthe intracellular parasite B ^ DHFR T S AAAA Plasmodiumfalciparum exhibit an extreme A+T bias at third positions as well as within the genome as a whole (35). FIG. 1. Analysis of the DHFR-TS gene ofLeishmania maor. (A) Conservation of Amino Acid Sequence. The amino acid Restriction map of the 4.5-kb Sal I fragment located within the R sequence alignment presented in Fig. 3 indicates that certain region of Leishmania major. The orientation of this map is opposite residues are highly conserved among DHFR and TS se- to that shown in figure 1B of ref. 10. The symbols correspond to quences. In the DHFR domain these residues include many restriction enzyme cleavage sites as follows: P, Pvu II; R, EcoRI; B, of those implicated in the interactions of DHFR with sub- T, Sst I. The of DNA was Bgl II; I; S, Sal region sequenced between strates, the EcoRI site and a Sph I site (not shown) approximately 300 bp on cofactors, and/or inhibitors (31). Of the 18 residues the 3' side of the Sst I site. The location of the 0.6-kb cDNA clone shared in all eukaryotic DHFRs, the Leishmania DHFR is also indicated. (B) Summary offeatures ofthe DHFR-TS gene. The domain shares 15 (positions 32, 39, 40, 46, 47, 56, 60, 80, 83, location of the single large open reading frame of 1560 bp within the 94, 97, 102, 157, 158, and 180). For TS two separate regions mRNA is indicated by the thick region. The 22-amino acid have been identified in ligand binding studies: a 5-FdUMP hydrophilic leader sequence is shown as a solid region, and the , located at the conserved tripeptide Pro-Cys-His location of the DHFR and TS domains are taken from the alignments ofthe TS in many species (5) (residues 399-401 in Fig. 3), and in Fig. 3. The location of the poly(A) tract of the mRNA as revealed a folylpolyglutamate binding site in Lactobacillus casei (36) sequence of the cDNA is also shown. by analysis (residues 279-292). Both of these sites are conserved in the Leishmania TS domain. There are additionally several other (21). (The histidine at position 13 within the peptide was regions within the TS domain that are highly conserved identified as glycine by these authors. DNA sequencing of among all species and that presumably represent functionally both strands in this region unambiguously predicts histi- important residues. The Leishmania TS domain shares an dine.) The predicted DHFR-TS has a molecular weight of insertion of 12 amino acids with the human sequence (resi- 58,684, somewhat larger than the 57-kDa subunit molecular dues 322-333), the site ofwhich has been shown to be a target size reported for Leishmania and other trypanosomatids for protease inactivation (21). (6-8). This may reflect the accuracy of gel electrophoresis Relationships of the Bifunctional DHFR-TS Enzyme to methods for estimating protein molecular weights. Alterna- Other Species. Table 1 shows that that Leishmania TS and tively, the DHFR-TS may be processed in vivo to a smaller DHFR are both more closely related to vertebrate enzymes final product. Alignment of the Leishmania DHFR-TS with than to the prokaryotic or phage T4 enzymes. The DHFR DHFRs of diverse origin indicates that the Leishmania gene domain of the Leishmania DHFR-TS exhibits 37% amino bears a hydrophilic amino-terminal extension of 22 amino acid with vertebrate DHFRs, 29% ho- acids (Fig. 3). The amino terminus of the mature DHFR-TS mology with prokaryotes, and 18% homology with phage T4. has been reported to be blocked (9, 21), and the precise Similarly, the TS domain of the DHFR-TS exhibits 63% amino terminus of the DHFR-TS is currently unknown. homology with vertebrates, 48% homology with prokaryotes, Another 1040 bp of sequence follow the open reading and 45% homology with phage T4. These findings are in frame, at which point the sequence ofthe cDNA and genomic accordance with the current view of the evolutionary rela- sequences diverge, corresponding to the poly(A) tract of the tionships among metazoa, protists, and prokaryotes (37). mRNA. The region between the long open reading frame and Examination of the evolutionary insertions of amino acid the poly(A) tract contains no long open reading frames. As sequence within both the TS and DHFR genes confirms that reported previously for genes of the related parasite Trypan- the Leishmania DHFR-TS is more closely related to the osoma brucei, we can find no eukaryotic consensus poly(A) vertebrate genes than to either prokaryotic or phage T4 addition signal (AATAAA) prior to the poly(A) tract (32, 33). genes. Within the TS genes there are six introduced gaps Furthermore, we cannot find a consensus homology in this among the five sequences. Two ofthese are specific for phage region with the corresponding regions in other trypanosome T4 (positions 306 and 498) and two are specific for Leishma- genes (33, 34). nia (positions 275 and 411). The remaining two (positions 322 Comparison of the Predicted Leishmania DHFR-TS Se- and 349) are complex, in that the insertion boundaries are quence with Other DHFR and TS Sequences. The amino acid variable among species. Examination of these insertions sequence of the DHFR-TS enzyme was derived from the reveals that the Leishmania and human TS sequences are nucleotide sequence and compared with published DHFR precisely the same length and homologous at the amino acid and TS sequences (Fig. 3, Table 1). It is evident that the sequence level, in contrast to the comparisons involving the amino-terminal region of the Leishmania DHFR-TS exhibits other prokaryotic and phage sequences. Similar findings are significant sequence homology (up to 37%) with DHFRs of evident in the insertions found within the DHFR sequence diverse origin, whereas the carboxyl-terminal region of the alignments; however, as the degree of sequence homology leishmanial protein exhibits homology (up to 63%) with TS among these genes is considerably less than for TS, there is sequences. The DHFR-TS also exhibits a hydrophilic 22- less certainty about the exact placement of the sites of amino acid extension prior to the DHFR domain. The homol- insertion. ogies with both DHFR and TS sequences include amino acids Relative Rates of Evolution of Bifunctional vs. Monofunc- known to be important in ligand binding and/or located at the tional DHFRs and TS. It is possible that new properties ofthe catalytic site (discussed below). We, therefore, infer that the bifunctional DHFR-TS relative to monofunctional DHFR leishmanial protein is encoded as a fused polypeptide chain of and TS may have given rise to new evolutionary pressures, two separate domains, an amino-terminal DHFR domain fol- resulting in a change in the rate of evolution ofthe DHFR-TS lowed by a carboxyl-terminal TS domain. Our genomic DNA genes. We compared the Leishmania and human genes sequence data indicate the Leishmania DHFR gene does not separately with the more distantly related prokaryote and contain intervening sequences, an observation confirmed by S1 phage genes (Table 2). Alterations in evolutionary rate would nuclease analysis oftheDHFR-coding region ofthe Leishmania be revealed by differences in the distance between the gene (data not shown). outside reference group and the Leishmania or human Downloaded by guest on September 25, 2021 2586 Genetics: Beverley et al. Proc. Natl. Acad. Sci. USA 83 (1986)

-1 TGGCGGACGAAACTCGCACACAGGCACGCCGCCTCCTTTCACCCGTCATAGArAGTTGAArrAGACGCCCTCCTCCTCCCTCATCATCGCCGTCGTCATCCGGGTCCGAGCACTACGAAG-120 AiTGTCCAGGGCAGCTGCGAGGTTTAAGATTCCGATGCCGGAGACGCAACGWAGACTTTCTTTCCCCTCCCTGCGCGCCTTCTCCATCGTCGTCGCCTCGrATATGCAGCACGCCATCGOC1 PvuII 120 MetSerArgAloAlaAlaArgPheLysI leProMetPro0luThrLySAlaAspPheAl1PheProSerLeuArgAlaPheSerI1eValValAlILeuAspMet~lnHisGlyIleGly 240 GACGGCGAGTCGATCCCGTGGCGGGTGCCCCGAGGACATGACG'TTC'rTC AAGAACCAGACGACCCGCTGTGWAACAAGAAGCCGCCGACGCACA AGA AGCGCAACGCCGTCCTGATGGGC AspGlyGluSerI leProTrpArgVal1ProGl1uAspMetTh rP hePheLysAsnGlnThrThrL~euLeuArgAsnLY3LysProP roThrC 1uLysLysA rgAsnAl aVal1ValMet~ly 360 CGCAAGACTTOGGAGAGCGTCCCGGTAA AGTTCCGACCACTCAACCGGACWT;GCAACATCGTGTTATCCTCGAAGGCCACCGTC(;AGGAWCTTCTGCGCCGCI'GlCCGGA~tibGAC ACC(;C ArgLysThrTrpGl uSerVa1ProVa LysPheA rgProLeu LyslyArgLeuAsnIleVaLeuSerSerLysAaThrVa1CuGluLeuLeuA1aProLeuProGlu1yG1nArg 480 GCGGCGGCGGCGCAGGATGTGGTGGrGGTGA ACGGCGGTC TGCGCCACGCGCTCCGCCTCCTCGCACGCCCGCTGCTACTGCAGTC CATCGAGCAtC~sGTATTWcGTCGGTG( T¢GCWA AleAlsAaArlaGlnAspVlValVaIVylAsnGlyGlyyLeuAlaluAlaLeuArgLeuLeuA1ArgProLeuTyrCysSerSerIleC1uThrAlaTyrCysVal~ly~lyA!aGin bOO GSTTTACGCGGACGCCATGCTGTCGCCGTGCATCGAGAAACTGCAGGAAGTGTACCTCACCCGCATCTACCGCCAUCGtCCCTGCGCTCTACGCGCTTCTTTCCCTTTCCGCCCCGAG^AACGtCG ValTyrAlaAspAlaIetIeUSerProCY3IleGluLY3LeuGflnAOI TyrLeuThrArgl leTyrAlaThrAlaProAlaCysThrArgPhePheProPheProProGluAsnAla BglII 7zo G(CCACGG;CGTGGGAWCTGGCGTCGTCTCAGGGACGCCWAAGAWCGAGGCGGAGGGCCCTCGAGTTCGAGATCTCCAAG'TACCTGCCGCGCAACCAC;AC(;GAGCCCAGTACC rTGAGCTG tlaThrAlsTrpAspLeuAlaSerSerGlnGlyArgArtLysSerGluAlaOluGl yLeuGluPheGCuIl eCysLysTyrVal ProArgAsnHisul us;uArgGlnryrLeuGluLeu 840 ATTGACCGCATCATGACACGGGGATCGTGAAGGAGGACCGCACCGGC GTGCGCACCATCAGCCTCTTCCG;CWCCAGATcGCWT'rCTC( CTA CGCGACAACcCUt,tI'CCXuTGCTGiACG

PavII 960 A^CGAA^GCGrGTCTTCTGGCGCGGCGTGTGCGAGGACCGCTCXTCC~jTTCCTUCeCC;GGC#^AACGAGTUCCCAGCTGCTGC;CAG;ACAAGCAACATTCACATCTGGGACWCAACCGTTCGCGC ThrLysArgValPheTrp~r5GlyValCysluGl uLeu LeuTrpPheLeuA rgGl yl uThrSerA1al nLeu LeuA laAspLysAspI eHisI leTrpAspulyAsnblySerArg 1080 GAGTTTCTCGA^CAGCCGCGC;CTTGAsCAGA^GAATAAGGAGATGGUACCTC CGCCCTGTCTACGGCTTCCAC rvCcCCACTTCUGGiGCACA rTACAAGCGGG1TTGA AGCGAA(;TACGACGOC GluPheLuspSerAr&GlyLevThrGluAsnLysGlUeApLeuGlyProVa lTyr~lyPheGl n'rrp~rlis Phe~lyA1AspTy rLysGl y PheG1uA1aAsnTyrAspGly 1200 G^AAGOGT~GGACCAGAkTCAAGCTbCATCGTGGAGACCATCAAGACGAACCCAA^CGACCGCCGCCTC'T AtiTCACTGCCTGC;AACCCG rGCGCGCT(;CAAAAG;ATG;CGCTUCCCCCCCTCC G1uGlyValAspGlnIleLysLeuIleValGluThrIleLysThrAfsnProAsnAspArgArgLeuLeuVai rhrAiaTrpAsnProCysAlaLeuo;l!nLysMetAlaLeuProProCys 1320 CACCTTGCTTCCAGTTCTACGTGAACACAGACACGAGCGAGCTATCCTGCATGTTGTACCAGCGCT'CGTGTGACATGGGTC'rTGGCGTCCCCTTCAACATTGCCTCCTACGCGCTGCTC HisL~LeuAl&Gl nPheTyrValAsnThrA pThrbrGlu Leu~rCys tLeuTyrGl nArrCysAspMetGl yLeuGl yVal ProPheAsn I leAlaSerTyrAlaLeuLeu 1440 ACCATCCTCATT~OCAAGGCGACGGCTCTGCGOCTGGTGAGCTTGTGCACACCCTCGGCGACGCCCACGTCTACCGCAACCACGTTGATGCCCTCAAGGCGCA GCTCGAGCGAGTCCCG ThlrIleLouIle~llLysAloThrly L uArgPr~o~lyGl uLeuValHlsThrL u~ly~spAlaHilsV lTyrArgAsnHi sVa1AspA1aLeuLysAlaGl nLeuCluArgV alPro 1560 CWCCCGfTCCCGACCCTCATCrTCAAGCGAGGGCGGCCGTACCTCGAGGACTACGAGTTGACGGACATGGAGCTGATCGACTACGTTCCACACCCCGGCGATCAAGATGGAGATGGCCGTA Nl&AloPbe~roThrLe~ll PheL~ys~luGluArgGlnTyrLeuGlu~spTyr~l uL uThrAsp~et~luVel I eAspTyrVa 1Pr~oHl ProAl1aIle LysMetGl uMetA laVal TAGAGAGAGGGAGGGTGTCATGTCCGsTTGTATGCATGCAGCCACCGCCGC'GACGCTGCC;CTCACCTGliCTCCCACCTCCTCGTCGCACGACGACCGCCCCTTGCGCAGACTCGTTGGT*.. ~~~~~~~~~~~~~~~~~~~~~~~1680 1800 AACCATGAGCGGCGCGCGAGGGTACGCGCCT&CGTTTCTCGCACATGGCTGCGGCTGATATCCCGCCCGCACCCGCGCGCGCTCAGGCCGCGrGTCG;TCG;TWTCTTTCCATTTrTTTTT 1920 FIG. 2. Sequence of the DHFR-TS GATTTGG^GCTGCTCTCCGTTGTGTGCTGGGGACCCTCCrTCCC TCGATCTCCTC GTGCGGGGTCTCCGACWCGCAGACGCGGTGCGTGCGAGCGTGCACGTGCTGT gene of Leishmania major. The nucleo- PvuII 2040 tide sequence is numbered with position CCCMMVOCTOTAOTTGCGAGCGaGAGGAGAGAGAGAGGGAGGGGGGAGGGCAGAGGGCAGAGGACATC;CGGGTGGGAACGTGCACCGGCCTCGTCTCACGCAGCTGGAGCCCACGAAT 1 as the first base of the initiation codon. 2160 The underlined peptide corresponds to WCCACCACCACACCCTcTsTCCCCCCCCCTCCCTCTTCTTCCCACGGCGGCGACGACGACGGGGGCTCAGCTCACGCTTCTGTAGGGTATTATATTAAAGCACATGTGCC;TGCTGTGCTT that determined from partial protein se- 2280 CCMC~rCTCTTMaGGcGTTcGCrTTCAAGTCCACGACTCCTCCCGTTGCTCACCGCCGGGCTGTCTTCTTTCGCCCCTCCCTCGGTGCCTCTTCCGCCGCCGCACOCGTGCGCTGAT quencing reported for the DHFR-TS from a 5,8-dideaza-lO-propargyl folate- 2400 CACO=tCTTOCGTGATGTGCCTGTC;TGTATACACACCGTGCACGGAGAGAGCGAGCGAGCGAGAGAGAGAGAGGCCGAGAGGAAGAATGWCGGTGCCTCGCGTGGGCAAGCGTGCGCG resistant line by Garvey and Santi (21). The terminal portion of the cDNA se- OmGrffOTGTCGTOCOCCGCGTG^GATCGTCGCCAGCOCAAGACCCCCGTGCGCCTGAAACC;TGAAGGGAGG;TCGAGAirGCGTGCCGCA'TGAGGCCTTAAGGCAGGlkA'AGTGAAAAGAG2520 quence is shown at position 2598; the sequence to the 5' end of this clone is 3st I , , 2640 CTCGACG~OTGACCAGCACGCGCGACATCG^AcACGACGGAATAGACACTCAOCCCCTTCCCCTTTCAAAAACTGAATGGACGGACATCGTAACGCGCTCTCTCCCCTCCACTCCCCCT identical to the genomic sequence CDNA:.. CCCCTTTCAAAAACTGAAAAAAn shown.

lineages. It is evident that there are no significant or consist- DHFR and TS form two successive steps of the metabolic ent differences between the human and Leishmania DHFR or cycle responsible for de novo synthesis of thymidylate in TS domains in these tests, indicating that the rate ofevolution Leishmania (6). Significant interactions of DHFR and TS of DHFR and TS has not been detectably altered upon occur within the bifunctional enzyme complex of Leishma- formation of the bifunctional enzyme. Furthermore, each nia, as Meek et al. (9) have shown that the bifunctional protein domain has maintained its characteristic rate of DHFR-TS exhibits channeling of dihydrofolate, a product of evolution, with DHFR evolving more rapidly than TS. the TS-catalyzed reaction and the substrate of DHFR. Furthermore, the bifunctional DHFR-TS of Leishmania and DISCUSSION Crithidia exhibit disparate binding of the usually stoichio- In earlier work we described the selection and properties of metric ligand methotrexate, with 0.5 mol bound per mol of the methotrexate-resistant R1000 line of Leishmania. This DHFR-TS subunit (8, 9). The disparate binding may be line exhibits unstable amplification of two regions of DNA, eliminated by limited proteolysis (21). In contrast, it has also and overproduction of the bifunctional DHFR-TS enzyme. been shown that the DHFR and TS domains may be sepa- As enzyme overproduction and R region amplification were rately inhibited by agents selective for one activity or the invariably associated (6, 10), we proposed that the R region other, suggesting that the two domains retain a measure of encoded the structural gene for DHFR-TS. We have now independence (8, 9). Correspondingly, we can find no evi- confirmed by nucleotide sequencing and evolutionary com- dence indicating that DHFR and TS sequences have become parisons with other DHFR and TS genes that the R region intermixed within the bifunctional DHFR (Fig. 3). Further- does, in fact, encode the structural gene for the DHFR-TS of more, evolutionary comparisons indicate that bifunctionality Leishmania. has not grossly altered the rate ofevolution ofthe Leishmania Downloaded by guest on September 25, 2021 Genetics: Beverley et A Proc. Natl. Acad. Sci. USA 83 (1986) 2587 20

M

140 160 180 200 220

*1*. . * .* * LM LAEALRLLARPLYCSS.IETAYCVGGAQVYADAMLSPCIEKLQEVYLTR IY ATAPACTRFFPF'PPENAATAWDLASSOGRRKSEAEGLEFEICKYVPRNHE H LODALKLTEQPELANKVDMVWIVGGSSVYKEAMNHPGHLKL---FVTRIMQDFESDT-FFPEIDLEKYKLLPEYPGV LSDVQEEKGIKYKFEVYEKON EC VDEA IAACGDVP ------EIMVIGGGRVYEQFL--PKAOKL---YLTHIDAEVEGDT-HFPDYEPDDWESVFS ----- EFHDADAONSHSYCFEIO'ER LVC VAAVFAYAKOHLDQ----ELVIAGGAQIFTAFK--DDVDTL---LVTRLAGSFEGDT-KMIPLNWDDFTKVSS------RTVEDrN-PALTHTYEVWQKKA T4 FYITWEQYT.------. YISGGEQVSSPNAPFETML---DQNSKVSVIGGPA-LLYAALPYADEVVVS.------.RIVHR---VNSTVOLDASLDDeILKRLIIVETHWYKIDEV

T4 TTLTESVYK

H MPVAGSELPHRPLPPAAQERDAEPHPPHG B. TS domain LC ML

240 260 280 300 320 ***tf * * .*f ** ** , * * t** . *** *.* **.ttttt *** f**tf *t *** LM ERQYLELIDRIMKTGIVKEDRTCVGTISLFGAQM8FSLRDNRLPLLTTKRVFWR5VCEELLWFLRGETSAQLL --- 00--ADK3IHIWDGNGSBEFLDShGLTENK E------H ELQYLGQIQHILRCCVRKDDRTGTGTL.SVFGMQARYSLRDE-FPLLTTKRVFWKGVLEELLWFI KGSTiAKEL ------SSK5VKI'OWDANOORDFLDSLOiFS'OTREE ------EC MKQYLELMQKVLDEGTQKND8 IGTGTLSIFHOMRFNLQDG-FFL CLVTTKRCHLRSIIHELLWFLQ)TN IA --L.---.HENNV DEWADEN.------LC EUPY LD LAKKVLDEGHFKPDRTHTGTYS IFGHOMRFDLSKG-FPLL'rTK4VPFGLI KSELLWFLHGDTNI RF L------LQHRNH IWDEWAFE KWV KSOLY H'iPDM'I'iDF '8HFS KDP EF T4 MKOYQDLIKDIFENGYETDDRTGTGTIALFGSKLRWDLTKG-FPAVTTKKLAWKACI AE LIWFLSGSTNVNDL8LIQHDSLIQGKTVWDENYENAKDLGYh.S-.--.------

340 360 380 400 420

LM ------MDLGPVYGFQWRHFGAADYKGFEANYDG EGVDQIKLIVETIKT NPNDRRLLVTAWNPCALQKMALPPCHLLAQFYVNTDi'SE LSCMLYORSCDIIGL H ------GDLGPVYGFQWHHFGAEYRDMESDYSGQGVDQLORV IDTIKTNPDDRR II MCAWNP RD LPLMA LPPCHA LCQFYVV' 1--SELSCQOYUHSGDMGL EC ------L------GDLPVYGKaWRAWP------TPDGRHIDQOTTVLNQLKNDFDSFFIVSAWN VGELDKIA LAPCAFFUFYVAD--GKL2CQLYQSCDV FL LC AAVYHEEMAKFDD8VLHDDAFAAKYGDLGLVOYGSQWRAWH------TSKGDTIDQLGDVIEQIKTHPYSRRLIVSAWNPEDVPTMA LPPCHTLYQFYVND--0'iKLL'L( LYURSADIFL T4------A------GELGPIYGKQWRDFG0------GVDQI I EV I DRIKKLPNDRRUQISA WN PAELKYMALP PC HMFYQFNV RN --VOY LL LOWY L0S0VDV FL

440 460 480 500

LM GV PFNIASYA LLTILIAKATGLRPGELVHTLGDAHVYRNHVDALKAQLER VPHAF PT LIFKEERQY LEDY0------ELIDMEVIDY VPHPAIKMEMAV H GVPFNIASYALLTYMIAHITOLKPGDFIHTLGDAHIYLNHIEPLKIQLQREPRPFPKLRLRKVEKIDDF------KAEDFQIESYNPHPTIKMEMAJ EC GLPFNIASYALLVHMMAQQCDLEVGDFVWTGGD'EHLYSNHMD Q'rHLQLSREPRPLPKLI IKKPEIFDY------RFEDFEIEGYDPHPGIKAPVAI LC GVPFNIASYALLTHLVAHECGLEVGEFIHTFGDALYVNHLDOKEQLSTPRPAPTLQLNiDKoHDIFDF------DMKDIKLLNYDPYPAI4A0AV T4 GLPFN IASYA TLVHIV AKCN LIPGD LIFSGGNTH IYMNHV EQCKE ILR REPKELYE LV ISGLPYKF RYLST KEQLKY VLK LRP KDFV LNNYVSHPP I KKMAV

FIG. 3. Alignment of the DHFR-TS of Leishmania with DHFRs and TSs of other species. Amino acid residues are abbreviated using the single-letter code. Stars above amino acid positions indicate residues shared by four or more of the protein sequences. Abbreviations and references for the species source of enzyme are: LM, Leishmania major; H, human (22-24); EC, (25, 26); LC, L. casei (27, 28); and T4, phage T4 (29, 30). The initiator methionine of the Leishmania DHFR-TS is assigned position 1; the carboxyl-terminal extension of T4 DHFR and the amino-terminal extensions of human and L. casei TS lacking homology with Leishmania sequence are not numbered in this figure. (A) DHFR domain. The amino-terminal region of the DHFR-TS of Leishmania shown in Fig. 2 was aligned with other DHFRs, following the alignments based upon crystallographic data (31). The carboxyl-terminal extension of the phage T4 DHFR protein is shown separately beneath the five aligned sequences. (B) TS domain. The carboxyl-terminal region of the DHFR-TS of Leishmania shown in Fig. 2 was aligned with the indicated TSs. The alignment follows that presented in ref. 24, to which the Leishmania sequence was added. The underlined regions at positions 399-401 and 279-292 correspond to the 5-FdUMP and folyl polyglutamate binding sites, respectively. The amino-terminal extensions of the human and L. casei enzymes are shown preceding the alignment of the five species TS. DHFR-TS relative to the monofunctional DHFR and TS of characterized to date (6-8). Within this phage genome the vertebrates (Table 2). DHFR and TS coding sequences are adjacent and arranged so Origin of the Bifunctional Enzyme. Our data allow a test of that the termination codon of the DHFR gene (located on the the hypothesis advanced by several laboratories (21, 29, 30) 5' side of the TS gene) overlaps the initiation codon of the TS that the bacteriophage T4 DHFR and TS genes (or some gene. Were the T4-related hypothesis correct, we would relative thereof) were an evolutionary precursor of the expect the bifunctional DHFR-TS to be most closely related bifunctional DHFR-TS found in Leishmania and all protists to the T4 genes. In fact, the Leishmania DHFR-TS gene is more similar to the unlinked monofunctional DHFR and TS Table 1. Amino acid similarity of Leishmania DHFR-TS with genes of eukaryotes than to the unlinked genes of prokary- DHFRs and TSs of other species otes, and least similar to the adjacent phage T4 genes (Table Amino acids 1). Furthermore, the DHFR-TS enzyme apparently lacks shared with sequences related to the carboxyl-terminal extension of the Leishmania T4 DHFR (Fig. 3). Thus, the model invoking an DHFR-TS domain,* % intermediate related to phage T4 receives no support from Species DHFR TS comparison of amino acid sequences. It is possible that the of DHFR-TS resem- Vertebrates 37t 63* evolutionary precursor independently Prokaryotes bled the T4 gene arrangement, though a simple fusion of two E. coli 32 47 Table 2. Sequence comparisons among DHFRs and TSs L. casei 28 50 S. faecium 26 Outside Amino acid sequence Phage T4 18 45 reference similarity, % Human *For those DHFRs not shown in Fig. 3, sequences were aligned as Domain group* Leishmania in ref. 31. Ifa gap was inserted into one or more sequences, residues DHFR Prokaryotes (3) 29 29 at these positions were not included in these calculations. The T4 18 15 literature references for the protein sequences utilized are found in TS Prokaryotes (2) 48 54 the legend to Fig. 3. T4 45 49 tIncludes human, cow, mouse, pig, and chicken, all which are 37% similar. *The number in parentheses refers to the number of sequence tHuman only. comparisons averaged in columns 3 and 4. Downloaded by guest on September 25, 2021 2588 Genetics: Beverley et al. Proc. Natl. Acad. Sci. USA 83 (1986) independent genes is equally plausible. An alternative pro- 11. Ebrahimzadeh, A. & Jones, T. C. (1983) Am. J. Trop. Med. posal would be that the T4 genes have for some reason Hyg. 32, 694-702. evolved so rapidly that we cannot discern the true evolution- 12. Vieira, J. & Messing, J. (1982) Gene 19, 259-268. ary history, a possibility that currently can neither be 13. Huynh, T. V., Young, R. A. & Davis, R. W. (1985) in DNA Cloning Techniques: A Practical Approach, ed. Glover, D. excluded nor tested. (IRL, Oxford), pp. 49-78. It is intriguing that two enzymes that exhibit metabolic 14. Young, R. A. & Davis, R. W. (1983) Proc. Natl. Acad. Sci. channeling and that form part of a metabolic cycle necessary USA 80, 1194-1198. for the de novo synthesis of thymidylate are joined into a 15. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular single polypeptide chain. Several laboratories have proposed Cloning: A Laboratory Manual (Cold Spring Harbor Labora- that a multienzyme complex is responsible for the in vivo tory, Cold Spring Harbor, NY). synthesis of DNA (38-40), which in mammalian cells may 16. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. include both TS and DHFR (38). It is possible that the Acad. Sci. USA 74, 5463-5467. 17. Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983) Proc. Natl. bifunctional DHFR-TS of Leishmania and all protists repre- Acad. Sci. USA 80, 3%3-3965. sents a covalent stabilization of an interaction that is nor- 18. Messing, J. (1983) Methods Enzymol. 101, 40-98. mally noncovalent in most organisms. The bifunctional 19. Kapler, G. & Beverley, S. M. (1985) Genetics 110, s28. arrangement may also offer advantages for the joint regula- 20. Washtien, W. L., Grumont, R. & Santi, D. V. (1985) J. Biol. tion of the two enzymatic activities. Chem. 260, 7809-7812. Intron Evolution. Our data indicate that the bifunctional 21. Garvey, E. P. & Santi, D. V. (1985) Proc. Natl. Acad. Sci. DHFR-TS gene of Leishmania contains no intervening se- USA 82, 7188-7192. quences. Several genes of protists contain intervening se- 22. Masters, J. N. & Attardi, G. (1983) Gene 21, 59-63. quences (41-43), and Leishmania and other trypanosomatids 23. Chen, M.-J., Shimada, T., Moulton, A. D., Cline, A., exhibit an unusual Humphries, R. K., Maizel, J. & Nienhuis, A. W. (1984) J. form of RNA processing (44-47). For Biol. Chem. 259, 3933-3943. DHFR, only vertebrate genes appear to contain intervening 24. Takeishi, K., Kaneda, S., Ayusawa, D., Shimizu, K., Gotoh, sequences (23, 48). One intervening sequence has been 0. & Seno, T. (1985) Nucleic Acids. Res. 13, 2035-2043. described within the phage T4 TS gene (29,49), and mouse TS 25. Belfort, M., Maley, G., Pedersen-Lane, J. & Maley, F. (1983) probably has several (50). The distribution of intervening Proc. Natl. Acad. Sci. USA 80, 4914-4918. sequences is evolutionarily complex and is relevant to models 26. Smith, D. R. & Calvo, J. M. (1980) Nucleic Acids Res. 8, concerning the evolutionary origin of intervening sequences. 2255-2274. As the intervening sequence structure of additional DHFR 27. Maley, G. F., Bellisario, R. L., Guarino, D. U. & Maley, F. and TS genes in phylogenetically diverse organisms such as (1979) J. Biol. Chem. 254, 1301-1304. archaebacteria, plants, 28. Freisheim, J. H., Bitar, K. G., Reddy, A. V. & Blankenship, fungi, and invertebrates becomes D. T. (1978) J. Biol. Chem. 253, 6437. available, we may anticipate being able to examine the 29. Chu, F. K., Maley, G. F., Maley, F. & Belfort, M. (1984) evolutionary origin of introns within the DHFR and TS genes Proc. Natl. Acad. Sci. USA 81, 3049-3052. in more detail. 30. Purohit, S. & Mathews, C. K. (1984) J. Biol. Chem. 259, We thank Dale Sinclair for technical assistance, and members of 6261-6266. our laboratory for discussions. This work was supported by grants 31. Blakley, R. L. (1984) in and Pteridines: Chemistry and from the National Institutes of Health (Al 21903-02), the Pharma- Biochemistry ofFolates, eds. Blakely, R. L. & Benkovic, S. J. ceutical Manufacturers Association, and the Milton Fund. J.S.C. (Wiley, New York), Vol. 1, pp. 191-254. was supported by The Wellington Fund of Harvard Medical School 32. Borst, P. & Cross, G. A. M. (1982) Cell 29, 291-303. and a Charles King Trust Postdoctoral Fellowship. T.E.E. was 33. Sather, S. & Agabian, N. (1985) Proc. Natl. Acad. Sci. USA supported by a National Institutes of Health training grant in 82, 5695-5699. Pharmacological Sciences (GM 07306-11A1). S.M.B. is a Burroughs 34. Hasan, G., Turner, M. J. & Cordingley, J. S. (1984) Cell 37, Wellcome Scholar in Molecular Parasitology. 333-341. 35. McCutchan, F. T., Dame, J. B., Miller, L. H. & Barnwell, J. 1. Calabresi, P. & Parks, R. E., Jr. (1985) in The Pharmacolog- (1984) Science 225, 808-811. ical Basis of Therapeutics, eds. Gilman, A. G., Goodman, 36. Maley, G. F., Maley, F. & Baugh, C. M. (1982) Arch. L. S., Rall, T. W. & Murad, F. (Macmillan, New York), pp. Biochem. Biophys. 216, 551-558. 1263-1267. 37. Whittaker, R. H. (1977) Parasitic Protozoa 1, 1-33. 2. Calabresi, P. & Parks, R. E., Jr. (1985) in The Pharmacolog- 38. Reddy, G. P. V. & Pardee, A. B. (1980) Proc. Natl. Acad. Sci. ical Basis of Therapeutics, eds. Gilman, A. G., Goodman, USA 77, 3312-3216. L. S., Rall, T. W. & Murad, F. (MacMillan, New York), pp. 39. Reddy, G. P. V., Singh, A., Stafford, M. E. & Mathews, 1267-1271. C. K. (1977) Proc. Natl. Acad. Sci. USA 74, 3152-3156. 3. Roth, B. (1983) in Inhibition ofFolate Metabolism in Chemo- 40. Liu, C. C,, Burke, R. L., Hibner, U., Barry, J. & Alberts, B. therapy, Handbook of Experimental Pharmacology, ed. Hitch- (1978) Cold Spring Harbor Symp. Quant. Biol. 43, 469-487. ings, G. H. (Springer, New York), Vol. 64, pp. 102-128. 41. Unnasch, T. R. & Wirth, D. F. (1983) Nicleic Acids Res. 11, 4. Rollo, I. M. (1983) in Inhibition of Folate Metabolism in 8461-8472. Chemotherapy, Handbook of Experimental Pharmacology, ed. 42. Ravetch, J. V., Feder, R., Pavlovec, A. & Blobel, G. (1984) Hitchings, G. H. (Springer, New York), Vol. 64, pp. 252-287. Nature (London) 312, 616-620. 5. Santi, D. V. & Danenberg, P. V. (1984) in Folates and Pterins: 43. Nellen, W. & Gallwitz, D. (1982) J. Mol. Biol. 159, 1-18. Chemistry and Biochemistry of Folates, eds. Blakley, R. L. & 44. Landfear, S. M. & Wirth, D. F. (1985) Mol. Biochem. Parasi- Benkovic, S. J. (Wiley, New York), Vol. 1, pp. 298-345. tol. 15, 61-82. 6. Coderre, J. A., Beverley, S. M., Schimke, R. T. & Santi, 45. Boothroyd, J. C. & Cross, G. A. M. (1982) Gene 20, 281-289. D. V. (1983) Proc. Natl. Acad. Sci. USA 80, 2132-2136. 46. Van derPloeg, L. H. T., Liu, A. Y. C., Michels, P. A. M., De 7. Garrett, C. E., Coderre, J. A., Meek, T. D., Garvey, E. P., Lange, T., Borst, P., Majumder, H. K., Veeneman, G. H. & Claman, D. M., Beverley, S. M. & Santi, D. V. (1983) Mol. Van Boom, J. (1982) Nucleic Acids Res. 10, 3591-3604. Biochem. Parasitol. 11, 257-265. 47. Parsons, M., Nelson, R. G., Watkins, K. D. & Agabian, N. 8. Ferone, R. & Roland, S. (1980) Proc. Natl. Acad. Sci. USA 77, (1984) Cell 34, 901-909. 48. Crouse, G. F., Simonsen, C. C., McEwan, R. N. & Schimke, 5802-5806. R. T. (1982) J. Biol. Chem. 257, 7887-7897. 9. Meek, T. D., Garvey, E. P. & Santi, D. V. (1985) Biochemis- 49. Belfort, M., Pedersen-Lane, J., West, D., Ehrenman, K., try 24, 678-686. Maley, G., Chu, F. & Maley, F. (1985) Cell 41, 375-382. 10. Beverley, S. M., Coderre, J. A., Santi, D. V. & Schimke, 50. Jenh, C.-H., Geyer, P. K., Baskin, F. & Johnson, L. F. (1985) R. T. (1984) Cell 38, 431-439. Mol. Pharmacol. 28, 80-85. Downloaded by guest on September 25, 2021