Proc. Natl. Acad. Sci. USA Vol. 94, pp. 11466–11471, October 1997 Genetics

Genetic definition of a -splicing domain: Functional mini-inteins support structure predictions and a model for intein evolution

VICTORIA DERBYSHIRE*, DAVID W. WOOD*†,WEI WU*, JOHN T. DANSEREAU*, JACOB Z. DALGAARD‡, AND MARLENE BELFORT*§

*Molecular Genetics Program, Wadsworth Center, New York State Department of Health, and School of Public Health, State University of New York at Albany, PO Box 22002, Albany, NY 12201-2002; †Howard P. Isermann Department of Chemical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180-3590; and ‡National Cancer Institute–Frederick Cancer Research and Development Center, Advanced BioScience Laboratories–Basic Research Program, PO Box B, Building 549, Room 154, Frederick, MD 21701-1201

Communicated by John Abelson, California Institute of Technology, Pasadena, CA, August 19, 1997 (received for review July 1, 1997)

ABSTRACT Inteins are protein-splicing elements, most two activities may have evolved independently. First, the of which contain conserved sequence blocks that define a crystal structure of PI-SceI has recently been solved (12). This family of homing endonucleases. Like group I that 454-aa protein is folded into two distinct structural domains. encode such endonucleases, inteins are mobile genetic ele- Second, hidden Markov models have been used to define two ments. Recent crystallography and computer modeling stud- conserved functional domains of inteins, corresponding to ies suggest that inteins consist of two structural domains that independent endonuclease and splicing modules, separated by correspond to the endonuclease and the protein-splicing nonconserved spacer regions of variable lengths (ref. 13; elements. To determine whether the bipartite structure of J.Z.D., A. Klar, M. J. Moser, W. R. Holley, A. Chatterjee, and inteins is mirrored by the functional independence of the I. S. Mian, unpublished results; Fig. 1B). Third, three putative protein-splicing domain, the entire endonuclease component inteins have recently been reported that are in the 150-aa size was deleted from the Mycobacterium tuberculosis recA intein. range and lack endonuclease motifs, although it is not clear Guided by computer modeling studies, and taking advantage whether these smaller elements retain splicing function (4, 5, of genetic systems designed to monitor intein function, the 13). Finally, a newly identified Synechocystis intein does not 440-aa Mtu recA intein was reduced to a functional mini-intein contain a LAGLIDADG endonuclease but instead contains a of 137 aa. The accuracy of splicing of several mini-inteins was member of the H-N-H family of group I endonucleases verified. This work not only substantiates structure predic- (13, 28). tions for intein function but also supports the hypothesis that, Site-directed mutagenesis experiments have shown that like group I introns, mobile inteins arose by an endonuclease endonuclease activity is not required for protein-splicing func- invading a sequence encoding a small, functional splicing tion (14, 15), and deletion of a region encompassing the element. LAGLIDADG motifs of PI-SceI has confirmed this conclusion (16). However, despite the apparent structural autonomy of Inteins are protein-splicing elements that exist as in-frame the protein-splicing and endonuclease domains of PI-SceI, they fusions with flanking protein sequences called exteins. Inteins do appear to collaborate in interacting with the homing site are self-splicing at the protein level, with their excision being DNA (12). Therefore, it is important to determine whether the coupled to extein ligation (1–3). Most of the inteins that have bipartite structure of inteins is mirrored by the functional been described are in the 400- to 500-aa range with little independence of their two components. We tested the predic- absolute sequence conservation among the elements (4, 5). tion that the entire endonuclease domain and spacer sequences However, Cys or Ser residues are required at the amino termini between the domains can be deleted from a protein-splicing of both the intein and the second extein, and a His and Asn are element to generate a mini-intein that is splicing proficient. To present at the carboxy terminus of the intein (Fig. 1A Top). this end, we used the 440-aa intein from the Mycobacterium Most inteins contain eight conserved sequence blocks (A–H), tuberculosis recA gene expressed in Escherichia coli. The Mtu two of these being the LAGLIDADG motifs (blocks C and E) recA intein contains a conventional LAGLIDADG endonu- that define a family of intron-homing endonucleases (refs. 4 clease domain (4, 5), although endonuclease activity has not and 5; Fig. 1A). Consistent with the occurrence of these motifs, yet been demonstrated. Guided by junctions inferred from several inteins have been shown to have site-specific endonu- structure models (ref. 13; Fig. 1B), a series of mini-intein clease activity (6), and PI-SceI, the VMA1 intein of Saccha- derivatives was tested in two genetic systems developed to romyces cerevisiae, is capable of homing into a cognate intein- screen for splicing activity in vivo and in vitro. A number of less allele (7). The sporadic distribution of inteins in all three mini-inteins deleted for the entire endonuclease domain were biological kingdoms is consistent with their being mobile shown to be capable of in both contexts, elements. consistent with structure predictions. These results support the Endonuclease have been assumed to be invasive model that homing inteins evolved through an endonuclease genetic elements that colonized group I introns, converting gene invading a DNA sequence encoding a functional mini- them into mobile genetic elements (8–11). Similarly, mobile intein. inteins appear to be derived from invasive endonuclease genes. Recent structural studies indeed suggest that the protein- splicing and endonuclease domains are separate and that their MATERIALS AND METHODS Construction of td:Mini-Intein Fusions. Plasmid pKKtd- The publication costs of this article were defrayed in part by page charge C238-I containing the Mtu recA intein is derived from pKK223 payment. This article must therefore be hereby marked ‘‘advertisement’’ in (Pharmacia). The plasmid contains an in-frame fusion of the accordance with 18 U.S.C. §1734 solely to indicate this fact. © 1997 by The National Academy of Sciences 0027-8424͞97͞9411466-6$2.00͞0 Abbreviation: TS, thymidylate synthase. PNAS is available online at http:͞͞www.pnas.org. §e-mail: [email protected].

11466 Downloaded by guest on October 3, 2021 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997) 11467

FIG. 1. Structural features of inteins. (A) Intein and endonuclease maps. The top map shows the generic relationship of an intein to its flanking exteins. Conserved amino acids are circled (see text). In the intein and endonuclease maps below, conserved sequence blocks are labeled A–H, with the LAGLIDADG endonuclease motifs, C and E, boxed (4, 5). (B) Intein structures. The top schematic depicts the two domains and spacer regions of the 440-aa Mtu recA intein according to modeling predictions (ref. 13; J.Z.D., et al., unpublished data) and the structure of PI-SceI (12), with corresponding linear maps below. Solid areas, protein-splicing domain; shaded, endonuclease domain; FIG. 2. Summary of phenotypes of mini-intein constructs. (A) The open, spacer regions. The smallest functional mini-inteins from this td-based genetic system. Schematic of the TS:intein in-frame fusion study (137 aa; Fig. 2B) are shown in both representations, with the system shows precursors and splice products (Left). TS phenotype of Ϫ linear form below a map of the Mtu recA intein with conserved td:intein in-frame fusion derivatives (Right). Patches of TS cells R sequence blocks. (D1210⌬thyA::Kan ) containing plasmids expressing td:intein fusions were tested for growth on minimal medium plates at different tem- intein with the intronless T4 td gene, which encodes thymidy- peratures (27). td, no intein; td Mtu, full-length Mtu recA intein; td late synthase (TS), such that Cys-238 of TS is the N-terminal MtuAA, full-length Mtu recA intein with C-terminal His and Asn residues mutated to Ala; remainder, td:mini-intein constructs. Num- residue of the second extein (circled in Fig. 2A; D.W.W., V.D., bering in brackets is as in B.(B) Summary of protein-splicing activity. W.W., Georges Belfort, and M.B., unpublished results). Full-length and mini-intein constructs are numbered 1–21. The con- pKKtdC238-I was used as a template for inverse PCRs to structs are defined by the deletion junctions, with Ala residues inserted generate a series of plasmids for expression of TS:mini-intein in the cloning listed in parentheses [e.g., 94⌬383 has all residues in-frame fusion . The coding sequences for the fusion between 94 and 383 deleted, whereas 101⌬405(A7) has residues proteins were sequenced in their entirety. between 101 and 405 deleted and replaced with 7 Ala residues]. The primers used in the PCRs carried a terminal BssHII Constructs 4, 9, 12, and 18 have Ala–Arg inserted between the junction residues as indicated by a prime. TS phenotype: ϩ5, growth at 23°C, restriction site, such that digestion of the products with this 30°C, and 37°C; ϩ4, growth at 23°C, 30°C, and weak growth at 37°C; enzyme and religation generated in-frame td:mini-intein fu- ϩ3, growth at 23°C and at 30°C; ϩ2, growth at 23°C; ϩ1, weak growth sions with central deletions marked by a BssHII site and, at 23°C; —, no growth on minimal media. MIC splicing is quantitated hence, an Ala–Arg dipeptide. Where possible, these residues as the percentage of precursor that was converted to ligated exteins replaced the same or similar amino acids in the final fusion after3hofinduction: ϩ5, 80–100%; ϩ4, 60–80%; ϩ3, 40–60%; ϩ2, proteins. In some cases this required the addition of an Ala 20–40%; ϩ1, Յ20%. Those constructs indicated by an asterisk splice residue between the junctions (Fig. 2B), and in constructs further upon incubation at 4°C. Intein size represents Mtu recA residues only, excluding Ala and Arg residues incorporated as part of 101⌬395(A5Ј), 123⌬372Ј, 129⌬372Ј, and 129⌬400Ј (Fig. 2B, the cloning process. Arrowheads mark largest deletion on each side, constructs 4, 9, 12, and 18), deleted residues are replaced with consistent with splicing function. Ala–Arg. Construction of Tripartite Fusion Derivatives, Detection of amplified using primers encoding EcoRI (5Ј) and BsrGI (3Ј) Splicing, and Protein Purification. Mini-intein derivatives restriction sites for cloning into the parent MIC expression were transferred to a tripartite fusion system (MIC) (D.W.W. plasmid (pMIC), which contains silent EcoRI and BsrGI sites et al., unpublished results) for in vitro characterization. This close to the intein–extein junctions. pMIC is derived from encodes maltose binding protein (M), intein (I), and the pMalC2 (New England Biolabs). Again, the mini-intein por- C-terminal domain of I-TevI (C) (ref. 17; Fig. 3). Fragments tion of each pMIC derivative was sequenced. pMIC25 is a encoding the pKKtdC238-mini-intein derivatives were PCR- slow-splicing derivative of full-length pMIC (D.W.W. et al., Downloaded by guest on October 3, 2021 11468 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997)

FIG. 3. Splicing in vivo.(A) Schematic of the MIC in-frame fusion. Precursor, cleavage, and splice products are shown. (B) Twelve percent Coomassie-stained SDS polyacrylamide gel of cell lysates of MIC constructs induced at 37°C for 3 h. Lane M, molecular mass marker bands (Benchmark, GIBCO͞BRL) from the bottom up correspond to 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 120, 160, and 220 kDa (bold numbers correspond to the two high-intensity bands). Lanes: 1, full-length MIC; 2, MIC25, a slow-splicing mutant derivative of full-length MIC (D.W.W. et al., unpublished results); 3, MIT with TS as C-terminal extein; 4, 101⌬405(A7) (Fig. 2, construct 3); 5, 110⌬383 (Fig. 2, construct 6); 6, 114⌬372 (Fig. 2, construct 8). , precursor (MIC or MIT); E, ligated exteins (MC); F, intein. Arrowheads in lane 1 mark position of precursor. (C) Western blot of 12% SDS polyacrylamide gel with maltose binding protein antiserum. (D) Western blot of 12% SDS polyacrylamide gel with Mtu recA intein antiserum. (E) Western blot of 10% SDS polyacrylamide gel with I-TevI antiserum. The assignments were verified on a 12% gel (data not shown). Background bands from cross-reactivity to the polyclonal antisera are also evident in C–E. Lanes and symbols are as in B.

unpublished results). The tripartite MIT fusion, which has the to fold properly. Therefore, we designed a series of derivatives I-TevI domain replaced with TS (D.W.W. et al., unpublished with more or less of the central portion of the intein deleted. results), was used as a control in Western blots. The derivatives were generated by inverse PCR using the td pMIC derivatives were maintained in E. coli expression plasmid containing the full-length Mtu recA intein D1210⌬thyA::KanR [FϪ⌬(gpt-proA)62 leuB6 supE44 ara-14 as template. galK2 lacY1 ⌬(mcrC-mrr) rpsL20 (Str r) xyl-5 mtl-1 recA13 Fig. 2 shows the TS phenotype of a number of mini-intein lacIq]. Cells were grown to mid-log phase at 37°C in L Broth derivatives. Whereas the full-length td:intein fusion was TSϩ at with 100 ␮g͞ml ampicillin, and MIC expression was induced all temperatures up to 37°C (Fig. 2B, construct 1), and the for3hbytheaddition of 1 mM isopropyl ␤-D-thiogalactoside splicing-defective MtuAA mutant was TSϪ at all temperatures (IPTG). Splicing products were visualized by running cell (Fig. 2B, construct 2), most of the mini-intein derivatives were lysates on SDS polyacrylamide gels (18). Western blot analyses TSϩ only at lower temperatures (Fig. 2B, constructs 3–12, 14, were carried out using a Bio-Rad semi-dry blot apparatus and 15, 17, 18, 20, and 21). These phenotypes suggest that many of the Amersham ECL detection kit according to the manufac- the mini-inteins are capable of splicing, although less effi- turers’ instructions. Polyclonal antisera to I-TevI and the Mtu ciently than the wild-type Mtu recA intein. Although there is recA intein were raised in rabbits and that to maltose binding apparently no direct correlation between deletion size and protein was purchased from New England Biolabs. phenotype, it is interesting to note that the shorter mini-inteins MIC and the mini-intein in-frame fusion proteins were gave the strongest TSϩ phenotypes (Fig. 2B, constructs 7, 14, purified on amylose columns (New England Biolabs) accord- and 15), whereas the longest mini-intein was TSϪ (Fig. 2B, ing to the manufacturer’s instructions. Quantitation of MIC construct 13). Perhaps the most surprising result is the fre- splicing was achieved by densitometry of samples fractionated quency with which TSϩ mini-inteins were generated. Deriva- on 12% Coomassie-stained SDS͞PAGE gels to determine the tives with deletions of up to 303 aa [101⌬405(A7) and 96⌬400Ј percentage of precursor converted to ligated exteins. (Fig. 2B, constructs 3 and 21)] still retain splicing function. Construct 101⌬405(A7) corresponds precisely to the predicted RESULTS minimal splicing domain as defined by computer modeling (13). Construction of Mini-Inteins and Examining Boundaries of Once it was apparent that a number of mini-inteins were Splicing Function. Guided by sequence alignments of all active and that TS phenotype correlated with protein-splicing known inteins (13), a series of deletion derivatives of the Mtu activity (see below), it became of interest to approximate the recA intein was constructed. We aimed to remove all of the boundaries of the splicing domains at the N and C termini of endonuclease domain, leaving behind a minimal splicing do- the intein (Fig. 2B, constructs 14–21). Mini-inteins 94⌬383 and main (Fig. 1). Initially intein function was assayed in a genetic 114⌬406 (Fig. 2B, constructs 15 and 17, see arrowhead) that system, in which the intein is expressed from a plasmid as an are TSϩ indicate that splicing function is contained within the in-frame fusion with the TS gene (td) of bacteriophage T4 first 94 and last 35 aa of the intein. (D.W.W. et al., unpublished results; Fig. 2A). Because three Characterization of Mini-Intein Splicing Products. To fur- in-frame deletion derivatives of this intein had been shown ther assess splicing activity, the mini-intein derivatives were previously to be inactive by Western blot analysis (19), the transferred to a tripartite fusion system (MIC) for in vitro increased sensitivity of the in vivo system would likely prove characterization. The tripartite system comprises maltose useful for detecting low-level splicing activity. Furthermore, binding protein (M) as the first extein and as an affinity tag for we reasoned that even if the endonuclease exists as an inde- purification, fused in-frame to the intein (I) and then to the pendent domain, it might be difficult to generate a functional second extein, the C-terminal domain of the homing endonu- mini-intein, because crudely snipping out a large region of clease I-TevI (C) (ref. 17; Fig. 3A). This C-terminal domain was protein could well leave the remainder of the protein unable chosen because of the solubility of this protein module and the Downloaded by guest on October 3, 2021 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997) 11469

availability of antibody to I-TevI. Indeed, antibody to each derivatives, regardless of intein size (Fig. 3 C and E; see also component of MIC allowed accurate identification of all below; Fig. 4A). precursors, intermediates, and products of the splicing reac- Purification and Properties of Mini-Intein Derivatives. tion by Western blot analysis (Fig. 3 C–E). Overexpressed proteins were affinity-purified on amylose resin Overexpression of the tripartite fusion proteins was induced for most of the MIC derivatives (Fig. 4). In each case, as by the addition of IPTG, and the precursors and products of expected, all fusion proteins containing maltose binding pro- the splicing reactions were separated and identified (repre- tein were purified (Fig. 4A): tripartite precursor (MIC); ligated sentative data shown in Fig. 3 B–E). For most of the derivatives exteins (MC); side-products (M, which was highly enriched in the products of the splicing reaction, ligated exteins and free these preparations) and (MI). Free intein (I) as well as C and intein, were readily detected on Coomassie gels and their IC were also visible in some preparations, either because of identity was verified by Western blot analysis. In addition, the association of these species with proteins bound to the column precursor and intermediates or side-products of the reaction and͞or because splicing occurs during or after purification. corresponding to N-terminal or C-terminal intein cleavage TS phenotype as a function of temperature provides a without ligation were also seen. semiquantitative measure of splicing efficiency. In the MIC A Western blot using maltose binding protein antiserum context, splicing was judged by quantitating the appearance of (Fig. 3C) shows the presence of precursors in MIC25, a ligated exteins upon overexpression of the tripartite fusion. In full-length mutant derivative (lane 2), in MIC mini-intein most cases there was agreement between the activity in the two constructs (lanes 4–6), and in MIT (lane 3), a control tripartite contexts, although 129⌬271(A1) (Fig. 2B, construct 13) was fusion of M, I, and intact TS (D.W.W. et al., unpublished absolutely TSϪ but sometimes exhibited very low level MIC work), although full-length wild-type MIC splices so efficiently splicing (Fig. 4A). In addition, 101⌬405(A7) and 129⌬405(A1) that no precursor was visible (lane 1). In addition to full-length (Fig. 2B, constructs 3 and 10), which were splicing proficient protein, several products containing the maltose binding do- as judged by TS phenotype, exhibited very low level MIC main were detected, including ligated exteins (MC) and free splicing on initial induction (Figs. 3 and 4A). However, upon maltose binding protein (M). Cleavage products containing storage in elution buffer at 4°C, these fusion proteins contin- the intein (MI) and other unidentified species carrying the ued to splice in vitro (Fig. 4B; data not shown). Despite these maltose binding protein were also observed between the subtle context effects, there is a general correlation between precursor and MC bands. Mtu recA intein antiserum (Fig. 3D) TS phenotype and splicing proficiency (Fig. 2B). identified inteins from splicing-proficient derivatives. Full- Constructs that tentatively delineate the N- and C-terminal length intein was seen as a product of splicing of full-length boundaries of the protein-splicing domains in the TS context MIC (wild type and the MIC25 mutant, lanes 1 and 2) and MIT (94⌬383 and 114⌬406; Fig. 2B, constructs 15 and 17, arrow- (lane 3), whereas mini-inteins of the appropriate size were heads) also splice as MIC derivatives (Fig. 2B; data not shown). visible for constructs 110⌬383 and 114⌬372 (lanes 5 and 6, The splicing properties of these derivatives confirm that the corresponding to constructs 6 and 8 in Fig. 2B). For construct functional domains of the Mtu recA intein are contained within 101⌬405(A7) the intein band was detectable only on longer the first 94 and the last 35 aa. exposures (lane 4, corresponding to construct 3 in Fig. 2B), Assessment of Splicing Fidelity. Ligated exteins (MC) re- because this construct splices slowly (see below; Fig. 4B). sulting from MIC splicing were of the same apparent molec- Finally, I-TevI antiserum (Fig. 3E) clearly identified precur- ular mass (56.1 kDa) for all constructs including the full-length sors, as well as ligated exteins (MC), in splicing-proficient MIC intein (Figs. 3 and 4). To verify that the splicing reaction was derivatives (lanes 1, 2, 5, and 6) but, as expected, not in the MIT proceeding accurately, the molecular masses of two mini- construct (lane 3). Importantly, there is agreement between intein products from MIC splicing reactions were determined predicted and observed sizes of the different inteins (Fig. 3D), by mass spectrometry. Mini-inteins from MIC constructs and the size of ligated exteins was constant for all MIC 110⌬383 and 114⌬372 (Fig. 2B, constructs 6 and 8) with

FIG. 4. Purification and characterization of mini-intein constructs. (A) Purified MIC mini-intein derivatives. 12% SDS polyacrylamide gel is shown. M, markers (kDa), and lanes: 1, 101⌬405(A7); 2, 101⌬395(A5Ј); 3, 101⌬383(A7); 4, 110⌬383; 5, 114⌬384; 6, 114⌬372; 7, 123⌬372Ј;8, 129⌬405(A1); 9, 129⌬383(A1); 10, 129⌬372Ј; 11, 129⌬271(A1), corresponding to constructs 3–13, respectively, in Fig. 2B.(B)In vitro splicing. 12% SDS polyacrylamide gel of MIC101⌬405(A7) (Fig. 2B, construct 3) immediately after purification on amylose resin (lane 1) and after 18 days at 4°C (lane 2). M, markers (kDa). In both A and B, , precursor (MIC); E, ligated exteins (MC); F, intein (I); ᮀ, N-terminal cleavage product (IC); ■, maltose binding protein (M). (C) Precise molecular masses of selected mini-inteins. HPLC-purified mini-inteins were analyzed by infusion on a Finnigan-MAT TSQ-700 triple quadrupole mass spectrometer equipped with a Finnigan ESI source (San Jose, CA). Downloaded by guest on October 3, 2021 11470 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997)

predicted molecular masses of 18,594 and 20,326 Da, respec- modified by the addition of peptide linkers at the site of the tively, were determined to be 18,594 and 20,327 Da by elec- deletion (16). The lack of splicing activity in these constructs trospray ionization mass spectroscopy (Fig. 4C). Together, containing an intact protein-splicing domain presumably re- these data indicate not only that the mini-inteins are capable flects protein-folding problems. These constraints on intein of protein splicing but also that the accuracy of the reaction function have undoubtedly played an important role in guiding remains uncompromised. their evolution. Evolution of Mobile Inteins. The invasiveness of homing DISCUSSION endonuclease genes is likely to form the basis of the mainten- ance and spread of both mobile introns and inteins (refs. 8–12; Mini-Inteins Deleted for the Entire Endonuclease Domain Fig. 5). According to one model, an endonuclease gene of the Mtu recA Intein Are Splicing Proficient. Derivatives of invaded a protein-coding sequence and evolved protein- the Mtu recA intein have been constructed that retain splicing splicing activity to preserve the functional integrity of the host activity despite removal of the entire endonuclease domain, protein (Fig. 5A, model 1). According to a second model, the which constitutes more than two-thirds of the wild-type pro- endonuclease gene invaded the DNA sequence of a primitive tein. These results are in accord with deletion of ca. 80% of the protein-splicing element (Fig. 5A, model 2). Model 1 was PI-SceI endonuclease domain while maintaining splicing func- supported by the observation that HO endonuclease, a free- tion (16). The Mtu recA mini-inteins were analyzed using two standing LAGLIDADG endonuclease, has six intein motifs in-frame fusion systems, for which a general correlation be- (ref. 4; Fig. 1A) but is unable to splice as an in-frame protein tween TS phenotype and MIC splicing was found. These fusion (J. Platko and F. Perler, cited in ref. 5; V.D. and M.B., results are consistent with the expectation that all information unpublished results), suggesting that HO is part-way along the necessary for splicing is carried within the intein itself and that path to evolving intein function. However, the combination of splicing activity is not an artifact of a particular fusion context. crystallographic studies (12), molecular modeling (13), and It was thereby shown that all information required for splicing our genetic analysis support a structurally and functionally function is carried within the first 94 and final 35 aa of the independent protein-splicing domain, in favor of model 2, as 440-residue intein. This size and configuration of the mini- argued further below. In this case, HO endonuclease could be intein corresponds well with both the naturally occurring a defunct intein. inteins without endonuclease domains (5, 13) and with the Predicted endonuclease target sites flanking the endonu- computer alignments that designated the first 101 and last 35 clease gene that serve as markers of the initial invasion event aa as constituting the protein-splicing element (13). have been found in mobile group I introns (11). These It is clear from the analysis of MIC mini-intein fusions that, recognition sites have not, however, been observed in inteins in addition to bona fide splicing, as judged by the presence of (ref. 12; W.W. and J.Z.D., unpublished results). This work, ligated exteins, there is a significant amount of substrate demonstrating the functional independence of the protein- cleavage in the absence of ligation (Fig. 4A), as has been seen splicing and endonuclease domains, is therefore of particular in other intein expression constructs (20). First, the artificial importance in supporting their separate origin. Furthermore, nature of tripartite fusion systems, wherein the individual components are, by design, stable structural domains rather than the intein interrupting a protein of defined structure, likely increases accumulation of side-products. Second, the precise nature of the deletion appears to affect the build-up of cleavage products as reflected by variability in amount of side-products among the different constructs (Fig. 4A). Third, removal of the endonuclease domain may compromise splic- ing—it is not unlikely that the activity of the protein-splicing domain is affected by the endonuclease domain, just as endo- nuclease interaction with its homing site seems to be influ- enced by the intein domain (12). Nevertheless, the appearance of ligated exteins validates the splicing proficiency of the mini-inteins. Structural and Functional Domains of Mobile Inteins. The localization of protein-splicing activity in the Mtu recA intein as defined by the mini-inteins described in this work is entirely consistent with the two-domain structure of PI-SceI (12) and the statistical modeling that predicts that all endonuclease- containing inteins are folded into two domains (13). The central endonuclease domain is predicted to be separated from the minimal protein-splicing domains by variable spacer se- quences (ref. 13; J.Z.D. et al., unpublished results; Fig. 1B), which may serve to enable the protein to accommodate the dual functions of the endonuclease-containing inteins. The tolerance of Mtu recA intein function to different central deletions is likely a consequence of the flexibility of these spacers. It is clear, however, that inteins are not tolerant of all FIG. 5. Intein evolution. (A) Models for the coexistence of endo- internal deletions. For example, the 166⌬201 deletion of the nuclease and protein-splicing functions in mobile inteins. (B) Mobile Mtu recA intein (19) does not retain activity, despite the intron and intein evolution. An endonuclease gene is shown invading DNA encoding a self-splicing intron or intein to generate a mobile presence of all residues required for splicing function. Simi- intron or intein, respectively. The sharing of endonuclease motifs with larly, the 129⌬271(A1) derivative (Fig. 2B, construct 13) other proteins to form composite proteins is shown. The formation of exhibits very low activity and only in the MIC context. Addi- a hedgehog protein from an intein is depicted by loss of the C-terminal tionally, a derivative of PI-SceI with a 184-aa deletion spanning domain responsible for protein ligation. Symbols are depictions of the endonuclease motifs had no protein-splicing activity unless DNA, with the endonuclease ORF represented by a gray rectangle. Downloaded by guest on October 3, 2021 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997) 11471

the ability of the intein to function in the absence of the entire reading of the manuscript; Georges Belfort and Fred Gimble for useful endonuclease domain favors the scenario in which the endo- discussions; Maryellen Carl for expert secretarial assistance; Robert F. nuclease gene invaded a preexisting intein (Fig. 5A, model 2). Stack and Charles R. Hauer III for the mass spectrometry; and The clear, functional independence of the protein-splicing Maureen Belisle for making the figures. The Molecular Genetics domain is counter to model 1, in which some splicing function Core Facility at the Wadsworth Center provided DNA oligonucle- would be predicted to reside in the endonuclease moiety itself. otides and DNA sequencing service. This work was supported by National Institutes of Health Grants GM39422 and GM44844 to M.B. Endonuclease-containing inteins are far more common in J.Z.D. is supported by a grant from the Danish Natural Science modern genomes than the endonuclease-free inteins. The Research Council and by the National Cancer Institute, Department endonuclease seems to provide the means for inteins to be of Health and Human Services. maintained and, indeed, to spread among different genes, , and perhaps even kingdoms. Through their ability 1. Colston, M. J. & Davis, E. O. (1994) Mol. Microbiol. 12, 359–363. to splice, autocatalytic inteins, like self-splicing introns, in turn 2. Perler, F. B., Davis, E. O., Dean, G. E., Gimble, F. S., Jack, W. E., provide a genetically silent haven for the invasive endonucle- Neff, N., Noren, C. J., Thorner, J. & Belfort, M. (1994) Nucleic ase ORFs and the vehicles for their propagation. Accordingly, Acids Res. 22, 1125–1127. endonucleases can be viewed as a substantial driving force in 3. Cooper, A. A. & Stevens, T. H. (1995) TIBS 20, 351–356. molecular evolution. Through their capacity to make nicks and 4. Pietrokovski, S. (1994) Protein Sci. 3, 2340–2350. breaks in DNA, endonuclease genes can invade sequences to 5. Perler, F. B., Olsen, G. J. & Adam, E. (1997) Nucleic Acids Res. form molecular associations that not only mobilize introns and 25, 1087–1093. inteins but can also provide catalytic function to other proteins 6. Belfort, M. & Roberts, R. J. (1997) Nucleic Acids Res. 25, (Fig. 5B). An example of the latter is the H-N-H endonuclease 3379–3388. 7. Gimble, F. S. & Thorner, J. (1992) Nature (London) 357, 301–306. cassette, which is present in mobile group I intron- and 8. Perlman, P. S. & Butow, R. A. (1989) Science 246, 1106–1109. intein-encoded proteins, as well as in the colicin family of 9. Belfort, M. (1989) Trends Genet. 5, 209–213. bacterial toxins and in the tripartite reverse transcriptase- 10. Lambowitz, A. M. (1989) Cell 56, 323–326. maturase-endonuclease proteins of mobile group II introns 11. Loizos, N., Tillier, E. R. M. & Belfort, M. (1994) Proc. Natl. Acad. (13, 21, 22, 28). The propensity of endonuclease genes to Sci. USA 91, 11983–11987. colonize genomes therefore can influence genome stability 12. Duan, X., Gimble, F. S. & Quiocho, F. A. (1997) Cell 89, 555–564. and configuration by promoting lesions in DNA and subse- 13. Dalgaard, J. Z., Moser, M. J., Hughey, R. & Mian, I. S. (1997) quent intron- or intein-based rearrangements. Furthermore, J. Comput. Biol. 4, 193–214. intron endonucleases can provide selective advantage in both 14. Hodges, R. A., Perler, F. B., Noren, C. J. & Jack, W. E. (1992) phage and archaeal systems (29, 39), whereas colicins promote Nucleic Acids Res. 20, 6153–6157. host defense, thereby influencing the stability of entire micro- 15. Gimble, F. S. & Stephens, B. W. (1995) J. Biol. Chem. 270, bial populations. 5849–5856. Although the antiquity of the original self-splicing introns 16. Chong, S. & Xu, M.-Q. (1997) J. Biol. Chem. 272, 15587–15590. 17. Derbyshire, V., Kowalski, J. C., Dansereau, J. T., Hauer, C. R. & has been argued (23), nothing is known of the evolutionary Belfort, M. (1997) J. Mol. Biol. 265, 494–506. history of ‘‘ancestral’’ endonuclease-free inteins. Although 18. Laemmli, U. K. (1970) Nature (London) 227, 680–685. there are examples of endonuclease-free inteins in all three 19. Davis, E. O., Jenner, P. J., Brooks, P. C., Colston, M. J. & biological kingdoms (5, 13), it would be premature to assume Sedgwick, S. G. (1992) Cell 71, 201–210. that these elements existed in the last common ancestor. It is, 20. Xu, M.-Q., Southworth, M. W., Mersha, F. B., Hornstra, L. J. & however, interesting to note that mechanistic parallels have Perler, F. B. (1993) Cell 75, 1371–1377. been drawn between inteins and the self-activating amidohy- 21. Gorbalenya, A. E. (1994) Protein Sci. 3, 1117–1120. drolases (24). Furthermore, the endonuclease-free inteins 22. Shub, D. A., Goodrich-Blair, H. & Eddy, S. R. (1994) Trends have been hypothesized to be evolutionarily related to the Biochem. 19, 402–404. self-cleaving hedgehog proteins, which are involved in eukary- 23. Cech, T. R. (1985) Int. Rev. Cytol. 93, 3–22. otic developmental pathways (13, 25, 26). It has been suggested 24. Brannigan, J. A., Dodson, G., Duggleby, H. J., Moody, P. C. E., that the hedgehog family, which exists in arthropods and all Smith, J. L., Tomchick, D. R. & Murzin, A. G. (1995) Nature (London) 378, 416–419. vertebrates from amphibians to mammals, arose from an intein 25. Koonin, E. V. (1995) Trends Biochem. Sci. 20, 141–142. that lost ligation activity (ref. 13; Fig. 5). Regardless, finding 26. Lee, J. J., Ekker, S. C., Von Kessler, D. P., Porter, J. A., Sun, B. I. such nonmobile, autocatalytic, intein-like molecules with re- & Beachy, P. A. (1994) Science 266, 1528–1537. lated functions in deeply branching organisms will ultimately 27. Belfort, M., Ehrenman, K. & Chandry, P. S. (1990) Methods help address issues relating to intein ancestry and evolutionary Enzymol. 181, 521–539. age. 28. Pietrokovski, S. (1997) Protein Sci., in press. 29. Goodrich-Blair, H. & Shub, D. A. (1996) Cell 84, 211–221. We are grateful to Benoit Cousineau, Mathias Holpert, Joseph 30. Aagaard, C., Dalgaard, J. Z. & Garrett R. A. (1995) Proc. Natl. Kowalski, Richard Lease, Monica Parker, and George Silva for critical Acad. Sci. USA 92, 12285–12289. Downloaded by guest on October 3, 2021