Genetic Definition of a Protein-Splicing Domain: Functional Mini-Inteins Support Structure Predictions and a Model for Intein Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Natl. Acad. Sci. USA Vol. 94, pp. 11466–11471, October 1997 Genetics Genetic definition of a protein-splicing domain: Functional mini-inteins support structure predictions and a model for intein evolution VICTORIA DERBYSHIRE*, DAVID W. WOOD*†,WEI WU*, JOHN T. DANSEREAU*, JACOB Z. DALGAARD‡, AND MARLENE BELFORT*§ *Molecular Genetics Program, Wadsworth Center, New York State Department of Health, and School of Public Health, State University of New York at Albany, PO Box 22002, Albany, NY 12201-2002; †Howard P. Isermann Department of Chemical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180-3590; and ‡National Cancer Institute–Frederick Cancer Research and Development Center, Advanced BioScience Laboratories–Basic Research Program, PO Box B, Building 549, Room 154, Frederick, MD 21701-1201 Communicated by John Abelson, California Institute of Technology, Pasadena, CA, August 19, 1997 (received for review July 1, 1997) ABSTRACT Inteins are protein-splicing elements, most two activities may have evolved independently. First, the of which contain conserved sequence blocks that define a crystal structure of PI-SceI has recently been solved (12). This family of homing endonucleases. Like group I introns that 454-aa protein is folded into two distinct structural domains. encode such endonucleases, inteins are mobile genetic ele- Second, hidden Markov models have been used to define two ments. Recent crystallography and computer modeling stud- conserved functional domains of inteins, corresponding to ies suggest that inteins consist of two structural domains that independent endonuclease and splicing modules, separated by correspond to the endonuclease and the protein-splicing nonconserved spacer regions of variable lengths (ref. 13; elements. To determine whether the bipartite structure of J.Z.D., A. Klar, M. J. Moser, W. R. Holley, A. Chatterjee, and inteins is mirrored by the functional independence of the I. S. Mian, unpublished results; Fig. 1B). Third, three putative protein-splicing domain, the entire endonuclease component inteins have recently been reported that are in the 150-aa size was deleted from the Mycobacterium tuberculosis recA intein. range and lack endonuclease motifs, although it is not clear Guided by computer modeling studies, and taking advantage whether these smaller elements retain splicing function (4, 5, of genetic systems designed to monitor intein function, the 13). Finally, a newly identified Synechocystis intein does not 440-aa Mtu recA intein was reduced to a functional mini-intein contain a LAGLIDADG endonuclease but instead contains a of 137 aa. The accuracy of splicing of several mini-inteins was member of the H-N-H family of group I intron endonucleases verified. This work not only substantiates structure predic- (13, 28). tions for intein function but also supports the hypothesis that, Site-directed mutagenesis experiments have shown that like group I introns, mobile inteins arose by an endonuclease endonuclease activity is not required for protein-splicing func- gene invading a sequence encoding a small, functional splicing tion (14, 15), and deletion of a region encompassing the element. LAGLIDADG motifs of PI-SceI has confirmed this conclusion (16). However, despite the apparent structural autonomy of Inteins are protein-splicing elements that exist as in-frame the protein-splicing and endonuclease domains of PI-SceI, they fusions with flanking protein sequences called exteins. Inteins do appear to collaborate in interacting with the homing site are self-splicing at the protein level, with their excision being DNA (12). Therefore, it is important to determine whether the coupled to extein ligation (1–3). Most of the inteins that have bipartite structure of inteins is mirrored by the functional been described are in the 400- to 500-aa range with little independence of their two components. We tested the predic- absolute sequence conservation among the elements (4, 5). tion that the entire endonuclease domain and spacer sequences However, Cys or Ser residues are required at the amino termini between the domains can be deleted from a protein-splicing of both the intein and the second extein, and a His and Asn are element to generate a mini-intein that is splicing proficient. To present at the carboxy terminus of the intein (Fig. 1A Top). this end, we used the 440-aa intein from the Mycobacterium Most inteins contain eight conserved sequence blocks (A–H), tuberculosis recA gene expressed in Escherichia coli. The Mtu two of these being the LAGLIDADG motifs (blocks C and E) recA intein contains a conventional LAGLIDADG endonu- that define a family of intron-homing endonucleases (refs. 4 clease domain (4, 5), although endonuclease activity has not and 5; Fig. 1A). Consistent with the occurrence of these motifs, yet been demonstrated. Guided by junctions inferred from several inteins have been shown to have site-specific endonu- structure models (ref. 13; Fig. 1B), a series of mini-intein clease activity (6), and PI-SceI, the VMA1 intein of Saccha- derivatives was tested in two genetic systems developed to romyces cerevisiae, is capable of homing into a cognate intein- screen for splicing activity in vivo and in vitro. A number of less allele (7). The sporadic distribution of inteins in all three mini-inteins deleted for the entire endonuclease domain were biological kingdoms is consistent with their being mobile shown to be capable of protein splicing in both contexts, elements. consistent with structure predictions. These results support the Endonuclease genes have been assumed to be invasive model that homing inteins evolved through an endonuclease genetic elements that colonized group I introns, converting gene invading a DNA sequence encoding a functional mini- them into mobile genetic elements (8–11). Similarly, mobile intein. inteins appear to be derived from invasive endonuclease genes. Recent structural studies indeed suggest that the protein- splicing and endonuclease domains are separate and that their MATERIALS AND METHODS Construction of td:Mini-Intein Fusions. Plasmid pKKtd- The publication costs of this article were defrayed in part by page charge C238-I containing the Mtu recA intein is derived from pKK223 payment. This article must therefore be hereby marked ‘‘advertisement’’ in (Pharmacia). The plasmid contains an in-frame fusion of the accordance with 18 U.S.C. §1734 solely to indicate this fact. © 1997 by The National Academy of Sciences 0027-8424y97y9411466-6$2.00y0 Abbreviation: TS, thymidylate synthase. PNAS is available online at http:yywww.pnas.org. §e-mail: [email protected]. 11466 Downloaded by guest on October 3, 2021 Genetics: Derbyshire et al. Proc. Natl. Acad. Sci. USA 94 (1997) 11467 FIG. 1. Structural features of inteins. (A) Intein and endonuclease maps. The top map shows the generic relationship of an intein to its flanking exteins. Conserved amino acids are circled (see text). In the intein and endonuclease maps below, conserved sequence blocks are labeled A–H, with the LAGLIDADG endonuclease motifs, C and E, boxed (4, 5). (B) Intein structures. The top schematic depicts the two domains and spacer regions of the 440-aa Mtu recA intein according to modeling predictions (ref. 13; J.Z.D., et al., unpublished data) and the structure of PI-SceI (12), with corresponding linear maps below. Solid areas, protein-splicing domain; shaded, endonuclease domain; FIG. 2. Summary of phenotypes of mini-intein constructs. (A) The open, spacer regions. The smallest functional mini-inteins from this td-based genetic system. Schematic of the TS:intein in-frame fusion study (137 aa; Fig. 2B) are shown in both representations, with the system shows precursors and splice products (Left). TS phenotype of 2 linear form below a map of the Mtu recA intein with conserved td:intein in-frame fusion derivatives (Right). Patches of TS cells R sequence blocks. (D1210DthyA::Kan ) containing plasmids expressing td:intein fusions were tested for growth on minimal medium plates at different tem- intein with the intronless T4 td gene, which encodes thymidy- peratures (27). td, no intein; td Mtu, full-length Mtu recA intein; td late synthase (TS), such that Cys-238 of TS is the N-terminal MtuAA, full-length Mtu recA intein with C-terminal His and Asn residues mutated to Ala; remainder, td:mini-intein constructs. Num- residue of the second extein (circled in Fig. 2A; D.W.W., V.D., bering in brackets is as in B.(B) Summary of protein-splicing activity. W.W., Georges Belfort, and M.B., unpublished results). Full-length and mini-intein constructs are numbered 1–21. The con- pKKtdC238-I was used as a template for inverse PCRs to structs are defined by the deletion junctions, with Ala residues inserted generate a series of plasmids for expression of TS:mini-intein in the cloning listed in parentheses [e.g., 94D383 has all residues in-frame fusion proteins. The coding sequences for the fusion between 94 and 383 deleted, whereas 101D405(A7) has residues proteins were sequenced in their entirety. between 101 and 405 deleted and replaced with 7 Ala residues]. The primers used in the PCRs carried a terminal BssHII Constructs 4, 9, 12, and 18 have Ala–Arg inserted between the junction residues as indicated by a prime. TS phenotype: 15, growth at 23°C, restriction site, such that digestion of the products with this 30°C, and 37°C; 14, growth at 23°C, 30°C, and weak growth at 37°C; enzyme and religation generated in-frame td:mini-intein fu- 13, growth at 23°C and at 30°C; 12, growth at 23°C; 11, weak growth sions with central deletions marked by a BssHII site and, at 23°C; —, no growth on minimal media. MIC splicing is quantitated hence, an Ala–Arg dipeptide. Where possible, these residues as the percentage of precursor that was converted to ligated exteins replaced the same or similar amino acids in the final fusion after3hofinduction: 15, 80–100%; 14, 60–80%; 13, 40–60%; 12, proteins.