2564–2576 Nucleic Acids Research, 2014, Vol. 42, No. 4 Published online 21 November 2013 doi:10.1093/nar/gkt1212
Reprogramming homing endonuclease specificity through computational design and directed evolution Summer B. Thyme1,2,*, Sandrine J. S. Boissel3, S. Arshiya Quadri1, Tony Nolan4, Dean A. Baker5, Rachel U. Park1, Lara Kusak1, Justin Ashworth6 and David Baker1,7,*
1Department of Biochemistry, University of Washington, UW Box 357350, 1705 NE Pacific St., Seattle, WA 98195, USA, 2Graduate Program in Biomolecular Structure and Design, University of Washington, UW Box 3
357350, 1705 NE Pacific St., Seattle, WA 98195, USA, Graduate Program in Molecular and Cellular Biology, Downloaded from University of Washington, UW Box 357275, 1959 NE Pacific St., Seattle, WA 98195, USA, 4Department of Life Sciences, Sir Alexander Fleming Building, Imperial College London, Imperial College Road, London SW7 2AZ, UK, 5Department of Genetics, University of Cambridge, Downing Street, Cambridge CB1 3QA, UK, 6Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA and 7Howard Hughes Medical Institute, University of Washington, UW Box 357350, 1705 NE Pacific St., Seattle, WA 98195, USA http://nar.oxfordjournals.org/
Received May 24, 2013; Revised November 3, 2013; Accepted November 5, 2013
ABSTRACT the site of the break using a template with DNA arms that match the sequence surrounding the cleavage location. Homing endonucleases (HEs) can be used to This process can be used to generate directed modifications induce targeted genome modification to reduce the in animal models and shows potential for human gene fitness of pathogen vectors such as the malaria- therapy (2). A number of nucleases—zinc-finger nucleases at University of Washington Libraries on December 7, 2015 transmitting Anopheles gambiae and to correct dele- (ZFNs) (3), transcription activator-like effector nucleases terious mutations in genetic diseases. We describe (TALENs) (4), CRISPR nucleases (5,6) and homing endo- the creation of an extensive set of HE variants with nucleases (HEs, also known as meganucleases) (7–9)—can novel DNA cleavage specificities using an integrated make targeted breaks to activate gene conversion. Each experimental and computational approach. Using of these reagents differs in its enzymatic properties and computational modeling and an improved selection the ease of redesign of specificity. strategy, which optimizes specificity in addition to LAGLIDADG homing endonucleases (LHEs) (Figure 1a) have several desirable characteristics as gene activity, we engineered an endonuclease to cleave therapy reagents (8). These enzymes are likely to cleave at in a gene associated with Anopheles sterility few loci within a human genome because of their high and another to cleave near a mutation that causes specificity recognition (18–22 bp target sites, Figure 1b) pyruvate kinase deficiency. In the course of this work and, because LHEs are typically single-chain proteins we observed unanticipated context-dependence with coupled cleavage and binding activity (10,11), they between bases which will need to be mechanistically have rapid cleavage kinetics (12) and a low potential for understood for reprogramming of specificity to off-target cleavage compared to nucleases with less succeed more generally. concerted activities. Other advantages of these nucleases include the 3’ overhangs of the resultant DSB, potentially increasing rates of homologous recombination (13,14), INTRODUCTION and their small encoding ORFs without repetitive Targeted gene correction requires site-specific DNA elements, desirable for nuclease delivery and maintenance cleavage to initiate cellular programs for DNA break in a host organism (15). The major disadvantage of these repair (1). A double-stranded break (DSB) can stimulate nucleases is that they are significantly more difficult to homologous recombination pathways to correct deleteri- engineer to cleave novel target sites than some of their ous genetic mutations or insert a desired DNA sequence at counterparts, such as the modular TALE nucleases (16).
*To whom correspondence should be addressed. Tel: +1 951 204 4067; Email: [email protected] Correspondence may also be addressed to David Baker. Tel: +1 206 543 1295; Fax: +1 206 685 1792; Email: [email protected] Present addresses: Summer B. Thyme, 16 Divinity Avenue, BIOL 1020, Harvard University, Cambridge, MA 02138, USA. David Baker, University of Washington, Molecular Engineering Building, 4th Floor, 4000 15th Ave NE, Seattle, WA 98195, USA.
ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, No. 4 2565
Overcoming this challenge is essential to using HEs in tandem target site arrays were bought as minigenes from gene targeting applications. IDT and the arrays were inserted into the pCcdB plasmid Both experimental-directed evolution (17–19) and com- with the NheI and SacII restriction sites. The negative putational methods (20–24) have been developed for selection vector was built by ligating two copies of wild- altering protein–DNA interaction specificity. The former type (WT) I-AniI LIB4 target site into the NheI and SacII approaches are limited by the number of protein positions sites on the pENDO-HE vector. Amino acid libraries were that can be randomized and the need to maintain specificity built using assembly PCR (31) with oligonucleotides con- while increasing activity on a desired target site; maintain- taining codons with degenerate nucleotides. These ing the naturally high specificity of these nucleases is critical libraries were ligated into the modified pENDO-HE if they are to be used as gene therapy reagents because vector between the NcoI and NotI restriction sites. All off-target cleavage events can be detrimental (25). C-terminal I-AniI libraries (starting at the interface Computational design has the potential to resolve these position 150) were built in the context of the activating issues by identifying a combination of amino acids that M5 mutations (F13Y, I55V, F91I, S92T, S111Y), and all bind the desired target sequence with high affinity as well N-terminal libraries (interface position 18 to position 72) Downloaded from as specificity. However, while computational approaches were built in the context of M4, which is M5 without the have been used to design an enzyme cleaving multiple I55V mutation (32) (Figure 1a). N-terminal variants were (three) adjacent base-pair substitutions (26), the success overexpressed with I-AniI-M4 mutations and C-terminal rate is still low (12,21,26,27), likely because features of variants were overexpressed with I-AniI-Y2 mutations endonuclease binding and catalysis are not yet well under- (F13Y, S111Y) because I-AniI-M5 expresses poorly (32). stood and therefore cannot be accurately modeled. http://nar.oxfordjournals.org/ In this article, we present a combined experimental and Directed evolution computational method for reprogramming HE cleavage specificity. We approached the challenge of altering target A bacterial screen for variants of I-AniI that cleave single site recognition by breaking it into two parts: first, the iden- and multiple base-pair substitutions was carried out as tification of many variants with specificity shifted by a previously described (19,33) with the improvements single base-pair, and second, the utilization of this infor- described in the results section. An electrocompetent mation to engineer nucleases cleaving more divergent strain of DH12S Escherichia Coli (Invitrogen) was trans- formed with a pCcdB plasmid. The pCcdB plasmids used target sites. New variants cleaving single base-pair substi- at University of Washington Libraries on December 7, 2015 tutions were generated with an improved bacterial selection contained either two adjacent copies of the activated LIB4 method (19), described in the first section of our report, version (29) of the I-AniI target site with a single base-pair which enables screening for cleavage specificity in substitution or it contained a tandem array of 10–20 single addition to activity. We then engineered endonucleases copy target sites from genes of interest that contained cleaving target sites with multiple base-pair substitutions various multiple base-pair substitutions. The number of by integrating information from the single-site cleaving cleavable target sites on the pCcdB plasmid is related to variants using computation and library selection. We survival levels, and some endonucleases require as many generated endonucleases cutting full target sites in a gene as four WT sites to achieve near 100% survival (19). The for a human disease model and in a gene predicted to play a optimized Y2 and M5 versions of I-AniI were previously role in Anopheles gambiae reproduction (15,28). identified by selection against four copies of the WT site (32), but show reasonable survival against two or even one (Supplementary Figure S1). A standard procedure for MATERIALS AND METHODS electrocompetent cell preparation was used to produce each pCcdB-containing strain. Construction of genes, libraries and target plasmids For Round 1 selections, endonuclease libraries were The bacterial selection plasmids pENDO-HE and pCcdB ligated into the new negative selection pENDO-HE for the original selection strategy were described previ- vector (containing two copies of the LIB4 WT I-AniI ously (19) and were gifts from Barry Stoddard, originally site), purified and transformed into 20 ml of the appropri- from David Liu (Harvard University). The pET15-HE ate pCcdB-containing cell line. Ligation products were vector used for protein expression, a variant of pET15 purified by running over a PCR cleanup column, leaving that produces an N-terminal his-tag fusion protein using the wash buffer on the column for an extended period of the NcoI and NotI restriction sites, was a gift from Barry time (5 min) to remove salts, and eluting the samples in Stoddard. The pCcdB plasmids containing the I-AniI 10 ml of water, all of which was transformed. Transformed LIB4 target sites (29) with single base-pair substitutions bacteria were recovered in Terrific Broth (TB) media at were built by ligating pCcdB vector cut with NheI and 37 C for approximately one-half hour. The recovery was SacII to oligonucleotides (IDT, Integrated DNA followed by a selection in 2 ml of liquid culture, containing Technologies) that were phosphorylated and annealed to 0.02% L-arabinose and 100 mg/ml carbenicillin, for 4 h at form a duplex with compatible sticky ends. Two copies of 30 C. After liquid selection, a volume of the culture was each site were used for the selections against single base- plated on a minimal media selection plate (100 mg/ml car- pair substitutions. Some of the substrates (single copies of benicillin, 1 mM IPTG, 0.02% L-arabinose, 1.5% agar, the site) used for enzyme assays were built the same way M9 salt, 1% glycerol, 0.8% tryptone, 0.2% thiamine, using the pCcdB vector, while others were constructed on 1 mM MgSO4, 1 mM CaCl2) and on a LB-agar, 100 mg/ pBluescript by site-directed mutagenesis (12,30). The ml carbenicillin control plate. If the selection was a Round 2566 Nucleic Acids Research, 2014, Vol. 42, No. 4
(a) 1, the volumes plated were 1 ml on the control plate and the entire remainder of the selection (spun down and re- suspended) on the selection plate. Plates are grown for 36 h at 30 C. For Round 2 selections, colonies were picked from the Round 1 selection plate, grown overnight, combined and miniprepped, and 1 ml of the mixed plasmid was re-transformed into 20 ml of the same competent cell line as the Round 1 ligation. The volumes plated on the Round 2 selection plates depended on the Round 1 survival and size of the starting library, but 0.2–1 ml was plated on the control plate and typically 0.2–1 ml and 200 ml on selection plates. Colonies on Round 2 selection plates were either grown up for a third round of selection
or subjected to colony PCR to amplify the endonuclease Downloaded from ORF off of the pENDO-HE vector and the PCR products were sequenced.
Protein production
Proteins were expressed and purified as previously http://nar.oxfordjournals.org/ described (34). The genes for each expressed I-AniI variant were either amplified from clones saved from the selection process or assembled de novo (31). These genes were cloned into pET15-HE with NcoI and NotI site, sequence verified and expressed in BL21 Star cells (Invitrogen) using autoinduction (35). Either a half- or full-liter of media was inoculated and grown at 37 C for 8–12 h. Expression at 18 C continued for 20–24 h, and
(b) at University of Washington Libraries on December 7, 2015 then the cells were harvested, often frozen overnight, and resuspended in 20 mM Tris, pH 7.5, 30 mM Imidazole and 1.0 M NaCl. The cells were resuspended with lysozyme and sonicated, and proteins were isolated from the soluble fraction with nickel affinity chromatog- raphy and elution with 20 mM Tris, pH 7.5, 500 mM Imidazole and 500 mM NaCl. Proteins were concentrated and buffer exchanged into 20 mM Tris, pH 7.5, 500 mM NaCl using spin concentrators and stored in 50% (v/v) glycerol. The purity of the proteins assessed with SDS- Figure 1. Structure and specificity of the I-AniI LAGLIDADG HE. PAGE and the mass was verified by mass spectrometry. (a) The I-AniI endonuclease (shown here, pdb code 2qoj) was used in The absorbance at 280 nm, collected with a NanoDrop, this study, with the addition of activating mutations—Y2, M4 and M5, was used to determine the protein concentrations. detailed in the methods—identified in previous work (32). Monomeric LAGLIDADG endonucleases are pseudo-symmetric, with two enzyme Concentrations of proteins that were <90% pure were halves binding to the left (– half) and right (+half) sides of the DNA estimated from stained SDS-PAGE gels. target that flank the central four bases where cleavage occurs (arrow). The N-terminal domain binds to the (–) half-site and the C-terminal In vitro endonuclease cleavage assays domain binds to the (+) half-site. The linker between the two regions is shown in red and the termini are marked with pink (N) and orange Plasmid substrates were linearized, either with the restric- (C) spheres. The goal of our work is to alter the target site substrate tion enzyme ScaI for targets on pBluescript (all sites with preferences of these enzymes in order to direct their cleavage to genomic sites of interest. Many new variants cleaving single base-pair single base-pair substitution) or with XbaI for the pCcdB substitutions in the I-AniI target are presented in this work, and the targets (multiple base-pair substitutions, full- and half- labeling scheme for these variants is presented here. The ‘Selected’ sites). Additional substrates, used in the assays shown in enzymes were identified from fully randomized libraries, Figures 6a and 4, were generated by amplification of a ‘Computationally guided’ enzymes were either improved versions of 1 kb fragment from a pCcdB vector containing the sites. previous computational designs or selected from libraries containing computationally identified motif contacts, and the ‘Previously pub- Experiments measuring kinetics always used the full- lished’ enzymes are presented as well to show the full range of currently length plasmid substrates to maintain consistency with targeted positions in the I-AniI interface. (b) Experimentally determined previously collected kinetic data (12). The enzyme specificity for the I-AniI endonuclease (Y2), derived from previously reaction buffer was previously optimized for activity and published kinetic data on each single base-pair substitution (12). Experimental specificity is defined in the Methods section on computa- stability of I-AniI, and contained final concentrations of tional specificity prediction—a value close to 1.0 indicates that the 170 mM KCl, 10 mM MgCl2, 20 mM Tris, pH 9.0 and enzyme has high specificity and a value of 0.25 indicates that all nu- 1 mM DTT. To collect EC1/2max data, corresponding to cleotides are cleaved equally or that one other nucleotide is significantly the half-maximal cleavage of the target site, endonuclease preferred over the target nucleotide. variants was assayed at several enzyme dilutions with Nucleic Acids Research, 2014, Vol. 42, No. 4 2567