Genome Engineering with and TALENs

With the advances in large-scale sequencing, it is now possible to sequence whole quite readily. This new technology opens up revolutionary possibilities in agriculture, animal husbandry and molecular medicine. If we could edit genomes at precise locations, we could impart desirable qualities in agricultural plants, breed improved animals for farm labor or for human consumption and most importantly cure genetic diseases.

As we already discussed in reasonable detail, the most basic function of a genome is to replicate faithfully, and propagate the replicas. The required functions for nearly error-free copying of the genetic information, correction of occasional mistakes, and repair of various types of DNA damage due to physical and chemical agents are thus encoded in the genome. The pathways for replication, recombination and repair are intimately connected. The premise of directed genome engineering is that once a DNA damage has been induced at a pre-determined genetic locus, we may take advantage of the pre-existing cellular mechanisms to repair that damage in specific ways- to correct a certain mutation, insert a new sequence, delete an existing sequence etc.

The approach is thus to develop methodologies for inducing the DNA damage, usually a double strand break, at the desired genomic locale. Here we will discuss two technologies, one embodied by ‘Zinc finger ’ and the other by ‘TALENs’ (Transcription Activator-Like Effector Nucleases).

Zinc finger nucleases

Restriction enzymes are utilized by bacteria to restrict invasion by foreign nucleic acids (phage DNA, DNA etc.). The widely used Type II restriction enzymes recognize specific sequences (4 to 6 base pairs; some longer sequences), and make double strand DNA breaks. These enzymes are therefore quite useful for making site-specific DNA breaks to string together DNA fragments. They have revolutionized the technology for cloning, shuttling and other manipulations of . The problem with using restriction enzymes for is the presence of a large number of the susceptible sites even within relatively small genomes. For example, a six-base pair recognition sequence will be present, on average, once every 4096 bp (46 bp). Thus making a cut only at one of these sites will be nearly impossible.

FokI

However, one might be able to make use of the cutting activity of a and combine it with a desired DNA binding specificity borrowed from a different source. This indeed was what was done in the development of zinc finger nucleases. The restriction enzyme chosen for this purpose was FokI. As pointed out, the restriction enzymes that are most commonly employed are Type II enzymes. They bind to their target sequence, and cut the DNA strands at this sequence (within it or at its edges). FokI is slightly different. It is Type IIS enzyme, which cuts at some distance from its binding site. In addition, its DNA cleaving domain (nuclease domain) can be separated from its DNA binding domain by proteolysis. Now that we have a nuclease domain, the next step is to fuse it to a peptide domain with defined DNA binding specificity. This is where the Zinc finger comes into the picture.

Zinc finger domains

‘Zinc Finger’ is the general name given to a peptide, about 30 amino acids, which coordinates the metal zinc. There are different types of Zinc fingers. A common one is the Cys2-His2 finger, which utilizes two cysteine and two histidine residues to bind zinc. Zinc fingers are widely distributed among transcription factors which bind to promoter sequences to activate expression. These proteins usually contain more than one zinc finger unit.

The crystal structure of a three zinc finger unit bound to DNA was solved, which revealed the nature of the amino acid-DNA interactions responsible for sequence specific binding (see Figure 1 below).

Figure 1. The structure shown at the left (a) shows three zinc fingers making contacts with the DNA through the major groove. Each Zinc finger is made up of an alpha-helix (which makes contacts with specific bases and two beta strands. A single Zinc finger shown at the right (b) shows the Zinc atom coordinated by Cys2-His2 as well as the residues responsible for making base contacts. -1 refers to a residue from the loop just before the start of the alpha-helix.

The sequence segment recognized by an individual zinc finger is 3 nucleotides. The sequence range covered by three such fingers, the minimum number used in the construction of ZFNs, is therefore a string of 9 nucleotides. The chance occurrence of a chosen nonamer (9-mer) is one in 49 = ~250,000 base pairs in a genome. Since the active form of a ZFN is comprised of two sets of zinc fingers separated by a spacer DNA of appropriate length (see the explanation further on), the specific base recognition is at least 18 nucleotides (rather than 9) for a minimal ZFN. Thus the probability of a chosen target site being present elsewhere in the genome is almost zero. Thus, ZFNs have high specificity, and off-target recognition (and DNA cleavage) are quite unlikely.

However, the significant problem is in finding the right combination of three fingers that harbors the desired DNA sequence specificity (with sufficient discrimination by each zinc finger for its cognate triplet against other triplets). Considerable effort was spent in engineering zinc fingers that span a wide range of sequence specificity. The large number of naturally occurring transcription factors and other DNA binding proteins provided a valuable source of zinc fingers with multiple triplet recognition capability. A whole library of zinc-fingers was generated by randomizing the relevant amino acids and selecting for the desired specificity in a variety of selection procedures. In one study, the experimental strategy was to select zinc fingers that target 5’GNN3’, 5’ANN3’ and 5’CNN3’, where N is any base. For example, 5’GNN3’ will cover 16 triplet sequences.

Selection schemes: There are a number of ways for selecting desired binding specificities between a protein and a DNA sequence or between two protein sequences. A classical method is called selection by . For example, a DNA library coding for Zinc finger peptides with randomized sequences at the specificity residue positions is cloned into the gene coding for the coat protein of a filamentous bacteriophage. This phage library is introduced into the bacteria, and with a helper phage the library is amplified. That is we can create 108 to the 109 phage particles/ml displaying different types of zinc finger specificities on their surface (as part of their coat proteins. One can then design a synthetic double stranded oligonucleotide containing the desired trinucleotide motif, say 5’GTT3’/3’CAA5’ to bait a phage carrying the cognate zinc finger peptide in its coat protein. Such an oligonucleotide tagged by biotin at its 5’ end can be immobilized on a surface coated with streptavidin. When the library phage is incubated with the oligonucleotide, the phage bound to this sequence can be trapped, and washed free of the unbound phage. After elution, the bound phage can be amplified by propagation in bacteria, and the new phage batch can be subjected to a second round of enrichment. After multiple round of selection, the phage DNA corresponding to the coat protein is sequenced to identify the Zinc finger with the sought specificity. At each selection step control oligonucleotides lacking the target trinucleotide are included as competitors to minimize random binding. The competitor nucleotides are not biotinylated. They remain in solution, and are washed away along with any phage bound to them.

Fusing the Zinc finger module to the FokI nuclease domain

The zinc finger module (normally the module is tripartite, that is, consisting of three zinc fingers) is fused via its carboxyl-terminus to the FokI nuclease domain. The FokI restriction enzyme is a dimer, with each of the two nuclease domains cutting each of the two DNA strands. However, the FokI nuclease domain is a monomer in solution and does not form a stable dimer. Hence to target a specific DNA locus, one has to design two ZFN (zinc finger nuclease) modules such that their binding to DNA will bring the nuclease domains in proper proximity for dimerization. When the peptide linker between the zinc finger module and the nuclease domain is short, the optimal spacer length between the two DNA binding sites is 5 to 6 bp.