US00935.9599B2

(12) United States Patent (10) Patent No.: US 9,359,599 B2 Liu et al. (45) Date of Patent: Jun. 7, 2016

(54) ENGINEERED TRANSCRIPTION 8,759,104 B2 6/2014 Unciti-Broceta et al. ACTIVATOR-LIKE EFFECTOR (TALE) 76. R: 2.38 Eas st al DOMAINS AND USES THEREOF 8,846,578- W B2 9/2014 McCraya ca. et al. 2006, OO88864 A1 4/2006 Smoke et al. (71) Applicant: President and Fellows of Harvard 2008. O1822.54 A1 T/2008 R al College, Cambridge, MA (US) 2009,0130718 A1 5, 2009 Short 2009, 0234109 A1 9, 2009 Han et al. (72) Inventors: David R. Liu, Lexington, MA (US); 2010/0076057 A1 3/2010 Sontheimer et al. John Paul Guili Rid CO 2011 0104787 A1 5, 2011 Church et al. onn raui Gunger, Kigway, , 2011/O189776 A1 8/2011 Terns et al. (US); Vikram Pattanayak, Cambridge, 2012/0141523 A1 6, 2012 Castado et al. MA (US) 2012/0270273 Al 10/2012 Zhang et al. 2013/01 17869 A1* 5, 2013 Duchateau ...... C12N 9/22 (73) Assignee: President and Fellows of Harvard 2013,0130248 A1 5, 2013 H itz et al 800, 13 College, Cambridge, MA (US) 2013/0344117 A1 12/2013 MirosevichaWa ca. et al. 2014,0005269 A1 1, 2014 N luka et al. (*) Notice: Subject to any disclaimer, the term of this 2014, OO18404 A1 1, 2014 E. e al patent is extended or adjusted under 35 2014/0044793 A1 2/2014 GO11 et al. U.S.C. 154(b) by 0 days. 2014/0068797 A1 3/2014 Doudna et al. 2014/O127752 A1 5, 2014 Zhou et al. 2014/O141094 A1 5/2014 Smyth et al. (21) Appl. No.: 14/320,519 2014/0141487 A1 5/2014 Feldman et al. 2014/O186843 A1 7/2014 Zhang9. et al. (22) Filed: Jun. 30, 2014 2014/0186958 A1 7/2014 Zhang et al. (65) Prior Publication Data (Continued) US 2015/0056177 A1 Feb. 26, 2015 FOREIGN PATENT DOCUMENTS

O O AU 2012244264 11, 2012 Related U.S. Application Data CN 103233028 A 8, 2013 CN 103388006 A 11, 2013 (60) Provisional application No. 61/868,846, filed on Aug. CN 103614415 A 3, 2014 22, 2013. CN 103.642836 A 3, 2014 (51) Int. Cl. (Continued) CI2N I5/0 (2006.01) CI2N 9/22 (2006.01) OTHER PUBLICATIONS (52) U.S. Cl. International Search Report and Written Opinion for PCT/US2014/ CPC CI2N 15/01 (2013.01); C12N 9/22 (2013.01); O52231, mailed Dec. 4, 2014 C07K 2319/00 (2013.01); C07K 2319/80 • I (201 3.01 ) (Continued) (58) Field of Classification Search None See application file for complete search history. Primary Examiner — Suzanne M Noakes (56) Ref Cited Assistant Examiner — Jae W Lee eerees e (74) Attorney, Agent, or Firm — Wolf, Greenfield & Sacks, U.S. PATENT DOCUMENTS P.C. 5,780,053 A 7/1998 Ashley et al. 6,057,153 A 5/2000 George et al. (57) ABSTRACT 6,453,242 B1 9/2002 Eisenberg et al. 6,503,717 B2 1/2003 Case et al. 6,534,261 B1 3/2003 Cox, III et al. Engineered transcriptional activator-like effectors (TALEs) 6,599,692 B1 7/2003 Case et al. are versatile tools for genome manipulation with applications 6,607,882 B1 8/2003 Cox, III et al. in research and clinical contexts. One current drawback of 6,824,978 B1 1 1/2004 Cox, III et al. 6,933,113 B2 8, 2005 Case et al. TALEs is their tendency to bind and cleave off-target 6,979,539 B2 12/2005 Cox, III et al. sequence, which hampers their clinical application and ren 7,013,219 B2 3, 2006 Case et al. ders applications requiring high-fidelity binding unfeasible. 7,163,824 B2 1/2007 Cox, III et al. This disclosure provides engineered TALE domains and 7,479,573 B2 1/2009 Chu et al. 7,794,931 B2 9, 2010 Breaker et al. TALEs comprising Such engineered domains, e.g., TALE 8,569.256 B2 10/2013 Heyes et al. nucleases (TALENs). TALE transcriptional activators, TALE 8,680,069 B2 3/2014 de Fougerolles et al. transcriptional repressors, and TALE epigenetic modification 8,691,750 B2 4/2014 Constien et al. enzymes, with improved specificity and methods for gener 8,709,466 B2 4/2014 Coady et al. 8,728,526 B2 5, 2014 Heller ating and using such TALEs. 8,748,667 B2 6, 2014 Budzik et al. 8,758,810 B2 6, 2014 Okada et al. 8,759,103 B2 6, 2014 Kim et al. 28 Claims, 34 Drawing Sheets US 9,359,599 B2 Page 2

(56) References Cited WO WO 2011/O17293 A2 2/2011 WO WO 2011/O53982 A2 5, 2011 U.S. PATENT DOCUMENTS WO WO 2012/054726 A1 4, 2012 WO WO 2012/065043 A2 5, 2012 2014/0234289 A1 8, 2014 Liu et all WO WO 2012/138927 A2 10, 2012 2014/0273O37 A1 9, 2014 Wu WO WO 2012/158985 A2 11/2012 2014/0273226 A1 9, 2014 Wu WO WO 2012/158986 A2 11/2012 2014/0273230 A1 9, 2014 Chen et al. WO WO 2012, 164565 A1 12/2012 2014/0295556 Al 10/2014 Joung et al. WO WO 2013/O12674 A1 1/2013 2014/0295557 A1 10/2014 Joung et al. WO WO 2013/O13105 A2 1/2013 2014/034245.6 A1 11/2014 Mali et al. WO WO 2013,066438 A2 5, 2013 2014/0342457 A1 11/2014 Mali et al. WO WO 2013,098244 A1 T 2013 2014/0342458 A1 11/2014 Mali et al. WO WO 2013, 130824 A1 9, 2013 2014/03494.00 A1 11/2014 Jakimo et al. WO WO 2013, 141680 A1 9, 2013 2014/0356867 A1 12, 2014 Peter et al. WO WO 2013/142578 A2 9, 2013 2014/0356956 All 12/2014 Church et al. WO WO 2013, 166315 A1 11, 2013 2014/0356958 A1 12, 2014 Mali et al. WO WO 2013, 169398 A2 11/2013 2014/0356959 A1 12, 2014 Church et al. WO WO 2013/1698O2 A1 11, 2013 2014/0357523 A1 12/2014 Zeiner et al. WO WO 2013, 176915 A1 11, 2013 2014/0377868 Al 12/2014 Joung et al. WO WO 2013, 17691.6 A1 11, 2013 2015, 0010526 A1 1/2015 Liu et al. WO WO 2013, 188037 A1 12/2013 2015,003 1089 A1 1/2015 Lindstrom WO WO 2013, 188522 A2 12/2013 2015,003 1132 A1 1/2015 Church et al. WO WO 2013, 188638 A2 12/2013 2015,003 1133 A1 1/2015 Church et al. WO WO 2014/005042 A2 1/2014 2015,0044191 A1 2/2015 Liu et al. WO WO 2014/O11237 A1 1/2014 2015,0044192 A1 2/2015 Liu et al. WO WO 2014/011901 A2 1/2014 2015/0050699 A1 2/2015 Silksnys et al. WO WO 2014/O18423 A2 1/2014 2015.0056177 A1 2/2015 Liu et al. WO WO 2014/036219 A2 3, 2014 2015.0056629 A1 2/2015 Guthrie-Honea WO WO 2014/039523 A1 3/2014 2015,0064138 A1 3, 2015 Lu et al. WO WO 2014/059255 A1 4, 2014 WO WO 2014/065596 A1 5, 2014 2015,007 1898 A1 3, 2015 Liu et al. WO WO 2014/066505 A1 5, 2014 2015,007 1899 A1 3, 2015 Liu et al. WO WO 2014/071235 A1 5, 2014

2015, 0071901 A1 3, 2015 Liu et al. WO WO 2014/093479 A1 6, 2014

2015, 0071906 A1 3, 2015 Liu et al. WO WO 2014/093635 A1 6, 2014 2015/0098954 A1 4/2015 Hyde et al. WO WO 2014/093661 A2 6, 2014

2015 0140664 AI 52015 Byrne et al. WO WO 2014/093709 A1 6, 2014 WO WO 2014/093718 A1 6, 2014 FOREIGN PATENT DOCUMENTS WO WO 2014/093852 A1 6, 2014 WO WO 2014,099.744 A1 6, 2014 CN 103668472 A 3, 2014 WO WO 2014,099.750 A2 6, 2014 CN 103820441 A 5, 2014 WO WO 2014/113493 A1 T 2014 CN 103820454. A 5, 2014 WO WO 2014, 124226 A1 8, 2014 CN 10391 1376 A 7, 2014 WO WO 2014/127287 A1 8, 2014 N E. A 38: WO WO 2014, 128324 A1 8, 2014 CN 104109687. A 10/2014 WO WO 2014f1286.59 A1 8, 2014 WO WO 2014f130955 A1 8, 2014 N g A 1333 WO WO 2014f131833 A1 9, 2014 CN 104404036. A 3, 2015 WO WO 2014, 143381 A1 9, 2014 CN 10445O774 A 3, 2015 WO WO 2014, 144094 A1 9, 2014 CN 104480.144. A 4/2015 WO WO 2014, 144155 A1 9, 2014 CN 104498493 A 4/2015 WO WO 2014, 144288 A1 9, 2014 CN 104504304 A 4/2015 WO WO 2014f144592 A2 9, 2014 CN 104531704 A 4/2015 WO WO 2014, 144761 A2 9, 2014 CN 104531705. A 4/2015 WO WO 2014f145599 A2 9, 2014 CN 104560864. A 4/2015 WO WO 2014, 150624 A1 9, 2014 CN 104593418 A 5/2015 WO WO 2014/152432 A2 9, 2014 CN 104593422 A 5/2015 WO WO 2014, 153470 A2 9, 2014 CN 104611370 A 5/2015 WO WO 2014, 164466 A1 10, 2014 CN 104651392 A 5, 2015 WO WO 2014, 165177 A1 10, 2014 N 1933 A 328i WO WO 2014, 165349 A1 10, 2014 CN 104651401 A 5/2015 WO WO 2014, 165825 A2 10, 2014 CN 104673816. A 6, 2015 WO WO 2014f172458 A1 10, 2014 WO WO 2006/OO2547 A1 1/2006 WO WO 2014f172470 A 10, 2014 WO WO 2006/042112 A2 4, 2006 WO WO 2014f172489 A2 10, 2014 WO WO 2010/054108 A2 5, 2010 WO WO 2014, 1827OO A1 11, 2014 WO WO 2010/0541.54 A2 5, 2010 WO WO 2014, 183071 A2 11/2014 WO WO 2010/068289 A2 6, 2010 WO WO 2014, 184143 A1 11, 2014 WO WO 2010, 129019 A2 11/2010 WO WO 2014, 184741 A1 11, 2014 WO WO 2010, 144150 A2 12/2010 WO WO 2014, 184744 A1 11, 2014 US 9,359,599 B2 Page 3

(56) References Cited OTHER PUBLICATIONS FOREIGN PATENT DOCUMENTS International Search Report and Written Opinion for PCT/US2014/ 050283, mailed Nov. 6, 2014. WO WO 2014, 186585 A2 11/2014 Invitation to Pay Additional Fees for PCT/US2014/054291, mailed WO WO 2014, 186686 A2 11/2014 WO WO 2014, 190181 A1 11, 2014 Dec. 18, 2014. WO WO 2014, 191128 A1 12/2014 GenBank Submission; NIH/NCBI, Accession No.J.04623. Kita et al., WO WO 2014, 191518 A1 12/2014 Apr. 26, 1993. 2 pages. WO WO 2014, 191521 A2 12/2014 NCBI Reference Sequence: NM 002427.3. Wu et al., May 3, 2014. WO WO 2014, 191525 A1 12/2014 5 pages. WO WO 2014, 191527 A1 12/2014 Barrangou, RNA-mediated programmable DNA cleavage. Nat WO WO 2014, 194190 A1 12/2014 Biotechnol. Sep. 2012:30(9):836-8. doi:10.1038/nbt.2357. WO WO 2014, 197568 A2 12/2014 Carroll, A CRISPR approach to targeting. Mol Ther. Sep. WO WO 2014, 197748 A2 12/2014 2012:20(9): 1658-60.doi:10.1038/mt.2012.171. WO WO 2014, 200659 A1 12/2014 Fuchs et al., Polyarginine as a multifunctional fusion tag. Sci. WO WO 2014, 201015 A2 12/2014 WO WO 2014,204578 A1 12/2014 Jun. 2005;14(6): 1538-44. WO WO 2014, 204723 A1 12/2014 Liu et al., Fast Colorimetric Sensing of Adenosine and Cocaine Based WO WO 2014, 204724 A1 12/2014 on a General Sensor Design Involving Aptamers and Nanoparticles. WO WO 2014, 204725 A1 12/2014 Angew Chem. 2006:118(1):96-100. WO WO 2014, 204726 A1 12/2014 Mussolino et al., TALE nucleases: tailored genome engineering WO WO 2014, 204727 A1 12/2014 made easy. Curr Opin Biotechnol. Oct. 2012:23(5):644-50. doi: WO WO 2014, 204728 A1 12/2014 10.1016/j.copbio.2012.01.013. Epub Feb. 17, 2012. WO WO 2014, 204729 A1 12/2014 O'Connell et al., Programmable RNA recognition and cleavage by WO WO 2015,002780 A1 1, 2015 CRISPR/Cas9. Nature. Sep. 28, 2014. doi:10.1038/nature 13769. WO WO 2015,004241 A2 1, 2015 International Search Report and Written Opinion for PCT/US2012/ WO WO 2015.006290 A1 1, 2015 047778, mailed May 30, 2013. WO WO 2015.006294 A2 1, 2015 WO WO 2015, OO6498 A2 1, 2015 International Preliminary Report on Patentability for PCT/US2012/ WO WO 2015, OO6747 A2 1, 2015 047778, mailed Feb. 6, 2014. WO WO 2015/0101.14 A1 1, 2015 International Search Report for PCT/US2013/032589, mailed Jul. WO WO 2015/O13583 A2 1/2015 26, 2013. WO WO 2015,017866 A1 2, 2015 GenBank Submission; NIH/NCBI, Accession No.J.04623. Kita et al., WO WO 2015,0185.03 A1 2/2015 Aug. 26, 1993. 2 pages. WO WO 2015,021353 A1 2, 2015 GenBank Submission; NIH/NCBI, Accession No. NC 002737.1. WO WO 2015,021426 A1 2/2015 Ferretti et al., Jun. 27, 2013. 1 page. WO WO 2015/021990 A1 2, 2015 GenBank Submission; NIH/NCBI, Accession No. NC 015683.1. WO WO 2015,024.017 A2 2, 2015 Trost et al., Jul. 6, 2013. 1 page. WO WO 2015,026883 A1 2, 2015 GenBank Submission; NIH/NCBI, Accession No. NC 016782.1. WO WO 2015,026885 A1 2, 2015 Trost et al., Jun. 11, 2013. 1 page. WO WO 2015,026886 A1 2, 2015 GenBank Submission; NIH/NCBI, Accession No. NC 016786.1. WO WO 2015,026887 A1 2, 2015 Trost et al., Aug. 28, 2013. 1 page. WO WO 2015,027134 A1 2/2015 GenBank Submission; NIH/NCBI, Accession No. NC 017053.1. WO WO 2015,03O881 A1 3, 2015 Fittipaldi et al., Jul. 6, 2013. 1 page. WO WO 2015,031619 A1 3/2015 GenBank Submission; NIH/NCBI, Accession No. NC 017317.1. WO WO 2015,031775 A1 3, 2015 Trost et al., Jun. 11, 2013. 1 page. WO WO 2015/033293 A1 3/2015 GenBank Submission; NIH/NCBI, Accession No. NC 017861.1. WO WO 2015,034872 A2 3, 2015 Heidelberg et al., Jun. 11, 2013. 1 page. WO WO 2015,035.136 A2 3, 2015 GenBank Submission; NIH/NCBI, Accession No. NC 0180 10.1. WO WO 2015,035139 A2 3, 2015 Lucas et al., Jun. 11, 2013. 2 pages. W8 W858:38-39 A; 38. stan, Slip N.N.B. Accession No. NC 018721.1. WO WO 2015. O404O2 A1 3/2015 eng et al. Jun. 11, ZU13.1 pages. WO WO 2015,048577 A2 4/2015 stakes, NCBI. Accession No. NC 021284.1. W W 28:8. A. 3.39. GenBank Submission; NIH/NCBI, Accession No. NC 021314.1. WO WO 2015,052231 A2 4, 2015 Zhangget et al.,al., Jul.Jul 15,13, 2013. 1 page.pag WO WO 2015,053995 A1 4/2015 snak Sysis, NINCB). Accession No. NC 021846.1. W. W. 38.933 A. 338 GenBank Submission; NIH/NCBI, Accession No. NP 472073.1. WO WO 2015,05798O A1 4, 2015 Glaser et al., Jun. 27, 2013, 2 pages. WO WO 2015,059265 A1 4/2015 GenBank Submission; NIH/NCBI, Accession No. P42212. Prasher WO WO 2015, O65964 A1 5, 2015 et al., Mar. 19, 2014. 7 pages. WO WO 2015/0661 19 A1 5/2015 GenBank Submission; NIH/NCBI, Accession No. YP 002342100. WO WO 2015,066637 A1 5, 2015 1. Bernardini et al., Jun 10, 2013, 2 pages. WO WO 2015, O70O83 A1 5/2015 GenBank Submission; NIH/NCBI, Accession No. YP 002344900. WO WO 2015, O70193 A1 5, 2015 1. Gundogdu et al., Mar. 19, 2014. 2 pages. WO WO 2015, O70212 A1 5/2015 GenBank Submission; NIH/NCBI, Accession No. YP 820832.1. WO WO 2015, O71474 A2 5/2015 Makarova et al., Aug. 27, 2013. 2 pages. WO WO 2015/073683 A2 5/2015 UniProt Submission; UniProt, Accession No. P04275. Last modified WO WO 2015,073,867 A1 5/2015 Jul. 9, 2014, version 107. 29 pages. WO WO 2015,07399.0 A1 5/2015 UniprotSubmission; UniProt, Accession No. P04264. Last modified WO WO 2015, O7505.6 A1 5/2015 Jun. 11, 2014, version 6.15 pages. WO WO 2015, O75154 A2 5/2015 UniprotSubmission; UniProt, Accession No. P01011. Last modified WO WO 2015, O75195 A1 5/2015 Jun. 11, 2014, version 2. 15 pages. WO WO 2015, O77290 A2 5/2015 Bedell et al. In vivo genome editing using a high-efficiency TALEN WO WO 2015, O77318 A1 5/2015 system. Nature. Nov. 1, 2012:491 (7422): 114-8. Doi:10.1038/na WO WO 2015, O7905.6 A1 6, 2015 ture 11537. Epub Sep. 23, 2012. US 9,359,599 B2 Page 4

(56) References Cited Jinek et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. Aug. 17. OTHER PUBLICATIONS 2012:337(6096):816-21. doi:10.1126/science. 1225829. Epub Jun. 28, 2012. Bochet al., Breaking the code of DNA binding specificity of TAL Joung et al...TALENs: a widely applicable technology for targeted type III effectors. Science. Dec. 11, 2009:326(5959): 1509-12. Doi: genome editing. Nat Rev Mol Cell Biol. Jan. 2013;14(1):49-55. doi: 10.1126/science. 11788.11. 10.1038/nrm3486. Epub Nov. 21, 2012. Boch. TALEs of genome targeting. Nat Biotechnol. Feb. Kaiser et al., Gene therapy. Putting the fingers on gene repair. Sci 2011:29(2):135-6. Doi:10.1038/nbt. 1767. ence. Dec. 23, 2005:310(5756): 1894-6. Cade et al., Highly efficient generation of heritable zebrafish gene Kim et al. A library of TAL effector nucleases spanning the human mutations using homo- and heterodimeric TALENs Nucleic Acids genome. Nat Biotechnol. Mar. 2013:31(3):251-8. Doi:10.1038/nbt. Res. Sep. 2012:40(16):8001-10. Doi:10.1093/nar/gks518. Epub Jun. 2517. Epub Feb. 17, 2013. 7, 2012. Kim et al., TALENs and ZFNs are associated with different Carroll et al., Gene targeting in Drosophila and Caenorhabditis mutationsignatures. Nat Methods. Mar. 2013; 10(3):185. doi: 10.1038/nmeth.2364. Epub Feb. 10, 2013. elegans with zinc-finger nucleases. Methods Mol Biol. 2008:435:63 Lei et al., Efficient targeted gene disruption in Xenopus embryos 77 doi:10.1007/978-1-59745-232-8 5. using engineered transcription activator-like effector nucleases Cermak et al. Efficient design and assembly of custom TALEN and (TALENs). Proc Natl AcadSci USA. Oct. 23, 2012:109(43): 17484 other TAL effector-based constructs for DNA targeting. Nucleic 9. Doi:10.1073/pnas. 1215421 109. Epub Oct. 8, 2012. Acids Res. Jul. 2011:39(12):e82. Doi. 10.1093/margkr218. Epub Li et al., Modularly assembled designer TAL effector nucleases for Apr. 14, 2011. targeted gene knockout and gene replacement in eukaryotes. Nucleic Charpentier et al., Biotechnology: Rewriting a genome. Nature. Mar. Acids Res. Aug. 2011:39(14):6315-25. doi: 10.1093/nar/gkr188. 7, 2013:495(7439):50-1. doi:10.1038/495.050a. Epub Mar. 31, 2011. Christian et al. Targeting G with TAL effectors: a comparison of Lietal. TAL nucleases (TALNs): hybrid composed of TAL activities of TALENs constructed with NN and NK repeat variable effectors and FokI DNA-cleavage domain. Nucleic Acids Res. Jan. di-residues. PLoS One. 2012;7(9):e45383. doi: 10.1371/journal. 2011; 39(1):359-72. doi:10.1093/nar/gkg704. Epub Aug. 10, 2010. pone.0045383. Epub Sep. 24, 2012. Liu et al., Cell-penetrating peptide-mediated delivery of TALEN Christian et al., Targeting DNA double-strand breaks with TAL effec proteins via bioconjugation for genome engineering. PLoS One. Jan. tor nucleases. Genetics. Oct. 2010; 186(2):757-61. Doi:10.1534/ge 20, 2014:9(1):e85755. doi:10.1371journal.pone.0085755.. eCollec netics. 110.120717. Epub Jul. 26, 2010. tion 2014. Cong et al., Comprehensive interrogation of natural TALE DNA Maeder et al., Robust, Synergistic regulation of human gene expres binding modules and transcriptional repressor domains. Nat Com sion using TALE activators. Nat Methods. Mar. 2013; 10(3):243-5. mun. Jul. 24, 2012:3:968. doi:10.1038/ncomms 1962. doi:10.1038/nmeth.2366. Epub Feb. 10, 2013. Dahlem et al., Simple methods for generating and detecting Mahfouz et al., De novo-engineered transcription activator-like specific mutations induced with TALENs in the zebrafish genome. effector (TALE) hybrid nuclease with novel DNA binding specificity PLoS Genet. 2012:8(8):e 1002861. doi: 10.1371/journalipgen. creates double-strand breaks. Proc Natl Acad Sci USA. Feb. 8, 1002861. Epub Aug. 16, 2012. 2011; 108(6):2623-8. doi:10.1073/pnas. 1019533108. Epub Jan. 24, De Souza, Primer: genome editing with engineered nucleases. Nat 2011. Methods. Jan. 2012:9(1):27. Maketal. The crystal structure of TAL effector PthXo1 bound to its Ding et al., ATALEN genome-editing system for generating human DNA target. Science. Feb. 10, 2012:335(6069):716-9, doi:10.1126/ stem cell-based disease models. Cell Stem Cell. Feb. 7, science.1216211. Epub Jan. 5, 2012. 2013; 12(2):238-51. Doi:10.1016/j.stem.2012. 11.011. Epub Dec. 13, Meckler et al., Quantitative analysis of TALE-DNA interactions sug 2012. gests polarity effects. Nucleic Acids Res. Apr. 2013:41(7):41 18-28. Esvelt et al. A system for the continuous directed evolution of doi:10.1093/nar/gkt085. Epub Feb. 13, 2013. biomolecules. Nature. Apr. 28, 2011:472(7344):499-503. doi: Miller et al., A TALE nuclease architecture for efficient genome 10.1038/nature09929. Epub Apr. 10, 2011. editing. Nat Biotechnol. Feb. 2011:29(2): 143-8. doi:10.1038/nbt. ESvelt et al., Genome-scale engineering for systems and synthetic 1755. Epub Dec. 22, 2010. biology. Mol Syst Biol. 2013:9:641. doi:10.1038/msb.2012.66. Moore et al., Improved somatic mutagenesis in Zebrafish using tran Gaj et al., ZFN, TALEN, and CRISPR/Cas-based methods for scription activator-like effector nucleases (TALENs). PloS One. genome engineering. Trends Biotechnol. Jul. 2013:31(7):397-405. 2012;7(5):e37877. Doi:10.1371journal.pone.0037877. Epub May doi:10.1016/j.tibtech.2013.04.004. Epub May 9, 2013. 24, 2012. Gao et al., Crystal structure of a TALE protein reveals an extended Morbitzer et al., Assembly of custom TALE-type DNA binding N-terminal DNA binding region. Cell Res. Dec. 2012:22(12): 1716 domains by modular cloning. Nucleic Acids Res. Jul. 20. doi:10.1038/cr2012.156. Epub Nov. 13, 2012. 2011:39(13):5790-9. doi:10.1093/nar/gkr151. Epub Mar. 18, 2011. Guilinger et al., Broad specificity profiling of TALENs results in Moscou et al. A simple cipher governs DNA recognition by TAL engineered nucleases with improved DNA-cleavage specificity. Nat effectors. Science. Dec. 11, 2009:326(5959): 1501. doi:10.1126/sci Methods. Apr. 2014; 11(4):429-35. doi:10.1038/nmeth.2845. Epub ence. 1178817. Feb. 16, 2014. Mussolino et al. A novel TALE nuclease scaffold enables high Hale et al., RNA-guided RNA cleavage by a CRISPR RNA-Cas genome editing activity in combination with low toxicity. Nucleic protein complex. Cell. Nov. 25, 2009; 139(5):945-56. doi:10.1016/j. Acids Res. Nov. 2011:39(21):9283-93. Doi: 10.1093/margkrS97. ce11.2009.07.040. Epub Aug. 3, 2011. Hockemeyer et al., Genetic engineering of human pluripotent cells Narayanan et al., Clamping down on weak terminal base pairs: using TALE nucleases. Nat Biotechnol. Jul. 7, 2011:29(8):731-4. oligonucleotides with molecular caps as fidelity-enhancing elements doi:10.1038/nbt. 1927. at the 5’- and 3'-terminal residues. Nucleic Acids Res. May 20, Huang et al., Heritable gene targeting in Zebrafish using customized 2004:32(9): 2901-11. Print 2004. TALENs. Nat Biotechnol. Aug. 5, 2011:29(8):699-700. doi: Osborn et al., TALEN-based gene correction for epidermolysis 10.1038/nbt. 1939. bullosa. Mol Ther. Jun. 2013:21(6): 1151-9. doi:10.1038/mt.2013. Humbert et al., Targeted gene therapies: tools, applications, optimi 56. Epub Apr. 2, 2013. zation. Crit Rev Biochem Mol Biol. May-Jun. 2012:47(3):264-81. Pan et al., Biological and biomedical applications of engineered doi:10.3109/10409238.2012.658112. nucleases. Mol Biotechnol. Sep. 2013:55(1):54-62. doi: 10.1007/ Hurt et al., Highly specific Zinc finger proteins obtained by directed S12033-012-9613-9. domain shuffling and cell-based selection. Proc Natl AcadSci USA. Pennisi et al., The tale of the TALEs. Science. Dec. 14, Oct. 14, 2003: 100(21): 12271-6. Epub Oct. 3, 2003. 2012:338(6113): 1408-11. doi:10.1126/science.338.6113. 1408. US 9,359,599 B2 Page 5

(56) References Cited Caron et al., Intracellular delivery of a Tat-eGFP fusion protein into muscle cells. Mol Ther. Mar. 2001:3(3):310-8. OTHER PUBLICATIONS Chung-Il et al., Artificial control of gene expression in mammalian cells by modulating RNA interference through aptamer-Small mol Perez-Pinera et al., Advances in targeted genome editing. Curr Opin ecule interaction. RNA. May 2006; 12(5):710-6. Epub Apr. 10, 2006. Chem Biol. Aug. 2012:16(3-4):268-77. doi:10.1016/j.cbpa. 2012.06. Cradick et al., ZFN-site searches genomes for Zinc finger nuclease 007. Epub Jul. 20, 2012. target sites and off-target sites. BMC Bioinformatics. May 13, Peteket al., Frequent endonuclease cleavage at off-target locations in 2011; 12:152. doi:10.1186/1471-2105-12-152. vivo. Mol Ther. May 2010:18(5):983-6. Doi: 10.1038/mt. 2010.35. Gilleron et al., Image-based analysis of lipid nanoparticle-mediated Epub Mar. 9, 2010. siRNA delivery, intracellular trafficking and endosomal escape. Nat Porteus, Design and testing of Zinc finger nucleases for use in mam Biotechnol. Jul. 2013:31(7):638-46. doi: 10.1038/nbt.2612. Epub malian cells. Methods Mol Biol. 2008:435:47-61. doi:10.1007/978 Jun. 23, 2013. 1-59745-232-8 4. Guo et al., Protein tolerance to random amino acid change. J Gene Reyonet al., Flashassembly of TALENs for high-throughput genome Med. Mar.-Apr. 2002:4(2):195-204. editing. Nat Biotechnol. May 2012:30(5):460-5. doi:10.1038/nbt. Hasadsri et al., Functional protein delivery into neurons using poly 217 O. meric nanoparticles. J Biol Chem. Mar. 13, 2009:284(11):6972-81. Sander et al., Targeted gene disruption in Somatic Zebrafish cells doi:10.1074/bc.M805956200. Epub Jan. 7, 2009. using engineered TALENs. Nat Biotechnol. Aug. 5, 2011:29(8):697 Hill et al., Functional analysis of conserved histidines in ADP-glu 8. doi:10.1038/nbt. 1934. cose pyrophosphorylase from Escherichia coli.Biochem Biophys Sashital et al., Mechanism of foreign DNA selection in a bacterial Res Commun. Mar. 17, 1998:244(2):573-7. adaptive immune system. Mol Cell. Jun. 8, 2012:46(5):606-15. doi: Houdebine. The methods to generate transgenic animals and to con 10.1016/j.molcel.2012.03.020. Epub Apr. 19, 2012. trol transgene expression. J Biotechnol. Sep. 25, 2002:98(2-3): 145 Schriefer et al., Low pressure DNA shearing: a method for random 60. DNA sequence analysis. Nucleic Acids Res. Dec. 25. Kappel et al., Regulating gene expression in transgenic animals.Curr 1990:18(24):7455-6. Opin Biotechnol. Oct. 1992:3(5):548-53. Sheridan, Gene therapy finds its niche. Nat Biotechnol. Feb. Klauser et al... An engineered Small RNA-mediated genetic Switch 2011:29(2): 121-8. doi:10.1038/nbt. 1769. based on a ribozyme expression platform. Nucleic Acids Res. May 1, Siebert et al. An improved PCR method for walking in uncloned 2013:41 (10):5542-52. doi:10.1093/margkt253. Epub Apr. 12, 2013. genomic DNA. Nucleic Acids Res. Mar. 25, 1995:23(6):1087-8. Lazar et al., Transforming growth factor alpha: mutation of aspartic Sun et al., Optimized TAL effector nucleases (TALENs) for use in acid 47 and leucine 48 results in different biological activities. Mol treatment of sickle cell disease. Mol Biosyst. Apr. 2012;8(4): 1255 Cell Biol. Mar. 1988; 8(3):1247-52. 63. doi:10.1039/c2mb05461b. Epub Feb. 3, 2012. Lewis et al. A serum-resistant cytofectin for cellular delivery of Tesson et al., Knockout rats generated by embryo microinjection of antisense oligodeoxynucleotides and plasmid DNA. Proc Natl Acad TALENs. Nat Biotechnol. Aug. 5, 2011:29(8):695-6. doi:10.1038/ Sci U S A. Apr. 16, 1996:93(8):3176-81. nbt. 1940. Lundberg et al., Delivery of short interfering RNA using Weber et al., Assembly of designer TAL effectors by Golden Gate endosomolytic cell-penetrating peptides. FASEB J. Sep. cloning. PLoS One. 2011;6(5):e 19722. doi:10.1371/journal.pone. 2007:21(11):2664-71. Epub Apr. 26, 2007. 00.19722. Epub May 19, 2011. Mullins et al., Transgenesis in nonmurine species. Hypertension. Wiedenheft et al., RNA-guided genetic silencing systems in bacteria Oct. 1993:22(4):630-3. and archaea. Nature. Feb. 15, 2012:482(7385):331-8. doi:10.1038/ Nomura et al., Synthetic mammalian riboswitches based on guanine nature 10886. Review. aptazyme. Chem Commun (Camb). Jul. 21, 2012:48(57):7215-7. Wood et al., Targeted genome editing across species using ZFNs and doi:10.1039/c2cc33140c. Epub Jun. 13, 2012. TALENs. Science. Jul. 15, 2011:333(6040):307. doi:10.1126/sci Perez et al., Establishment of HIV-1 resistance in CD4+ T cells by ence. 1207773. Epub Jun. 23, 2011. genome editing using zinc-finger nucleases. Nat Biotechnol. Jul. Zhang et al., Efficient construction of sequence-specific TAL effec 2008:26(7):808-16. Doi:10.1038/nbt1410. Epub Jun. 29, 2008. tors for modulating mammalian transcription. Nat Biotechnol. Feb. Perez-Pinera et al., RNA-guided gene activation by CRISPR-Cas9 2011:29(2):149-53. doi:10.1038/nbt. 1775. Epub Jan. 19, 2011. based transcription factors.Nat Methods. Oct. 2013:10(10):973-6. Zou et al., Gene targeting of a disease-related gene in human induced doi:10.1038/nmeth.2600. Epub Jul. 25, 2013. pluripotent stem and embryonic stem cells. Cell Stem Cell. Jul. 2, Phillips, The challenge of gene therapy and DNA delivery. J Pharm 2009:5(1): 97-110. doi:10.1016/j.stem.2009.05.023. Epub Jun. 18, Pharmacol. Sep. 2001:53(9): 1169-74. 2009. Qi et al., Engineering naturally occurring trans-acting non-coding RNAS to sense molecular signals. Nucleic Acids Res. Jul. Partial Supplementary European Search Report for Application No. 2012:40(12):5775-86. doi:10.1093/nar/gks 168. Epub Mar. 1, 2012. EP 1284.5790.0, mailed Mar. 18, 2015. Qi et al., Repurposing CRISPR as an RNA-guided platform for International Search Report and Written Opinion for PCT/US2014/ sequence-specific control of gene expression. Cell. Feb. 28, 052231, mailed Jan. 30, 2015 (Corrected Version). 2013; 152(5): 1173-83. doi:10.1016/j.cell.2013.02.022. International Search Report and Written Opinion for PCT/US2014/ Ramakrishna et al., Gene disruption by cell-penetrating peptide 054247, mailed Mar. 27, 2015. mediated delivery of Cas9 protein and guide RNA. Genome Res. Jun. International Search Report and Written Opinion for PCT/US2014/ 2014:24(6):1020-7. doi:10.1101/gr. 171264. 113. Epub Apr. 2, 2014. 054291, mailed Mar. 27, 2015. Ran et al., Double nicking by RNA-guided CRISPR Cas9 for International Search Report and Written Opinion for PCT/US2014/ enhanced genome editing specificity. Cell. Sep. 12, 054252, mailed Mar. 5, 2015. 2013; 154(6): 1380-9. doi:10.1016/j.cell.2013.08.021. Epub Aug. 29, International Search Report and Written Opinion for PCT/US2014/ 2013. 070038, mailed Apr. 14, 2015. Samal et al., Cationic polymers and their therapeutic potential. Chem Boeckle et al., Melittin analogs with highlytic activity at endosomal Soc Rev. Nov. 7, 2012:41 (21):7147-94. doi:10.1039/c2cs35094g. pH enhance transfection with purified targeted PEI polyplexes. J Epub Aug. 10, 2012. Control Release. May 15, 2006; 112(2):240-8. Epub Mar. 20, 2006. Sang, Prospects for transgenesis in the chick. Mech Dev. Sep. Branden and Tooze, Introduction to Protein Structure. 1999; 2nd 2004; 121(9): 1179-86. edition. Garland Science Publisher: 3-12. Schwarze et al. In vivo protein transduction: delivery of a biologi Cameron, Recent advances in transgenic technology. Mol cally active protein into the mouse. Science. Sep. 3, Biotechnol. Jun. 1997;7(3):253-65. 1999:285(5433): 1569-72. US 9,359,599 B2 Page 6

(56) References Cited Wadia et al., Transducible TAT-HA fusogenic peptide enhances escape of TAT-fusion proteins after lipid raft macropinocytosis. Nat OTHER PUBLICATIONS Med. Mar. 2004; 10(3):310-5. Epub Feb. 8, 2004. Winkleretal. Thiamine derivatives bind messenger RNAs directly to Sells et al., Delivery of protein into cells using polycationic regulate bacterial gene expression. Nature. Oct. 31. liposomes. Biotechniques. Jul. 1995; 19(1): 72-6, 78. Thorpe et al., Functional correction of episomal mutations with short 2002:419(6910):952-6. Epub Oct. 16, 2002. DNA fragments and RNA-DNA oligonucleotides. JGene Med. Mar.- Zelphati et al., Intracellular delivery of proteins with a new lipid Apr. 2002:4(2):195-204. mediated delivery system. J Biol Chem. Sep. 14. Wacey et al., Disentangling the perturbational effects of amino acid 2001:276(37):35103-10. Epub Jul. 10, 2001. Substitutions in the DNA-binding domain of p53. Hum Genet. Jan. Zhang et al., CRISPR/Cas9 for genome editing: progress, implica 1999; 104(1): 15-22. tions and challenges. Hum Mol Genet. Sep. 15, 2014:23(R1):R40-6. Wadia et al., Modulation of cellular function by TAT mediated doi:10.1093/hmg/ddu 125. Epub Mar. 20, 2014. transduction of full length proteins. Curr Protein Pept Sci. Apr. 2003:4(2): 97-104. * cited by examiner

U.S. Patent Jun. 7, 2016 Sheet 2 of 34 US 9,359,599 B2

DNA ibrary of ~10 target site sequence variats son {icts:2aii: ξing circie apicatio

8itiaig {: the overitags iigatixx of axiage if x

°CR with two griefs. 8ciates i2-838siasts&c.38::ce 28ciates it xS

Airpicors containing cleaved ibrary members l {38: tisificatio

high-ixist: *i-sex:::::::ig origitatic88 itering A.EN DNA cleavage specificity profile

F.G. B.

U.S. Patent Jun. 7, 2016 Sheet 7 of 34 US 9,359,599 B2

0. ------

8:

8 S

38 it’s v- w w w w w w w w w w w w w w w w w w

& 8-888 '8+833

*::::::::: S-333 8.3+&S 38-38 8.8+38 &8.3-8:

inho. 3+33 ww.:3483 33-338 :::::::::::::: : 84 83 8 ------

*:::::::::::::3-838 $383

'83-833 3-33

8.8+8

2 3 a. 8 8 i is 33 8:a::iiie (; iiiia:it:8 it: i. - 8:3-3i: its {{{388 (initatics i3 ... + & 88:i-sies

F.G. 4A F.G. 4B U.S. Patent Jun. 7, 2016 Sheet 8 of 34 US 9,359,599 B2

83 gr:ror 8. 8x8.8+38 8 s :38-3-83

& 3. 8:3+8 ana 8.8-238 a an 83--R3

g8. 33+R8

:3--33 s :::::::::: -- 3 iS: 8 3...... 8:8...... *::34-3 www. : 3. 3

S. : 8-3 ...... 3.

33: 38.338 f ::::$33&S is . * R 8i-Sics

F.G. 4C

RSigEx: AAE&

88:

s:

3:

{C::::::::8 8:3 {f 88-88 88.88ca: 33 {g FG, SA FIG.S.B.

U.S. Patent Jun. 7, 2016 Sheet 10 of 34 US 9,359,599 B2

88::::::::: 3

8 & 88

388-89% grggr 8 CCR3A oft-target site 8 CCR5A off-target site i5

8.338 :::::3------

8. 38

8.338, :-8::::::::::...------C-term, dorrain; Can, : 33 : isk cit:a::: rig: Eikkr Eix

{ei}{3:33 specificity. 2 283 x 83 : 83. 88: F.G. 6

U.S. Patent Jun. 7, 2016 Sheet 17 of 34 US 9,359,599 B2

8.8 x3 pr: ; proggggggger,

888 & 38 :::::::::8:

A 8: 8:38 caix:::::

A38 3 8:38:

883 & 8

:

F.G. (A

U.S. Patent Jun. 7, 2016 Sheet 19 of 34 US 9,359,599 B2

At 8 cartrica:

::::::::: * : 3 x 38.

A: 33

A f

*::::::::::::::::::::::::::::

A 32; 3:33

A 8 carorica:

: ; ; ; ; ; ; ; ; ; ::::::::::::::

FG. A U.S. Patent Jun. 7, 2016 Sheet 20 of 34 US 9,359,599 B2

i; 28 x: {

so 1. 8 8. 9. 888 & 3: *:::::::::::

**************

8: 3 & 8

×? ?

is 3:

F.G. B.

U.S. Patent Jun. 7, 2016 Sheet 24 of 34 US 9,359,599 B2

3-38 C-88

scaracase case areas

s 3-k3 CCs 3. 8. &

83.3 -: : {{CRS8

...is -. ::::::::::::::::::: & :::::: 8 & 888 & 28.83 : 3 & 3:3: 3 & 8

F.G. 3B U.S. Patent Jun. 7, 2016 Sheet 25 of 34 US 9,359,599 B2

3. 3 (3.3 in

. 3.25

35 0.2 3. it r ; -: -3 -: S.- 8 s s resa : (3.3 s: . 3.

3.3S 88s 83 8.33. Seye is e-Et3: 3:33 eit waite norialized to on-target efici; eit F.G. 4A

g

... so 3 ' *+ : ; i a: x: i : :

: & C is a : " ' s s . . s

s

: s 8 t ;: stace betweet the two titators it the same af-site F.G. 4B U.S. Patent Jun. 7, 2016 Sheet 26 of 34 US 9,359,599 B2

CCRA C - actical &

x: : 888 &88 & 3.88:::::::::::: 88888 &&.88X888&8x8.

CCRA (f - actica

CCRA 28-aa or carcica

8- r x - i.e. taif-site Right half-site - 5

F.G. 5A

AV Q3 - Canonical

t

8. AM OF - canonical

: e 88:

*:::::::::::::::::::::: *::::::::::::::::::::::: ex-site Right a site - 5

F.G. SB U.S. Patent Jun. 7, 2016 Sheet 27 of 34 US 9,359,599 B2

{r^ - actica

8:8.&

::::::::::::::::::::::::: &88 & 3:38::::::::888& 3 • {{{tics

... it is site Right half-site. 3'

F.G. 5C A N - carica

388&:::::::::::::::::: *:::::::::::::::::: A N2 - carica

::::::::::::::::::::::: ::::::::::::::::::::::::: A N3 - carica

:::::::::::::::::::::::: ::::::::::::::::::::: - 8 8-site Right half-site - 5' F.G. SD U.S. Patent Jun. 7, 2016 Sheet 28 of 34 US 9,359,599 B2

{Xi, is taxic: « . . cxxxic:

l

eit half-site Right half-site . . F.G. SE

AIM 12 nM canonical - 24 nM canonical

:8::::::::::::::::::::::::: *:::::::::::::::::::::::::::: A carcica - castic:

888&33:838&88: 88883 ::::::::::::::::::::::::::::::::: A actica - - a caric:

... et aii-site Right half-site - 5 F.G. SF

U.S. Patent Jun. 7, 2016 Sheet 30 of 34 US 9,359,599 B2

A. As carotics 3.

: 3 & 8888 & 3: 8:::::::::::::::: 28:888;&fica: 2nki cartorical 8th caronical S8i 388fica:

&ii. 8: Of 8:8: {

h

: 8: 88: 83: 2 & 8::::::: Eiki 33 Erikk {: Six 82 Eikkk

*:Aspacer ength tip

F.G. 6B U.S. Patent Jun. 7, 2016 Sheet 31 of 34 US 9,359,599 B2

CCRA ANs castica

88.8 carica:

3. s: R s s 33 8 2 83 RNA spacer sing:

& - : ...... 2

: 38 23: { 33 288

acfia ikkr 83 Eikk if Ei 33% :

83 8 : : 38 383 88.88 of spaces 8A base pairs preceding the right haif-site

FG, 7A U.S. Patent Jun. 7, 2016 Sheet 32 of 34 US 9,359,599 B2

Ai is: gaotic& f

3:8 cafc883

:: -

A. ; ; ; ; ; : 83%, -iere i 8 &-- {: s --- & repre: ; : . . . . . 8 8: ::: 8:3 8: 88 38 8 2. : :

s S: X 2: RF 8, 7 8 38: - s: ...: i; s 8. E8A spacer ength 8:... see:& 8, 8.8, ex&Yrker-ex 3% exerxerxer ...... : {s S.

N:

3%, irrero 8 8 3

catorica iikkr 3 Eick 2 ERRR

8: - 8% .3%

; : irre 3%, -ex. i &-8-8---- : retrex, sex, es 8: ::: : 2. : : 8.83 of space: 38A3se pairs pieceding 88 fight 888-site

U.S. Patent Jun. 7, 2016 Sheet 34 of 34 US 9,359,599 B2

F.G. 9 US 9,359,599 B2 1. 2 ENGINEERED TRANSCRIPTION In scenarios where a TALEN is employed for the targeted ACTIVATOR-LIKE EFFECTOR (TALE) cleavage of a DNA sequence in the context of a complex DOMAINS AND USES THEREOF sample, e.g., in the context of a genome, it is often desirable for the TALEN to bind and cleave the specific target sequence RELATED APPLICATION only, with no or only minimal off-target cleavage activity (see, e.g., PCT Application Publication WO2013/066438A2, This application claims priority under 35 U.S.C. S 119(e) to the entire contents of which are incorporated herein by refer U.S. provisional patent application, U.S. Ser. No. 61/868, ence). In some embodiments, an ideal TALEN would specifi 846, filed Aug. 22, 2013, the entire contents of which are cally bind only its intended target sequence and have no incorporated herein by reference. 10 off-target activity, thus allowing the targeted cleavage of a single sequence, e.g., a single allele of agene of interest, in the GOVERNMENT SUPPORT context of a whole genome. Some aspects of this disclosure are based on the recogni This invention was made with U.S. Government support tion that the tendency of TALENs to cleave off-target under grant HR0011-11-2-0003 and N66001-12-C-4207, 15 sequences and the parameters affecting the propensity of awarded by the Defense Advanced Research Projects off-target TALEN activity are poorly understood. The work Agency; grant T32GM007753, awarded by the National presented here provides a better understanding of the struc Institute of General Medical Sciences; and grant DP1 tural parameters that result in TALEN off-target activity. GM105378 awarded by the National Institutes of Health. The Methods and systems for the generation of engineered TAL U.S. Government has certain rights in this invention. ENs having no or minimal off-target activity are provided herein, as are engineered TALENs having increased on-target BACKGROUND OF THE INVENTION cleavage efficiency and minimal off-target activity. It will be understood by those of skill in the art that the strategies, Transcription activator-like effector nucleases (TALENs) methods, and reagents provided herein for decreasing non are fusions of the FokI restriction endonuclease cleavage 25 specific or off-target DNA binding by TALENs are applicable domain with a DNA-binding transcription activator-like to other DNA-binding proteins as well. In particular, the effector (TALE) repeat array. TALENs can be engineered to strategies for modifying the amino acid sequence of DNA specifically bind and cleave a desired target DNA sequence, binding proteins for reducing unspecific binding to DNA by which is useful for the manipulation of nucleic acid mol Substituting cationic amino acid residues with amino acid ecules, , and genomes in vitro and in vivo. Engineered 30 residues that are not cationic, are uncharged, or are anionic at TALENs are useful in the context of many applications, physiological pH, can be used to decrease the specificity of including, but not limited to, basic research and therapeutic for example, other TALE effector proteins, engineered zinc applications. For example, engineered TALENs can be finger proteins (including Zinc finger nucleases), and Cas9 employed to manipulate genomes in the context of the gen proteins. eration of gene knockouts or knock-ins via induction of DNA 35 Some aspects of this disclosure provide engineered iso breaks at a target genomic site for targeted gene knockout lated Transcription Activator-Like Effector (TALE) domains. through non-homologous end joining (NHEJ) or targeted In some embodiments, the isolated TALE domain is an N-ter genomic sequence replacement through homology-directed minal TALE domain and the net charge of the isolated N-ter repair (HDR) using an exogenous DNA template, respec minal domain is less than the net charge of the canonical tively. TALENs are thus useful in the generation of geneti 40 N-terminal domain (SEQID NO: 1) at physiological pH. In cally engineered cells, tissues, and organisms. some embodiments, the isolated TALE domain is a C-termi TALENs can be designed to cleave any desired target DNA nal TALE domain and the net charge of the C-terminal sequence, including naturally occurring and synthetic domain is less than the net charge of the canonical C-terminal sequences. However, the ability of TALENs to distinguish domain (SEQ ID NO: 22) at physiological pH. In some target sequences from closely related off-target sequences has 45 embodiments, the isolated TALE domain is an N-terminal not been studied in depth. Understanding this ability and the TALE domain and the binding energy of the N-terminal parameters affecting it is of importance for the design of domain to a target nucleic acid molecule is Smaller than the TALENs having the desired level of specificity and also for binding energy of the canonical N-terminal domain (SEQID choosing unique target sequences to be cleaved, e.g., in order NO: 1). In some embodiments, the isolated TALE domain is to minimize the chance of undesired off-target cleavage. 50 a C-terminal TALE domain and the binding energy of the C-terminal domainto a target nucleic acid molecule is Smaller SUMMARY OF THE INVENTION than the binding energy of the canonical C-terminal domain (SEQID NO: 22). In some embodiments, the net charge of the TALENs are versatile tools for the manipulation of genes C-terminal domain is less than or equal to +6, less than or and genomes in vitro and in Vivo, as they can be designed to 55 equal to +5, less than or equal to +4, less than or equal to +3. bind and cleave virtually any target sequence within a nucleic less than or equal to +2, less than or equal to +1, less than or acid molecule. For example, TALENs can be used for the equal to 0, less than or equal to -1, less than or equal to -2. targeted deletion of a DNA sequence within a cellular less than or equal to -3, less than or equal to -4, or less than genome via induction of DNA breaks that are then repaired by or equal to -5. In some embodiments, the C-terminal domain the cellular DNA repair machinery through non-homologous 60 comprises an amino acid sequence that differs from the end joining (NHEJ). TALENs can also be used for targeted canonical C-terminal domain sequence in that at least one sequence replacement in the presence of a nucleic acid com cationic amino acid residue of the canonical C-terminal prising a sequence to be inserted into a genomic sequence via domain sequence is replaced with an amino acid residue that homology-directed repair (HDR). As TALENs can be exhibits no charge or a negative charge at physiological pH. In employed to manipulate the genomes of living cells, the 65 Some embodiments, the N-terminal domain comprises an resulting genetically modified cells can be used to generate amino acid sequence that differs from the canonical N-termi transgenic cell or tissue cultures and organisms. nal domain sequence in that at least one cationic amino acid US 9,359,599 B2 3 4 residue of the canonical N-terminal domain sequence is domain to a target nucleic acid molecule is less than the replaced with an amino acid residue that exhibits no charge or binding energy of the canonical C-terminal domain (SEQID a negative charge at physiological pH. In some embodiments, NO: 22). In some embodiments, the net charge on the C-ter at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, minal domain at physiological pH is less than or equal to +6. at least 7, at least 8, at least 9, at least 10, at least 11, at least less than or equal to +5, less than or equal to +4, less than or 12, at least 13, at least 14, or at least 15 cationic amino acid(s) equal to +3, less than or equal to +2, less than or equal to +1. in the isolated TALE domain is/are replaced with an amino less than or equal to 0, less than or equal to -1, less than or acid residue that exhibits no charge or a negative charge at equal to -2, less than or equal to -3, less than or equal to -4. physiological pH. In some embodiments, the at least one or less than or equal to -5. In some embodiments, the N-ter cationic amino acid residue is arginine (R) or lysine (K). In 10 minal domain comprises an amino acid sequence that differs Some embodiments, the amino acid residue that exhibits no from the canonical N-terminal domain sequence in that at charge or a negative charge at physiological pH is glutamine least one cationic amino acid residue of the canonical N-ter (Q) or glycine (G). In some embodiments, at least one lysine minal domain sequence is replaced with an amino acid resi or arginine residue is replaced with a glutamine residue. In due that does not have a cationic charge, has no charge, or has Some embodiments, the C-terminal domain comprises one or 15 an anionic charge. In some embodiments, the C-terminal more of the following amino acid replacements: K777Q. domain comprises an amino acid sequence that differs from K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some the canonical C-terminal domain sequence in that at least one embodiments, the C-terminal domain comprises a Q3 variant cationic amino acid residue of the canonical C-terminal sequence (K788Q, R792Q, K801Q). In some embodiments, domain sequence is replaced with an amino acid residue that the C-terminal domain comprises a Q7 variant sequence does not have a cationic charge, has no charge, or has an (K777Q, K778Q, K788Q, R789Q, R792Q, R793Q, R801Q). anionic charge. In some embodiments, at least 1, at least 2, at In some embodiments, the N-terminal domain is a truncated least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at version of the canonical N-terminal domain. In some embodi least 9, at least 10, at least 11, at least 12, at least 13, at least ments, wherein the C-terminal domain is a truncated version 14, or at least 15 cationic amino acid(s) is/are replaced with an of the canonical C-terminal domain. In some embodiments, 25 amino acid residue that does not have a cationic charge, has the truncated domain comprises less than 90%, less than 80%, no charge, or has an anionic charge in the N-terminal domain less than 70%, less than 60%, less than 50%, less than 40%, and/or in the C-terminal domain. In some embodiments, the less than 30%, or less than 25% of the residues of the canoni at least one cationic amino acid residue is arginine (R) or cal domain. In some embodiments, the truncated C-terminal lysine (K). In some embodiments, the amino acid residue that domain comprises less than 60, less than 50, less than 40, less 30 replaces the cationic amino acid is glutamine (Q) or glycine than 30, less than 29, less than 28, less than 27, less than 26, (G). Positively charged residues in the C-terminal domain less than 25, less than 24, less than 23, less than 22, less than that can be replaced according to aspects of this disclosure 21, or less than 20 amino acid residues. In some embodi include, but are not limited to, arginine (R) residues and ments, the truncated C-terminal domain comprises 60, 59,58, lysine (K) residues, e.g., R747, R770, K777, K778, K788, 57, 56,55, 54, 53, 52, 51, 50,49,48, 47,46, 45, 44, 43,42, 41, 35 R789, R792, R793, R797, and R801 in the C-terminal domain 40,39,38, 37, 36,35, 34,33, 32, 31, 30,39,38, 37, 36,35,34, (see. e.g., SEQID NO: 22, the numbering refers to the posi 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, tion of the respective residue in the full-length TALEN pro 16, 15, 14, 13, 12, 11, or 10 residues. In some embodiments, tein, the equivalent positions for the C-terminal domain as the isolated TALE domain is comprised in a TALE molecule provide in SEQID NO:22 are R8, R30, K37, K38, K48, R49, comprising the structure N-terminal domain-TALE repeat 40 R52, R53, R57, R61). Positively charged residues in the array-C-terminal domain-effector domain; or effector N-terminal domain that can be replaced according to aspects domain-N-terminal domain-TALE repeat array-C-ter of this disclosure include, but are not limited to, arginine (R) minal domain. In some embodiments, the effector domain residues and lysine (K) residues, e.g., K57, K78, R84, R97, comprises a nuclease domain, a transcriptional activator or K110, K113, and R114 (see, e.g., SEQ ID NO: 1). In some repressor domain, a recombinase domain, or an epigenetic 45 embodiments, at least one lysine or arginine residue is modification enzyme domain. In some embodiments, the replaced with a glutamine residue. In some embodiments, the TALE molecule binds a target sequence within a gene known C-terminal domain comprises one or more of the following to be associated with a disease or disorder. amino acid replacements: K777Q, K778Q, K788Q, R789Q, Some aspects of this disclosure provide Transcription Acti R792Q, R793Q, R801Q. In some embodiments, the C-termi vator-Like Effector Nucleases (TALENs) having a modified 50 nal domain comprises a Q3 variant sequence (K788Q. net charge and/or a modified binding energy for binding their R792Q, R801Q). In some embodiments, the C-terminal target nucleic acid sequence as compared to canonical TAL domain comprises a Q7 variant sequence (K777Q, K778Q. ENs. Typically, the inventive TALENs include (a) a nuclease K788Q, R789Q, R792Q, R793Q, R801Q). In some embodi cleavage domain; (b) a C-terminal domain conjugated to the ments, the N-terminal domain is a truncated version of the nuclease cleavage domain; (c) a TALE repeat array conju 55 canonical N-terminal domain. In some embodiments, the gated to the C-terminal domain; and (d) an N-terminal C-terminal domain is a truncated version of the canonical domain conjugated to the TALE repeat array. In some C-terminal domain. In some embodiments, the truncated embodiments, (i) the net charge on the N-terminal domain at domain comprises less than 90%, less than 80%, less than physiological pH is less than the net charge on the canonical 70%, less than 60%, less than 50%, less than 40%, less than N-terminal domain (SEQ ID NO: 1) at physiological pH: 60 30%, or less than 25% of the residues of the canonical and/or (ii) the net charge of the C-terminal domain at physi domain. In some embodiments, the truncated C-terminal ological pH is less than the net charge of the canonical C-ter domain comprises less than 60, less than 50, less than 40, less minal domain (SEQID NO: 22) at physiological pH. In some than 30, less than 29, less than 28, less than 27, less than 26, embodiments, (i) the binding energy of the N-terminal less than 25, less than 24, less than 23, less than 22, less than domain to a target nucleic acid molecule is less than the 65 21, or less than 20 amino acid residues. In some embodi binding energy of the canonical N-terminal domain (SEQID ments, the truncated C-terminal domain comprises 60, 59,58, NO: 1); and/or (ii) the binding energy of the C-terminal 57, 56,55, 54, 53, 52, 51, 50,49,48, 47,46, 45, 44, 43,42, 41, US 9,359,599 B2 5 6 40,39,38, 37, 36,35, 34,33, 32, 31, 30,39,38, 37, 36,35,34, In Some embodiments, the method comprises administering a 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, composition, e.g., a pharmaceutical composition, comprising 16, 15, 14, 13, 12, 11, or 10 residues. In some embodiments, the TALEN to the subject in an amount sufficient for the the nuclease cleavage domain is a FokI nuclease domain. In TALEN to bind and cleave the target site. Some embodiments, the FokI nuclease domain comprises a 5 Some aspects of this disclosure provide methods of pre sequence as provided in SEQ ID NOS: 26-30. In some paring engineered TALENs. In some embodiments, the embodiments, the TALEN is a monomer. In some embodi method comprises replacing at least one amino acid in the ments, the TALEN monomer dimerizes with another TALEN canonical N-terminal TALEN domain and/or the canonical monomerto forma TALEN dimer. In some embodiments, the C-terminal TALEN domain with an amino acid having no dimer is a heterodimer. In some embodiments, the TALEN 10 charge or a negative charge as compared to the amino acid binds a target sequence within a gene known to be associated being replaced at physiological pH; and/or truncating the with a disease or disorder. In some embodiments, the TALEN N-terminal TALEN domain and/or the C-terminal TALEN cleaves the target sequence upon dimerization. In some domain to remove a positively charged fragment; thus gener embodiments, the disease being treated or prevented is HIV ating an engineered TALEN having an N-terminal domain infection or AIDS, or a proliferative disease. In some embodi 15 and/or a C-terminal domain of decreased net charge at physi ments, the TALEN binds a CCR5 (C-C chemokine receptor ological pH. In some embodiments, the at least one amino type 5) target sequence in the treatment or prevention of HIV acid being replaced comprises a cationic amino acid or an infection or AIDS. In some embodiments, the TALEN binds amino acid having a positive charge at physiological pH. an ATM (ataxia telangiectasia mutated) target sequence. In Positively charged residues in the C-terminal domain that can some embodiments, the TALEN binds a VEGFA (Vascular be replaced according to aspects of this disclosure include, endothelial growth factor A) target sequence. but are not limited to, arginine (R) residues and lysine (K) Some aspects of this disclosure provide compositions com residues, e.g., R747, R770, K777, K778, K788, R789, R792, prising a TALEN described herein, e.g., a TALEN monomer. R793, R797, and R801 in the C-terminal domain. Positively In some embodiments, the composition comprises the inven charged residues in the N-terminal domain that can be tive TALEN monomer and a different inventive TALEN 25 replaced according to aspects of this disclosure include, but monomer that forma heterodimer, wherein the dimer exhibits are not limited to, arginine (R) residues and lysine (K) resi nuclease activity. In some embodiments, the composition is a dues, e.g., K57, K78, R84, R97, K110, K113, and R114. In pharmaceutical composition. Some embodiments, the amino acid replacing the at least one Some aspects of this disclosure provide a composition amino acid is a cationic amino acid or a neutral amino acid. In comprising a TALEN provided herein. In some embodi 30 some embodiments, the truncated N-terminal TALEN ments, the composition is formulated to be suitable for con domain and/or the truncated C-terminal TALEN domain tacting with a cellor tissue invitro. In some embodiments, the comprises less than 90%, less than 80%, less than 70%, less pharmaceutical composition comprises an effective amount than 60%, less than 50%, less than 40%, less than 30%, or less of the TALEN for cleaving a target sequence, e.g., in a cell or than 25% of the residues of the respective canonical domain. in a tissue in vitro or ex vivo. In some embodiments, the 35 In some embodiments, the truncated C-terminal domain com TALENbinds a target sequence within a gene of interest, e.g., prises less than 60, less than 50, less than 40, less than 30, less a target sequence within a gene known to be associated with than 29, less than 28, less than 27, less than 26, less than 25, a disease or disorder, and the composition comprises an effec less than 24, less than 23, less than 22, less than 21, or less tive amount of the TALEN for alleviating a sign and/or symp than 20 amino acid residues. In some embodiments, the trun tom associated with the disease or disorder. Some aspects of 40 cated C-terminal domain comprises 60, 59,58, 57, 56, 55, 54. this disclosure provide a pharmaceutical composition com 53, 52, 51, 50,49, 48,47, 46, 45, 44, 43,42, 41, 40,39,38, 37, prising a TALEN provided herein and a pharmaceutically 36,35,34, 33, 32, 31, 30,39,38, 37, 36, 35,34, 33, 32, 31, 30, acceptable excipient. In some embodiments, the pharmaceu 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, tical composition is formulated for administration to a Sub 12, 11, or 10 amino acid residues. In some embodiments, the ject. In some embodiments, the pharmaceutical composition 45 method comprises replacing at least 2, at least 3, at least 4, at comprises an effective amount of the TALEN for cleaving a least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at target sequence in a cell in the Subject. In some embodiments, least 11, at least 12, at least 13, at least 14, or at least 15 amino the TALEN binds a target sequence within a gene known to be acids in the canonical N-terminal TALEN domain and/or in associated with a disease or disorder, and the composition the canonical C-terminal TALEN domain with an amino acid comprises an effective amount of the TALEN for alleviating 50 having no charge or a negative charge at physiological pH. In a sign and/or symptom associated with the disease or disor Some embodiments, the amino acid being replaced is arginine der. (R) or lysine (K). In some embodiments, the amino acid Some aspects of this disclosure provide methods of cleav residuehaving no charge or a negative charge at physiological ing a target sequence in a nucleic acid molecule using a pHis glutamine (Q) or glycine (G). In some embodiments, the TALEN provided herein. In some embodiments, the method 55 method comprises replacing at least one lysine or arginine comprises contacting a nucleic acid molecule comprising the residue with a glutamine residue. target sequence with an inventive TALEN binding the target Some aspects of this disclosure provide kits comprising an sequence under conditions suitable for the TALEN to bind engineered TALEN as provided herein, or a composition and cleave the target sequence. In some embodiments, the (e.g., a pharmaceutical composition) comprising such a TALEN is provided as a monomer. In some embodiments, the 60 TALEN. In some embodiments, the kit comprises an excipi inventive TALEN monomer is provided in a composition ent and instructions for contacting the TALEN with the comprising a different TALEN monomer that can dimerize excipient to generate a composition Suitable for contacting a with the inventive TALEN monomer to form a heterodimer nucleic acid with the TALEN. In some embodiments, the having nuclease activity. In some embodiments, the inventive excipient is a pharmaceutically acceptable excipient. TALEN is provided in a pharmaceutical composition. In 65 The Summary above is meant to illustrate, in a non-limiting Some embodiments, the target sequence is in the genome of a manner, Some of the embodiments, advantages, features, and cell. In some embodiments, the target sequence is in a Subject. uses of the technology disclosed herein. Other embodiments, US 9,359,599 B2 7 8 advantages, features, and uses of the technology disclosed 89-109 from top to bottom). For each example shown, the herein will be apparent from the Detailed Description, the unmodified genomic site is the first sequence, followed by the Drawings, the Examples, and the Claims. top three sequences containing deletions. The numbers in parentheses indicate sequencing counts and the half-sites are BRIEF DESCRIPTION OF THE DRAWINGS underlined and bolded. FIGS. 4A-C. Predicted off-target genomic cleavage as a FIGS. 1A-B. TALEN architecture and selection scheme. function of TALEN length considering both TALEN speci (A) Architecture of a TALEN. ATALEN monomer contains ficity and off-target site abundance in the . (A) an N-terminal domain followed by an array of TALE repeats The enrichment value of on-target (Zero mutation) and off (brown), a C terminal domain (green), and a FokI nuclease 10 target sequences containing one to six mutations are shown cleavage domain (purple). The 12th and 13th amino acids (the for CCR5B TALENs of varying TALE repeat array lengths. RVD (SEQID NO: 43), red) of each TALE repeat recognize The TALENs targeted DNA sites of 32 bp (L16+R16), 29 bp a specific DNA . Two different TALENs bind their (L16+R13 or L13+R16), 26 bp (L16+R10 or L13+R13 or corresponding half-sites, allowing FokI dimerization and L10+R16), 23 bp (L13+R10 or L10+R13) or 20 bp (L10+ DNA cleavage; ttcattacacctgcagct is SEQID NO: 44, agctg 15 R10) in length. (B) Number of sites in the human genome caggtgtaatgaa is SEQID NO: 45; agtatcaattctggaagais SEQ related to each of the nine CCR5B on-target sequences (L10, ID NO: 46; and tettccagaattgatact is SEQ ID NO: 47. The L13, or L16 combined with R10, R13, or R16), allowing for C-terminal domain variants used in this study are shown in a spacer length from 12 to 25 bps between the two half-sites. green (SEQ ID NOs: 48–50, and 25, from top to bottom, (C) For all nine CCR5B TALENs, overall genomic off-target respectively). (B) A single-stranded library of DNA oligo cleavage frequency was predicted by multiplying the number nucleotides containing partially randomized left half-site (L), of sites in the human genome containing a certain number of spacer (S), right half-site (R) and constant region (thick black mutations by the enrichment value of off-target sequences line) was circularized, then concatemerized by rolling circle containing that same number of mutations shown in (A). amplification. The resulting DNA libraries were incubated Because enrichment values level off at high mutation num with an in vitro-translated TALEN of interest. Cleaved library 25 bers likely due to the limit of sensitivity of the selection, it was members were blunted and ligated to adapter #1. The ligation necessary to extrapolate high-mutation enrichment values by products were amplified by PCR using one primer consisting fitting enrichment value as function of mutation number of adapter #1 and the other primer consisting of adapter (Table 9). The overall predicted genomic cleavage was cal #2—constant sequence, which anneals to the constant culated only for mutation numbers with sites observed to regions. Amplicons 1/2 target-sequence cassettes in length 30 occur more than once in the human genome. were isolated by gel purification and Subjected to high FIGS. 5A-F. In vitro specificity and discrete cleavage effi throughput DNA sequencing and computational analysis. ciencies of TALENs containing canonical or engineered FIGS. 2A-G. In vitro selection results. The fraction of C-terminal domains. (A and B) On-target enrichment values sequences Surviving selection (green) and before selection for selections of (A) CCR5ATALENs or (B) ATMTALENs (black) are shown for CCR5ATALENs (A) and ATMTAL 35 containing canonical, Q3, Q7, or 28-aa C-terminal domains. ENs (B) as a function of the number of mutations in both (C) CCR5A on-target sequence (OnC) and double-mutant half-sites. (C) Specificity scores for the L18+R18 CCR5A sequences with mutations in red. (D) ATM on-target sequence TALEN at all positions in the target half-sites plus a single (OnA) and single-mutant sequences with mutations in red. flanking position. The colors range from a maximum speci (E) Discrete in vitro cleavage efficiency of DNA sequences ficity score of 1.0 to white (no specificity, score of 0) to a 40 listed in (C) with CCR5ATALENs containing either canoni maximum negative score of -1.0. Boxed bases represent the cal or engineered Q7 C-terminal domains. (F) Same as (E) for intended target base. (D) Same as (C) for the L18+R18 ATM ATM TALENs. Left half-site sequences in FIG. 5C corre TALEN. (E) Enrichment values from the selection of L13+ spond to SEQID NOs: 110-118 and right half-site sequences R13 CCR5B TALEN for 16 mutant DNA sequences (muta correspond to SEQ ID NOs: 119-127. Left half-site tions in red) relative to on-target DNA (OnB). (F) Correspon 45 sequences in FIG.5D correspond to SEQID NOs: 128-136 dence between discrete in vitro TALEN cleavage efficiency and right half-site sequences correspond to SEQ ID NOs: (cleaved DNA as a fraction of total DNA) for the sequences 137-145. listed in (E) normalized to on-target cleavage (-1) versus their FIG. 6. Specificity of engineered TALENs in human cells. enrichment values in the selection normalized to the on-target The cellular modification efficiency of canonical and engi enrichment value (F1). (G) Discrete assays of on-target and 50 neered TALENs expressed as a percentage of indels consis off-target sequences used in (F) as analyzed by PAGE. tent with TALEN-induced modification out of total sequences Sequences in FIG.2C correspond, from left to right and top to is shown for the on-target CCR5A sequence and for CCR5A bottom, to SEQID NOs: 51-54. Left half-site sequences in off-target site #5, the most highly cleaved off-target substrate FIG.2E correspond to SEQID NOs: 55-71 and right half-site tested. Cellular specificity, defined as the ratio of on-target to sequences correspond to SEQID NOs: 72-88. 55 off-target modification, is shown below each pair of bars. FIGS. 3A-C. Cellular modification induced by TALENs at FIGS. 7A-B. Target DNA sequences in human CCR5 and on-target and predicted off-target genomic sites. (A) For cells ATM genes. The target DNA sequences for the TALENs used treated with either no TALEN or CCR5ATALENs containing in this study are shown in black (A-B). The N-terminal heterodimeric EL/KK, heterodimeric ELD/KKR, or the TALEN end recognizing the 5' T for each half-site target is homodimeric (Homo) FokI variants, cellular modification 60 noted (5') and TALENs are named according to number of rates are shown as the percentage of observed insertions or base pairs targeted. TALENs targeting the CCR5 L18 and deletions (indels) consistent with TALEN cleavage relative to R18 shown are referred to as CCR5A TALENs while TAL the total number of sequences for on-target (On) and pre ENs targeting the L10, L13, L16, R10, R13 or R16 half-sites dicted off-target sites (Off). (B) Same as (A) for ATM TAL shown are referred to as CCR5B TALENs (A). ENs. (C) Examples of modified sequences at the on-target site 65 FIGS. 8A-B. Specificity profiles from all CCR5ATALEN and off-target sites for cells treated with CCR5A TALENs selections as heat maps. Specificity scores for every targeted containing the ELD/KKR FokI domains (SEQ ID NOs: base pair in selections of CCR5ATALENs are shown. Speci US 9,359,599 B2 9 10 ficity scores for the L18+R18 CCR5ATALEN at all positions FIGS. 11 A-C. Specificity profiles from all ATM TALEN in the target half-sites plus a single flanking position. The selections as bar graphs. Specificity scores for every targeted colors range from a maximum specificity score of 1.0 to white base pair in selections of ATMTALENs are shown. Positive (score of 0, no specificity) to a maximum negative score of specificity Scores, up to complete specificity at a specificity -1.0. Boxed bases represent the intended target base. The 5 score of 1.0, signify enrichment of that base pair over the titles to the right indicate if the TALEN used in the selection other possibilities at that position. Negative specificity scores, differs from the canonical TALEN architecture, which con down to complete antispecificity of -1.0, represents enrich tains a canonical C-terminal domain, wildtype N-terminal ment against that base pair. Specified positions were plotted domain, and EL/KK FokI variant. Selections correspond to as stacked bars above the X-axis (multiple specified base conditions listed in Table 2. (A) Specificity profiles of canoni 10 pairs at the same position were plotted over each other with cal, Q3, Q7, 28-aa, 32 nM canonical, 8 nM canonical, 4 nM the shortest bar in front, and not end-to-end) while anti canonical, 32 nM Q7 and 8 nM Q7 CCR5A TALEN selec specified base pairs were plotted as narrow, grouped bars. The tions. (B) Specificity profiles of 4 nM Q7, N1, N2, N3, titles to the right indicate if the TALEN used in the selection canonical ELD/KKR, Q3 ELD/KKR, Q7 ELD/KKR and N2 differs from the canonical TALEN architecture, which con ELD/KKRCCR5A TALEN selections. When not specified, 15 tains a canonical C-terminal domain, wild-type N-terminal TALEN concentration was 16 nM. Nttcattacacctgcagctin cor domain, and EL/KK FokI variant. Selections correspond to responds to SEQID NO: 51 and nagtatcaattctggaagan corre conditions listed in Table 2. (A) Specificity profiles of canoni sponds to SEQID NO: 52. cal, Q3, Q7, 32 nM canonical, and 8 nM canonical ATM FIGS. 9A-C. Specificity profiles from all CCR5ATALEN TALEN selections. (B) Specificity profiles of 3 nM canonical, selections as bar graphs. Specificity scores for every targeted 24 nM Q7, 6 nM Q7, N1, N2, and N3 ATM TALEN selec base pair in selections of CCR5ATALENs are shown. Posi tions. (C) Specificity profiles of canonical ELD/KKR, Q3 tive specificity scores, up to complete specificity at a speci ELD/KKR, Q7 ELD/KKR, and N2 ELD/KKRATMTALEN ficity score of 1.0, signify enrichment of that base pair over selections. When not specified, TALEN concentration was 12 the other possibilities at that position. Negative specificity nM. ntgaattgggatgctgtttncorresponds to SEQID NO: 53; and scores, down to complete antispecificity of -1.0, represents 25 ntttattt tactgtctttan corresponds to SEQID NO: 54. enrichment against that base pair. Specified positions were FIG. 12. Specificity profiles from all CCR5B TALEN plotted as stacked bars above the X-axis (multiple specified selections as heat maps. Specificity scores for every targeted base pairs at the same position were plotted over each other base pair in selections of CCR5B TALENs are shown. Speci with the shortest bar in front, and not end-to-end) while ficity scores for CCR5B TALENs targeting all possible com anti-specified base pairs were plotted as narrow, grouped 30 binations of the left (L10, L13, L16) and right (R10, R13, bars. The titles to the right indicate if the TALEN used in the R16) half-sites at all positions in the target half-sites plus a selection differs from the canonical TALEN architecture, single flanking position. The colors range from a maximum which contains a canonical C-terminal domain, wild-type specificity score of 1.0) to white (score of 0, no specificity) to N-terminal domain, and EL/KK FokI variant. Selections cor a maximum negative score of -1.0. Boxed bases represent the respond to conditions listed in Table 2. (A) Specificity profiles 35 intended target base. The titles to the right notes the targeted of canonical, Q3, Q7, 28-aa, 32 nM canonical, and 8 nM left (L) and right (R) target half-sites for the CCR5B TALEN canonical CCR5ATALEN selections. (B) Specificity profiles used in the selection. Selections correspond to conditions of4 nM canonical, 32nMQ7.8nMQ7, 4 nM Q7, N1, and N2 listed in Table 2. Sequences in the left column correspond, CCR5A TALEN selections. (C) Specificity profiles of N3, from top to bottom, to SEQID NOs: 160, 160, 160, 161,161, canonical ELD/KKR, Q3 ELD/KKR, Q7 ELD/KKR, and N2 40 161, 162, 162, and 162. Sequences in the right column cor ELD/KKRCCR5A TALEN selections. When not specified, respond, from top to bottom, to SEQID NOs: 163, 164, 165, TALEN concentration was 16 nM. nttcattacacctgcagctin cor 163, 164, 165, 163, 164, and 165. responds to SEQ ID NO. 51, nagtatcaattctggaagan corre FIGS. 13 A-B. Specificity profiles from all CCR5BTALEN sponds to SEQID NO: 52, ntgaattgggatgctgtttin corresponds selections as bar graphs (A-B). Specificity Scores for every to SEQ ID NO: 53; and ntttattt tactgtctttan corresponds to 45 targeted base pair in selections of CCR5B TALENs are SEQID NO:54. shown. Positive specificity scores, up to complete specificity FIGS. 10A-B. Specificity profiles from all ATM TALEN at a specificity score of 1.0, signify enrichment of that base selections as heat maps. Specificity scores for every targeted pair over the other possibilities at that position. Negative base pair in selections of ATMTALENs are shown. Specific specificity Scores, down to complete antispecificity of -1.0, ity scores for the L18+R18 ATMTALEN at all positions in the 50 represents enrichment against that base pair. Specified posi target half-sites plus a single flanking position. The colors tions were plotted as stacked bars above the X-axis (multiple range from a maximum specificity score of 1.0 to white (score specified base pairs at the same position were plotted over of 0, no specificity) to a maximum negative score of -1.0. each other with the shortest bar in front, and not end-to-end) Boxed bases represent the intended target base. The titles to while anti-specified base pairs were plotted as narrow, the right indicate if the TALEN used in the selection differs 55 grouped bars. The titles to the right notes the targeted left (L) from the canonical TALEN architecture, which contains a and right (R) target half-sites for the CCR5B TALEN used in canonical C-terminal domain, wild type N-terminal domain, the selection. Selections correspond to conditions listed in and EL/KK FokI variant. Selections correspond to conditions Table 2. Sequences correspond to SEQ ID NO: 160 (left listed in Table 2. (A) Specificity profiles of (12 nM) canoni column) and SEQID NO: 163 (right column). cal, Q3, (12 nM) Q7, 24 nM canonical, 6 nM canonical, 3 nM 60 FIGS. 14A-B. Observed versus predicted double-mutant canonical, 24 nM Q7, and 6 nM Q7 ATMTALEN selections. sequence enrichment values. (A) For the L13+R13 CCR5A (B) Specificity profiles of N1, N2, N3, canonical ELD/KKR, TALEN selection, the observed double-mutant enrichment Q3 ELD/KKR, Q7 ELD/KKR, and N2 ELD/KKR ATM values of individual sequences (post-selection sequence TALEN selections. When not specified, TALEN concentra abundance--pre-selection sequence abundance) were normal tion was 12 nM. ntgaattgggatgctgtttin corresponds to SEQID 65 ized to the on-target enrichment value (=1.0 by definition) and NO: 53; and ntttattt tactgtctttan corresponds to SEQID NO: plotted against the corresponding predicted double-mutant 54. enrichment values calculated by multiplying the enrichment US 9,359,599 B2 11 12 value of the component single-mutants normalized to the the majority of known domains of that type. In some embodi on-target enrichment. The predicted double mutant enrich ments, a canonical sequence is a consensus sequence. ment values therefore assume independent contributions The terms “consensus sequence” and "consensus site, as from each single mutation to the double-mutants enrichment used herein in the context of nucleic acid sequences, refers to value. (B) The observed double-mutant sequence enrichment a calculated sequence representing the most frequent nucle divided by the predicted double-mutant sequence enrichment otide residue found at each position in a plurality of similar plotted as a function of the distance (in base pairs) between sequences. Typically, a consensus sequence is determined by the two mutations. Only sequences with two mutations in the sequence alignment in which similar sequences are compared same half-site were considered. to each other and similar sequence motifs are calculated. In FIG. 15. Effects of engineered TALEN domains and 10 the context of nuclease target site sequences, a consensus TALEN concentration on specificity. (A) The specificity sequence of a nuclease target site may, in Some embodiments, score of the targeted base pair at each position of the CCR5A be the sequence most frequently bound, bound with the high site was calculated for CCR5A TALENs containing the est affinity, and/or cleaved with the highest efficiency by a canonical, Q3, Q7, or 28-aa C-terminal domains. The speci given nuclease. ficity scores of the Q3, Q7, or 28-aa C-terminal domain TAL 15 The terms "conjugating.” “conjugated, and "conjugation' ENs subtracted by the specificity scores of the TALEN with refer to an association of two entities, for example, of two the canonical C-terminal domain are shown. (B) Same as (A) molecules Such as two proteins, two domains (e.g., a binding but for CCR5A TALENs containing engineered N-terminal domain and a cleavage domain), or a protein and an agent domains N1, N2, or N3. (C) Same as (A) but comparing (e.g., a protein binding domain and a small molecule). The specificity scores differences of the canonical CCR5A association can be, for example, via a direct or indirect (e.g., TALEN assayed at 16 nM, 8 nM, or 4 nM subtracted by the via a linker) covalent linkage or via non-covalent interactions. specificity scores of canonical CCR5A TALENs assayed at In some embodiments, the association is covalent. In some 32 nM. (D-F) Same as (A-C) but for ATM TALENs. Selec embodiments, two molecules are conjugated via a linker con tions correspond to conditions listed in Table 2. ttcattacacct necting both molecules. For example, in some embodiments gcagct corresponds to SEQ ID NO: 44, agtatcaattctggaaga 25 where two proteins are conjugated to each other, e.g., a bind corresponds to SEQ ID NO: 46, tgaattgggatgctgttt corre ing domain and a cleavage domain of an engineered nuclease, sponds to SEQID NO: 128 and tttattt tactgtctitta corresponds to form a protein fusion, the two proteins may be conjugated to SEQ ID NO:137. via a polypeptide linker, e.g., an amino acid sequence con FIGS. 16A-B. Spacer-length preferences of TALENs. (A) necting the C-terminus of one protein to the N-terminus of the For each selection with CCR5ATALENs containing various 30 other protein. combinations of the canonical, Q3, Q7, or 28-aa C-terminal The term “effective amount, as used herein, refers to an domains; N1, N2, or N3 N-terminal mutations; and the amount of a biologically active agent that is sufficient to elicit EL/KK or ELD/KKR FokI variants and at 4, 8, 16, or 32 nM, a desired biological response. For example, in some embodi the DNA spacer-length enrichment values were calculated by ments, an effective amount of a TALE nuclease may refer to dividing the abundance of DNA spacer lengths in post-selec 35 the amount of the nuclease that is Sufficient to induce cleavage tion sequences by the abundance of DNA spacer lengths in the ofa target site specifically bound and cleaved by the nuclease, preselection library sequences. (B) Same as (A) but for ATM e.g., in a cell-free assay, or in a target cell, tissue, or organism. TALENS. As will be appreciated by the skilled artisan, the effective FIGS. 17A-B. DNA cleavage-site preferences of TALENs. amount of an agent, e.g., a nuclease, a hybrid protein, or a (A) For each selection with CCR5A TALENs with various 40 polynucleotide, may vary depending on various factors as, for combinations of canonical, Q3, Q7, or 28-aa C-terminal example, on the desired biological response, the specific domains; N1, N2, or N3 N-terminal mutations; and the allele, genome, target site, cell, or tissue being targeted, and EL/KK or ELD/KKR FokI variants and at 4, 8, 16, or 32 nM, the agent being used. histograms of the number of spacer DNA base pairs preced The term “engineered, as used herein refers to a molecule, ing the right half-site for each possible DNA spacer length, 45 complex, Substance, or entity that has been designed, pro normalized to the total sequence counts of the entire selec duced, prepared, synthesized, and/or manufactured by a tion, are shown. (B) Same as (A) for ATMTALENs. human. Accordingly, an engineered product is a product that FIG. 18. DNA cleavage-site preferences of TALENs com does not occur in nature. In some embodiments, an engi prising N-terminal domains with different amino acid substi neered molecule or complex, e.g., an engineered TALEN tutions. Sequences in FIG. 18, from top to bottom, correspond 50 monomer, dimer, or multimer, is a TALEN that has been to SEQID NOs: 31-41. designed to meet particular requirements or to have particular FIG. 19. Exemplary TALEN plasmid construct. desired features e.g., to specifically bind a target sequence of interest with minimal off-target binding, to have a specific DEFINITIONS minimal or maximal cleavage activity, and/or to have a spe 55 cific stability. As used herein and in the claims, the singular forms 'a' As used herein, the term "isolated’ refers to a molecule, “an and “the include the singular and the plural reference complex, Substance, or entity that has been (1) separated from unless the context clearly indicates otherwise. Thus, for at least Some of the components with which it was associated example, a reference to “an agent' includes a single agent and when initially produced (whether in nature or in an experi a plurality of agents. 60 mental setting), and/or (2) produced, prepared, synthesized, The term “canonical sequence, as used herein, refers to a and/or manufactured by a human. Isolated Substances and/or sequence of DNA, RNA, or amino acids that reflects the most entities may be separated from at least about 10%, about 20%, common choice of base or amino acid at each position about 30%, about 40%, about 50%, about 60%, about 70%, amongst known molecules of that type. For example, the about 80%, about 90%, or more of the other components with canonical amino acid sequence of a protein domain may 65 which they were initially associated. In some embodiments, reflect the most common choice of amino acid resides at each isolated agents are more than about 80%, about 85%, about position amongst all known domains of that type, or amongst 90%, about 91%, about 92%, about 93%, about 94%, about US 9,359,599 B2 13 14 95%, about 96%, about 97%, about 98%, about 99%, or more DNA molecule ends ending with unpaired nucleotide(s) are than about 99% pure. As used herein, a substance is “pure' if also referred to as sticky ends, as they can “stick to other it is substantially free of other components. double-stranded DNA molecule ends comprising comple The term “library,” as used herein in the context of nucleic mentary unpaired nucleotide(s). A nuclease protein typically acids or proteins, refers to a population of two or more dif comprises a “binding domain that mediates the interaction ferent nucleic acids or proteins, respectively. For example, a of the protein with the nucleic acid substrate, and a “cleavage library of nuclease target sites comprises at least two nucleic domain that catalyzes the cleavage of the phosphodiester acid molecules comprising different nuclease target sites. In bond within the nucleic acid backbone. In some embodi Some embodiments, a library comprises at least 10", at least ments, a nuclease protein can bind and cleave a nucleic acid 10, at least 10, at least 10, at least 10, at least 10, at least 10 molecule in a monomeric form, while, in other embodiments, 107, at least 10, at least 10, at least 10', at least 10', at least a nuclease protein has to dimerize or multimerize in order to 10', at least 10", at least 10", or at least 10" different cleave a target nucleic acid molecule. Binding domains and nucleic acids or proteins. In some embodiments, the members cleavage domains of naturally occurring nucleases, as well as of the library may comprise randomized sequences, for modular binding domains and cleavage domains that can be example, fully or partially randomized sequences. In some 15 combined to create nucleases that bind specific target sites, embodiments, the library comprises nucleic acid molecules are well known to those of skill in the art. For example, that are unrelated to each other, e.g., nucleic acids comprising transcriptional activator like elements can be used as binding fully randomized sequences. In other embodiments, at least domains to specifically binda desired target site, and fused or some members of the library may be related, for example, conjugated to a cleavage domain, for example, the cleavage they may be variants or derivatives of a particular sequence, domain of FokI, to create an engineered nuclease cleaving the Such as a consensus target site sequence. desired target site. The term “linker, as used herein, refers to a chemical The terms “nucleic acid' and “nucleic acid molecule, as group or a molecule linking two molecules or moieties, e.g., used herein, refer to a compound comprising a nucleobase a binding domain and a cleavage domain of a nuclease. Typi and an acidic moiety, e.g., a nucleoside, a nucleotide, or a cally, the linker is positioned between, or flanked by, two 25 polymer of nucleotides. Typically, polymeric nucleic acids, groups, molecules, or other moieties and connected to each e.g., nucleic acid molecules comprising three or more nucle one via a covalent bond, thus connecting the two. In some otides are linear molecules, in which adjacent nucleotides are embodiments, the linker is an amino acid or a plurality of linked to each other via a phosphodiester linkage. In some amino acids (e.g., a peptide or protein). In some embodi embodiments, “nucleic acid refers to individual nucleic acid ments, the linker is an organic molecule, group, polymer, or 30 residues (e.g. nucleotides and/or nucleosides). In some chemical moiety. embodiments, “nucleic acid refers to an oligonucleotide The term “nuclease” as used herein, refers to an agent, for chain comprising three or more individual nucleotide resi example a protein or a small molecule, capable of cleaving a dues. As used herein, the terms "oligonucleotide' and “poly phosphodiester bond connecting nucleotide residues in a nucleotide' can be used interchangeably to refer to a polymer nucleic acid molecule. In some embodiments, a nuclease is a 35 of nucleotides (e.g., a string of at least three nucleotides). In protein, e.g., an enzyme that can bind a nucleic acid molecule Some embodiments, “nucleic acid encompasses RNA as and cleave a phosphodiester bond connecting nucleotide resi well as single and/or double-stranded DNA. Nucleic acids dues within the nucleic acid molecule. A nuclease may be an may be naturally occurring, for example, in the context of a endonuclease, cleaving a phosphodiester bonds within a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, polynucleotide chain, or an exonuclease, cleaving a phos 40 SnRNA, a plasmid, cosmid, , chromatid, or other phodiester bond at the end of the polynucleotide chain. In naturally occurring nucleic acid molecule. On the other hand, Some embodiments, a nuclease is a site-specific nuclease, a nucleic acid molecule may be a non-naturally occurring binding and/or cleaving a specific phosphodiester bond molecule, e.g., a recombinant DNA or RNA, an artificial within a specific nucleotide sequence, which is also referred chromosome, an engineered genome, or fragment thereof, or to herein as the “recognition sequence, the “nuclease target 45 a synthetic DNA, RNA, DNA/RNA hybrid, or including non site.” or the “target site. In some embodiments, a nuclease naturally occurring nucleotides or nucleosides. Furthermore, recognizes a single Stranded target site, while in other the terms “nucleic acid, “DNA “RNA and/or similar embodiments, a nuclease recognizes a double-stranded target terms include nucleic acid analogs, i.e. analogs having other site, for example a double-stranded DNA target site. The than a phosphodiester backbone. Nucleic acids can be puri target sites of many naturally occurring nucleases, for 50 fied from natural sources, produced using recombinant example, many naturally occurring DNA restriction expression systems and optionally purified, chemically Syn nucleases, are well known to those of skill in the art. In many thesized, etc. Where appropriate, e.g., in the case of chemi cases, a DNA nuclease, such as EcoRI, HindIII, or BamPHI, cally synthesized molecules, nucleic acids can comprise recognize a palindromic, double-stranded DNA target site of nucleoside analogs such as analogs having chemically modi 4 to 10 base pairs in length, and cut each of the two DNA 55 fied bases or Sugars, and backbone modifications. A nucleic Strands at a specific position within the target site. Some acid sequence is presented in the 5' to 3’ direction unless endonucleases cut a double-stranded nucleic acid target site otherwise indicated. In some embodiments, a nucleic acid is symmetrically, i.e., cutting both strands at the same position or comprises natural nucleosides (e.g. adenosine, thymidine, so that the ends comprise base-paired nucleotides, also guanosine, cytidine, uridine, deoxyadenosine, deoxythymi referred to herein as blunt ends. Other endonucleases cut a 60 dine, deoxyguanosine, and deoxycytidine); nucleoside ana double-stranded nucleic acid target sites asymmetrically, i.e., logs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyr cutting each Strand at a different position so that the ends rolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, comprise unpaired nucleotides. Unpaired nucleotides at the 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, end of a double-stranded DNA molecule are also referred to C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cyti as “overhangs, e.g., as “5'-overhang' or as “3'-overhang.” 65 dine, C5-methylcytidine, 2-aminoadeno sine, 7-deaZaad depending on whether the unpaired nucleotide(s) form(s) the enosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxogua 5' or the 5' end of the respective DNA strand. Double-stranded nosine, O(6)-methylguanine, and 2-thiocytidine); chemically US 9,359,599 B2 15 16 modified bases; biologically modified bases (e.g., methylated thesized by adding non-equal amounts of the nucleotides to bases); intercalated bases; modified Sugars (e.g. 2'-fluorori be incorporated (e.g., 79% T.7% A, 7% G, and 7% C) during bose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or the synthesis step of the respective sequence residue. Partial modified phosphate groups (e.g., phosphorothioates and randomization allows for the generation of sequences that are 5'-N-phosphoramidite linkages). templated on a given sequence, but have incorporated muta The term “pharmaceutical composition, as used herein, tions at a desired frequency. For example, ifa known nuclease refers to a composition that can be administrated to a subject target site is used as a synthesis template, partial randomiza in the context of treatment of a disease or disorder. In some tion in which at each step the nucleotide represented at the embodiments, a pharmaceutical composition comprises an respective residue is added to the synthesis at 79%, and the active ingredient, e.g. a nuclease or a nucleic acid encoding a 10 other three nucleotides are added at 7% each, will result in a nuclease, and a pharmaceutically acceptable excipient. mixture of partially randomized target sites being synthe The terms “prevention” or “prevent” refer to the prophy sized, which still represent the consensus sequence of the lactic treatment of a subject who is at risk of developing a original target site, but which differ from the original target disease, disorder, or condition (e.g., at an elevated risk as site at each residue with a statistical frequency of 21% for compared to a control Subject, or a control group of subject, or 15 each residue so synthesized (distributed binomially). In some at an elevated risk as compared to the average risk of an embodiments, a partially randomized sequence differs from age-matched and/or gender-matched Subject), resulting in a the consensus sequence by more than 5%, more than 10%, decrease in the probability that the subject will develop the more than 15%, more than 20%, more than 25%, or more than disease, disorder, or condition (as compared to the probability 30% on average, distributed binomially. In some embodi without prevention), and/or to the inhibition of further ments, a partially randomized sequence differs from the con advancement of an already established disorder. sensus site by no more than 10%, no more than 15%, no more The term “proliferative disease, as used herein, refers to than 20%, no more than 25%, nor more than 30%, no more any disease in which cell or tissue homeostasis is disturbed in than 40%, or no more than 50% on average, distributed bino that a cell or cell population exhibits an abnormally elevated mially. proliferation rate. Proliferative diseases include hyperprolif 25 The term “subject as used herein, refers to an individual erative diseases, such as pre-neoplastic hyperplastic condi organism, for example, an individual mammal. In some tions and neoplastic diseases. Neoplastic diseases are charac embodiments, the Subject is a human of either sex at any stage terized by an abnormal proliferation of cells and include both of development. In some embodiments, the Subject is a non benign and malignant neoplasias. Malignant neoplasms are human mammal. In some embodiments, the Subject is a non also referred to as cancers. 30 human primate. In some embodiments, the Subject is a rodent. The terms “protein,” “peptide.” and “polypeptide' are used In some embodiments, the Subject is a sheep, a goat, a cattle, interchangeably herein and refer to a polymer of amino acid a cat, or a dog. In some embodiments, the subject is a verte residues linked together by peptide (amide) bonds. The terms brate, an amphibian, a reptile, a fish, an insect, a fly, or a refer to a protein, peptide, or polypeptide of any size, struc nematode. ture, or function. Typically, a protein, peptide, or polypeptide 35 The terms “target nucleic acid, and “target genome as will be at least three amino acids long. A protein, peptide, or used herein in the context of nucleases, refer to a nucleic acid polypeptide may refer to an individual protein or a collection molecule or a genome, respectively, that comprises at least of proteins. One or more of the amino acids in a protein, one target site of a given nuclease. peptide, or polypeptide may be modified, for example, by the The term “target site.” used herein interchangeably with addition of a chemical entity Such as a carbohydrate group, a 40 the term “nuclease target site refers to a sequence within a hydroxyl group, a phosphate group, a farnesyl group, an nucleic acid molecule that is bound and cleaved by a nuclease. isofarnesyl group, a fatty acid group, a linker for conjugation, A target site may be single-stranded or double-stranded. In functionalization, or other modification, etc. A protein, pep the context of nucleases that dimerize, for example, nucleases tide, or polypeptide may also be a single molecule or may be comprising a FokI DNA cleavage domain, a target site typi a multi-molecular complex. A protein, peptide, or polypep 45 cally comprises a left-halfsite (bound by one monomer of the tide may be just a fragment of a naturally occurring protein or nuclease), a right-half site (bound by the second monomer of peptide. A protein, peptide, or polypeptide may be naturally the nuclease), and a spacer sequence between the half sites in occurring, recombinant, or synthetic, or any combination which the cut is made. This structure (left-half site-spacer thereof. A protein may comprise different domains, for sequence-right-half site) is referred to herein as an LSR example, a nucleic acid binding domain and a nucleic acid 50 structure. In some embodiments, the left-half site and/or the cleavage domain. In some embodiments, a protein comprises right-half site is between 10-18 nucleotides long. In some a proteinaceous part, e.g., an amino acid sequence constitut embodiments, either or both half-sites are shorter or longer. In ing a nucleic acid binding domain, and an organic compound, Some embodiments, the left and right half sites comprise e.g., a compound that can act as a nucleic acid cleavage agent. different nucleic acid sequences. The term "randomized as used herein in the context of 55 The term “Transcriptional Activator-Like Effector.” nucleic acid sequences, refers to a sequence or residue within (TALE) as used herein, refers to proteins comprising a DNA a sequence that has been synthesized to incorporate a mixture binding domain, which contains a highly conserved 33-34 of free nucleotides, for example, a mixture of all four nucle amino acid sequence comprising a highly variable two-amino otides A, T, G, and C. Randomized residues are typically acid motif (Repeat Variable Diresidue, RVD). The RVD motif represented by the letter N within a nucleotide sequence. In 60 determines binding specificity to a nucleic acid sequence, and Some embodiments, a randomized sequence or residue is can be engineered according to methods well known to those fully randomized, in which case the randomized residues are of skill in the art to specifically bind a desired DNA sequence synthesized by adding equal amounts of the nucleotides to be (see, e.g., Miller, Jeffrey; et. al. (February 2011). ''A TALE incorporated (e.g., 25% T. 25% A, 25% G, and 25% C) during nuclease architecture for efficient genome editing. Nature the synthesis step of the respective sequence residue. In some 65 Biotechnology 29 (2): 143-8; Zhang, Feng; et. al. (February embodiments, a randomized sequence or residue is partially 2011). “Efficient construction of sequence-specific TAL randomized, in which case the randomized residues are syn effectors for modulating mammalian transcription’. Nature US 9,359,599 B2 17 18 Biotechnology 29 (2): 149-53; Gei?ler, R.; Scholze, H.; DETAILED DESCRIPTION OF CERTAIN Hahn, S.; Streubel, J.; Bonas, U.; Behrens, S. E.; Boch, J. EMBODIMENTS OF THE INVENTION (2011), Shiu, Shin-Han. ed. “Transcriptional Activators of Human Genes with Programmable DNA-Specificity”. PLOS Transcription activator-like effector nucleases (TALENs) ONE 6 (5): e19509; Boch, Jens (February 2011). “TALEs of 5 are fusions of the FokI restriction endonuclease cleavage genome targeting. Nature Biotechnology 29 (2): 135-6: domain with a DNA-binding transcription activator-like Boch, Jens; et. al. (December 2009). “Breaking the Code of effector (TALE) repeat array. TALENs can be engineered to DNA Binding Specificity of TAL-Type III Effectors”. Sci reduce off-target cleavage activity and thus to specifically ence 326 (5959): 1509-12; and Moscou, Matthew J.; Adam J. bind a target DNA sequence and can thus be used to cleave a Bogdanove (December 2009). A Simple Cipher Governs 10 target DNA sequence, e.g., in a genome, in vitro or in vivo. DNA Recognition by TAL Effectors”. Science 326 (5959): Such engineered TALENs can be used to manipulate 1501; the entire contents of each of which are incorporated genomes in vivo or in vitro, e.g., for gene knockouts or knock herein by reference). The simple relationship between amino ins via induction of DNA breaks at a target genomic site for acid sequence and DNA recognition has allowed for the engi targeted gene knockout through non-homologous endjoining neering of specific DNA binding domains by selecting a 15 (NHEJ) or targeted genomic sequence replacement through combination of repeat segments containing the appropriate homology-directed repair (HDR) using an exogenous DNA RVDS. template. The term “Transcriptional Activator-Like Element TALENs can be designed to cleave any desired target DNA Nuclease.” (TALEN) as used herein, refers to an artificial sequence, including naturally occurring and synthetic nuclease comprising a transcriptional activator like effector sequences. However, the ability of TALENs to distinguish DNA binding domain to a DNA cleavage domain, for target sequences from closely related off-target sequences has example, a FokI domain. A number of modular assembly not been studied in depth. Understanding this ability and the schemes for generating engineered TALE constructs have parameters affecting it is of importance for the design of been reported (Zhang, Feng; et. al. (February 2011). “Effi 25 TALENs having the desired level of specificity for their thera cient construction of sequence-specific TAL effectors for peutic use and also for choosing unique target sequences to be modulating mammalian transcription’. Nature Biotechnol cleaved in order to minimize the chance of off-target cleav ogy 29 (2): 149-53; Gei?ler, R.: Scholze, H.; Hahn, S.: age. Streubel, J.; Bonas, U.: Behrens, S. E.; Boch, J. (2011), Shiu, Some aspects of this disclosure are based on cleavage 30 specificity data obtained from profiling 41 TALENs on 10"? Shin-Han. ed. “Transcriptional Activators of Human Genes potential off-target sites through in vitro selection and high with Programmable DNA-Specificity”. PLoS ONE 6 (5): throughput sequencing. Computational analysis of the selec e19509; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; tion results predicted off-target Substrates in the human Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V. et al. genome, thirteen of which were modified by TALENs in (2011). “Efficient design and assembly of custom TALEN 35 human cells. Some aspect of this disclosure are based on the and other TAL effector-based constructs for DNA targeting. surprising findings that (i) TALEN repeats bind DNA rela Nucleic Acids Research: Morbitzer, R.; Elsaesser, J.; Haus tively independently; (ii) longer TALENs are more tolerant of ner, J.; Lahaye, T. (2011). “Assembly of custom TALE-type mismatches, yet are more specific in a genomic context; and DNA binding domains by modular cloning. Nucleic Acids (iii) excessive DNA-binding energy can lead to reduced Research; Li, T.; Huang, S.; Zhao, X. Wright, D.A.: Carpen 40 TALENspecificity. Based on these findings, optimized TAL ter, S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). ENs were engineered with mutations designed to reduce non “Modularly assembled designer TAL effector nucleases for specific DNA binding. Some of these engineered TALENs targeted gene knockout and gene replacement in eukaryotes'. exhibit improved specificity, e.g., 34- to >1 16-fold greater Nucleic Acids Research. Weber, E.; Gruetzner, R.; Werner, specificity, in human cells compared to commonly used TAL S.: Engler, C.; Marillonnet, S. (2011). Bendahmane, Moham 45 ENS. med. ed. “Assembly of Designer TAL Effectors by Golden The ability to engineer site-specific changes in genomes Gate Cloning”. PLoS ONE 6 (5): e19722; the entire contents represents a powerful research capability with significant of each of which are incorporated herein by reference). therapeutic implications. TALENs are fusions of the FokI The terms “treatment,” “treat,” and “treating refer to a restriction endonuclease cleavage domain with a DNA-bind clinical intervention aimed to reverse, alleviate, delay the 50 ing TALE repeat array (FIG. 1A). These arrays consist of onset of, or inhibit the progress of a disease or disorder, or one multiple 34-amino acid TALE repeat sequences, each of or more symptoms thereof, as described herein. As used which uses a repeat-variable di-residue (RVD), the amino herein, the terms “treatment,” “treat,” and “treating refer to a acids at positions 12 and 13, to recognize a single DNA clinical intervention aimed to reverse, alleviate, delay the nucleotide." Examples of RVDs that enable recognition of onset of, or inhibit the progress of a disease or disorder, or one 55 each of the four DNA base pairs are known, enabling arrays of or more symptoms thereof, as described herein. In some TALE repeats to be constructed that can bind virtually any embodiments, treatment may be administered after one or DNA sequence. TALENs can be engineered to be active only more symptoms have developed and/or after a disease has as heterodimers through the use of obligate heterodimeric been diagnosed. In other embodiments, treatment may be FokI variants. In this configuration, two distinct TALEN administered in the absence of symptoms, e.g., to prevent or 60 monomers are each designed to bind one target half-site and delay onset of a symptom or inhibit onset or progression of a to cleave within the DNA spacer sequence between the two disease. For example, treatment may be administered to a half-sites. Susceptible individual prior to the onset of symptoms (e.g., in In cells, e.g., in mammalian cells, TALEN-induced double light of a history of symptoms and/or in light of genetic or Strand breaks can result in targeted gene knockout through other Susceptibility factors). Treatment may also be contin 65 non-homologous end joining (NHEJ) or targeted genomic ued after symptoms have resolved, for example to prevent or sequence replacement through homology-directed repair delay their recurrence. (HDR) using an exogenous DNA template.'" TALENs have US 9,359,599 B2 19 20 been Successfully used to manipulate genomes in a variety of was experimentally validated that shorter TALENs have organisms''' and cell lines.''' greater specificity per targeted base pair than longer TALENs, TALEN-mediated DNA cleavage at off-target sites can but that longer TALENs are more specific against the set of result in unintended mutations at genomic loci. While potential cleavage sites in the context of a whole genome than SELEX experiments have characterized the DNA-binding shorter TALENs for the tested TALEN lengths targeting 20 specificities of monomeric TALE proteins, the DNA cleav to 32-bp sites, as described in more detail elsewhere herein. age specificities of active, dimeric nucleases can differ from Some aspects of this disclosure are based on the Surprising the specificities of their component monomeric DNA-bind discovery that excess binding energy in longer TALENs ing domains." Full-genome sequencing of four TALEN reduces specificity by enabling the cleavage of off-target treated yeast strains' and two human cell lines' derived 10 sequences without a corresponding increase in the efficiency from a TALEN-treated cell revealed no evidence of TALE of on-target cleavage efficiency. Some aspects of this disclo induced genomic off-target mutations, consistent with other sure are based on the surprising discovery that TALENs can reports that observed no off-target genomic modification in be engineered to more specifically cleave their target Xenopus'7 and human cell lines." In contrast, TALENs were sequences by reducing off-target binding energy without observed to cleave off-target sites containing two to eleven 15 compromising on-target cleavage efficiency. The recognition mutations relative to the on-target sequence in vivo in that TALEN specificity can be improved by reducing non zebrafish,'" rats, human primary fibroblasts,” and embry specific DNA binding energy beyond what is required to onic stem cells. A systematic and comprehensive profile of enable efficient on-target cleavage served as the basis for the TALENspecificity generated from measurements of TALEN generation of engineered TALENs with improved target site cleavage on a large set of related mutant target sites has not specificity. been described before. Such a broad specificity profile is Typically, a TALEN monomer, e.g., a TALEN monomeras fundamental to understand and improve the potential of TAL provided herein, comprises or is of the following structure: ENS as research tools and therapeutic agents. N-terminal domain-TALE repeat array-C-terminal Some of the work described herein relates to experiments domain-nuclease domain performed to profile the ability of 41 TALEN pairs to cleave 25 wherein each '-' individually indicates conjugation, either 10' off-target variants of each of their respective target covalently or non-covalently, and wherein the conjugation sequences using a modified version of a previously described can be direct, e.g., via direct bond, or indirect, e.g., via a linker in vitro selection' for DNA cleavage specificity. These domain. See also FIG. 1. results from these experiments provide comprehensive pro Some aspects of this disclosure provide TALENs with files of TALEN cleavage specificities. The in vitro selection 30 enhanced specificity as compared to TALENs that were pre results were used to computationally predict off-target Sub viously used. In general, the sequence specificity of a TALEN strates in the human genome, 13 of which were confirmed to is conferred by the TALE repeat array, which binds to a be cleaved by TALENs in human cells. specific nucleotide sequence. TALE repeat arrays consist of It was Surprisingly found that, despite being less specific multiple 34-amino acid TALE repeat sequences, each of per base pair, TALENs designed to cleave longer target sites 35 which uses a repeat-variable di-residue (RVD), the amino in general exhibit higher overall specificity than those that acids at positions 12 and 13, to recognize a single DNA target shorter sites when considering the number of potential nucleotide. Some aspects of this disclosure provide that the off-target sites in the human genome. The selection results specific binding of the TALE repeat array is sufficient for also suggest a model in which excess non-specific TALEN dimerization and nucleic acid cleavage, and that non-specific binding energy gives rise to greater off-target cleavage rela 40 nucleic acid binding activity is due to the N-terminal and/or tive to on-target cleavage. Based on this model, we engi C-terminal domains of the TALEN. neered TALENs with substantially improved DNA cleavage Based on this recognition, improved TALENs have been specificity in vitro, and 30- to >150-fold greater specificity in engineered as provided herein. As it was discovered that human cells, than currently used TALEN constructs. non-specific binding via the N-terminal domain can occur Some aspects of this disclosure are based on data obtained 45 through excess binding energy conferred by amino acid resi from profiling the specificity of 41 heterodimeric TALENs dues that are positively charged (cationic) at physiological designed to target one of three distinct sequence, as described pH, some of the improved TALENs provided herein have a in more detail elsewhere herein. The profiling was performed decreased net charge and/or a decreased binding energy for using an improved version of an in vitro selection method binding their target nucleic acid sequence as compared to (also described in PCT Application Publication WO2013/ 50 canonical TALENs. This decrease in charge leads to a 066438 A2, the entire contents of which are incorporated decrease in off-target binding via the modified N-terminal herein by reference) with modifications that increase the and C-terminal domains. The portion of target recognition throughput and sensitivity of the selection (FIG. 1B). and binding, thus, is more narrowly confined to the specific Briefly, TALENs were profiled against libraries of >10'’ recognition and binding activity of the TALE repeat array. DNA sequences and cleavage products were captured and 55 The resulting TALENs, thus, exhibit an increase in the speci analyzed to determine the specificity and off-target activity of ficity of binding and, in turn, in the specificity of cleaving the each TALEN. The selection data accurately predicted the target site by the improved TALEN as compared to a TALEN efficiency of off-target TALEN cleavage in vitro, and also using non-modified domains. indicated that TALENs are overall highly specific across the In some embodiments, a TALEN is provided in which the entire target sequence, but that Some level of off-target cleav 60 net charge of the N-terminal domain is less than the net charge age occurs in conventional TALENs which can be undesir of the canonical N-terminal domain (SEQID NO: 1); and/or able in some scenarios of TALEN use. As a result of the the net charge of the C-terminal domain is less than the net experiments described herein, it was Surprisingly found that charge of the canonical C-terminal domain (SEQID NO: 22). that TALE repeats bind their respective DNA base pairs inde In some embodiments, a TALEN is provided in which the pendently beyond a slightly increased tolerance for adjacent 65 binding energy of the N-terminal domain to a target nucleic mismatches, which informed the recognition that TALEN acid molecule is less than the binding energy of the canonical specificity per base pair is independent of target-site length. It N-terminal domain (SEQ ID NO: 1); and/or the binding US 9,359,599 B2 21 22 energy of the C-terminal domain to a target nucleic acid 15 cationic amino acid(s) is/are replaced with an amino acid molecule is less than the binding energy of the canonical residue that exhibits no charge or a negative charge at physi C-terminal domain (SEQID NO: 22). In some embodiments, ological pH in the modified N-terminal domain and/or in the a modified TALEN N-terminal domain is provided the bind modified C-terminal domain. ing energy of which to the TALEN target nucleic acid mol In some embodiments, the cationic amino acid residue is ecule is less than the binding energy of the canonical N-ter arginine (R), lysine (K), or histidine (H). In some embodi minal domain (SEQ ID NO: 1). In some embodiments, a ments, the cationic amino acid residue is R or H. In some modified TALEN C-terminal domain is provided the binding embodiments, the amino acid residue that exhibits no charge energy of which to the TALEN target nucleic acid molecule is or a negative charge at physiological pH is glutamine (Q), less than the binding energy of the canonical C-terminal 10 Glycine (G), Asparagine (N). Threonine (T), Serine (S), domain (SEQID NO: 22). In some embodiments, the binding Aspartic acid (D), or Glutamic Acid (E). In some embodi energy of the N-terminal and/or of the C-terminal domain in ments, the amino acid residue that exhibits no charge or a the TALEN provided is decreased by at least 5%, at least 10%, negative charge at physiological pH is Q. In some embodi at least 15%, at least 20%, at least 25%, at least 30%, at least ments, at least one lysine or arginine residue is replaced with 35%, at least 40%, at least 45%, at least 50%, at least 55%, at 15 a glutamine residue in the modified N-terminal domain and/ least 60%, at least 65%, at least 70%, at least 75%, at least or in the modified C-terminal domain. 80%, at least 85%, at least 90%, at least 95%, at least 98%, or In some embodiments, the C-terminal domain comprises at least 99%. one or more of the following amino acid replacements: In some embodiments, the canonical N-terminal domain K777Q, K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. and/or the canonical C-terminal domain is modified to In some embodiments, the C-terminal domain comprises two replace an amino acid residue that is positively charged at or more of the following amino acid replacements: K777Q. physiological pH with an amino acid residue that is not K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some charged or is negatively charged. In some embodiments, the embodiments, the C-terminal domain comprises three or modification includes the replacement of a positively charged more of the following amino acid replacements: K777Q. residue with a negatively charged residue. In some embodi 25 K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some ments, the modification includes the replacement of a posi embodiments, the C-terminal domain comprises four or more tively charged residue with a neutral (uncharged) residue. In of the following amino acid replacements: K777Q, K778Q. Some embodiments, the modification includes the replace K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi ment of a positively charged residue with a residue having no ments, the C-terminal domain comprises five or more of the charge or a negative charge. In some embodiments, the net 30 following amino acid replacements: K777Q, K778Q. charge of the modified N-terminal domain and/or of the modi K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi fied C-terminal domain is less than or equal to +10, less than ments, the C-terminal domain comprises six or more of the or equal to +9, less than or equal to +8, less than or equal to +7. following amino acid replacements: K777Q, K778Q. less than or equal to +6, less than or equal to +5, less than or K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi equal to +4, less than or equal to +3, less than or equal to +2, 35 ments, the C-terminal domain comprises all seven of the less than or equal to +1, less than or equal to 0, less than or following amino acid replacements: K777Q, K778Q. equal to -1, less than or equal to -2, less than or equal to -3. K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi less than or equal to -4, or less than or equal to -5, or less than ments, the C-terminal domain comprises a Q3 variant or equal to -10. In some embodiments, the net charge of the sequence (K788Q, R792Q, R801Q, see SEQID NO. 23). In modified N-terminal domain and/or of the modified C-termi 40 Some embodiments, the C-terminal domain comprises a Q7 nal domain is between +5 and -5, between +2 and -7, variant sequence (K777Q, K778Q, K788Q, R789Q, R792Q, between 0 and -5, between 0 and -10, between -1 and -10, R793Q, R801Q, see SEQID NO: 24). or between -2 and -15. In some embodiments, the net charge In some embodiments, the N-terminal domain is a trun of the modified N-terminal domain and/or of the modified cated version of the canonical N-terminal domain. In some C-terminal domain is negative. In some embodiments, the net 45 embodiments, the C-terminal domain is a truncated version of charge of the modified N-terminal domain and of the modi the canonical C-terminal domain. In some embodiments, the fied C-terminal domain, together, is negative. In some truncated N-terminal domain and/or the truncated C-terminal embodiments, the net charge of the modified N-terminal domain comprises less than 90%, less than 80%, less than domain and/or of the modified C-terminal domain is neutral 70%, less than 60%, less than 50%, less than 40%, less than or slightly positive (e.g., less than +2 or less than +1). In some 50 30%, or less than 25% of the residues of the canonical embodiments, the net charge of the modified N-terminal domain. In some embodiments, the truncated C-terminal domain and of the modified C-terminal domain, together, is domain comprises less than 60, less than 50, less than 40, less neutral or slightly positive (e.g., less than +2 or less than +1). than 30, less than 29, less than 28, less than 27, less than 26, In some embodiments, the modified N-terminal domain less than 25, less than 24, less than 23, less than 22, less than and/or the modified C-terminal domain comprise(s) an amino 55 21, or less than 20 amino acid residues. In some embodi acid sequence that differs from the respective canonical ments, the truncated C-terminal domain comprises 60, 59,58, domain sequence in that at least one cationic amino acid 57, 56,55, 54, 53, 52, 51, 50,49,48, 47,46, 45, 44, 43,42, 41, residue of the canonical domain sequence is replaced with an 40,39,38, 37, 36,35, 34,33, 32, 31, 30,39,38, 37, 36,35,34, amino acid residue that exhibits no charge or a negative 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, charge at physiological pH. In some embodiments, at least 1, 60 16, 15, 14, 13, 12, 11, or 10 residues. In some embodiments, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7. the modified N-terminal domain and/or the modified C-ter at least 8, at least 9, at least 10, at least 11, at least 12, at least minal domain is/are truncated and comprise one or more 13, at least 14, or at least 15 cationic amino acid(s) is/are amino acid replacement(s). It will be apparent to those of skill replaced with an amino acid residue that exhibits no charge or in the art that it is desirable in some embodiments to adjust the a negative charge at physiological pH in the modified N-ter 65 DNA spacer length in TALENs using truncated domains, e.g., minal domain and/or in the modified C-terminal domain. In truncated C-terminal domains, in order to accommodate the some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, truncation. US 9,359,599 B2 23 24 In some embodiments, the nuclease domain, also some herein as canonical domains. Exemplary sequences of a times referred to as a nucleic acid cleavage domain is a non canonical N-terminal domain (SEQID NO: 1) and a canoni specific cleavage domain, e.g., a FokI nuclease domain. In cal C-terminal domain (SEQID NO: 22) are provided herein. Some embodiments, the nuclease domain is monomeric and Exemplary sequences of FokI nuclease domains are also pro must dimerize or multimerize in order to cleave a nucleic acid. Homo- or heterodimerization or multimerization of vided herein. In addition, exemplary sequences of TALE TALEN monomers typically occurs via binding of the mono repeats forming a CCR5-binding TALE repeat array are pro mers to binding sequences that are in Sufficiently close proX vided. It will be understood that the sequences provided imity to allow dimerization, e.g., to sequences that are proxi below are exemplary and provided for the purpose of illus mal to each other on the same nucleic acid molecule (e.g., the 10 trating some embodiments embraced by the present disclo same double-stranded nucleic acid molecule). Sure. They are not meant to be limiting and additional The most commonly used domains, e.g., the most widely sequences useful according to aspects of this disclosure will used N-terminal and C-terminal domains, are referred to be apparent to the skilled artisan based on this disclosure.

Canonical N-terminal domain: (SEQ ID NO: 1) VDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYODMIAA LPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRN

ALTGAPLN

Modified N-terminal domain: N1 (SEQ ID NO: WDLRTLGYSQQQQEKIKPKVRSTWAOHHEALVGHGFTHAHIVALSOHPAALGTVAWKYODMIAA LPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLQIAKRGGVTAVEAVHAWRN

ALTGAPLN

Modified N-terminal domain: N2 (SEQ ID NO: WDLRTLGYSQQQQEKIKPKVRSTWAOHHEALVGHGFTHAHIVALSOHPAALGTVAWKYODMIAA LPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLQIAgRGGVTAVEAVHAWRN

ALTGAPLN

Modified N-terminal domain: N3 (SEQ ID NO: WDLRTLGYSQQQQEKIKPKVRSTWAOHHEALVGHGFTHAHIVALSOHPAALGTVAWKYODMIAA LPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLQIAgQGGVTAVEAVHAWRN

ALTGAPLN

TALE repeat array: L18 CCR5A (SEO ID NO; 5) MTPDQVVAIASNGGGKQALETVQRLLPVLCQDH (SEQ ID NO : 6) GLTPEQVVA ASHDGGKQALETVQRLLPVLCQAH (SEO ID NO: 7) GLTPDOVVA. ASNIGGKQALETVQRLLPVLCQAH (SEQ ID NO: 8) GLTPAQVVA ASNGGGKQALETVQRLLPVLCQDH (SEO ID NO: 9) GLTPDOVVA. ASNGGGKQALETVQRLLPVLCQDH (SEQ ID NO: 10) GLTPEQVVA ASNIGGKQALETVQRLLPVLCQAH (SEQ ID NO: 11) GLTPDOVVA. ASHDGGKQALETVQRLLPVLCQAH (SEQ ID NO: 12) GLTPAQVVA ASNIGGKQALETVQRLLPVLCQDH (SEQ ID NO: 13) GLTPDOVVA. ASHDGGKQALETVQRLLPVLCQDH (SEQ ID NO: 14) GLTPEQVVA ASHDGGKQALETVQRLLPVLCQAH

US 9,359,599 B2 27 28 - Continued FokI : KKR (SEQ ID NO : 30) GSOLVKSELEEKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEFFMKVYGYRGKHLGG SRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMORYVKENOTRNKHINPNEWWKVYP SSVTEFKFLFVSGHFKGNYKAQLTRLNRKTNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFN

NGEINF k

10 In some embodiments, a TALEN is provided herein that TABLE 1-continued comprises a canonical N-terminal domain, a TALE repeat array, a modified C-terminal domain, and a nuclease domain. Exemplary TALENs embraced by the present disclosure. In some embodiments, a TALEN is provided herein that com N-terminal TALE repeat C-terminal Nuclease prises a modified N-terminal domain, a TALE repeat array, a 15 TALEN domain array domain domain canonical C-terminal domain, and a nuclease domain. In 35 N1 Sequence-specific Truncated (28aa) KKR some embodiments, a TALEN is provided herein that com 36 N2 Sequence-specific Canonical Homo prises a modified N-terminal domain, a TALE repeat array, a dimeric modified C-terminal domain, and a nuclease domain. In some 37 N2 Sequence-specific Canonical EL embodiments, the nuclease domain is a FokI nuclease 38 N2 Sequence-specific Canonical KK 39 N2 Sequence-specific Canonical ELD domain. In some embodiments, the FokI nuclease domain is 40 N2 Sequence-specific Canonical KKR a homodimeric FokI domain, or a FokI-EL, FokI-KK, FokI 41 N2 Sequence-specific Q3 Homo ELD, or FokI-KKR domain. dimeric All possible combinations of the specific sequences of 42 N2 Sequence-specific Q3 EL canonical and modified domains provided herein are 25 43 N2 Sequence-specific Q3 KK embraced by this disclosure, including the following: 44 N2 Sequence-specific Q3 ELD 45 N2 Sequence-specific Q3 KKR 46 N2 Sequence-specific Q7 Homo TABLE 1. dimeric 47 N2 Sequence-specific Q7 EL 30 Exemplary TALENs embraced by the present disclosure. 48 N2 Sequence-specific Q7 KK 49 N2 Sequence-specific Q7 ELD N-terminal TALE repeat C-terminal Nuclease TALEN domain array domain domain 50 N2 Sequence-specific Q7 KKR 51 N2 Sequence-specific Truncated (28aa) Homo 1 Canonical Sequence-specific Q3 Homo dimeric dimeric 35 52 N2 Sequence-specific Truncated (28aa) EL 2 Canonical Sequence-specific Q3 EL 53 N2 Sequence-specific Truncated (28aa) KK 3 Canonical Sequence-specific Q3 KK S4 N2 Sequence-specific Truncated (28aa) ELD 4 Canonical Sequence-specific Q3 ELD 55 N2 Sequence-specific Truncated (28aa) KKR 5 Canonical Sequence-specific Q3 KKR 56 N3 Sequence-specific Canonical Homo 6 Canonical Sequence-specific Q7 Homo dimeric dimeric 40 57 N3 Sequence-specific Canonical EL 7 Canonical Sequence-specific Q7 EL 58 N3 Sequence-specific Canonical KK 8 Canonical Sequence-specific Q7 KK 9 Canonical Sequence-specific Q7 ELD 59 N3 Sequence-specific Canonical ELD 10 Canonical Sequence-specific Q7 KKR 60 N3 Sequence-specific Canonical KKR 11 Canonical Sequence-specific Truncated (28aa) Homo 61 N3 Sequence-specific Q3 Homo dimeric dimeric 12 Canonical Sequence-specific Truncated (28aa) EL 45 62 N3 Sequence-specific Q3 EL 13 Canonical Sequence-specific Truncated (28aa) KK 63 N3 Sequence-specific Q3 KK 14 Canonical Sequence-specific Truncated (28aa) ELD 64 N3 Sequence-specific Q3 ELD 15 Canonical Sequence-specific Truncated (28aa) KKR 16 N Sequence-specific Canonical Homo 65 N3 Sequence-specific Q3 KKR dimeric 66 N3 Sequence-specific Q7 Homo 17 N Sequence-specific Canonical EL 50 dimeric 18 N Sequence-specific Canonical KK 67 N3 Sequence-specific Q7 EL 19 N Sequence-specific Canonical ELD 68 N3 Sequence-specific Q7 KK 2O N Sequence-specific Canonical KKR 69 N3 Sequence-specific Q7 ELD 21 N Sequence-specific Q3 Homo 70 N3 Sequence-specific Q7 KKR dimeric 71 N3 Sequence-specific Truncated (28aa) Homo 22 N Sequence-specific Q3 EL 55 23 N Sequence-specific Q3 KK dimeric 24 N Sequence-specific Q3 ELD 72 N3 Sequence-specific Truncated (28aa) EL 25 N Sequence-specific Q3 KKR 73 N3 Sequence-specific Truncated (28aa) KK 26 N Sequence-specific Q7 Homo 74 N3 Sequence-specific Truncated (28aa) ELD dimeric 75 N3 Sequence-specific Truncated (28aa) KKR 27 N Sequence-specific Q7 EL 76 Canonical Sequence-specific Canonica EL 60 28 N Sequence-specific Q7 KK 77 Canonical Sequence-specific Canonica KK 29 N Sequence-specific Q7 ELD 78 Canonical Sequence-specific Canonica ELD 30 N Sequence-specific Q7 KKR 79 Canonical Sequence-specific Canonica KKR 31 N Sequence-specific Truncated (28aa) Homo dimeric 8O Canonical Sequence-specific Truncated (28aa) Homo 32 N Sequence-specific Truncated (28aa) EL dimeric 33 N Sequence-specific Truncated (28aa) KK 65 81 Canonical Sequence-specific Truncated (28aa) EL 34 N Sequence-specific Truncated (28aa) ELD 82 Canonical Sequence-specific Truncated (28aa) KK US 9,359,599 B2 29 30 TABLE 1-continued diseases, disorders, and/or conditions, including but not lim ited to one or more of the following: autoimmune disorders Exemplary TALENs embraced by the present disclosure. (e.g. diabetes, lupus, multiple Sclerosis, psoriasis, rheumatoid N-terminal TALE repeat C-terminal Nuclease arthritis); inflammatory disorders (e.g. arthritis, pelvic TALEN domain array domain domain inflammatory disease); infectious diseases (e.g. viral infec tions (e.g., HIV. HCV. RSV), bacterial infections, fungal 83 Canonical Sequence-specific Truncated (28aa) ELD infections, sepsis); neurological disorders (e.g. Alzheimer's 84 Canonical Sequence-specific Truncated (28aa) KKR disease, Huntington's disease; autism; Duchenne muscular The respective TALE repeat array employed will depend on the specific target sequence, dystrophy); cardiovascular disorders (e.g. atherosclerosis, Those of skill in the art will be able to design such sequence-specific TALE repeat arrays based on the instant disclosure and the knowledge in the art. Sequences for the different 10 hypercholesterolemia, thrombosis, clotting disorders, angio N-terminal, C-terminal, and Nuclease domains are provided above (See, SEQID NOs 1-4 genic disorders such as macular degeneration); proliferative and 22-30). disorders (e.g. cancer, benign neoplasms); respiratory disor It will be understood by those of skill in the art that the ders (e.g. chronic obstructive pulmonary disease); digestive exemplary sequences provided herein are for illustration pur disorders (e.g. inflammatory bowel disease, ulcers); muscu poses only and are not intended to limit the scope of the 15 loskeletal disorders (e.g. fibromyalgia, arthritis); endocrine, present disclosure. The disclosure also embraces the use of metabolic, and nutritional disorders (e.g. diabetes, osteoporo each of the inventive TALEN domains, e.g., the modified sis); urological disorders (e.g. renal disease); psychological N-terminal domains, C-terminal domains, and nuclease disorders (e.g. depression, Schizophrenia); skin disorders domains described herein, in the context of other TALEN (e.g. wounds, eczema); blood and lymphatic disorders (e.g. sequences, e.g., other modified or unmodified TALEN struc anemia, hemophilia); etc. In some embodiments, the TALEN tures. Additional sequences satisfying the described prin cleaves the target sequence upon dimerization. In some ciples and parameters that are useful in accordance to aspects embodiments, a TALEN provided herein cleaves a target site of this disclosure will be apparent to the skilled artisan. within an allele that is associated with a disease or disorder. In In some embodiments, the TALEN provided is a monomer. some embodiments, the TALEN cleaves a target site the In some embodiments, the TALEN monomer can dimerize 25 cleavage of which results in the treatment or prevention of a with another TALEN monomer to form a TALEN dimer. In disease or disorder. In some embodiments, the disease is some embodiments the formed dimer is a homodimer. In HIV/AIDS. In some embodiments, the disease is a prolifera Some embodiments, the dimer is a heterodimer. tive disease. In some embodiments, the TALEN binds a In some embodiments, TALENs provided herein cleave CCR5 target sequence (e.g., a CCR5 sequence associated their target sites with high specificity. For example, in some 30 with HIV). In some embodiments, the TALEN binds an ATM embodiments an improved TALEN is provided that has been target sequence (e.g., an ATM target sequence associated with engineered to cleave a desired target site within a genome ataxia telangiectasia). In some embodiments, the TALEN while binding and/or cleaving less than 1, less than 2, less binds a VEGFA target sequence (e.g., a VEGFA sequence than 3, less than 4, less than 5, less than 6, less than 7, less than associated with a proliferative disease). In some embodi 8, less than 9 or less than 10 off-target sites at a concentration 35 ments, the TALEN binds a CFTR target sequence (e.g., a effective for the nuclease to cut its intended target site. In CFTR sequence associated with cystic fibrosis). In some some embodiments, a TALEN is provided that has been engi embodiments, the TALEN binds a dystrophin target sequence neered to cleave a desired unique target site that has been (e.g., a dystrophin gene sequence associated with Duchenne selected to differ from any other site within a genome by at muscular dystrophy). In some embodiments, the TALEN least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at 40 binds a target sequence associated with haemochromatosis, least 9, or at least 10 nucleotide residues. haemophilia, Charcot-Marie-Tooth disease, neurofibromato Some aspects of this disclosure provide nucleic acids sis, phenylketonuria, polycystic kidney disease, sickle-cell encoding the TALENs provided herein. For example, nucleic disease, or Tay-Sachs disease. Suitable target genes, e.g., acids are provided herein that encode the TALENs described genes causing the listed diseases, are known to those of skill in Table 1. In some embodiments, the nucleic acids encoding 45 in the art. Additional genes and gene sequences associated the TALEN are under the control of a heterologous promoter. with a disease or disorder will be apparent to those of skill in In some embodiments, the encoding nucleic acids are the art. included in an expression construct, e.g., a plasmid, a viral Some aspects of this disclosure provide isolated TALE vector, or a linear expression construct. In some embodi effector domains, e.g., N- and C-terminal TALE effector ments, the nucleic acid or expression construct is in a cell, 50 domains, with decreased non-specific nucleic acid binding tissue, or organism. activity as compared to previously used TALE effector The map of an exemplary nucleic acid encoding a TALEN domains. The isolated TALE effector domains provided provided herein is illustrated in FIG. 19. An exemplary herein can be used in the context of suitable TALE effector sequence of such a nucleic acid is provided below. It will be molecules, e.g., TALE nucleases, TALE transcriptional acti understood by those of skill in the art that the maps and 55 vators, TALE transcriptional repressors, TALE recombi sequences provided herein are exemplary and do not limit the nases, and TALE epigenome modification enzymes. Addi Scope of this disclosure. tional suitable TALE effectors in the context of which the As described elsewhere herein, TALENs, including the isolated TALE domains can be used will be apparent to those improved TALENs provided by this disclosure, can be engi of skill in the art based on this disclosure. In general, the neered to bind (and cleave) virtually any nucleic acid 60 isolated N- and C-terminal domains provided herein are engi sequence based on the sequence-specific TALE repeat array neered to optimize, e.g., minimize, excess binding energy employed. In some embodiments, an improved TALEN pro conferred by amino acid residues that are positively charged vided herein binds a target sequence within a gene known to (cationic) at physiological pH. Some of the improved N-ter be associated with a disease or disorder. In some embodi minal or C-terminal TALE domains provided herein have a ments, TALENs provided herein may be used for therapeutic 65 decreased net charge and/or a decreased binding energy for purposes. For example, in Some embodiments, TALENs pro binding a target nucleic acid sequence as compared to the vided herein may be used for treatment of any of a variety of respective canonical TALE domains. When used as part of a US 9,359,599 B2 31 32 TALE effector molecule, e.g., a TALE nuclease, TALE tran domain and/or of the isolated C-terminal TALE domain is scriptional activator, TALE transcriptional repressor, TALE negative. In some embodiments, an isolated N-terminal recombinase, or TALE epigenome modification enzyme, this TALE domain and an isolated C-terminal TALE domain are decrease in charge leads to a decrease in off-target binding via provided and the net charge of the isolated N-terminal TALE the modified N-terminal and C-terminal domain(s). The por domain and of the isolated C-terminal TALE domain, tion of target recognition and binding, thus, is more narrowly together, is negative. In some embodiments, the net charge of confined to the specific recognition and binding activity of the the isolated N-terminal TALE domain and/or of the isolated TALE repeat array, as explained in more detail elsewhere C-terminal TALE domain is neutral or slightly positive (e.g., herein. The resulting TALE effector molecule, thus, exhibits less than +2 or less than +1 at physiological pH). In some an increase in the specificity of binding and, in turn, in the 10 embodiments, an isolated N-terminal TALE domain and an specificity of the respective effect of the TALE effector (e.g., isolated C-terminal TALE domain are provided, and the net cleaving the target site by a TALE nuclease, activation of a charge of the isolated N-terminal TALE domain and of the target gene by a TALE transcriptional activator, repression of isolated C-terminal TALE domain, together, is neutral or expression of a target gene by a TALE transcriptional repres slightly positive (e.g., less than +2 or less than +1 at physi Sor, recombination of a target sequence by a TALE recombi 15 ological pH). nase, or epigenetic modification of a target sequence by a In some embodiments, the isolated N-terminal domain TALE epigenome modification enzyme) as compared to and/or the isolated C-terminal domain provided herein com TALE effector molecules using unmodified domains. prise(s) an amino acid sequence that differs from the respec In some embodiments, an isolated N-terminal TALE tive canonical domain sequence in that at least one cationic domain is provided in which the net charge is less than the net amino acid residue of the canonical domain sequence is charge of the canonical N-terminal domain (SEQID NO: 1). replaced with an amino acid residue that exhibits no charge or In some embodiments, an isolated C-terminal TALE domain a negative charge at physiological pH. In some embodiments, is provided in which the net charge is less than the net charge at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, of the canonical C-terminal domain (SEQ ID NO: 22). In at least 7, at least 8, at least 9, at least 10, at least 11, at least some embodiments, an isolated N-terminal TALE domain is 25 12, at least 13, at least 14, or at least 15 cationic amino acid(s) provided in which the binding energy to a target nucleic acid is/are replaced with an amino acid residue that exhibits no molecule is less than the binding energy of the canonical charge or a negative charge at physiological pH in the isolated N-terminal domain (SEQID NO: 1). In some embodiments, N-terminal domain and/or in the isolated C-terminal domain an isolated C-terminal TALE domain is provided in which the provided. In some embodiments, 1,2,3,4,5,6,7,8,9, 10, 11, binding energy to a target nucleic acid molecule is less than 30 12, 13, 14, 15 cationic amino acid(s) is/are replaced with an the binding energy of the canonical C-terminal domain (SEQ amino acid residue that exhibits no charge or a negative ID NO:22). In some embodiments, the binding energy of the charge at physiological pH in the isolated N-terminal domain isolated N-terminal and/or of the isolated C-terminal TALE and/or in the isolated C-terminal domain. domain provided herein is decreased by at least 5%, at least In some embodiments, the cationic amino acid residue is 10%, at least 15%, at least 20%, at least 25%, at least 30%, at 35 arginine (R), lysine (K), or histidine (H). In some embodi least 35%, at least 40%, at least 45%, at least 50%, at least ments, the cationic amino acid residue is R or H. In some 55%, at least 60%, at least 65%, at least 70%, at least 75%, at embodiments, the amino acid residue that exhibits no charge least 80%, at least 85%, at least 90%, at least 95%, at least or a negative charge at physiological pH is glutamine (Q), 98%, or at least 99%. glycine (G), asparagine (N), threonine (T), serine (S), aspartic In some embodiments, the canonical N-terminal domain 40 acid (D), or glutamic acid (E). In some embodiments, the and/or the canonical C-terminal domain is modified to amino acid residue that exhibits no charge or a negative replace an amino acid residue that is positively charged at charge at physiological pH is Q. In some embodiments, at physiological pH with an amino acid residue that is not least one lysine or arginine residue is replaced with a charged or is negatively charged to arrive at the isolated glutamine residue in the isolated N-terminal domain and/or in N-terminal and/or C-terminal domain provided herein. In 45 the isolated C-terminal domain. Some embodiments, the modification includes the replace In some embodiments, an isolated C-terminal TALE ment of a positively charged residue with a negatively domain is provided herein that comprises one or more of the charged residue. In some embodiments, the modification following amino acid replacements: K777Q, K778Q. includes the replacement of a positively charged residue with K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi a neutral (uncharged) residue. In some embodiments, the 50 ments, the isolated C-terminal domain comprises two or more modification includes the replacement of a positively charged of the following amino acid replacements: K777Q, K778Q. residue with a residue having no charge or a negative charge. K788Q, R789Q, R792Q, R793Q, R801Q. In some embodi In some embodiments, the net charge of the isolated N-ter ments, the isolated C-terminal domain comprises three or minal domain and/or of the isolated C-terminal domain pro more of the following amino acid replacements: K777Q. vided herein is less than or equal to +10, less than or equal to 55 K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some +9, less than or equal to +8, less than or equal to +7, less than embodiments, the isolated C-terminal domain comprises four or equal to +6, less than or equal to +5, less than or equal to +4. or more of the following amino acid replacements: K777Q. less than or equal to +3, less than or equal to +2, less than or K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some equal to +1, less than or equal to 0, less than or equal to -1, embodiments, the isolated C-terminal domain comprises five less than or equal to -2, less than or equal to -3, less than or 60 or more of the following amino acid replacements: K777Q. equal to -4, or less than or equal to -5, or less than or equal to K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some -10 at physiological pH. In some embodiments, the net embodiments, the isolated C-terminal domain comprises six charge of the isolated N-terminal domain and/or of the iso or more of the following amino acid replacements: K777Q. lated C-terminal domain is between +5 and -5, between +2 K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some and -7, between 0 and -5, between 0 and -10, between -1 65 embodiments, the isolated C-terminal domain comprises all and -10, or between -2 and -15 at physiological pH. In some seven of the following amino acid replacements: K777Q, embodiments, the net charge of the isolated N-terminal TALE K778Q, K788Q, R789Q, R792Q, R793Q, R801Q. In some US 9,359,599 B2 33 34 embodiments, the isolated C-terminal domain comprises a TALEN and a pharmaceutically acceptable excipient. In Q3 variant sequence (K788Q, R792Q, R801Q, see SEQ ID Some embodiments, the pharmaceutical composition is for NO. 23). In some embodiments, the isolated C-terminal mulated for administration to a subject. In some embodi domain comprises a Q7 variant sequence (K777Q, K778Q. ments, the pharmaceutical composition comprises an effec K788Q, R789Q, R792Q, R793Q, R801Q, see SEQID NO: tive amount of the TALEN for cleaving a target sequence in a 24). cell in the subject. In some embodiments, the TALEN binds a In some embodiments, an isolated N-terminal TALE target sequence within a gene known to be associated with a domain is provided that is a truncated version of the canonical disease or disorder and wherein the composition comprises N-terminal domain. In some embodiments, an isolated C-ter an effective amount of the TALEN for alleviating a symptom minal TALE domain is provided that is a truncated version of 10 associated with the disease or disorder. the canonical C-terminal domain. In some embodiments, the truncated N-terminal domain and/or the truncated C-terminal For example, Some embodiments provide pharmaceutical domain comprises less than 90%, less than 80%, less than compositions comprising a TALEN as provided herein, or a 70%, less than 60%, less than 50%, less than 40%, less than nucleic acid encoding Such a nuclease, and a pharmaceuti 30%, or less than 25% of the residues of the canonical 15 cally acceptable excipient. Pharmaceutical compositions domain. In some embodiments, the truncated C-terminal may optionally comprise one or more additional therapeuti domain comprises less than 60, less than 50, less than 40, less cally active Substances. than 30, less than 29, less than 28, less than 27, less than 26, Formulations of the pharmaceutical compositions less than 25, less than 24, less than 23, less than 22, less than described herein may be prepared by any method known or 21, or less than 20 amino acid residues. In some embodi hereafter developed in the art of pharmacology. In general, ments, the truncated C-terminal domain comprises 60, 59,58, Such preparatory methods include the step of bringing the 57, 56,55, 54, 53, 52, 51, 50,49,48, 47,46, 45, 44, 43,42, 41, active ingredient into association with an excipient and/or one 40,39,38, 37, 36,35, 34,33, 32, 31, 30,39,38, 37, 36,35,34, or more other accessory ingredients, and then, if necessary 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, and/or desirable, shaping and/or packaging the product into a 16, 15, 14, 13, 12, 11, or 10 residues. In some embodiments, 25 desired single- or multi-dose unit. an isolated N-terminal TALE domain and/or an isolated Pharmaceutical formulations may additionally comprise a C-terminal domain is provided herein that is/are truncated pharmaceutically acceptable excipient, which, as used and comprise(s) one or more amino acid replacement(s). In herein, includes any and all solvents, dispersion media, dilu some embodiments, the isolated N-terminal TALE domains ents, or other liquid vehicles, dispersion or Suspension aids, comprise an amino acid sequence as provided in any of SEQ 30 Surface active agents, isotonic agents, thickening or emulsi ID NOS 2-5. In some embodiments, the isolated C-terminal fying agents, preservatives, solid binders, lubricants and the TALE domains comprise an amino acid sequence as provided like, as suited to the particular dosage form desired. Reming in any of SEQID NOs 23-25. ton's The Science and Practice of Pharmacy, 21 Edition, A. It will be apparent to those of skill in the art that the isolated R. Gennaro (Lippincott, Williams & Wilkins, Baltimore, C- and N-terminal TALE domains provided herein may be 35 Md., 2006; incorporated herein by reference) discloses vari used in the context of any TALE effector molecule, e.g., as ous excipients used in formulating pharmaceutical composi part of a TALE nuclease, a TALE transcriptional activator, a tions and known techniques for the preparation thereof. TALE transcriptional repressor, a TALE recombinase, a Except insofar as any conventional excipient medium is TALE epigenome modification enzyme, or any other Suitable incompatible with a Substance or its derivatives, such as by TALE effector molecule. In some embodiments, a TALE 40 producing any undesirable biological effect or otherwise domain provided herein is used in the context of a TALE interacting in a deleterious manner with any other compo molecule comprising or consisting essentially of the follow nent(s) of the pharmaceutical composition, its use is contem ing structure plated to be within the scope of this invention. N-terminal domain-TALE repeat array-C-terminal In some embodiments, a composition provided herein is domain-effector domain or 45 administered to a Subject, for example, to a human Subject, in effector domain-N-terminal domain-TALE repeat order to effect a targeted genomic modification within the array-C-terminal domain, subject. In some embodiments, cells are obtained from the wherein the effector domain may, in Some embodiments, be a Subject and contacted with a nuclease or a nuclease-encoding nuclease domain, a transcriptional activator or repressor nucleic acid ex vivo, and re-administered to the Subject after domain, a recombinase domain, oran epigenetic modification 50 the desired genomic modification has been effected or enzyme domain. detected in the cells. Although the descriptions of pharma It will also be apparent to those of skill in the art that it is ceutical compositions provided herein are principally desirable, in some embodiments, to adjust the DNA spacer directed to pharmaceutical compositions which are Suitable length in TALE effector molecules comprising Such a spacer, for administration to humans, it will be understood by the when using a truncated domain, e.g., truncated C-terminal 55 skilled artisan that Such compositions are generally Suitable domain as provided herein, in order to accommodate the for administration to animals of all sorts. Modification of truncation. pharmaceutical compositions suitable for administration to Some aspects of this disclosure provide compositions com humans in order to render the compositions suitable for prising a TALEN provided herein, e.g., a TALEN monomer. administration to various animals is well understood, and the In some embodiments, the composition comprises the 60 ordinarily skilled Veterinary pharmacologist can design and/ TALEN monomer and a different TALEN monomer that can or perform such modification with no more than routine form a heterodimer with the TALEN, wherein the dimer experimentation. Subjects to which administration of the exhibits nuclease activity. pharmaceutical compositions is contemplated include, but In some embodiments, the TALEN is provided in a com are not limited to, humans and/or other primates; mammals, position formulated for administration to a subject, e.g., to a 65 including, but not limited to, cattle, pigs, horses, sheep, cats, human Subject. For example, in some embodiments, a phar dogs, mice, and/or rats; and/or birds, including commercially maceutical composition is provided that comprises the relevantbirds Such as chickens, ducks, geese, and/or turkeys. US 9,359,599 B2 35 36 The scope of this disclosure embraces methods of using the Mendel, M.C.; Greenberg, S. G.; Wang, J.; Xia, D. F.; Miller, TALENs provided herein. It will be apparent to those of skill J. C.: Urnov, F. D. et al. (2010). “Enhancing zinc-finger in the art that the TALENs provided herein can be used in any nuclease activity with improved obligate heterodimeric archi method suitable for the application of TALENs, including, tectures”. Nature Methods 8 (1): 74-79. doi:10.1038/ but not limited to, those methods and applications known in 5 nmeth. 1539. PMID 21131970; Szczepek, M.; Brondani, V.: the art. Such methods may include TALEN-mediated cleav Bichel, J.; Serrano, L.; Segal, D. J.; Cathomen, T. (2007). age of DNA, e.g., in the context of genome manipulations “Structure-based redesign of the dimerization interface Such as, for example, targeted gene knockout through non reduces the toxicity of zinc-finger nucleases”. Nature Bio homologous end joining (NHEJ) or targeted genomic technology 25 (7): 786. doi:10.1038/nbt1317. PMID sequence replacement through homology-directed repair 10 17603476; Guo, J.; Gaj, T.; Barbas Iii, C. F. (2010). “Directed (HDR) using an exogenous DNA template, respectively. The Evolution of an Enhanced and Highly Efficient FokI Cleav improved features of the TALENs provided herein, e.g., the age Domain for Zinc Finger Nucleases”. Journal of Molecu improved specificity of some of the TALENs provided herein, lar Biology 400 (1): 96. doi:10.1016/j.ijmb.2010.04.060. will typically allow for such methods and applications to be PMC 2885538. PMID 20447404: Mussolino, C.; Morbitzer, carried out with greater efficiency. All methods and applica 15 R. Lutge, F.; Dannemann, N.; Lahaye, T., Cathomen, T. tions suitable for the use of TALENs, and performed with the (2011). “A novel TALE nuclease scaffold enables high TALENs provided herein, are contemplated and are within genome editing activity in combination with low toxicity'. the scope of this disclosure. For example, the instant disclo Nucleic Acids Research. doi:10.1093/nar/gkrS97; Zhang, sure provides the use of the TALENs provided herein in any Feng, et. al. (February 2011). “Efficient construction of method suitable for the use of TALENs as described in Boch, sequence-specific TAL effectors for modulating mammalian Jens (February 2011). “TALEs of genome targeting. Nature transcription’. Nature Biotechnology 29 (2): 149-53. doi: Biotechnology 29 (2): 135-6. doi:10.1038/nbt. 1767. PMID 10.1038/nbt. 1775. PMC 3084533. PMID 21248753: Mor 213.01438: Boch, Jens; et. al. (December 2009). “Breaking bitzer, R.; Elsaesser, J.; Hausner, J.; Lahaye, T. (2011). the Code of DNA Binding Specificity of TAL-Type III Effec “Assembly of custom TALE-type DNA binding domains by tors”. Science 326 (5959): 1509-12. Bibcode:2009Sci . . . 25 modular cloning. Nucleic Acids Research. doi:10.1093/nar/ 326.1509B. doi:10.1126/science. 1178811. PMID 19933107; gkr151; Li, T.; Huang, S.; Zhao, X. Wright, D.A.; Carpenter, Moscou, Matthew J.; Adam J. Bogdanove (December 2009). S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). “Modu “A Simple Cipher Governs DNA Recognition by TAL Effec larly assembled designer TAL effector nucleases for targeted tors”. Science 326 (5959): 1501. Bibcode:2009Sci . . . gene knockout and gene replacement in eukaryotes. Nucleic 326.15O1M. doi:10.1126/science. 1178817. PMID 30 Acids Research. doi:10.1093/nar/gkr188: Gei?Bler, R.; 19933106; Christian, Michelle; et al. (October 2010). “Tar Scholze, H.; Hahn, S.; Streubel, J.; Bonas, U.: Behrens, S. E.; geting DNA Double-Strand Breaks with TAL Effector Boch, J. (2011). “Transcriptional Activators of Human Genes Nucleases”. Genetics 186 (2): 757-61. doi:10.1534/genet with Programmable DNA-Specificity'. In Shiu, Shin-Han. ics. 110.120717. PMC 2942870. PMID 20660643; Li, Ting: PLoS ONE 6 (5): e19509. doi:10.1371/jour et. al. (August 2010). “TAL nucleases (TALNs): hybrid pro 35 nal.pone.0019509; Weber, E.; Gruetzner, R.; Werner, S.: teins composed of TAL effectors and FokI DNA-cleavage Engler, C.; Marillonnet, S. (2011). “Assembly of Designer domain. Nucleic Acids Research 39: 1-14. doi:10.1093/marf TAL Effectors by Golden Gate Cloning’. In Bendahmane, gkg704. PMC 3017587. PMID 20699274; Mahfouz, Magdy Mohammed. PLoS ONE 6 (5): e19722. doi:10.1371/jour M.; et. al. (February 2010). "De novo-engineered transcrip nal.pone.0019722; Sander et al. Targeted gene disruption in tion activator-like effector (TALE) hybrid nuclease with 40 somatic Zebrafish cells using engineered TALENs. Nature novel DNA binding specificity creates double-strand breaks”. Biotechnology Vol 29:697-98 (5 Aug. 2011) Sander, J. D.; PNAS 108 (6): 2623-8. Bibcode:2011 PNAS.108.2623M. Cade, L.; Khayter, C.; Reyon, D.; Peterson, R.T.: Joung, J. K.: doi:10.1073/pnas. 1019533108. PMC 3038751. PMID Yeh, J. R. J. (2011). “Targeted gene disruption in somatic 21262818; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; Zebrafish cells using engineered TALENs. Nature Biotech Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V. et al. 45 nology 29 (8): 697. doi:10.1038/nbt. 1934; the entire contents (2011). “Efficient design and assembly of custom TALEN of each of which are incorporated herein by reference. and other TAL effector-based constructs for DNA targeting. In some embodiments, the TALENs, TALEN domains, Nucleic Acids Research. doi:10.1093/mar/gkr218: Miller, TALEN-encoding or TALEN domain-encoding nucleic Jeffrey; et. al. (February 2011). ATALE nuclease architec acids, compositions, and reagents described herein are iso ture for efficient genome editing. Nature Biotechnology 29 50 lated. In some embodiments, the TALENs, TALEN domains, (2): 143-8. doi:10.1038/nbt. 1755. PMID 21179091; Hock TALEN-encoding or TALEN domain-encoding nucleic emeyer, D.; Wang, H., Kiani, S.; Lai, C. S., Gao, Q. Cassady, acids, compositions, and reagents described herein are puri J. P.; Cost, G.J.; Zhang, L. etal. (2011). “Genetic engineering fied, e.g., at least 60%, at least 70%, at least 80%, at least 90%, of human pluripotent cells using TALE nucleases”. Nature or at least 95% pure. Biotechnology 29 (8), doi:10.1038/nbt. 1927; Wood, A. J.; 55 Some aspects of this disclosure provide methods of cleav Lo, T.-W.; Zeitler, B. Pickle, C. S.; Ralston, E.J.; Lee, A. H.; ing a target sequence in a nucleic acid molecule using an Amora, R.; Miller, J. C. et al. (2011). “Targeted Genome inventive TALEN as described herein. In some embodiments, Editing Across Species Using ZFNs and TALENs. Science the method comprises contacting a nucleic acid molecule 333 (6040): 307. doi:10.1126/science.1207773. PMC comprising the target sequence with a TALEN binding the 34892.82. PMID21700836; Tesson, L., Usal, C.; Menoret, S. 60 target sequence under conditions suitable for the TALEN to V.: Leung, E.; Niles, B.J.; Remy, S.V.; Santiago.Y.; Vincent, bind and cleave the target sequence. In some embodiments, A. I. et al. (2011). “Knockout rats generated by embryo the TALEN is provided as a monomer. In some embodiments, microinjection of TALENs. Nature Biotechnology 29 (8): the inventive TALEN monomer is provided in a composition 695. doi:10.1038/nbt. 1940: Huang, P: Xiao, A.; Zhou, M.: comprising a different TALEN monomer that can dimerize Zhu, Z.: Lin, S.; Zhang, B. (2011). “Heritable gene targeting 65 with the first inventive TALEN monomer to form a het in Zebrafish using customized TALENs. Nature Biotechnol erodimer having nuclease activity. In some embodiments, the ogy 29 (8): 699. doi:10.1038/nbt. 1939; Doyon, Y.; Vo, T. D.; inventive TALEN is provided in a pharmaceutical composi US 9,359,599 B2 37 38 tion. In some embodiments, the target sequence is in a cell. In The function and advantage of these and other embodi Some embodiments, the target sequence is in the genome of a ments of the present invention will be more fully understood cell. In some embodiments, the target sequence is in a Subject. from the Examples below. The following Examples are In Some embodiments, the method comprises administering a intended to illustrate the benefits of the present invention and composition, e.g., a pharmaceutical composition, comprising to describe particular embodiments, but are not intended to the TALEN to the subject in an amount sufficient for the exemplify the full scope of the invention. Accordingly, it will TALEN to bind and cleave the target site. be understood that the Examples are not meant to limit the Some aspects of this disclosure provide methods of pre Scope of the invention. paring engineered TALENs. In some embodiments, the method comprises replacing at least one amino acid in the 10 EXAMPLES canonical N-terminal TALEN domain and/or the canonical C-terminal TALEN domain with an amino acid having no Example 1 charge or a negative charge at physiological pH; and/or trun cating the N-terminal TALEN domain and/or the C-terminal Materials and Methods TALEN domain to remove a positively charged fragment; 15 thus generating an engineered TALEN having an N-terminal Oligonucleotides, PCR and DNA Purification domain and/or a C-terminal domain of decreased net charge. All oligonucleotides were purchased from Integrated DNA In some embodiments, the at least one amino acid being Technologies (IDT). Oligonucleotide sequences are listed in replaced comprises a cationic amino acid or an amino acid Table 10. PCR was performed with 0.4 uL of 2 U/uL Phusion having a positive charge at physiological pH. In some Hot Start II DNA polymerase (Thermo-Fisher) in 50 uL with embodiments, the amino acid replacing the at least one amino 1xEHF Buffer, 0.2 mM dNTP mix (0.2 mM dATP, 0.2 mM acid is a cationic amino acid or a neutral amino acid. In some dCTP, 0.2 mM dGTP, 0.2 mM dTTP) (NEB), 0.5uM to 1 uM embodiments, the truncated N-terminal TALEN domain and/ of each primer and a program of 98°C., 1 min: 35 cycles of or the truncated C-terminal TALEN domain comprises less 25 98° C., 15 s: 62° C., 15 s: 72° C., 1 min unless otherwise than 90%, less than 80%, less than 70%, less than 60%, less noted. Many DNA reactions were purified with a QIAquick than 50%, less than 40%, less than 30%, or less than 25% of PCRPurification Kit (Qiagen) referred to below as Q-column the residues of the respective canonical domain. In some purification or MinFlute PCR Purification Kit (Qiagen) embodiments, the truncated C-terminal domain comprises referred to below as M-column purification. less than 60, less than 50, less than 40, less than 30, less than 30 29, less than 28, less than 27, less than 26, less than 25, less TALEN Construction than 24, less than 23, less than 22, less than 21, or less than 20 amino acid residues. The canonical TALEN plasmids were constructed by the In some embodiments, the truncated C-terminal domain FLASH method' with each TALEN targeting 10-18 base comprises 60, 59,58, 57,56, 55,54,53, 52,51,50,49, 48,47, 35 pairs. N-terminal mutations were cloned by PCR with Q5 Hot 46,45, 44, 43,42, 41,40,39,38, 37, 36, 35,34, 33, 32, 31, 30, Start Master Mix (NEB) 98°C., 22 s; 62° C., 15s; 72°C., 7 39, 38,37, 36,35,34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, min) using phosphorylated TAL-N1fwd (for N1), phospho 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, or 10 amino acid rylated TAL-N2fwd (for N2), orphosphorylated TAL-N3fwd residues. In some embodiments, the method comprises (for N3) and phosphorylated TALNrev as primers. 1 u, DpnI replacing at least 2, at least 3, at least 4, at least 5, at least 6, at 40 (NEB) was added and the reaction was incubated at 37°C. for least 7, at least 8, at least 9, at least 10, at least 11, at least 12, 30 min then M-column purified. ~25 ng of eluted DNA was at least 13, at least 14, or at least 15 amino acids in the blunt-end ligated intramolecularly in 10 uL 2x Quick Ligase canonical N-terminal TALEN domain and/or in the canonical Buffer, 1 uL of Quick Ligase (NEB) in a total volume of 20LL C-terminal TALEN domain with an amino acid having no at room temperature (-21°C.) for 15 min. 1 uL of this ligation charge or a negative charge at physiological pH. In some 45 reaction was transformed into Top10 chemically competent embodiments, the amino acid being replaced is arginine (R) cells (Invitrogen). C-terminal domain mutations were cloned or lysine (K). In some embodiments, the amino acid residue by PCR using TAL-Cifwd and TAL-Cirev primers, then having no charge or a negative charge at physiological pH is Q-column purified. ~1 ng of this eluted DNA was used as the glutamine (Q) or glycine (G). In some embodiments, the template for PCR with TALCifwd and either TAL-Q3 (for method comprises replacing at least one lysine or arginine 50 Q3) or TAL-Q7 (for Q7) for primers, then Q-column purified. residue with a glutamine residue. ~1 ng of this eluted DNA was used as the template for PCR In some embodiments, the improved TALENs provided with TAL-Cifwd and TAL-Cirev for primers, then Qcolumn herein are designed and/or generated by recombinant tech purified. ~1 ug of this DNA fragment was digested with HpaI nology. In some embodiments, designing and/or generating and BamHI in 1x NEBuffer 4 and cloned into -2 ug of desired comprises designing a TALE repeat array that specifically 55 TALEN plasmid pre-digested with HpaI and BamHI. binds a desired target sequence, or a half-site thereof. Some aspects of this disclosure provide kits comprising an In Vitro TALEN Expression engineered TALEN as provided herein, or a composition (e.g., a pharmaceutical composition) comprising such a TALEN proteins, all containing a 3xFLAG tag, were TALEN. In some embodiments, the kit comprises an excipi 60 expressed by in vitro transcription/translation. 800 ng of ent and instructions for contacting the TALEN with the TALEN-encoding plasmid or no plasmid ("empty lysate' excipient to generate a composition Suitable for contacting a control) was added to an in vitro transcription/translation nucleic acid with the TALEN. In some embodiments, the reaction using the TNT.R. Quick Coupled Transcription/ excipient is a pharmaceutically acceptable excipient. Translation System, T7 Variant (Promega) in a final volume Typically, the kit will comprise a container housing the 65 of 20 uL at 30° C. for 1.5 h. Western blots were used to components of the kit, as well as written instructions stating visualize protein using the anti-FLAG M2 monoclonal anti how the components of the kit should be stored and used. body (Sigma-Aldrich). TALEN concentrations were calcu US 9,359,599 B2 39 40 lated by comparison to standard curve of 1 ng to 16 ng sequences) were isolated and purified by Qcolumn. ~2ng of N-terminally FLAG-tagged bacterial alkaline phosphatase eluted DNA was amplified by PCR for 5 to 8 cycles with #2B (Sigma-Aldrich). adapter primers (Table 10A) and purified by M-column. 10 uI of eluted DNA was purified using 12 uL of AMPure In Vitro Selection for DNA Cleavage XP beads (Agencourt) and quantified with an Illumina/Uni versal Library Quantification Kit (Kapa Biosystems). DNA Pre-selection libraries were prepared with 10 pmol of oligo was prepared for high-throughput DNA sequencing accord libraries containing partially randomized target half-site ing to Illumina instructions and sequenced using a MiSeq sequences (CCR5A, ATM, or CCR5B) and fully randomized DNA Sequencer (Illumina) using a 12 pM final solution and 10- to 24-bp spacer sequences (Table 10). Oligonucleotide 10 156-bp paired-end reads. To prepare the preselection library libraries were separately circularized by incubation with 100 for sequencing, the pre-selection library was digested with 1 units of CircIligase II ssDNA Ligase (Epicentre) in 1x Cir LL to 4 LL of appropriate cLigase II Reaction Buffer (33 mM Tris-acetate, 66 mM (CCR5A=Tsp45I, ATM=AccG5I, CCR5B=AvaI (NEB)) for potassium acetate, 0.5 mM dithiothreitol, pH 7.5) supple 1 h at 37°C. then ligated as described above with 2 pmol of mented with 2.5 mM MnC12 in 20 uL total for 16 hat 60° C. 15 heated and cooled #1 library adapters (Table 10A). Pre-selec then incubated at 80° C. for 10 min. 2.5 LL of each circular tion library DNA was prepared as described above using #2A ization reaction was used as a Substrate for rolling-circle library adapter primers and #2B library adapter primers in amplification at 30° C. for 16 h in a 50-uI reaction using the place of #2A adapter primers and #2B adapter primers, Illustra TempliPhi 100 Amplification Kit (GE Healthcare). respectively (Table 10A). The resulting pre-selection library The resulting concatemerized libraries were quantified with DNA was sequenced together with the TALEN-digested Quant-iTTM PicoGreen(R) dsDNA Kit (Invitrogen) and librar samples. ies with different spacer lengths were combined in an equimolar ratio. Discrete In Vitro TALEN Cleavage Assays For selections on the CCR5B sequence libraries, 500 ng of pre-selection library was digested for 2 h at 37° C. in 1x 25 Discrete DNA substrates for TALEN digestion were con NEBuffer 3 with in vitro transcribed/translated TALEN plus structed by combining pairs of oligonucleotides as specified empty lysate (30 uL total). For all CCR5B TALENs, in vitro in Table 9B with restriction cloning 14 into puC19 (NEB). transcribed/translated TALEN concentrations were quanti Corresponding cloned plasmids were amplified by PCR (59 fied by Western blot (during the blot, TALENs were stored for C. annealing for 15 s) for 24 cycles with puC19Ofwd and 16 h at 4° C.) and then TALEN was added to 40 nM final 30 pUC19Orev primers (Table 10B) and Q-column purified. 50 concentration per monomer. For selections on CCR5A and ng of amplified were digested in 1x NEBuffer 3 with 3 ATM sequence libraries, the combined pre-selection library uL each of in vitro transcribed/translated TALEN left and was further purified in a 300,000 MWCO spin column (Sar right monomers (corresponding to a ~16 nM to -12 nM final torius) with three 500-uL washes in 1x NEBuffer 3. 125 ng TALEN concentration), and 6 uL of empty lysate in a total pre-selection library was digested for 30 min at 37°C. in 1x 35 reaction Volume of 120 uL. The digestion reaction was incu NEBuffer 3 with a total 24 uL of fresh in vitro transcribed/ bated for 30 min at 37°C., then incubated with 1 uL of 100 translated TALENs and empty lysate. For all CCR5A and ug/uL RNase A (Qiagen) for 2 min and purified by M-col ATMTALENs, 6 uL of in vitro transcription/translation left umn. The entire 10 uL of eluted DNA with glycerol added to TALEN and 6 uL of right TALEN were used, corresponding 15% was analyzed on a 5%TBE 18-well Criterion PAGEgel to a final concentration in a cleavage reaction of 16 nM+2 nM 40 (Bio-Rad) for 45 min at 200 V, then stained with 1xSYBR or 12 nM+1.5 nM for CC5A or ATMTALENs, respectively. Gold (Invitrogen) for 10 min. Bands were visualized and These TALEN concentrations were quantified by Western quantified on an AlphaImager HP (Alpha Innotech). blot performed in parallel with digestion. For all selections, the TALEN-digested library was incu Cellular TALEN Cleavage Assays bated with 1 uL of 100 ug/uL RNase A (Qiagen) for 2 min and 45 then Q-column purified. 50 uL of purified DNA was incu TALENs were cloned into mammalian expression vectors bated with 3 ul of 10 mM dNTP mix (10 mM dATP, 10 mM 12 and the resulting TALEN vectors transfected into U2OS dCTP, 10 mM dGTP, 10 mM dTTP) (NEB), 6 uL of 10x EGFP cells as previously described.' Genomic DNA was NEBuffer 2, and 1 uL of 5 U?uL Klenow Fragment DNA isolated after 2 days as previously described.' For each Polymerase (NEB) for 30 min at room temperature and 50 assay, 50 ng of isolated genomic DNA was amplified by PCR Q-column purified. 50LL of the eluted DNA was ligated with 98° C., 15 s 67.5°C., 15 s; 72° C., 22 s for 35 cycles with 2 pmol of heated and cooled #1 adapters containing barcodes pairs of primers with or without 4% DMSO as specified in corresponding to each sample (selections with different Table 10C. The relative DNA content of the PCR reaction for TALEN concentrations or constructs) (Table 10A). Ligation each genomic site was quantified with Quant-iTTM was performed in 1xT4 DNA Ligase Buffer (50 mM Tris 55 Pico Green(R) dsDNA Kit (Invitrogen) and then pooled into an HCl, 10 mM MgCl2, 1 mM ATP, 10 mM DTT, pH 7.5) with equimolar mixture, keeping no-TALEN and all TALEN 1 uL of 400U/uLT4DNA ligase (NEB) in 60L total volume treated samples separate. DNA corresponding to 150 to 350 for 16 hat room temperature, then Q-column purified. bp was purified by PAGE as described above. 6 uL of the eluted DNA was amplified by PCR in 150 uL 44 uL of eluted DNA was incubated with 5 uL of 1xT4 total reaction volume (divided into 3x50 uL reactions) for 14 60 DNA Ligase Buffer and 1 uL of 10 U/LL Polynucleotide to 22 cycles using the #2A adapter primers in Table 10A. The kinase (NEB) for 30 min at 37° C. and Q-column purified. 43 PCR products were purified by Q-column. Each DNA sample uL of eluted DNA was incubated with 1 uL of 10 mM dATP was quantified with Quant-iTTM PicoGreen(R) dsDNA Kit (In (NEB), 5 L of 10x NEBuffer 2, and 1 uL of 5 U/uL DNA vitrogen) and then pooled into an equimolar mixture. 500 ng Klenow Fragment (3'->5' exo-) (NEB) for 30 min at 37° C. of pooled DNA was run a 5% TBE 18-well Criterion PAGE 65 and purified by M-column. 10 u, of eluted DNA was ligated gel (BioRad) for 30min at 200 V and DNAs of length-230 bp as above with 10 pmol of heated and cooled G (genomic) (corresponding to 1.5 target site repeats plus adapter adapters (Table 10A). 8 ul of eluted DNA was amplified by US 9,359,599 B2 41 42 PCR for 6 to 8 cycles with G-B primers containing barcodes LFLANK=Left Flank Sequence (designed as a single random corresponding to each sample. Each sample DNA was quan base) tified with Quant-iTTM PicoGreen(R) dsDNA Kit (Invitrogen) LHS-Left Half Site Sequence and then pooled into an equimolar mixture. The combined RHS-Right Half Site Sequence DNA was subjected to high throughput sequencing using a 5 RFLANK-Right Flank Sequence (designed as a single ran dom base) MiSeq as described above. CONSTANT=Constant Sequence Data Analysis (CCR5A=CGTCACGCTCACCACT (SEQ ID NO: 166), CCR5B=CCTCGGGACTCCACGCT (SEQID NO:167), ATM=GGTACCCCACTCCGCGT (SEQ ID NO: 168)) Illumina sequencing reads were filtered and parsed with 10 4) Search for best instances of each half site in the full site, scripts written in Unix Bash as outlined in the Algorithms accept any sequences with proper left and right half-site section. The Source code is available upon request. Specificity order of left then right. scores were calculated as previously described."' Statistical 5) With half site positions determine corresponding spacer analysis on the distribution of number of mutations in various (sequence between the two half sites), left flank and right TALEN selections in Table 3 was performed as previously 15 flank sequences (sequence between half sites and constant described." Statistical analysis of modified sites in Table 7 sequences). was performed as previously described.'' 6) Determine sequence end by taking sequence from the start Algorithms of read after the 5bp barcode sequence to the beginning of All scripts were written in bash or MATLAB. the constant sequence. SEQUENCESTART-RHS-RFLANK-CONSTANT Computational Filtering of Pre-Selection Sequences 7) Filter by sequencing read quality scores, accepting and Selected Sequences sequences with quality scores of A or better across three fourths of the half site positions. For Pre-selection Sequences 8) Selected sequences were filtered by sequence end, by 1) Search for 16 bp constant sequence 25 accepting only sequences with sequence ends in the spacer that were 2.5-fold more abundant than the amount of (CCR5A=CGTCACGCTCACCACT (SEQ ID NO: 166), sequence end background calculated as the mean of the CCR5B=CCTCGGGACTCCACGCT (SEQID NO:167), number of sequences with ends Zero to five base pairs into ATM=GGTACCCCACTCCGCGT (SEQ ID NO: 168)) each half-site from the spacer side (sequence end back immediately after first 4 bases read (random bases), accept ground number was calculated for both half sites with the ing only sequences with the 16 bp constant sequence allow- 30 closest half site to the sequence end utilized as sequence ing for one mutation. end background for comparison). 2) Search for 9 bp final sequence at a position at least the Computational Search for Genomic Off-Target Sites minimum possible full site length away and up to the max Related to the CCR5B Target Site full site length away from constant sequence to confirm the 1) The Patmatch program was used to search the human presence of a full site, accept only sequences with this 9bp 35 genome (GRCh37/hg19 build) for pattern sequences as final sequence. (Final sequence: CCR5A=CGTCACGCT, follows: CCR5B left half-site sequence (L16, L13 or L10) CCR5B=CCTCGGGAC, ATM=GGTACGTGC) NNNNNNNNN ... CCR5B right half-site sequence (R16, 3) Search for best instances of each half site in the full site, R13 or R10)M,0,0) where number of Ns varied from 12 to accept any sequences with proper left and right half-site 25 order of left then right. 40 and M (indicating mutations allowed) varied from 0 to 14. 4) Determine DNA spacer sequence between the two half 2) The number of output off-target sites were de-cumulated sites, the single flanking nucleotide to left of the left half since the program outputs all sequences with X or fewer site and single flanking nucleotide to right of the right mutations, resulting in the number of off-target sites in the half-site (sequence between half sites and constant human genome that are a specific number of mutations sequences). 45 away from the target site. 5) Filter by sequencing read quality scores, accepting Identification of Indels in Sequences of Genomic Sites sequences with quality scores of A or better across three 1) For each sequence the primer sequence was used to iden fourths of the half site positions. tify the genomic site. For Selected Sequences 2) Sequences containing the reference genomic sequence 1) Output to separate files all sequence reads and position 50 corresponding to 8 bp to the left of the target site and quality scores of all sequences starting with correct 5 bp reference genomic sequence 8bp (or 6 bp for genomic sites barcodes corresponding to different selection conditions. at the very end of sequencing reads) to the right of the full 2) Search for the initial 16 bp sequence immediately after the target site were considered target site sequences. 5bp barcode repeated at a position at least the minimum 3) Any target site sequences corresponding to the same size as possible full site length away and up to the max full site 55 the reference genomic site were considered unmodified length away from initial sequence to confirm the presence and any sequences not the reference size were aligned with of a full site with repeated sequence, accept only sequences ClustalW to the reference genomic site. with a 16 bp repeat allowing for 1 mutation. 4) Aligned sequences with more than two insertions or two 3) Search for 16 bp constant sequence within the full site, deletions in the DNA spacer sequence between the two accept only sequences with a constant sequence allowing 60 half-site sequences were considered indels. for one mutation. Parse sequence to start with constant Results sequence plus 5' sequence to second instance of repeated sequence then initial sequence after barcode to constant Specificity Profiling of TALENs Targeting CCR5 sequence resulting in constant sequences Sandwiching the and ATM equivalent of one full site: 65 CONSTANT-LFLANK-LHS-SPACER-RHS-RFLANK We profiled the specificity of 41 heterodimeric TALEN CONSTANT pairs (hereafter referred to as TALENs) in total, comprising US 9,359,599 B2 43 44 TALENs targeting left and right half-sites of various lengths the in vitro selection results to train a machine-learning algo and TALENs with different domain variants. Each of the 41 rithm to generate potential TALEN off-target sites in the TALENs was designed to target one of three distinct human genome. This computational step was necessary sequences, which we refer to as CCR5A, CCR5B, or ATM, in because the preselection libraries coverall sequences with six two different human genes, CCR5 and ATM (FIG. 7). We or fewer mutations, while almost all potential off-target sites used an improved version of a previously described in vitro in the human genome for CCR5 and ATM sequences differ at selection method' with modifications that increase the more than six positions relative to the target sequence. The throughput and sensitivity of the selection (FIG. 1B). Briefly, preselection libraries of >10' DNA sequences algorithm calculates the posterior probability of each nucle each were digested with 3 nM to 40 nM of an in vitro trans otide in each position of a target to occur in a sequence that lated TALEN. These concentrations correspond to -20 to 10 was cleaved by the TALENs in opposition to sequences from -200 dimeric TALEN molecules perhuman cell nucleus,' a the target library that were not observed to be cleaved." relatively low level of cellular protein expression.’’ These posterior probabilities were then used to score the Cleaved library members contained a free 5' monophosphate likelihood that the TALEN used to train the algorithm would that was captured by adapter ligation and isolated by gel cleave every possible target sequence in the human genome purification (FIG. 1B). In the control sample, all members of 15 with monomer spacing of 10 to 30 bps. Using the machine the pre-selection library were cleaved by a restriction endo learning algorithm, we identified 36 CCR5A and 36 ATM nuclease at a constant sequence to enable them to be captured TALEN off-target sites that differ from the on-target by adapter ligation and isolated by gel purification. High sequence at seven to fourteen positions (Table 6). throughput sequencing of TALEN-treated or control samples The 72 best-scoring genomic off-target sites for CCR5A Surviving this selection process and computational analysis and ATMTALENs were amplified from genomic DNA puri revealed the abundance of all TALEN-cleaved sequences as fied from human U2OS-EGFP cells 12 expressing either well as the abundance of the corresponding sequences before CCR5A or ATMTALENs. Sequences containing insertions selection. The enrichment value for each library member or deletions of three or more base pairs in the DNA spacer of Surviving selection was calculated by dividing its post-selec the potential genomic off-target sites and present in signifi tion sequence abundance by its preselection abundance. The 25 cantly greater numbers in the TALEN-treated samples versus pre-selection DNA libraries were sufficiently large that they the untreated control sample were considered TALEN-in each contain, in theory, at least ten copies of all possible DNA duced modifications. Of the 35 CCR5A off-target sites that sequences with six or fewer mutations relative to the on-target we successfully amplified, we identified six off-target sites Sequence. For all 41 TALENs tested, the DNA that survived the with TALEN-induced modifications; likewise, of the 31 ATM selection contained significantly fewer mean mutations in the 30 off-target sites that we successfully amplified, we observed targeted half-sites than were present in the pre-selection seven off-target sites with TALEN-induced modifications libraries (Table 3 and 4). For example, the mean number of (FIG.3 and Table 7). The inspection of modified on-target and mutations in DNA sequences Surviving selection after treat off-target sites yielded a prevalence of deletions ranging from ment with TALENs targeting 18-bp left and right half-sites three to dozens of base pairs (FIG. 3), consistent with previ was 4.06 for CCR5A and 3.18 for ATM sequences, respec 35 ously described characteristics of TALEN-induced genomic tively, compared to 7.54 and 6.82 mutations in the corre modification. sponding pre-selection libraries (FIGS. 2A and 2B). For all These results collectively indicate that the in vitro selection selections, the on-target sequences were enriched by 8- to data, processed through a machine-learning algorithm, can 640-fold (Table 5). To validate our selection results in vitro, predict bona fide off-target substrates that undergo TALEN we assayed the ability of the CCR5B TALENs targeting 40 induced modification in human cells. TALE Repeats Produc 13-bp left and right half-sites (L13+R13) to cleave each of 16 tively Bind Base Pairs with Relative Independence The exten diverse off-target substrates (FIGS. 2E and 2F). The resulting sive number of quantitatively characterized off-target discrete in vitro cleavage efficiencies correlated well with the substrates in the selection data enabled us to assess whether observed enrichment values (FIG. 2G). mutations at one position in the target sequence affect the To determine the specificity at each position in the TALEN 45 ability of TALEN repeats to productively bind other posi target site for all four possible base pairs, a specificity score tions. We generated an expected enrichment value for every was calculated as the difference between pre-selection and possible double-mutant sequence for the L13+R13 CCR5B post-selection base pair frequencies, normalized to the maxi TALENs assuming independent contributions from the two mum possible change of the pre-selection frequency from corresponding single-mutation enrichments. In general, the complete specificity (defined as 1.0) to complete anti-speci 50 predicted enrichment values closely resembled the actual ficity (defined as -1.0). For all TALENs tested, the targeted observed enrichment values for each double-mutant base pair at every position in both half-sites is preferred, with sequence (FIG. 14A), Suggesting that component single the sole exception of the base pair closest to the spacer for mutations independently contributed to the overall cleavabil some ATM TALENs at the right-half site (FIG. 2C, 2D and ity of double-mutant sequences. The difference between the FIGS. 8 through 13). The 5'T nucleotide recognized by the 55 observed and predicted double-mutant enrichment values N-terminal domain is highly specified, and the 5' DNA end was relatively independent of the distance between the two (the N-terminal TALEN end) generally exhibits higher speci mutations, except that two neighboring mismatches were ficity than the 3' DNA end; both observations are consistent slightly better tolerated than would be expected (FIG. 14B). with previous reports. 'Taken together, these results show To determine the potential interdependence of more than that the selection data accurately predicts the efficiency of 60 two mutations, we evaluated the relationship between selec off-target TALEN cleavage in vitro, and that TALENs are tion enrichment values and the number of mutations in the overall highly specific across the entire target sequence. post-selection target for the L13+R13 CCR5B TALEN (FIG. 4A, black line). For 0 to 5 mutations, enrichment values TALEN Off Target Cleavage in Cells closely followed a simple exponential function of the mean 65 number of mutations (m) (Table 8). This relationship is con To test if off-target cleavage activities reported by the sistent with a model in which each Successive mutation selection are relevant to off-target cleavage in cells, we used reduces the binding energy by a constant amount (AG), result US 9,359,599 B2 45 46 ing in an exponential decrease in TALEN binding (Keq (m)) 4C). As a result, longer TALENs are predicted to be more such that Keq (m)-eAG*m. The observed exponential rela specific against the set of potential cleavage sites in the human tionship therefore Suggests that the mean reduction in binding genome than shorter TALENs for the tested TALEN lengths energy from a typical mismatch is independent of the number targeting 20- to 32-bp sites. of mismatches already present in the TALEN:DNA interac tion. Collectively, these results indicate that TALE repeats Engineering TALENs with Improved Specificity bind their respective DNA base pairs independently beyond a slightly increased tolerance for adjacent mismatches. The findings above suggest that TALEN specificity can be improved by reducing non-specific DNA binding energy Longer TALENs are Less Specific Per Recognized 10 beyond what is needed to enable efficient on-target cleavage. Base Pair The most widely used 63-aa C-terminal domain between the TALE repeat array and the FokI nuclease domain contains ten The independent binding of TALE repeats simplistically cationic residues. We speculated that reducing the cationic predicts that TALEN specificity per base pair is independent charge of the canonical TALE C-terminal domain would of target-site length. To experimentally characterize the rela 15 decrease non-specific DNA binding and improve TALEN tionship between TALE array length and off-target cleavage, specificity. we constructed TALENs targeting 10, 13, or 16 bps (includ We constructed two C-terminal domain variants in which ing the 5'T) for both the left (L10, L13, L16) and right (R.10, three (“Q3, K788Q, R792Q, R801Q) or seven (“Q7, R13, R16) half-sites. TALENs representing all nine possible K777Q, K778Q, K788Q, R789Q, R792Q, R793Q, R801Q) combinations of left and right CCR5B TALENs were sub cationic Argor Lys residues in the canonical 63-aa C-terminal jected to in vitro selection. The results revealed that shorter domain were mutated to Gln. We performed in vitro selec TALENs have greater specificity per targeted base pair than tions on CCR5A and ATMTALENs containing the canonical, longer TALENs (Table 3). For example, sequences cleaved by engineered Q3, and engineered Q7 C-terminal domains, as the L10+R10 TALEN contained a mean of 0.032 mutations well as a previously reported 28-aa truncated C-terminal per recognized base pair, while those cleaved by the L16+R16 25 domain with a theoretical net charge identical to that of the TALEN contained a mean of 0.067 mutations per recognized Q7 C-terminal domain (-1). base pair. For selections with the longest CCR5B TALENs The on-target sequence enrichment values for the CCR5A targeting 16+16 base pairs or CCR5A and ATM TALENs and ATM selections increased Substantially as the net charge targeting 18+18 bp, the mean selection enrichment values do of the C-terminal domain decreased (FIGS.5A and 5B). For not follow a simple exponential decrease as function of muta 30 example, the ATM selections resulted in on-target enrichment tion number (FIG. 4A and Table 8). values of 510, 50, and 20 for the Q7, Q3, and canonical 63-aa We hypothesized that excess binding energy from the C-terminal variants, respectively. These results suggest that larger number of TALE repeats in longer TALENs reduces the TALEN variants in which cationic residues in the C-ter specificity by enabling the cleavage of sequences with more minal domain have been partially replaced by neutral residues mutations, without a corresponding increase in the cleavage 35 or completely removed are substantially more specific in vitro of sequences with fewer mutations, because the latter are than the TALENs that containing the canonical 63-aa C-ter already nearly completely cleaved. Indeed, the in vitro cleav minal domain. Similarly, mutating one, two, or three cationic age efficiencies of discrete DNA sequences for these longer residues in the TALEN N-terminus to Glin also increased TALENs are independent of the presence of a small number cleavage specificity (Table 5, and FIGS. 8-11). of mutations in the target site (FIGS.5C-5F), suggesting there 40 In order to confirm the greater DNA cleavage specificity of is nearly complete binding and cleavage of sequences con Q7 over canonical 63-aa C-terminal domains in vitro, a rep taining few mutations Likewise, higher TALEN concentra resentative collection of 16 off-target DNA substrates were tions also result in decreased enrichment values of sequences digested in vitro with TALENs containing either canonical or with few mutations while increasing the enrichment values of engineered Q7 C-terminal domains. ATM and CCR5ATAL sequences with many mutations (Table 5). These results 45 ENs with the canonical 63-aa C-terminal domain TALEN together support a model in which excessive TALEN binding demonstrate comparable in vitro cleavage activity on target arising from either long TALE arrays or high TALEN con sites with zero, one, or two mutations (FIGS. 5C-5F). In centrations decreases observed TALEN DNA cleavage speci contrast, for 11 of the 16 off-target substrates tested, the ficity of each recognized base pair. engineered Q7 TALEN variants showed substantially higher 50 (~4-fold or greater) discrimination against off-target DNA Longer TALENs Induce Less Off-Target Cleavage in Substrates with one or two mutations than the canonical 63-aa a Genomic Context C-terminal domain TALENs, even though the Q7 TALENs cleaved their respective on-target sequences with comparable Although longer TALENs are more tolerant of mismatched or greater efficiency than TALENs with the canonical 63-aa sequences (FIG. 4A) than shorter TALENs, in the human 55 C-terminal domains (FIGS. 5C-5F). Overall, the discrete genome there are far fewer closely related off-target sites for cleavage assays are consistent with the selection results and a longer target site than for a shorter target site (FIG. 4B). indicate that TALENs with engineered Q7 C-terminal Since off-target site abundance and cleavage efficiency both domains are substantially more specific than TALENs with contribute to the number of off-target cleavage events in a canonical 63-aa C-terminal domains in vitro. genomic context, we calculated overall genome cleavage 60 specificity as a function of TALEN length by multiplying the Improved Specificity of Engineered TALENs in extrapolated mean enrichment value of mutant sequences of a Human Cells given length with the number of corresponding mutant sequences in the human genome. The decrease in potential To determine if the increased specificity of the engineered off-target site abundance resulting from the longer target site 65 TALENs observed in vitro also applies in human cells, length is large enough to outweigh the decrease in specificity TALEN-induced modification rates of the on-target and top per recognized base pair observed for longer TALENs (FIG. 36 predicted off-target sites were measured for CCR5A and US 9,359,599 B2 47 48 ATMTALENs containing all six possible combinations of the interact with the base pair adjacent to the spacer (targeted by canonical 63-aa, Q3, or Q7 C-terminal domains and the the most C-terminal TALE repeat) (FIGS. 10 and 11). To EL/KK or ELD/KKR FokI domains (12 TALENs total). compare the broad specificity profiles of canonical TALENs For both FokI variants, the TALENs with Q3 C-terminal with those containing engineered C-terminal or N-terminal domains demonstrate significant on-target activities ranging 5 domains, the specificity scores of each target base pair from from 8% to 24% modification, comparable to the activity of selections using CCR5A and ATMTALENs with the canoni TALENs with the canonical 63-aa C-terminal domains. TAL cal, Q3, or Q7 C-terminal domains and N1, N2, or N3 N-ter ENs with canonical 63-aa or Q3 C-terminal domains and the minal domains were subtracted by the corresponding speci ELD/KKR FokI domain are both more active in modifying ficity scores from selections on the canonical TALEN the CCR5A and ATM on-target site in cells than the corre 10 (canonical 63-aa C-terminal domain, wild-type N-terminal sponding TALENs with the Q7 C-terminal domain by ~5-fold domain). (FIG. 6 and Table 7). The results are shown in FIG. 15. Mutations in the C-ter Consistent with the improved specificity observed in vitro, minal domain that increase specificity did so most strongly in the engineered Q7 TALENs are more specific than the Q3 the middle and at the C-terminal end of each half-site. Like variants, which in turn are more specific than the canonical 15 wise, the specificity-increasing mutations in the N-terminus 63-aa C-terminal domain TALENs. Compared to the canoni tended to increase specificity most strongly at positions near cal 63-aa C-terminal domains, TALENs with Q3 C-terminal the TALENN-terminus (5' DNA end) although mutations in domains demonstrate a mean increase in cellular specificity the N-terminus of ATM TALEN targeting the right half-site (defined as the ratio of the cellular modification percentage did not significantly alter specificity. These results are con for on-target to off-target sites) of more than 13-fold and more sistent with a local binding compensation model in which than 9-fold for CCR5A and ATM sites, respectively, with the weaker binding at either terminus demands increased speci ELD/KKR FokI domain (Table 7). These mean improve ficity in the TALE repeats near this terminus. To characterize ments can only be expressed as lower limits due to the the effects of TALEN concentration on specificity, the speci absence or near-absence of observed cleavage events by the ficity scores from selections of ATM and CCR5A TALENs engineered TALENs for many off-target sequences. For the 25 performed at three different concentrations ranging from 3 most abundantly cleaved off-target site (CCR5A off-target nM to 16 nM were each subtracted by the specificity scores of site #5), the Q3 C-terminal domain is 34-fold more specific corresponding selections performed at the highest TALEN (FIG. 6), and the Q7 C-terminal domain is >1 16-fold more concentration assayed, 24 nM for ATM, or 32 nM for specific, than the canonical 63-aa C-terminal domain. CCR5A. The results (FIG. 15) indicate that specificity scores Together, these results reveal that for targeting the CCR5 30 increase fairly uniformly across the half-sites as the concen and ATM sequences, replacing the canonical 63-aa C-termi tration of TALEN is decreased. nal domain with the engineered Q3 C-terminal domain results in comparable activity for the on-target site in cells, a 34-fold DNA Spacer-Length and Cut-Site Preferences improvement in specificity in cells for the most readily cleaved off-target site, and a consistent increase in specificity 35 To assess the spacer-length preference of various TALEN for other off-target sites. When less activity is required, the architectures (C-terminal mutations, N-terminal mutations, engineered Q7 C-terminal domain offers additional gains in and FokI variants) and various TALEN concentrations, the specificity. enrichment values of library members with 10- to 24-base pairspacer lengths in each of the selections with CCR5A and Engineering N-Terminal Domains for Improved 40 ATM TALEN with various combinations of the canonical, TALEN DNA Cleavage Specificity Q3, Q7, or 28-aa C-terminal domains; N1, N2, or N3 N-ter minal mutations; and the EL/KK or ELD/KKR FokI variants The model of TALEN binding and specificity described at 4 nM to 32 nM CCR5A and ATMTALEN were calculated herein predicts that reducing excess TALEN binding energy (FIG. 16). All of the tested concentrations, N-terminal vari will increase TALEN DNA cleavage specificity. To further 45 ants, C-terminal variants, and FokI variants demonstrated a test this prediction and potentially further augment TALEN broad DNA spacer-length preference ranging from 14- to specificity, we mutated one (“N1, K150O), two (“N2, 24-base pairs with three notable exceptions. First, the K150O and K153Q), or three (“N3, K150O, K153Q, and CCR5A 28-aa C-terminal domain exhibited a much narrower R154CR) Lys or Arg residues to Gln in the N-terminal domain DNA spacer-length preference than the broader DNA spacer of TALENs targeting CCR5A and ATM. These N-terminal 50 length preference of the canonical C-terminal domain, con residues have been shown in previous studies to bind non sistent with previous reports. Second, the CCR5ATAL specifically to DNA, and mutations at these specific residues ENs containing Q7 C-terminal domains showed an increased to neutralize the cationic charge decrease non-specific DNA tolerance for 12-base spacers compared to the canonical binding energy. We hypothesized the reduction in non-spe C-terminal domain variant (FIG.16). This slightly broadened cific binding energy from these N-terminal mutations would 55 spacer-length preference may reflect greater conformational decrease excess TALEN binding energy resulting in flexibility in the Q7 C-terminal domain, perhaps resulting increased specificity. In vitro selections on these three from a smaller number of non-specific protein: DNA interac TALEN variants revealed that the less cationic N-terminal tions along the TALEN:DNA interface. Third, the ATMTAL TALENs indeed exhibit greater enrichment values of on ENs with Q7 C-terminal domains and the ATMTALENs with target cleavage (Table 5). 60 N3 mutant N-terminal domains showed a narrowed spacer preference. Effects of N-Terminal and C-Terminal Domains and These more specific TALENs (Table 5) with lower DNA TALEN Concentration on Specificity binding affinity may have faster off-rates that are competitive with the rate of cleavage of non-optimal DNA spacer lengths, All TALEN constructs tested specifically recognize the 65 altering the observed spacer-length preference. While previ intended base pair across both half-sites (FIGS. 8 to 13), ous reports have focused on the length of the TALEN C-ter except that some of the ATM TALENs do not specifically minal domain as a primary determinant of DNA spacer US 9,359,599 B2 49 50 length preference, these results suggest the net charge of the Our model and the resulting improved TALENs would have C-terminal domain as well as overall DNA-binding affinity been difficult to derive from cellular off-target cleavage meth can also affect TALEN spacer-length preference. ods, which are intrinsically limited by the small number of We also characterized the location of TALEN DNA cleav sequences closely related to a target sequence of interest that age within the spacer. We created histograms reporting the are present in a genome, or from SELEX experiments with number of spacer DNA bases observed preceding the right monomeric TALE repeat arrays, which do not measure DNA half-site in each of the sequences from the selections with cleavage activity and therefore does not characterize active, CCR5A and ATMTALEN with various combinations of the dimeric TALENs. In contrast, each TALEN in this study was canonical, Q3, Q7, or 28-aa C-terminal domains; N1, N2, or evaluated for its ability to cleave any of 10' close variants of N3 N-terminal mutations; and the EL/KK or ELD/KKR FokI 10 variants (FIG. 17). The peaks in the histogram were inter its on-target sequence, a library size several orders of magni preted to represent the most likely locations of DNA cleavage tude greater than the number of different sequences in a within the spacer. The cleavage positions are dependent on mammalian genome. This dense coverage of off-target the length of the DNA spacer between the TALEN binding sequence space enabled the elucidation of detailed relation half-sites, as might be expected from conformational con 15 ships between DNA-cleavage specificity and target base pair straints imposed by the TALEN C-terminal domain and DNA position, TALE repeat length, TALEN concentration, mis spacer lengths. match location, and engineered TALEN domain composi Discussion tion. The in vitro selection of 41 TALENs challenged with 10'’ closed related off-target sequences and Subsequent analysis Example 2 inform our understanding of TALENspecificity through four key findings: (i) TALENs are highly specific for their intended target base pair at all positions with specificity A number of TALENs were generated in which at least one increasing near the N-terminal TALEN end of each TALE cationic amino acid residue of the canonical N-terminal repeat array (corresponding to the 5' end of the bound DNA); 25 domain sequence was replaced with an amino acid residue (ii) longer TALENs are more specific in a genomic context that exhibits no charge or a negative charge at physiological while shorter TALENs have higher specificity per nucleotide; pH. The TALENs comprised substitutions of glycine (G) (iii) TALE repeats each bind their respective base pair rela and/or glutamine (Q) in their N-terminal domains (see FIG. tively independently; and (iv) excess DNA-binding affinity 18). An evaluation of the cutting preferences of the engi leads to increased TALEN activity against off-target sites and 30 neered TALENs demonstrated that mutations to glycine (G) therefore decreased specificity. are equivalent to glutamine (Q). Mutating the positively The observed decrease in specificity for TALENs with charged amino acids in the TALEN N-terminal domain more TALE repeats or more cationic residues in the C-termi (K150O, K153Q, and R154C) result in similar decreases in nal domain or N-terminus are consistent with a model in binding affinity and off-target cleavage for mutations to either which excess TALEN binding affinity leads to increased pro 35 miscuity. Excess binding energy could also explain the pre Q or G. For example, TALENs comprising the M3 and M4 viously reported promiscuity at the 5' terminal T of TALENs N-terminus, which comprises the same amino acid (R154) with longer C-terminal domains' and is also consistent with mutated to either Q or G, respectively, demonstrated roughly a report of higher TALEN protein concentrations resulting in equivalent amounts of cleavage. Similarly TALENs compris more off-target site cleavage in vivo. While decreasing 40 ing the M6 and M8 N-terminus, varying only in whether Q or TALEN protein expression in cells in theory could reduce G substitutions were introduced at positions K150 and R154, off-target cleavage, the Kd values of some TALEN constructs and TALENs comprising the M9 and M10 N-terminus, vary for their target DNA sequences are likely already comparable ing only in whether Q or G substitutions were introduced at to, or below, the theoretical minimum protein concentration positions K150, K153, and R154, showed similar cleavage in a human cell nucleus, -0.2 nM.' 45 activity. The difficulty of improving the specificity of such TALENs by lowering their expression levels, coupled with the need to Example 3 maintain sufficient TALEN concentrations to effect desired levels of on-target cleavage, highlight the value of engineer ing TALENs with higher intrinsic specificity such as those 50 A plasmid was generated for cloning and expression of described in this work. Our findings suggest that mutant engineered TALENs as provided herein. A map of the plas C-terminal domains with reduced non-specific DNA binding mid is shown in FIG. 19. The plasmid allows for the modular may be used to fine-tune the DNA-binding affinity of TAL cloning of N-terminal and C-terminal domains, e.g., engi ENs such that on-target sequences are cleaved efficiently but neered domains as provided herein, and for TALE repeats, with minimal excess binding energy, resulting in better dis 55 thus generating a recombinant nucleic acid encoding the crimination between on-target and off-target sites. Since desired engineered TALEN. The plasmid also encodes amino TALENs targeting up to 46 total base pairs have been shown acid tags, e.g., an N-terminal FLAG tag and a C-terminal V5 to be active in cells,' the results presented here are consistent tag, which can, optionally be utilized for purification or detec with the notion that specificity may be even further improved tion of the encoded TALEN. Use of these tags is optional and one of skill in the art will understand that the TALEN-encod by engineering TALENs with a combination of mutant N-ter 60 minal and C-terminal domains that impart reduced non-spe ing sequences will have to be cloned in-frame with the tag cific DNA binding, a greater number of TALE repeats to encoding sequences in order to result in a tagged TALEN contribute additional on-target DNA binding, and the more protein being encoded. specific (but lower-affinity) NK RVD to recognize G.''' An exemplary sequence of a as illustrated in Our study has identified more bona fide TALEN genomic 65 FIG. 19 is provided below. Those of skill in the art will off-target sites than other studies using methods such as understand that the sequence below is illustrative of an exem SELEX or integrase-deficient lentiviral vectors (IDLVs). plary embodiment and does not limit this disclosure.

US 9,359,599 B2 55 56 - Continued AAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCT

CCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACC

AGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTT

CTCCCTTCGGGAAGCGTGGCGCTTTCT CAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAA

GCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACC

CGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCT

ACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCC

AGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTT

GCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAG

TGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTA

AAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGG

CACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGG

GAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAAT

AAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTT

GCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTG

TCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTT

GTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGG

TTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACT CAACC

AAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACA

TAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGA

GATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGA

GCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCT

TTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATA

AACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTC 40 REFERENCES 11. Wood, A. J. et al. Targeted genome editing across species using ZFNs and TALENs. Science 333, 307 (2011). 1. Moscou, M.J. & Bogdanove, A.J. A simple cipher governs 12. Reyon, D. et al. FLASH assembly of TALENs for high DNA recognition by TAL effectors. Science 326, 1501 throughput genome editing. Nat Biotechnol 30, 460-465 (2009). 45 (2012). 2. Boch, J. etal. Breaking the code of DNA binding specificity 13. Mussolino, C. et al. A novel TALE nuclease scaffold of TAL-type III effectors. Science 326, 1509-1512 (2009). enables high genome editing activity in combination with 3. Doyon, Y. et al. Enhancing zinc-finger-nuclease activity low toxicity. Nucleic Acids Res 39,9283-9293 (2011). with improved obligate heterodimeric architectures. Nat 14. Pattanayak, V., Ramirez, C. L., Joung, J. K. & Liu, D. R. Methods 8,74-79 (2011). 50 Revealing off-target cleavage specificities of zinc-finger 4. Cade, L. et al. Highly efficient generation of heritable nucleases by in vitro selection. Nat Methods 8, 765-770 Zebrafish gene mutations using homo- and heterodimeric (2011). TALENs. Nucleic Acids Res 40, 8001-8010 (2012). 15. Li, T. et al. Modularly assembled designer TAL effector 5. Miller, J. C. et al. A TALE nuclease architecture for effi nucleases for targeted gene knockout and gene replace cient genome editing. Nat Biotechnol 29, 143-148 (2011). ment in eukaryotes. Nucleic Acids Res 39, 6315-6325 6. Bedell, V. M. et al. In vivo genome editing using a high 55 (2011). efficiency TALEN system. Nature 491, 114-118 (2012). 16. Ding, Q. et al. A TALEN Genome-Editing System for 7. Hockemeyer, D. et al. Genetic engineering of human pluri Generating Human StemCell-Based Disease Models. Cell potent cells using TALE nucleases. Nat Biotechnol 29, Stem Cell (2012). 731-734 (2011). 17. Lei, Y. etal. Efficient targeted gene disruption in Xenopus 8. Cermak, T. et al. Efficient design and assembly of custom 60 embryos using engineered transcription activator-like TALEN and other TAL effector-based constructs for DNA effector nucleases (TALENs). Proc Natl Acad Sci USA targeting. Nucleic Acids Res 39, e82 (2011). 109, 17484-17489 (2012). 9. Tesson, L. etal. Knockout rats generated by embryo micro 18. Kim, Y. et al. A library of TAL effector nucleases spanning injection of TALENs. Nat Biotechnol 29, 695-696 (2011). the human genome. Nat Biotechnol 31, 251-258 (2013). 10. Moore, F. E. et al. Improved somatic mutagenesis in 65 19. Dahlem, T. J. et al. Simple methods for generating and Zebrafish using transcription activator-like effector detecting locus-specific mutations induced with TALENs nucleases (TALENs). PLoS One 7, e37877 (2012). in the zebrafish genome. PLoS Genet 8, e1002861 (2012). US 9,359,599 B2 57 58 20. Osborn, M. J. et al. TALEN-based Gene Correction for All publications, patents, patent applications, publication, Epidermolysis Bullosa. Molecular Therapy (2013). and database entries (e.g., sequence database entries) men 21. Maul, G. G. & Deaven, L. Quantitative determination of tioned herein, e.g., in the Background, Summary, Detailed nuclear pore complexes in cycling cells with differing Description, Examples, and/or References sections, are DNA content. J Cell Biol 73,748-760 (1977). hereby incorporated by reference in their entirety as if each 22. Huang, B. et al. Counting low-copy number proteins in a individual publication, patent, patent application, publica single cell. Science 315, 81-84 (2007). tion, and database entry was specifically and individually 23. Beck, M. et al. The quantitative proteome of a human cell incorporated herein by reference. In case of conflict, the line. Mol Syst Biol 7,549 (2011). present application, including any definitions herein, will 24. Meckler, J. F. et al. Quantitative analysis of TALE-DNA 10 control. interactions Suggests polarity effects. Nucleic Acids Res (2013). EQUIVALENTS AND SCOPE 25. Christian, M. L. et al. Targeting G with TAL effectors: a comparison of activities of TALENs constructed with NN 15 Those skilled in the art will recognize, or be able to ascer and NK repeat variable di-residues. PLoS One 7, e45383 tain using no more than routine experimentation, many (2012). equivalents to the specific embodiments of the invention 26. Sander, J. D. et al. Abstraction of zinc finger nuclease described herein. The scope of the present invention is not cleavage profiles reveals an expanded landscape of off intended to be limited to the above description, but rather is as target mutations. Submitted (2013). set forth in the appended claims. 27. Witten, I. H. & Frank, E. Data mining: practical machine In the claims articles such as “a,” “an and “the may mean learning tools and techniques, Edn. 2nd. (Morgan Kauf one or more than one unless indicated to the contrary or man, San Francisco: 2005). otherwise evident from the context. Claims or descriptions 28. Kim, Y., Kweon, J. & Kim, J. S. TALENs and ZFNs are that include "orbetween one or more members of a group are associated with different mutation signatures. Nat Meth 25 considered satisfied if one, more than one, or all of the group ods 10, 185 (2013). members are presentin, employed in, or otherwise relevant to 29. McNaughton, B. R., Cronican, J. J., Thompson, D. B. & a given product or process unless indicated to the contrary or Liu, D. R. Mammalian cell penetration, siRNA transfec otherwise evident from the context. The invention includes tion, and DNA transfection by supercharged proteins. Proc embodiments in which exactly one member of the group is Natl Acad Sci USA 106, 6111-6116 (2009). 30 present in, employed in, or otherwise relevant to a given 30. Sun, N., Liang, J. Abil, Z. & Zhao, H. Optimized TAL product or process. The invention also includes embodiments effector nucleases (TALENs) for use in treatment of sickle in which more than one, or all of the group members are cell disease. Mol Biosyst 8, 1255-1263 (2012). present in, employed in, or otherwise relevant to a given 31. Cong, L., Zhou, R., Kuo, Y.C., Cunniff. M. & Zhang, F. product or process. Comprehensive interrogation of natural TALE DNA-bind 35 Furthermore, it is to be understood that the invention ing modules and transcriptional repressor domains. Nat encompasses all variations, combinations, and permutations Commun 3,968 (2012). in which one or more limitations, elements, clauses, descrip 32. Gabriel, R. et al. An unbiased genome-wide analysis of tive terms, etc., from one or more of the claims or from zinc-finger nuclease specificity. Nat Biotechnol 29, 816 40 relevant portions of the description is introduced into another 823 (2011). claim. For example, any claim that is dependent on another 33. Gao, H. Wu. X., Chai, J. & Han, Z. Crystal structure of a claim can be modified to include one or more limitations TALE protein reveals an extended N-terminal DNA bind found in any other claim that is dependent on the same base ing region. Cell Res 22, 1716-1720 (2012). claim. Furthermore, where the claims recite a composition, it 34. Li, T. et al. Modularly assembled designer TAL effector 45 is to be understood that methods of using the composition for nucleases for targeted gene knockout and gene replace any of the purposes disclosed herein are included, and meth ment in eukaryotes. Nucleic Acids Res 39, 6315-6325 ods of making the composition according to any of the meth (2011). ods of making disclosed herein or other methods known in the 35. Miller, J. C. et al. A TALE nuclease architecture for art are included, unless otherwise indicated or unless it would efficient genome editing. Nat Biotechnol 29, 143-148 50 be evident to one of ordinary skill in the art that a contradic (2011). tion or inconsistency would arise. 36. Mahfouz, M. M. et al. De novo-engineered transcription Where elements are presented as lists, e.g., in Markush activator-like effector (TALE) hybrid nuclease with novel group format, it is to be understood that each Subgroup of the DNA binding specificity creates double-strand breaks. elements is also disclosed, and any element(s) can be Proc Natl AcadSci USA 108, 2623-2628 (2011). 55 removed from the group. It is also noted that the term “com 37. Pattanayak, V., Ramirez, C. L., Joung, J. K. & Liu, D. R. prising is intended to be open and permits the inclusion of Revealing off-target cleavage specificities of zinc-finger additional elements or steps. It should be understood that, in nucleases by in vitro selection. Nat Methods 8, 765-770 general, where the invention, or aspects of the invention, (2011). is/are referred to as comprising particular elements, features, 38. Sander, J. D. et al. Abstraction of zinc finger nuclease 60 steps, etc., certain embodiments of the invention or aspects of cleavage profiles reveals an expanded landscape of off the invention consist, or consistessentially of Such elements, target mutations. Submitted (2013). features, steps, etc. For purposes of simplicity those embodi 39. Yan, T. et al. PatMatch: a program for finding patterns in ments have not been specifically set forth in haec verba peptide and nucleotide sequences. Nucleic Acids Res 33, herein. Thus for each embodiment of the invention that com W262-266 (2005). 65 prises one or more elements, features, steps, etc., the inven 40. Larkin, M. A. et al. Clustal W and Clustal X version 2.0. tion also provides embodiments that consist or consist essen Bioinformatics 23, 2947-2948 (2007). tially of those elements, features, steps, etc. US 9,359,599 B2 59 60 Where ranges are given, endpoints are included. Further In addition, it is to be understood that any particular more, it is to be understood that unless otherwise indicated or embodiment of the present invention may be explicitly otherwise evident from the context and/or the understanding excluded from any one or more of the claims. Where ranges of one of ordinary skill in the art, values that are expressed as are given, any value within the range may explicitly be ranges can assume any specific value within the stated ranges in different embodiments of the invention, to the tenth of the excluded from any one or more of the claims. Any embodi unit of the lower limit of the range, unless the context clearly ment, element, feature, application, or aspect of the compo dictates otherwise. It is also to be understood that unless sitions and/or methods of the invention, can be excluded from otherwise indicated or otherwise evident from the context any one or more claims. For purposes of brevity, all of the and/or the understanding of one of ordinary skill in the art, 10 embodiments in which one or more elements, features, pur values expressed as ranges can assume any Subrange within poses, or aspects is excluded are not set forth explicitly the given range, wherein the endpoints of the Subrange are herein. expressed to the same degree of accuracy as the tenth of the unit of the lower limit of the range. TABLES TABLE 2

TALEN constructs and concentrations used in the selections. Target Left + Right Site N-terminal C-terminal FokI TALEN Selection name site half-site length domain domain domain conc. (nM) A.

CCRSA 32M CCR5A L18 - R18 36 canonical Canonical ELKK 32 canonical CCR5A16 nM CCR5A L18 - R18 36 canonical Canonical ELKK 16 canonical (or CCR5A32 canonical) CCRSA 8 nM CCR5A L18 - R18 36 canonical Canonical ELKK 8 canonical CCRSA 4 nM CCR5A L18 - R18 36 canonical Canonical ELKK 4 canonical CCR5A Q3 CCR5A L18 - R18 36 canonical Q3 ELKK 6 CCR5A32nM Q7 CCRSA L18 - R18 36 canonical Q7 ELKK 32 CCR5A 16 nM Q7 CCR5A L18 - R18 36 canonical Q7 ELKK 6

CCR5A 8 nM Q7 CCR5A L18 - R18 36 canonical Q7 ELKK 8 CCR5A 4 nM Q7 CCR5A L18 - R18 36 canonical Q7 ELKK 4 CCR5A 26-aa. CCR5A L18 - R18 36 canonical 26-aa ELKK 6 CCRSAN 1 CCR5A L18 - R18 36 N1 Canonical ELKK 6 CCRSAN2 CCR5A L18 - R18 36 N2 Canonical ELKK 6 CCRSAN3 CCR5A L18 - R18 36 N3 Canonical ELKK 6 CCR5A canonical CCR5A L18 - R18 36 canonical Canonical ELDKKR 6 ELDKKR CCR5A Q3 ELD/KKR CCR5A L18 + R18 36 canonical Q3 ELDKKR 6 CCR5A Q7 ELD/KKR CCR5A L18 + R18 36 canonical Q7 ELDKKR 6 CCR5AN2 ELDKKR CCR5A L18 + R18 36 N2 Canonical ELDKKR 6 B

ATM32nM ATM L18 - R18 36 canonical Canonical ELKK 24 canonical ATM16 nM ATM L18 - R18 36 canonical Canonical ELKK 12 canonical (or ATM canonical) ATM 8nM ATM L18 - R18 36 canonical Canonical ELKK 6 canonical ATM 4 nM ATM L18 - R18 36 canonical Canonical ELKK 3 canonical ATM Q3 ATM L18 - R18 36 canonical Q3 ELKK 2 ATM 32 nM Q7 ATM L18 - R18 36 canonical Q7 ELKK 24 ATM 16 nM Q7 ATM L18 - R18 36 canonical Q7 ELKK 2 (or ATM Q7) ATM 6 nM Q7 ATM L18 - R18 36 canonical Q7 ELKK 6 ATM 4 nM Q7 ATM L18 - R18 36 canonical Q7 ELKK 3 ATM 26-aa. ATM L18 - R18 36 canonical 26aa ELKK 2 ATMN1 ATM L18 - R18 36 N1 Canonical ELKK 2 ATMN2 ATM L18 - R18 36 N2 Canonical ELKK 2 ATMN3 ATM L18 - R18 36 N3 Canonical ELKK 2 ATM canonical ATM L18 - R18 36 canonical Canonical ELDKKR 2 ELDKKR ATM Q3 ELD/KKR ATM L18 - R18 36 canonical Q3 ELDKKR 2 ATM Q7 ELD/KKR ATM L18 - R18 36 canonical Q7 ELDKKR 2 ATMN2 ELD, KKR ATM L18 - R18 36 N2 Canonical ELDKKR 2 US 9,359,599 B2 61 62 TABLE 2-continued

TALEN constructs and concentrations used in the selections. Target Left + Right Site N-terminal C-terminal FokI TALEN Selection name site half-site length domain domain domain conc. (nM) C

L16 - R16 CCR5B CCR5B L16 - R16 32 canonical Canonical ELKK 10 L16 - R13 CCRSB CCRSB L16 - R13 29 canonical Canonical ELKK 10 L16 - R1O CCR5B CCR5B L16 - R1O 26 canonical Canonical ELKK 10 L13 - R16 CCRSB CCRSB L13 - R16 29 canonical Canonical ELKK 10 L13 - R13 CCRSB CCRSB L13 - R13 26 canonical Canonical ELKK 10 L13 - R1O CCRSB CCRSB L13 - R1O 23 canonical Canonical ELKK 10 L10 - R16 CCR5B CCR5B L1 O - R16 26 canonical Canonical ELKK 10 L10 - R13 CCRSB CCRSB L1 O - R13 23 canonical Canonical ELKK 10 L10 - R1O CCRSB CCRSB L1 O - R1O 20 canonical Canonical ELKK 10

For each selection using TALENs targeting the CCR5A target sequence (A), ATM target sequence (B) and CCR5B target sequence (C), the selection name, the targetDNA site, the TALENN-terminal domain, the TALEN C-terminal domain, the TALEN FokIdomain, and the TALEN concentration (conc.) are shown.

TABLE 3 Statistics of sequences selected by TALEN digestion. Seq. Mean Stdev P-value P-value Selection name count mut. mut. Mut./bp vs. library vs. other TALENs A.

CCRSA 32M S3883 4.327 1463 0.120 3.3E-10 vs. CCR5A canonical canonical ELDKKR = 0.260 CCRSA 16 nM 2894O 4.061 1436 O. 113 5.4E-10 vs. CCR5A Q3 canonical ELDKKR = 0.026 CCRSA 8 nM 29568 3.751 1.394 O.104 3.3E-10 canonical CCRSA 4 nM 343SS 3.347 1.355 O.093 15E-10 canonical CCR5A Q3 S1694 3.841 1380 0.107 CCR5A32nM Q7 48473 2.718 1.197 0.076 CCR5A 16 nM Q7 56593 2.559 1.154 0.071 CCR5A 8 nM Q7 43895 2.303 1.157 0.064 CCR5A 4 nM Q7 43737 2.018 1.234 0.056 CCRSA 28-aa. 47395 2.614 1203 0.073 CCRSAN 1 642S7 3.721 1379 0.103 wS. CCRSA 8 nM canonical = 0.039 CCRSAN2 4S467 3.148 1.306 0.087 CCRSAN3 24O64 2.474 1493 O.O69 CCRSA 46998 4.336 1491 (0.120 canonical ELDKKR CCR5A Q3 S6978 4.098 1.415 0.114 22E-10 ELDKKR CCR5A Q7 S4903 3.234 1330 O.O90 7.3E ELDKKR CCRSAN2 79632 3.286 1.341 0.091 5.2E ELDKKR B

ATM 24 nM 89571 3.262 1.36O 0.091 6.54E- vs. ATM canonical canonical ELDKKR = 0.012 ATM 12 nM 96.703 3.181 1.307 O.O88 S.36E canonical (or ATM canonical) ATM 6 nM 786S2 2.736 1.259 O.O76 3.63E canonical ATM3 nM 82527 2.552 1.258 0.071 2.71E canonical ATM Q3 96.582 2.SS1 1.248 0.071 2.31E- wS. ATM 4 nM canonical = 0.222 ATM 24 nM Q7 10166 1885 2.125 O.OS2 2.06E-10 ATM 12 nM Q7 4662. 1626 2.083 0.045 5.31E-10 (or ATM Q7) ATM 6 nM Q7 129O 1700 2.376 O.047 7.16E-09 wS. ATM 16 nM Q7 = 0.035 ATMN1 844O2 2.627 1318 0.073 2.92E-11 ATMN2 62470 2.317 1516 0.064 2.69E-11 ATMN3 16OS 2.720 2.363 0.076 2.69E-08 US 9,359,599 B2 63 TABLE 3-continued Statistics of Sequences Selected by TALEN digestion. Seq. Mean Stdev P-value P-value Selection name count mut. mut. Mut./bp vs. library vs. other TALENs ATM 10797O 3.279 1.329 O.O91 5.48E canonical ELDKKR ATM Q3 104099 2.846 1244 O.O79 3.15E ELDKKR ATM Q7 ELD/KKR 21108 1.444 1.56 0.040 3.02 ATMN2 ELD, KKR 701.85 2.45 1444 OO68OS 2.82 C

6 - R16 CCRSB 34904 2.134 1.168 OO67 6 - R13 CCRSB 38229 1581 1.142 O.OSS 6 - R1O CCRSB 378O1 1.187 0.949 O.O46 3 - R16 CCRSB 46608 1. SOS 1.090 O.OS2 3 - R13 CCRSB S3973 0.996 1.025 O.O38 3 - R1O CCRSB 60SSO 0.737 0.684 O.O32 O - R16 CCRSB 36927 1.387 0.971 O.OS3 O - R13 CCRSB S817O O.839 O.882 0.036 9. O - R1O CCRSB S7331 O.646 0.779 0.032 1.OE

Statistics are shown for each TALEN selection on the CCR5A target sequence (A), ATM target sequence (B), and CCR5B target sequences (C), Seq counts; total counts of high-throughput sequenced and computationally filtered selection sequences, Meanmut.: mean mutations in selected sequences. Stdev, mut.: standard deviation of mutations in selected sequences. Mut, bp: mean mutation normalized to target site length (bp). P-value vs. library: P-values between the TALEN selection sequence distributions to the corresponding pre-selection library sequence distribu tions (Supplementary Table 4) were determined as previously reported. 5 P-value vs. other TALENs: all pair-wise comparisons between all TALEN digestions were calculated and P-values between 0.01 and 0.5 are shown. Note that for the 3 nMQ7 ATM and the 28-aa ATM selection not enough sequences were obtained to interpret, although these selections were performed.

TABLE 4 Statistics of sequences from pre-selection libraries. Target Left + Right Site Seq. Mean Stdev Library name site half-site length count mut. mut. Mut.fbp CCR5A Library CCR5A L18 - R18 36 158643 7.539 2.475 0.209 ATM Library ATM L18 - R18 36 212661 6.820 2.327 O.189 CCR5B Library CCR5B L16 - R16 32 28O223 6.SOO 2441 0.2O3 CCR5B Library CCRSB L16 - R13 29 28O223 S.914 2.336 0.204 CCR5B Library CCR5B L16 - R1O 26 28O223 S.273 2.218 O.2O3 CCR5B Library CCRSB L13 - R16 29 28O223 S.969 2.340 O2O6 CCR5B Library CCRSB L13 - R13 26 28O223 S.383 2.230 O.2O7 CCR5B Library CCRSB L13 - R1O 23 28O2.23 4.742 2-106 0.2O6 CCR5B Library CCR5B L1 O - R16 26 28O223 S.396 2.217 O.208 CCR5B Library CCRSB L1 O - R13 23 28O2.23 4810 2100 0.209 CCR5B Library CCRSB L1 O - R1O 2O 28O223 4.169 1971 O.208 For each preselection library containing a distribution of mutant sequences of the CCR5A target sequence, ATM target sequence and CCR5B target sequences, Seq counts: total counts of high-throughput sequenced and the computationally filtered selection sequences. Mean mut.: mean mutations of sequences. Stdev, mut.: standard deviation of sequences, Mut bp: mean mutation normalized to target site length (bp),

TABLE 5 Enrichment values of sequences as a function of number of mutations. Enrichment value

Selection O Mut. 1 Mut. 2 Mut. 3 Mut. 4 Mut. S Mut. 6 Mut. 7 Mut. 8 Mut.

A.

CCRSA 32M 9.879 9.191 8.335 6.149 4.205 2.269 1.OOS O.325 O.O85 canonical CCRSA 16 nM 12.182 13.2OO 10.322 7.195 4.442 2.127 O.748 O.216 O.OS2 canonical CCRSA 8 nM 19.673 17.935 13.731 8. SOS 4.512 1756 O.S31 O-116 O.O28 canonical CCRSA 4 nM 36.737 29.407 19.224 9.958 4.047 1242 O.302 0.058 O.O14 canonical CCR5A Q3 18.5SO 16.466 12.024 8.07O 4.632 1.938 O.S72 O.126 O.O26 CCR5A32nM Q7 60.583 S4.117 31.08.2 11.031 2.640 O469 O.O73 O.O13 O.OO6 CCR5A 16 nM Q7 62.294 64.689 35.036 10.538 2.163 O.322 O.046 O.O10 O.OO6 CCR5A 8 nM Q7 97.02O 91.633 38.634 8.974 1485 O.189 O.O29 O.O10 O.007 US 9,359,599 B2 65 66 TABLE 5-continued Enrichment values of Sequences as a function of number of mutations. Enrichment value

Selection O Mut. 1 Mut. 2 Mut. 3 Mut. 4 Mut. 5 Mut. 6 Mut. 7 Mut. 8 Mut. CCR5A 4 nM Q7 197239 130.497 38.361 6.535 O896 O.12O O.O2S O.019 CCRSA 28-aa. 70.441 62.213 33.481 10.486 2.317 O4O2 O.064 O.O12 CCRSAN 1 19.038 16.052 13.858 8.788 4.546 1.697 O499 O.115 CCRSAN2 41.715 35.752 22.638 10.424 3.777 O.989 O.194 O.O38 CCRSAN3 173.897 86.392 31.503 8.770 1853 O.3SO O.O89 O.036 CCRSA 8.101 10.012 8.22O 6.147 4.1.19 2.291 1.019 O.330 canonical ELDKKR CCR5A Q3 14.664 12.975 9.409 6.819 4.544 2.235 0.797 O.198 ELDKKR CCR5A Q7 37.435 32.922 21.033 10.397 3.867 1.087 O.238 O.046 ELDKKR CCRSAN2 35.860 31.459 20.13S 10.1.89 3.983 1155 O.260 O.OSO ELDKKR

ATM 24 nM 19.900 16.881 2.162 6.318 2.629 O.884 O.226 0.057 canonical ATM2 nM 20.472 17645 2.724 6.549 2.606 O.803 O. 189 O.007 canonical ATM 6 nM 41.141 29.522 7.153 1872 O431 O.062 canonical ATM3 nM 56.152 37.152 8.530 6.196 1.562 O.308 canonical ATM Q3 SO 403 36.687 9.031 6.245 1.513 O.294 0.057 O.O16 O.O10 ATM 24 nM Q7 353.148 90.3SO 3.475 1531 O.186 O.128 O. 116 O.118 O. 103 ATM 12 nM Q7 513.385 89.962 1310 O.860 O.190 O.093 O. 115 O.092 O. 111 ATM 6 nM Q7 644.427 82.074 7.650 0.677 O.170 O.2OS O.163 O.164 O.O71 ATMN1 57.218 35.388 7.808 6.124 1.644 O.383 O.O76 O.O23 O.O11 ATMN2 119.240 53-618 8.977 4.742 O.992 O.233 O.O76 O.044 O.O37 ATMN3 2O1.158 55.468 5.244 3.187 O.764 0.307 O.154 0.173 0.267 ATM 19356 15.692 1855 6.403 2.706 O.899 O.224 O.OS4 O.O11 canonical E LDKKR ATM Q3 32.816 25.151 6.172 6.727 2.095 O.S.06 O.095 O.018 O.004 LDKKR ATM Q7 447.509 93.166 3.505 1.543 O.170 O.049 O.045 LDKKR ATMN2 90.625 45.525 8.683 5.369 1.267 O.274 0.075 E LDKKR C

L16 - R16 CCR5B 59.422 35.499 3.719 3.770 L16 - R13 CCRSB 80.852 31434 7.754 1.380 L16 - R1O CCR5B 64.944 2O.OS6 3.867 0.515 L13 - R16 CCRSB 101.929 34.2SS 8.131 1.299 L13 - R13 CCRSB 113102 22.582 3.037 O.315 L13 - R1O CCRSB 74.085 11483 1.270 O.121 L1O - R16 CCR5B 60.186 22.393 S.286 0.777 L10 - R13 CCRSB 74.204 13.696 1673 O.152 L10 - R1O CCRSB 43.983 7.018 O.740 O.061

For each TALENselection on the CCR5A target sequence (A), ATM target sequence (B) and CCR5B target sequence (C), enrichment values calculated by dividing the fractional abundance of post-selection sequences from a TALEN digestion by the fractional abundance of pre-selection sequences as a function of total mutations (Mut) in the half-sites,

TABLE 6 Predicted off-target sites in the human genome.

A.

CCR5A Spacer Site Score Mut. Left half-site lenght Right half-site Gene

OnCCR5A O. OO8 O TTCATTACACCTGCAGCT 18 AGTATCAATTCTGGAAGA CCR5

Off C-1 Of 47 9 TaCATCACAtaTGCAaat 29 tdTATCAt TTCTGGgAGA ARL17A & LRRC37A

Off C-2 Of 47 9 TaCATCACAtaTGCAaaT 29 tdTATCAt TTCTGGgAGA ARL17A & US 9,359,599 B2 67 68 TABLE 6 - Continued Predicted off-target sites in the human genome. Off C-3 Of 47 TaCATCACAtaTGCAaat 29 toTATCAt TTCTGGgAGA ARL17A & LRRC37A

Off C-4 Of 47 cCATaaCACaTottt CT 10 tGcATCAtTcCTGGAAGA ZSCANSA

Off C-5 O. 804 TcCAaTACct CTGCCaCa 14 AGgAgCAAcTCTGGgAGA

Off C-6 O. 818 (TCAgTcCAtCTGaAaac 16 gGTATCAt TTCTGGAgGA KL

OffC-7 O. 834 aCAaaACc CitTGCcaaa. 27 taTATCAATTtgGGgAGA

Off C-8 O. 837 cCAagACACCTGCttac 26 to TATCAATTtgGGgAGA

Off C-9 O874 TTCATaACAt CTtaAaaT 27 AaTAcCAAcTCTGGAt GA. ZEB2

Off C-10 cCAaaACAt CTGaAaaT 25 tdgATCAAaTtgGGAAGA

Off C-11 O. 896 (TCAgaACACaTGactac 21 tdTATCAgTTaTGGAtGA GABPA

Off C-12 O. 904 TcCATaAtAt CTt Cdt CT 28 gGgATtAATTtgGGAgGA

Off C-13 O. 905 TgCAaTAtACCTGttgat 16 citcATCAATTCTGGgtoA

Off C-14 O 906 TTCATaaCACtccacctT 16 gGTATCAAaTCTGGggGA SYN3

Off C-15 O 906 TTCAgaACACaTGactac 26 gCTATCtATcCTGGAAtA SPOCK3

Off C-16 O 906 TTCCTTcCACCagtgtcc 28 AGCATCAATCCTGGAAGA

Off C-17 O. 907 TTaaTaaCAt CTCCAaCT 24 gGcAcCAAaTCTGGAtGA ATP13A5

Off C-18 O. 909 TcCATCACCCCTC CotCc 10 gGTgcCAgcTCTGGAgGA

Off C-19 O. 909 TTCATTACtCCTCCtt CT 3 O ctTATCAcTTtTGGAAGA

Off C-2O O. 912 TgCATTACACaTtatdtg 17 AGcAgCAcTTCTGGAAGA

Off C-21 O. 913 TTCAaaACACaTaCAt CT 28 AacAaCAtTcCTGtAAGA PRKAG2

Off C-22 O. 913 TcCATTACcaGTGCAGat 25 gacATCAgTTaTGGAtGA

Off C-23 O. 925 TTC cagACcCCTt Cotca 13 gacATCAAaTCTGGgAGA

Off C-24 O. 927 TTC calaACACCCGCtt Co 26 taTATCctTTCTGGAAtA

Off C-25 gaAaTACACCTGCctaT 13 gGccTCAAgg CTGGAtGA IL15

Off C-26 gCoaaACctCTGtcaCo 22 AGgATCAcTTCTGGAAGA

Off C-27 O. 931 gCoaaACctCTGtcaCo 22 AGgATCAcTTCTGGAAGA

Off C-28 O. 931 TTtATTACACtTcCAGat 19 gaTATCctTTCTGGAAGA ADIPOR2

Off C-29 O. 932 CaCAaaAaACtTt CtGag 27 tdTATCAATTtgGGgAGA FBXL17

Off C-3 O O. 932 cCAaaACACCCaCAGac 19 gGTATagATTgTGGAAGA ZNF365

Off C-31 O. 93 4 TTCATTcCACaTc Cocac 25 gtTATCAACatgGGAAGA MYO18B

Off C-32 O. 93 4 (TCAaTAtgCCaaCAGCT 11 AGctTCAATctgCGAgGA

Off C-33 O. 93 4 TTCAaTACACtTG totaT 12 toTgTCAt TTCTGGgttA

Off C-34 O. 935 TCAacACACCTt CAaaa. 12 td.TgTCAt TaaTGGAAGA

Off C-35 O. 935 TCAaaACAt CTGacat 10 AaTAgaAATTCTGGAAGA

Off C-36 O. 935 cTCcTaAtACCTGCAaat 21 CATtAt TTCTGGAggA

ATM Site Score Mult. Left half-site Spacer Right half-site Gene

On ATM O. OOO TGAATTGGGATGCTGTTT 18 TTTATTTTACTGTCTTTA ATM

OffA-1 O. 595 TGAATaGGaAata TaTTT TTTATTTTACTGTt TTTA

OffA-2 O. 697 TGgATTcagATaCTcTTT 10 TTTATTTTtt Ta Tt TTTA US 9,359,599 B2 69 70 TABLE 6 - continued Predicted off-target sites in the human genome. OffA-3 O. 697 9 TGgATTcagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-4 O. 697 9 TGgATTcagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-5 O. 697 9 TGgATTcagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-6 O. 697 9 TGgATTcagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-7 O. 697 9 TGgATTcagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-8 Of 8 TGcATaGGaATGCTaaTT O TTTATTTTACT'a TtTaTA MGAT4C

OffA-9 O. 708 10 TGAATTaaa.A.TcCTGcTT 9 gTTATaTgACTaTtTTTA BRCA2

OffA-1 O O. 711 10 TccATTaaa.A.TaCTaTTT 8 TTTATTTTAt TaTtTTTA CPNE4

OffA-11 O. 715 10 TGAATTGaGAgaagcaTT 6 TTTATTTTAt Ta Tt TTTA

OffA-12 O. 725 10 TGAAgTGGGATaCTGTTa 29 ggTATaTTAtaa TtTTTA

OffA-13 O. 729 9 TGAATTatGAaGCTacTT 7 TTTATTgTAaTaTtTTTA NAALADL2

OffA-14 O. 731 9 TGAATTatGAaGCTacTT 25 TTTATTTatt TaTtTTTA

OffA-15 O. 744 10 TGAATgGGGAcacago.ca 29 TTTATTTTAt Ta Tt TTTA

OffA-16 O. 752 9 TaAATgGaaATGCTGTTc 24 a TTATTTTAt TTT't

OffA-17 O. 761 9 gCAAaTGGGATaCTGagT 15 TTTATgTTACTaTtTcTA

OffA-18 O. 781 1 TGgATcGaagTGaTtaTT 23 TTTATTTTAt TaTtTTTA. CIDEC

OffA-19 O. 792 1 TGAATTGaGATtcacago 23 TTTATTTTtt Ta Tt TTTA

OffA-2O O. 8O3 8 TGAATTaGGAat CTGaTT O TTTATTTTAt TaTtaTTA. THSD7B

OffA-21 O. 807 2 TaAATTaaaATaCTccag 23 a TTATTTTAaTGTtTTTA ARID1B

OffA-22 O. 811 O TGAATaGGaAT at TcTTT 2 TTTATTTatt TaTtTTTA

OffA-23 O. 811 9 TagATTGaaATGCTGTTT 5 TTTtTaTTAt TaTtTTTA KLHL4

OffA-24 O. 816 O TGAcTaGaaATGaTGaTT 25 TTTATTTTct Ta Tt TTTA

OffA-25 O 817 2 TGAATT taalAaaaTGTcc 3 a TTATTTTAt TaTtTTTA

OffA-26 O 817 2 TGAATT taalAaaaTGTcc 3 a TTATTTTAt TaTtTTTA

OffA-27 O 817 o TGgATccagATaCTcTTT O TTTATTTTtt Ta Tt TTTA

OffA-28 O. 819 7 TGgAgTGaGAT ccTGTTT 21 TTTATTTTAtTGTtaTTA

OffA-29 O. 824 8 TGAACTtGGATGaTaTaT 24 TTTATTTgAt TaTCTTTA

OffA-3 O O. 832 9 TGtATTGGGATaCoalTT 26 TCTATTTTAt Ta Tt TTT't

OffA-31 O. 833 9 TcAATTGGGATGaTcaTa. 23 TTTATTTAt Tt Tt TTTA

OffA-32 O.835 9 TGAAagGGaAagtTGgaT 23 TTTATTTTAC TaTt TTTA

OffA-33 O. 841 9 TGgtTTGGGAT ccTGTgt 27 TTTATgTTtt TaTtTTTA PTCHD2

OffA-34 O. 841 9 TGAAaTGGGATGagcTTg 28 TTTATTTTAt TaTtTTaa

OffA-35 O. 844 10 TGAATTGGGATaCTGTag 29 cTTAaaTaAaTaTtTTTA. ST6 GALNAC3

OffA-36 O. 844 10 TGAATTGtogTatTGccT 18 TTTATggTttTGTCTTTA

(A) Using a machine learning classifier" algorithm trained on the output of the in vitro CCR5A. TALEN Selection, 6 mutant sequences of the target Site allowing for spacer lengths of 10 to 30 base pairs were scored. The resulting 36 predicted off-targets sites with the best scores for the CCR5A. TALENs are shown with classifier scores, mutation numbers, left and right half-site sequences (mutations from on-target in lower Case), the length of the spacer between half-sites in base pairs, and the gene (including introns) in which the predicted off-target Sites occurs, if it lies within a gene. (B) Same as (A) for ATM TALENs. Sequences Correspond to SEQ ID NOS: 44, 169-204 (left half-site Column of Table 6A); SEQ ID NOS: 46, 205-240 (right half-site column of Table 6A); SEQ ID NOS : 128, 242 – 276 (left half-site column of Table 6B) ; and SEQ ID NOS : 137, 277-312 (right half-site column of Table 6B).

US 9,359,599 B2 75 76 TABLE 7-continued Cellular modification induced by TALENs at On-target and predicted off-target genomic sites. C-terminal domain

No TALEN Q7 Q7 Q3 Q3 Canonical Canonical Canonical FokI domain No TALEN ELKK ELDKKR ELKK ELDKKR ELKK ELDKKR Homo Off-21

indels O O O O O O O O Total 343O2 27573 31694 24451 2S826 27.192 1811O 21161 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity Off-22

indels 1 O O O O O O O Tota 81.037 86687 74274 79004 93477 92089 75359 104.857 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity Off-23

indels O O O O O O O O Tota 18812 19337 23O34 2S603 25O23 28615 17172 21033 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity Off-24

indels O O 1 O O O O 1 Tota 23.538 21673 24594 27687 18343 291.13 21709 2661O Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffC-25

indels O O O O O O O O Tota 28941 2S326 25871 10641 21422 20171 18946 18711 Modified <0.006% <0.006% <0.006% <0.009% <0.006% <0.006% <0.006% <0.006% P-value Specificity Off-26

indels O O 1 O O O O O Tota 71831 484.94 626SO 458O1 6O175 65137 28795 64632 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffC-27

indels O O O O O O O O Tota 1218.1 2423 11258 7188 S126 4003 2116 4503 % Modified <0.008% <0.04.1% <0.009% <0.014% <0.020% <0.025% <0.047% <0.022% P-value Specificity Off-28

indels O O O O 6 1 12 5 Tota 10651 6410 16179 1398O 13022 7232 7379 8998 % Modified <0.009% <0.016% <0.006% 131 >835 >1187 526 712 170 843 Off-29

indels O O O O O O O O Tota 4262 3766 4228 6960 3234 1516 2466 1810 % Modified <0.023% <0.027% <0.024% <0.014% <0.031% <0.066% <0.04.1% <0.055% P-value Specificity Off-30

indels O O O O O O O O Tota 11840 12257 96.17 34097 20507 5029 22248 6285 % Modified <0.008% <0.008% <0.010% <0.006% <0.006% <0.020% <0.006% <0.016% P-value Specificity US 9,359,599 B2 77 78 TABLE 7-continued Cellular modification induced by TALENs at On-target and predicted off-target genomic sites. C-terminal domain

No TALEN Q7 Q7 Q3 Q3 Canonical Canonical Canonical FokI domain No TALEN ELKK ELDKKR ELKK ELDKKR ELKK ELDKKR Homo Off-31

indels O O O O O O O O Tota 64522 67791 50O85 5.0056 S6241 48287 72230 100410 % Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity Off-32

indels O O O O O O O O Tota 1944 6888 9330 32O7 4591 6699 13607 1911S % Modified <0.05.1% <0.015% <0.01.1% <0.031% <0.022% <0.015% <0.007% <0.006% P-value Specificity Off-33

indels O O O O O O O O Tota 34.475 27039 18547 33467 15745 17075 4 18844 % Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <25.000% <0.006% P-value Specificity Off-34

indels O O O O O O O O Tota 9052 18858 13647 11796 6945 6114 4979 9072 % Modified <0.01.1% <0.006%

indels O O O O O O O O Tota 23839 22290 25133 24190 10 10459 22554 11897 % Modified <0.006% <0.006% <0.006% <0.006% <10.000% <0.010% <0.006% <0.008% P-value Specificity Off-36

indels 1 O O 1 2 1 19 5 Tota 23412 24394 23427 24132 19723 28369 12461 18052 Modified <0.006% <0.006% <0.006% <0.006% <0.01.0% <0.006% O.1.52% O.O.28% P-value 2.6E-05 Specificity >307 >835 >1274 2392 >1476 181 1690 B ATM Sites On-A

indels 3 O 46 104 309 1289 410 909 Tota 6886 1869 252O 1198 1808 19025 2533 5003 Modified O.03% O.00% 1.83% 8.68% 17.09% 6.78% 16.19% 18.17% P-value O 2.2E-11 3.2E-26 4.9E-81 6.4E-276 4SE-105 15E-228 Specificity OffA-1

indels O O 1 O 1 O 13 34 Tota S2490 45383 341.9S 32325 47589 39704 SO349 44056 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% O.OO26% 0.077% P-value 3.1E-04 S.SE-09 Specificity >O >274 >1302 >2564 >1016 627 235 OffA-2

indels O O O O O O O O Tota 6777 11846 11362 12273 2O704 3776 5650 5025 Modified <0.01.1% <0.006% <0.009% <0.008% <0.006% <0.026% <0.018% <0.020% P-value Specificity OffA-3

indels O O O O 1 O O O Total 47338 14352 21253 17777 26512 19483 43728 29469 Modified <0.006%

No TALEN Q7 Q7 Q3 Q3 Canonical Canonical Canonical FokI domain No TALEN ELKK ELDKKR ELKK ELDKKR ELKK ELDKKR Homo OffA-4

indels O O O O O O O O Total 12292 532 1383 2597 861 2598 1356 3573 Modified <0.008% <0.188% <0.072% <0.039% <0.1.16% <0.038% O.O74% <0.028% P-value Specificity OffA-5

indels O O O O O O O O Tota 60859 22846 25573 19054 25315 31754 666.22 60925 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffA-6

indels O O O O O O O O Tota 60859 22846 25573 19054 25315 31754 666.22 60925 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffA-7

indels O O O O O O O O Tota 60859 22846 25573 19054 25315 31754 666.22 60925 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffA-8

indels O O O O O O O O Tota 9170 1614 S934 3215 24SO 12750 1012O 130O3 Modified <0.01.1% <0.062% <0.017% <0.031% <0.04.1% <0.008% <0.010% <0.008% P-value Specificity OffA-9

indels O O O O O O O 3 Tota 8753 12766 9SO4 1.0114 11086 10676 9013 11110 Modified <0.01.1% <0.008% <0.01.1% <0.01.0% <0.009% <0.009% <0.01.1% O.O27% P-value Specificity OffA-10

indels 1 O O 2 2 3 5 7 Tota 8151 16888 8804 7061 6891 3.2138 14889 40120 Modified O.O12% <0.006% <0.01.1% O.O2.8% O.O22% O.009% O.O.34% O.017% P-value Specificity OffA-11

indels O O 1 O O O 9 76 Tota 41343 32352 26834 28709 26.188 3.2519 24894 19586 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% O.O.36% O.388% P-value 2.7E-03 2.5E-18 Specificity >O >274 >1302 >2564 >1016 448 47 OffA-12

indels O O O O O O O O Tota 13186 2326 13961 12911 21134 922O 7792 806.8 Modified <0.008% <0.043%

indels O O O O O 2 9 O Total 32704 32O15 12312 23645 26315 24O78 36111 22364 Modified <0.006% <0.006% <0.006% <0.006% <0.006% O.OO8% O.025% <0.006% P-value 2.7E-03 Specificity >O >225 >1302 >2564 616 649 >2725 US 9,359,599 B2 81 82 TABLE 7-continued Cellular modification induced by TALENs at On-target and predicted off-target genomic sites. C-terminal domain

No TALEN Q7 Q7 Q3 Q3 Canonical Canonical Canonical FokI domain No TALEN ELKK ELDKKR ELKK ELDKKR ELKK ELDKKR Homo OffA-1S

indels O O O O 1 O O O Total 14654 15934 12313 6581 13053 18996 10916 21519 Modified <0.007% <0.006% <0.008% <0.015% O.OO8% <0.006% <0.009% <0.006% P-value Specificity OffA-16

indels 1 O O O O O O 12 Tota 65190 35633 37252 3O378 3.1469 22590 13594 20922 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% O >274 >1302 >2564 >1016 >2200 317 OffA-17

indels O O O O O O O 6 Tota 1972 606 1439 2113 2862 728 597 636 Modified <0.05.1% <0.165% <0.069% <0.047% <0.035% <0.137% <0.168% O.94.3% P-value 1.4E-02 Specificity >O >26 >183 >489 >49 >97 19 OffA-18

indels O O O O O O O O Tota 5425 995 1453 1831 3132 1934 1534 S816 Modified <0.018% <0.1019/o <0.069% <0.055% <0.03.2% <0.052% <0.065% <0.017% P-value Specificity OffA-19

indels 1 2 O 1 1 1 1 3 Tota 31094 412S2 332.13 29518 32337 25904 27575 38711 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.008% P-value Specificity OffA-21

indels O O O O O O O O Tota 15297 9710 16719 12119 15483 21692 16558 15418 Modified <0.007% <0.01.0% <0.006% <0.008% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffA-22

indels 27 41 38 46 32 50 55 57 Tota 94O6 11150 11516 10269 13814 14057 11685 14291 Modified O.267% O.368% O.33.0% O448% O.23.2% O.356% O.471% O.399% P-value Specificity OffA-23

indels 1 O O O O O 10 2O Tota 5671 9363 2203 7011 7078 12O68 3484 8619 Modified O.O1.8% <0.01.1% <0.045% <0.014% <0.014% <0.008% O.287% O.23.2% P-value 3.SE-03 9.1E-OS Specificity >O >40 >609 >1210 >818 56 78 OffA-24

indels 4 O O 1 O 1 O 2 Tota 17288 7909 14261 29936 6943 6333 14973 1995.3 Modified O.O23% <0.01.3%

indels O O O O O O O O Total 20089 4532O 50758 108581 11574 20948 123827 741.51 Modified <0.006% <0.006% <0.006% <0.006% <0.009% <0.006% <0.006% <0.006% P-value Specificity US 9,359,599 B2 83 84 TABLE 7-continued Cellular modification induced by TALENs at on-target and predicted off-target genomic sites.

C-terminal domain

No TALEN Q7 Q7 Q3 Q3 Canonical Canonical Canonical FokI domain No TALEN ELKK ELDKKR ELKK ELDKKR ELKK ELDKKR Homo

OffA-27

indels O O O O 1 O O O Tota 47338 14352 21253 17777 26512 19483 43728 29469 Modified <0.006%

indels O O O O O O O O Tota 5174 12618 36909 18063 16486 17934 9999 35072 Modified <0.01.9% <0.008% <0.006% <0.006% <0.006% <0.006% <0.010% <0.006% P-value Specificity OffA-30

indels 4 4 O 7 4 4 O 3 Tota 45082 56531 35333 88651 69652 2O362 2918O 21350 Modified O.009% O.OO7% <0.006% O.OO8% <0.006% O.O2.0% <0.006% O.O14% P-value Specificity OffA-32

indels O O O O O O O O Tota 13405 6721 14013 7513 1413S 22376 64O7 13.720 Modified <0.007% <0.015% <0.007% <0.01.3% <0.007% <0.006% <0.016%

indels O O O O 1 1 O 4 Tota 106222 46866 157329 48611 92.559 152094 2O1408 225805 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% P-value Specificity OffA-34

indels O O O O O O O 2 Tota 3889 3158 2903 2235 2112 3022 2322 2481 Modified <0.0026%

indels O O O 1 O O O 33 Tota 46462 37431 38043 31033 44803 37257 41073 47273 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% O.O70% P-value 9.2E-09 Specificity >O >274 >1302 >2564 >1016 >2428 260 OffA-36

indels O O 2 O O O O O Total 27115 17075 45.425 35059 22298 1961O 1262O 27170 Modified <0.006% <0.006% <0.006% <0.006% <0.006% <0.006% <0.008% <0.006% P-value Specificity

(A) Results from sequencing CCR5A on-target and each predicted genomic off-target site that amplified from genomic DNA isolated from human cells treated with either no TALEN or TALENs containing canonical, Q3 or Q7 C-terminal domains, and either ELKKheterodimeric, ELDKKRheterodimeric, or homodimeric (Homo) FokI domains. Indels: the number of observed sequences containing insertions or deletions consistent with TALEN-induced cleavage. Total: total number of sequence counts. Modified: number of indels divided by total number of sequences as percentages, Upper limits of potential modification were calculated for sites with no observed indels by assuming there is less than one indel then dividing by the total sequence count to arrive at an upper limit modification percentage, or taking the theoretical limit of detection (116,400), whichever value was more conservative (larger), P-values: calculated as previously reported5 between each TALEN-treated sample and the untreated control sample. P-values less than 0.05 are shown. Specificity: the ratio of ontarget to off-target genomic modification frequency for each site. (B) Same as (A) for the ATM target sites. US 9,359,599 B2 85 86 TABLE 8 TABLE 9 Exponential fitting of enrichment values as function of Exponential fitting and extrapolation of enrichment values as mutation number. function of mutation number.

TALEN selectors 8. b R2 5 TALEN selection Range 8. b R2

L13 - R1O CCRSB 1.00 -1.88 O.9999.37 L16 - R16 CCR5B 3-5 1.00 -1.638 O.99998 L10 - R1O CCRSB 1.00 -1.85 O.999901 L16 - R13 CCRSB 2-4 1.00 -1.733 O.99998 L10 - R13 CCRSB 1.00 -1.71 O.999822 L16 - R1O CCR5B 2-4 1.00 -2.023 O.99999 L13 - R13 CCRSB 1.00 -1.64 0.999771 L13 - R16 CCRSB 2-4 1.00 -1.844 O.99997 L13 - R16 CCRSB 1.00 -1.15 O.998.286 10 L13 - R13 CCRSB 1-3 1.00 -2.014 O.99998 L16 - R1O CCR5B 1.00 -1.24 O.998.252 L13 - R1O CCRSB 1-3 1.OO -2.205 0.99999 L1O - R16 CCR5B 1.01 -1.08 O.996343 L10 - R16 CCR5B 2-4 1.00 - 1.929 O.99995 L16 - R13 CCRSB 1.01 -1.04 O.995844 L10 - R13 CCRSB 1-3 1.00 -2.11O 0.99998 L16 - R16 CCR5B 1.03 -O.70 O.977880 L10 - R1O CCRSB 1-3 1.00 -2.254 0.99999 L18 - R18 ATM 1.08 -0.36 O.91.3087 L18 - R18 CCRSA 1.13 -0.21 O.798923 15 Enrichment values of all sequences from all nine of the CCR5B selections as function of mutation number were normalized relative to enrichment values of sequences with the Enrichment values of post-selection sequences as function of mutation were normalized lowest mutation number in the range shown (= 1.0 by definition). Normalized enrichment relative to on-target enrichment (=1.0 by definition). Normalized enrichment values of values of sequences from the range ofmutations specified were fitto an exponential function, sequences with zero to four mutations were fit to an exponential function, aeb, with R2 are, with R' reported utilizing the non-linear least squares method. These exponential reported using the non-linear least squares method. decrease, b, were used to extrapolate all mean enrichment values beyond five mutations,

TABL E 1 O Oligonucleotide used in this study.

A.

ol igonucleotide ale oligonucleotide sequence (5' - >3')

TA -New 5Phos/CAGCAGCTGCCCGGT

TA 5Phos/cAGATCGCGAAGAGAGGGGGAGTAACAGCGGTAG TA 5Phos/cAGATCGCGcAGAGAGGGGGAGTAACAGCGGTAG TA 5Phos/cAGATCGCGcAGcago.GGGGAGTAACAGCGGTAG TA ATC GTA GCC CAA TTG TCC A TA GTTGGTTCTTTGGATCAATGCG TA AAGTTCTCTCGGGAATCCGTTGGTTGGTTCTTTGGATCA TA GAAGTTCTCTCGGGAATTTGTTGGTTGGTTTGTTGGATCAATGCGGGAGCATGAGGCAGACCTT GTTGGACTGCATC TA - Ciirew CTTTTGACTAGTTGGGATCCCCGCGACTTGATGGGAAGTTCTCTCTTTAAT CC R5A Library 10 Phos/CACCACTNT T C A T T A C A C C T G C A C G C T NNNNNNNNNN CC R5A Library12 hos/ CC R5A Library 14 hos/ CC R5A Library 16 hos/ CC R5A Library 18 hos/ CC R5A Library 20 hos/ CC R5A Library 22 hos/ CC R5A Library 24 hos/ CC R5B Library 10 hos/ CC R5B Library 12 hos/ CC R5B Library 14 hos/ CC R5B Library 16 hos/ CC R5B Library 18 hos/ CC R5B Library 20 hos/ CC R5B Library 22 hos/ CC R5B Library 24 hos/ ATM Library 10 hos/ ATM Library 12 hos/ ATM Library 14 hos/ ATM Library 16 hos/ ATM Library 18 hos/ ATM Library 20 hos/ ATM Library 22 hos/ ATM Library 24 hos/ adapter-fwd* * 1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTACTGT

adapter-rev * * 1 ACAGTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGG

adapter-fwd** 2 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCTGAA

adapter-rev * * 2 TTCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGG

adapter-fwd* * 3 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTGCAA

adapter-rev * * 3 TTGCAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGG

adapter-fwd* * 4 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACT

US 9,359,599 B2 97 98 TABLE 1 O - continued Oligonucleotide used in this study.

OffA-35 TGGGAATGTAAATCTGACTGGCTG CTGGAACTCTGGGCATGGCT

OffA-36 GCTGCAATTGCTTTTTGGCA TGGACCCCTCCCTTACACC

(A) All oligonucleotides were purchased from Integrated DNA Technologies (SEQ ID NO: 313-447 from top to bottom. f5Phos/ indicates 5 phosphorylated oligonucleotides. A 3 symbol indicates that the preceding nucleotide was incorporated as a mixture of phosphoramidites Consisting of 79 mol% of the phosphoramidite Corresponding to the preceding nucleotide and 7 mol% of each of the other three Canonical phosphoramidites. An (*) indicates that the oligonucleotide primer was specific to a selection sequence (either CCR5A, ATM or CCR5B). An (**) indicates that the oligonucleotide adapter or primer had a unique sequence identifier to distinguish between different samples (selection Conditions or cellular TALEN treatment). (B) Combinations of oligonucleotides used to Construct discrete DNA substrates used in TALEN digestion assays. (C) Primer pairs for PCR amplifying on-target and off-target genomic Sites. +DMSO: DMSO was used in the PCR; ND: no Correct DNA product was detected from the PCR reaction. Sequences Correspond to SEQ ID NOS: 472-545 (Fwd primers from top to bottom); and SEQ ID NOS: 546 – 619 (Rev primers from top to bottom).

SEQUENCE LISTING

<16O is NUMBER OF SEO ID NOS: 619

<210s, SEQ ID NO 1 &211s LENGTH: 136 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 1 Val Asp Lieu. Arg Thr Lieu. Gly Tyr Ser Glin Glin Glin Glin Glu Lys Ile 1. 5 1O 15 Llys Pro Llys Val Arg Ser Thr Val Ala Gln His His Glu Ala Lieu Val 2O 25 3O Gly His Gly Phe Thr His Ala His Ile Val Ala Leu Ser Glin His Pro 35 4 O 45 Ala Ala Lieu. Gly Thr Val Ala Wall Lys Tyr Glin Asp Met Ile Ala Ala SO 55 6 O Lieu Pro Glu Ala Thr His Glu Ala Ile Val Gly Val Gly Lys Glin Trp

Ser Gly Ala Arg Ala Lieu. Glu Ala Lieu. Lieu. Thr Val Ala Gly Glu Lieu. 85 90 95 Arg Gly Pro Pro Lieu. Glin Lieu. Asp Thr Gly Glin Lieu Lleu Lys Ile Ala 1OO 105 11 O Lys Arg Gly Gly Val Thr Ala Val Glu Ala Wal His Ala Trp Arg Asn 115 12 O 125 Ala Lieu. Thr Gly Ala Pro Lieu. Asn 13 O 135

<210s, SEQ ID NO 2 &211s LENGTH: 136 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 2 Val Asp Lieu. Arg Thr Lieu. Gly Tyr Ser Glin Glin Glin Glin Glu Lys Ile 1. 5 1O 15 Llys Pro Llys Val Arg Ser Thr Val Ala Gln His His Glu Ala Lieu Val 2O 25 3O Gly His Gly Phe Thr His Ala His Ile Val Ala Leu Ser Glin His Pro 35 4 O 45 Ala Ala Lieu. Gly Thr Val Ala Wall Lys Tyr Glin Asp Met Ile Ala Ala SO 55 6 O US 9,359,599 B2 99 100 - Continued Lieu Pro Glu Ala Thr His Glu Ala Ile Val Gly Val Gly Lys Glin Trp 65 70 7s

Ser Gly Ala Arg Ala Lieu. Glu Ala Lieu. Lieu. Thr Val Ala Gly Glu Lieu. 85 90 95

Arg Gly Pro Pro Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Glin Ile Ala 1OO 105 11 O Lys Arg Gly Gly Val Thr Ala Val Glu Ala Wal His Ala Trp Arg Asn 115 12 O 125 Ala Lieu. Thr Gly Ala Pro Lieu. Asn 13 O 135

<210s, SEQ ID NO 3 &211s LENGTH: 136 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 3 Val Asp Lieu. Arg Thr Lieu. Gly Tyr Ser Glin Glin Glin Glin Glu Lys Ile 1. 5 1O 15

Llys Pro Llys Val Arg Ser Thr Val Ala Gln His His Glu Ala Lieu Wall 2O 25 3O

Gly His Gly Phe Thr His Ala His Ile Val Ala Leu Ser Glin His Pro 35 4 O 45

Ala Ala Lieu. Gly Thr Val Ala Wall Lys Tyr Glin Asp Met Ile Ala Ala SO 55 6 O Lieu Pro Glu Ala Thr His Glu Ala Ile Val Gly Val Gly Gln Trp 65 70 7s

Ser Gly Ala Arg Ala Lieu. Glu Ala Lieu. Lieu. Thr Val Ala Gly Glu Lieu. 85 90 95

Arg Gly Pro Pro Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Glin Ile Ala 1OO 105 11 O Glin Arg Gly Gly Val Thr Ala Val Glu Ala Val His Ala Trp Arg Asn 115 12 O 125 Ala Lieu. Thr Gly Ala Pro Lieu. Asn 13 O 135

<210s, SEQ ID NO 4 &211s LENGTH: 136 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 4.

Val Asp Lieu. Arg Thir Lell Gly Ser Glin Glin Glin Glin Glu Lys Ile 1. 5 1O 15

Lys Pro Lys Wall Arg Ser Thir Wall Ala Glin His His Glu Ala Lieu Wall 25 3O

Gly His Gly Phe Thir His Ala His Ile Wall Ala Lieu. Ser Glin His Pro 35 4 O 45

Ala Ala Luell Gly Thir Wall Ala Wall Llys Tyr Glin Asp Met Ile Ala Ala SO 55 6 O

Lell Pro Glu Ala Thir His Glu Ala Ile Val Gly Val Gly Gln Trp 65 70 7s

Ser Gly Ala Arg Ala Lell Glu Ala Lieu. Lieu. Thir Wall Ala Gly Glu Lieu. 85 90 95 US 9,359,599 B2 101 102 - Continued

Arg Gly Pro Pro Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Glin Ile Ala 1OO 105 11 O Glin Glin Gly Gly Val Thr Ala Val Glu Ala Val His Ala Trp Arg Asn 115 12 O 125 Ala Lieu. Thr Gly Ala Pro Lieu. Asn 13 O 135

<210s, SEQ ID NO 5 &211s LENGTH: 33 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 5 Met Thr Pro Asp Glin Val Val Ala Ile Ala Ser Asn Gly Gly Gly Lys 1. 5 1O 15 Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin Asp 2O 25 3O

His

<210s, SEQ ID NO 6 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide < 4 OO SEQUENCE: 6 Gly Lieu. Thr Pro Glu Glin Val Val Ala Ile Ala Ser His Asp Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 7 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OO > SEQUENCE: 7 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser Asn. Ile Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 8 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 8 Gly Lieu. Thr Pro Ala Glin Val Val Ala Ile Ala Ser Asn Gly Gly Gly 1. 5 1O 15

Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O US 9,359,599 B2 103 104 - Continued

Asp His

<210s, SEQ ID NO 9 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 9 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser Asn Gly Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 10 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 10 Gly Lieu. Thr Pro Glu Glin Val Val Ala Ile Ala Ser Asn. Ile Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 11 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 11 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser His Asp Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 12 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 12 Gly Lieu. Thr Pro Ala Glin Val Val Ala Ile Ala Ser Asn. Ile Gly Gly 1. 5 1O 15

Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 13 &211s LENGTH: 34 212. TYPE: PRT US 9,359,599 B2 105 106 - Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 13 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser His Asp Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 14 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 14 Gly Lieu. Thr Pro Glu Glin Val Val Ala Ile Ala Ser His Asp Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 15 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 15 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser Asn Gly Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 16 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 16 Gly Lieu. Thr Pro Ala Glin Val Val Ala Ile Ala Asn. Asn. Asn Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 17 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 17 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Ser His Asp Gly Gly US 9,359,599 B2 107 108 - Continued

1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 18 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 18 Gly Lieu. Thr Pro Glu Glin Val Val Ala Ile Ala Ser Asn. Ile Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 19 &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 19 Gly Lieu. Thr Pro Asp Glin Val Val Ala Ile Ala Asn. Asn. Asn Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O

Ala His

<210s, SEQ ID NO 2 O &211s LENGTH: 34 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 2O Gly Lieu. Thr Pro Ala Glin Val Val Ala Ile Ala Ser His Asp Gly Gly 1. 5 1O 15 Lys Glin Ala Lieu. Glu Thr Val Glin Arg Lieu. Lieu Pro Val Lieu. Cys Glin 2O 25 3O Asp His

<210s, SEQ ID NO 21 &211s LENGTH: 21 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 21 Gly Lieu. Thr Pro Glu Glin Val Val Ala Ile Ala Ser Asn Gly Gly Gly 1. 5 1O 15

Arg Pro Ala Lieu. Glu 2O

<210s, SEQ ID NO 22 US 9,359,599 B2 109 110 - Continued

&211s LENGTH: 63 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 22 Ser Ile Val Ala Glin Lieu. Ser Arg Pro Asp Pro Ala Lieu Ala Ala Lieu. 1. 5 1O 15 Thir Asn Asp His Lieu Val Ala Lieu Ala Cys Lieu. Gly Gly Arg Pro Ala 2O 25 3O Lieu. Asp Ala Wall Lys Lys Gly Lieu Pro His Ala Pro Ala Lieu. Ile Llys 35 4 O 45 Arg Thr Asn Arg Arg Ile Pro Glu Arg Thir Ser His Arg Val Ala SO 55 6 O

<210s, SEQ ID NO 23 &211s LENGTH: 63 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 23 Ser Ile Val Ala Glin Lieu. Ser Arg Pro Asp Pro Ala Lieu Ala Ala Lieu. 1. 5 1O 15 Thir Asn Asp His Lieu Val Ala Lieu Ala Cys Lieu. Gly Gly Arg Pro Ala 2O 25 3O Lieu. Asp Ala Val Llys Lys Gly Lieu Pro His Ala Pro Ala Lieu. Ile Glin 35 4 O 45 Arg Thr Asn. Glin Arg Ile Pro Glu Arg Thir Ser His Glin Val Ala SO 55 6 O

<210s, SEQ ID NO 24 &211s LENGTH: 63 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 24 Ser Ile Val Ala Glin Lieu. Ser Arg Pro Asp Pro Ala Lieu Ala Ala Lieu. 1. 5 1O 15 Thir Asn Asp His Lieu Val Ala Lieu Ala Cys Lieu. Gly Gly Arg Pro Ala 2O 25 3O Lieu. Asp Ala Val Glin Glin Gly Lieu Pro His Ala Pro Ala Lieu. Ile Glin 35 4 O 45 Gln Thr Asn Glin Glin Ile Pro Glu Arg Thr Ser His Glin Val Ala SO 55 6 O

<210s, SEQ ID NO 25 &211s LENGTH: 28 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 25 Ser Ile Val Ala Glin Lieu. Ser Arg Pro Asp Pro Ala Lieu Ala Ala Lieu. 1. 5 1O 15

Thir Asn Asp His Lieu Val Ala Lieu Ala Cys Lieu. Gly 2O 25 US 9,359,599 B2 111 112 - Continued

SEQ ID NO 26 LENGTH: 198 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 26

Gly Ser Glin Lieu Ser Glu Luell Glu Glu Ser Glu Luell 1. 15

Arg His Lys Luell Wall Pro His Glu Tyr Ile Glu Luell Ile Glu 25

Ile Ala Arg Asn Ser Thir Glin Asp Arg Ile Luell Glu Met Wall Met 35 4 O 45

Glu Phe Phe Met Lys Wall Tyr Gly Arg Gly Lys His Luell Gly Gly SO 55 6 O

Ser Arg Pro Asp Gly Ala Ile Thir Wall Gly Ser Pro Ile Asp 65 70

Gly Wall Ile Wall Asp Thir Ala Tyr Ser Gly Gly Asn Luell 85 90 95

Pro Ile Gly Glin Ala Asp Glu Met Glin Arg Tyr Wall Glu Glu Asn Glin 105 11 O

Thir Arg Asn His Ile Asn Pro Asn Glu Trp Trp Lys Wall Pro 115 12 O 125

Ser Ser Wall Thir Glu Phe Lys Phe Luell Phe Wall Ser Gly His Phe 13 O 135 14 O

Gly Asn Ala Glin Lell Thir Arg Luell ASn His Ile Thir Asn Cys 145 150 155 160

Asn Gly Ala Wall Lell Ser Wall Glu Glu Luell Luell Ile Gly Gly Glu Met 1.65 17O

Ile Ala Gly Thir Lell Thir Luell Glu Glu Wall Arg Arg Lys Phe Asn 18O 185 19 O

Asn Gly Glu Ile Asn Phe 195

SEO ID NO 27 LENGTH: 198 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 27

Gly Ser Glin Lieu Ser Glu Luell Glu Glu Ser Glu Luell 1. 1O 15

Arg His Lys Luell Wall Pro His Glu Tyr Ile Glu Luell Ile Glu 25 3O

Ile Ala Arg Asn Ser Thir Glin Asp Arg Ile Luell Glu Met Wall Met 35 4 O 45

Glu Phe Phe Met Lys Wall Tyr Gly Tyr Gly Lys His Luell Gly Gly SO 55 6 O

Ser Arg Pro Asp Gly Ala Ile Thir Wall Gly Ser Pro Ile Asp 65 70

Gly Wall Ile Wall Asp Thir Ala Tyr Ser Gly Gly Asn Luell 85 90 95

Pro Ile Gly Glin Ala Asp Glu Met Glu Arg Tyr Wall Glu Glu Asn Glin US 9,359,599 B2 113 114 - Continued

1OO 105 11 O

Thir Arg Asn Llys His Lieu. Asn Pro Asn. Glu Trp Trp Lys Wall Pro 115 12 O 125

Ser Ser Val Thr Glu Phe Llys Phe Leu Phe Val Ser Gly His Phe 13 O 135 14 O

Gly Asn Tyr Lys Ala Glin Lieu. Thir Arg Lieu. Asn His Ile Thir Asn Cys 145 150 155 160

Asn Gly Ala Val Lieu. Ser Val Glu Glu Lieu. Lieu. Ile Gly Gly Glu Met 1.65 17O

Ile Lys Ala Gly Thr Lieu. Thir Lieu. Glu Glu Val Arg Arg Lys Phe Asn 18O 185 19 O Asn Gly Glu Ile Asn. Phe 195

SEQ ID NO 28 LENGTH: 198 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic Polypeptide

SEQUENCE: 28

Gly Ser Glin Lieu Val Llys Ser Glu Lieu. Glu Glu Lys Ser Glu Luell 1. 15

Arg His Llys Lieu Lys Tyr Val Pro His Glu Tyr Ile Glu Luell Ile Glu 2O 25

Ile Ala Arg ASn Ser Thr Glin Asp Arg Ile Lieu. Glu Met Wall Met 35 4 O 45

Glu Phe Phe Met Lys Val Tyr Gly Tyr Arg Gly Lys His Luell Gly Gly SO 55 6 O

Ser Arg Llys Pro Asp Gly Ala Ile Tyr Thr Val Gly Ser Pro Ile Asp 65

Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Luell 85 90 95

Pro Ile Gly Glin Ala Asp Glu Met Glin Arg Tyr Val Glu Asn Glin 1OO 105 11 O

Thir Arg Asn Llys His Ile Asn Pro Asn. Glu Trp Trp Lys Wall Pro 115 12 O 125

Ser Ser Val Thr Glu Phe Llys Phe Leu Phe Val Ser Gly His Phe 13 O 135 14 O

Gly Asn Tyr Lys Ala Glin Lieu. Thir Arg Lieu. Asn His Thir Asn Cys 145 150 155 160

Asn Gly Ala Val Lieu. Ser Val Glu Glu Lieu. Lieu. Ile Gly Gly Glu Met 1.65 17O 17s

Ile Lys Ala Gly Thr Lieu. Thir Lieu. Glu Glu Val Arg Arg Lys Phe Asn 18O 185 19 O

Asn Gly Glu Ile Asn. Phe 195

SEQ ID NO 29 LENGTH: 198 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic Polypeptide

SEQUENCE: 29 US 9,359,599 B2 115 116 - Continued

Gly Ser Glin Luell Ser Glu Luell Glu Glu Lys Lys Ser Glu Luell 15

Arg His Luell Wall Pro His Glu Tyr Ile Glu Luell Ile Glu 25 3O

Ile Ala Arg Asn Ser Thir Glin Asp Arg Ile Luell Glu Met Wall Met 35 4 O 45

Glu Phe Phe Met Lys Wall Tyr Gly Arg Gly Lys His Luell Gly Gly SO 55 6 O

Ser Arg Pro Asp Gly Ala Ile Thir Wall Gly Ser Pro Ile Asp 65 70

Gly Wall Ile Wall Asp Thir Ala Tyr Ser Gly Gly Tyr Asn Luell 85 90 95

Pro Ile Gly Glin Ala Asp Glu Met Glu Arg Tyr Wall Glu Glu Asn Glin 105 11 O

Thir Arg Asp His Lell Asn Pro Asn Glu Trp Trp Lys Wall Pro 115 12 O 125

Ser Ser Wall Thir Glu Phe Lys Phe Luell Phe Wall Ser Gly His Phe 13 O 135 14 O

Gly Asn Ala Glin Lell Thir Arg Luell ASn His Ile Thir Asn Cys 145 150 155 160

Asn Gly Ala Wall Lell Ser Wall Glu Glu Luell Luell Ile Gly Gly Glu Met 1.65 17O

Ile Ala Gly Thir Lell Thir Luell Glu Glu Wall Arg Arg Lys Phe Asn 18O 185 19 O

Asn Gly Glu Ile Asn Phe 195

SEQ ID NO 3 O LENGTH: 198 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 3 O

Gly Ser Glin Lieu Ser Glu Luell Glu Glu Ser Glu Luell 1. 1O 15

Arg His Lys Luell Wall Pro His Glu Tyr Ile Glu Luell Ile Glu 25

Ile Ala Arg Asn Ser Thir Glin Asp Arg Ile Luell Glu Met Wall Met 35 4 O 45

Glu Phe Phe Met Lys Wall Tyr Gly Tyr Gly Lys His Luell Gly Gly SO 55 6 O

Ser Arg Pro Asp Gly Ala Ile Thir Wall Gly Ser Pro Ile Asp 65 70 7s

Gly Wall Ile Wall Asp Thir Ala Tyr Ser Gly Gly Tyr Asn Luell 85 90 95

Pro Ile Gly Glin Ala Asp Glu Met Glin Arg Tyr Wall Glu Asn Glin 1OO 105 11 O

Thir Arg Asn His Ile Asn Pro Asn Glu Trp Trp Lys Wall Pro 115 12 O 125

Ser Ser Wall Thir Glu Phe Lys Phe Luell Phe Wall Ser Gly His Phe 13 O 135 14 O

Gly Asn Ala Glin Lell Thir Arg Luell ASn Arg Thir Asn Cys 145 150 155 160 US 9,359,599 B2 117 118 - Continued

Asn Gly Ala Val Lieu. Ser Val Glu Glu Lieu. Lieu. Ile Gly Gly Glu Met 1.65 17O 17s Ile Lys Ala Gly Thr Lieu. Thir Lieu. Glu Glu Val Arg Arg Llys Phe Asn 18O 185 19 O Asn Gly Glu Ile Asn. Phe 195

<210s, SEQ ID NO 31 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 31 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu Lys Ile Ala Lys Arg Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 32 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 32 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Glin Ile Ala Lys Arg Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 33 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 33 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu Lys Ile Ala Glin Arg Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 34 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 34 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu Lys Ile Ala Lys Glin Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 35 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 9,359,599 B2 119 120 - Continued <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 35 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu Lys Ile Ala Lys Gly Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 36 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 36 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Glin Ile Ala Glin Arg Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 37 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OO > SEQUENCE: 37 Lieu Gln Lieu. Asp Thr Gly Gln Lieu Lieu. Glin Ile Ala Lys Glin Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 38 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 38 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu Lys Ile Ala Glin Glin Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 39 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Polypeptide

<4 OOs, SEQUENCE: 39 Lieu. Glin Lieu. Asp Thr Gly Glin Lieu. Lieu. Gly Ile Ala Lys Gly Gly Gly 1. 5 1O 15

Thir Wall. Thir Ala Wall Glu Ala 2O

<210s, SEQ ID NO 4 O &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Artificial Sequence