USOO91 63284B2

(12) United States Patent (10) Patent No.: US 9,163,284 B2 Liu et al. (45) Date of Patent: Oct. 20, 2015

(54) METHODS FOR IDENTIFYING ATARGET 2013,01582.45 A1 6, 2013 Russell et al. SITE OF ACAS9 NUCLEASE 2014, OO17214 A1 1/2014 Cost 2014f0065711 A1 3/2014 Liu et al. 2014.0068797 A1 3/2014 Doudna et al. (71) Applicant: President and Fellows of Harvard 2014/O127752 A1 5, 2014 Zhou et al. College, Cambridge, MA (US) 2014/0234289 A1* 8, 2014 Liu et al...... 424/94.61 2014/0273O37 A1 9, 2014 Wu (72) Inventors: David R. Liu, Lexington, MA (US); 2014/0273226 A1 9, 2014 Wu Vikram Pattanayak, Cambridge, MA 2014/0273230 A1 9/2014 Chen et al. 2014/0295556 A1 10/2014 Joung et al. (US) 2014/03494.00 A1 11/2014 Jakimo et al. 2014/0356867 A1 12/2014 Peter et al. (73) Assignee: President and Fellows of Harvard 2014/0357523 A1 12/2014 Zeiner et al. College, Cambridge, MA (US) 2015, 0010526 A1 1/2015 Liu et al. (*) Notice: Subject to any disclaimer, the term of this FOREIGN PATENT DOCUMENTS patent is extended or adjusted under 35 AU 2012244264 11, 2012 U.S.C. 154(b) by 0 days. CN 103233028 A 8, 2013 CN 103388006 A 11, 2013 (21) Appl. No.: 14/320,370 CN 103614415 A 3, 2014 CN 103.642836 A 3, 2014 (22) Filed: Jun. 30, 2014 CN 103668472 A 3, 2014 CN 103820441 A 5, 2014 (65) Prior Publication Data CN 103820454. A 5, 2014 CN 10391 1376. A T 2014 US 2015/OO44191A1 Feb. 12, 2015 CN 103923.911 A T 2014 CN 104004778 A 8, 2014 CN 104.109687. A 10/2014 Related U.S. Application Data CN 104178461 A 12/2014 WO WO 2007/O2SO97 A2 3, 2007 (60) Provisional application No. 61/864.289, filed on Aug. WO WO 2007.136815 A2 11/2007 9, 2013. WO WO 2007 143574 A1 12/2007 WO WO 2009,134808 A2 11/2009 (51) Int. Cl. WO WO 2010/054108 A2 5, 2010 CI2O I/68 (2006.01) WO WO 2010/0541.54 A2 5, 2010 CI2N 9/22 (2006.01) WO WO 2010.075424 A2 T 2010 (52) U.S. Cl. WO WO 2010, 129023 A2 11/2010 CPC ...... CI2O I/6874 (2013.01); C12N 9/22 (Continued) (2013.01) (58) Field of Classification Search OTHER PUBLICATIONS None See application file for complete search history. Guilanger et al. (Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification, Nature (56) References Cited Biotech., col. 32, No. 6, pp. 577-583, Apr. 25, 2014).* (Continued) U.S. PATENT DOCUMENTS 6,453,242 B1 9/2002 Eisenberg et al. 6,503,717 B2 1/2003 Case et al. Primary Examiner — Stephanie K Mummert 6,534,261 B1 3/2003 Cox, III et al. 6,599,692 B1 7/2003 Case et al. Assistant Examiner — Aaron Priest 6,607,882 B1 8/2003 Cox, III et al. 6,824,978 B1 1 1/2004 Cox, III et al. (74) Attorney, Agent, or Firm — Wolf, Greenfield & Sacks, 6,933,113 B2 8, 2005 Case et al. P.C. 6,979,539 B2 12/2005 Cox, III et al. 7,013,219 B2 3, 2006 Case et al. 7,163,824 B2 1/2007 Cox, III et al. (57) ABSTRACT 7.919,277 B2 4/2011 Russell et al. 8,361,725 B2 1/2013 Russell et al. Some aspects of this disclosure provide Strategies, methods, 8,546,553 B2 10/2013 Terns et al. 8,680,069 B2 3/2014 de Fougerolles et al. and reagents for determining nuclease target site preferences 8,697.359 B1 * 4/2014 Zhang ...... 435, 6.1 and specificity of site-specific endonucleases. Some methods 8,846,578 B2 9/2014 McCray et al. provided herein utilize a novel “one-cut’ strategy for screen 2008/O124725 A1 5/2008 Barrangou et al. 2008. O1822.54 A1 7/2008 Hall et al. ing a library of concatemers comprising repeat units of can 2009,0130718 A1 5, 2009 Short didate nuclease target sites and constant insert regions to 2010/0076057 A1 3/2010 Sontheimer et al. identify library members that can been cut by a nuclease of 2010/0093617 A1 4/2010 Barrangou et al. interest via sequencing of an intact target site adjacent and 2010, 0104690 A1 4/2010 Barrangou et al. 2011/O189776 A1 8, 2011 Terns et al. identical to a cut target site. 2011/0217739 A1 9, 2011 Terns et al. 2012fO141523 A1 6/2012 Castado et al. 2013,0130248 A1 5, 2013 Haurwitz et al. 24 Claims, 55 Drawing Sheets US 9,163.284 B2 Page 2

(56) References Cited WO WO 2014/099744 A1 6, 2014 WO WO 2014/O99750 A2 6, 2014 FOREIGN PATENT DOCUMENTS WO WO 2014/113493 A1 T 2014 WO WO 2014/124226 A1 8, 2014 WO WO 2011/O17293 A2 2, 2011 WO WO 2014f127287 A1 8, 2014 WO WO 2011/O53982 A2 5, 2011 WO WO 2014, 128659 A1 8, 2014 WO WO 2012/O54726 A1 4/2012 WO WO 2014f130955 A1 8, 2014 WO WO 2012/1389.27 A2 10, 2012 WO WO 2014f131833 A1 9, 2014 WO WO 2012/158985 A2 11/2012 WO WO 2014, 143381 A1 9, 2014 WO WO 2012/158986 A2 11/2012 WO WO 2014, 144094 A1 9, 2014 WO WO 2012, 164565 A1 12/2012 WO WO 2014f1441:55 A1 9, 2014 WO WO 2013/O12674 A1 1, 2013 WO WO 2014, 144288 A1 9, 2014 WO WO 2013,066438 A2 5, 2013 WO WO 2014, 144592 A2 9, 2014 WO WO2O13066438 A2 * 5, 2013 WO WO 2014, 144761 A2 9, 2014 WO WO 2013,098244 A1 T 2013 WO WO 2014f145599 A2 9, 2014 WO WO 2013/126794 A1 8, 2013 WO WO 2014/150624 A1 9, 2014 WO WO 2013, 130824 A1 9, 2013 WO WO 2014/152432 A2 9, 2014 WO WO 2013/141680 A1 9, 2013 WO WO 2014/153470 A2 9, 2014 WO WO 2013, 142378 A9 9, 2013 WO WO 2014, 164466 A1 10/2014 WO WO 2013, 142578 A2 9, 2013 WO WO 2014, 165349 A1 10/2014 WO WO 2013, 169398 A2 11/2013 WO WO 2014, 165825 A2 10/2014 WO WO 2013, 1698O2 A1 11, 2013 WO WO 2014, 172458 A1 10/2014 WO WO 2013, 176772 A2 11/2013 WO WO 2014f172470 A 10, 2014 WO WO 2013, 176915 A1 11/2013 WO WO 2014, 182700 A1 11, 2014 WO WO 2013, 17691.6 A1 11, 2013 WO WO 2014, 184143 A1 11, 2014 WO WO 2013, 181440 A1 12/2013 WO WO 2014, 184741 A1 11, 2014 WO WO 2013, 186754 A2 12/2013 WO WO 2014, 184744 A1 11, 2014 WO WO 2013, 188037 A2 12/2013 WO WO 2014, 186585 A2 11/2014 WO WO 2013, 188522 A2 12/2013 WO WO 2014, 186686 A2 11/2014 WO WO 2013, 188638 A2 12/2013 WO WO 2014, 190181 A1 11, 2014 WO WO 2014/O11237 A1 1, 2014 WO WO 2014, 191128 A1 12/2014 WO WO 2014/011901 A2 1, 2014 WO WO 2014, 191518 A1 12/2014 WO WO 2014/O18423 A2 1, 2014 WO WO 2014, 191521 A2 12/2014 WO WO 2014/022120 A1 2, 2014 WO WO 2014, 191525 A1 12/2014 WO WO 2014/0227 O2 A2 2, 2014 WO WO 2014, 191527 A1 12/2014 WO WO 2014/036219 A2 3, 2014 WO WO 2014, 197568 A2 12/2014 WO WO 2014/0395.13 A2 3, 2014 WO WO 2014, 197748 A2 12/2014 WO WO 2014/O39523 A1 3, 2014 WO WO 2014/039684 A1 3f2014 OTHER PUBLICATIONS WO WO 2014/0396.92 A2 3, 2014 M WO WO 2014/O397O2 A2 3, 2014 Pattanayak et al. (Revealing off-target cleavage specificities of zinc WO WO 2014/039872 A1 3, 2014 finger nucleases by in vitro selection, Nature Methods, vol. 8, No. 9. WO WO 2014/O3997O A1 3, 2014 pp. 765-770, Aug. 11, 2011).* WO WO 2014/041327 A1 3, 2014 WO WO 2014/043143 A1 3, 2014 Jinek et al. (A Programmable Dual-RNA-Guided DNA WO WO 2014/0471.03 A2 3, 2014 Endonuclease in Adaptive Bacterial Immunity, Science, vol. 337 No. WO WO 2014/059173 A2 4, 2014 6096 pp. 816-821, Jun. 28, 2012).* WO WO 2014/059255 A1 4/2014 Carroll (ACRISPR Approach to Targeting, Mol Ther. Sep. WO WO 2014/065596 A1 5, 2014 2012:20(9): 1658-60).* WO WO 2014/066505 A1 5, 2014 Barrangou (RNA-mediated programmable DNA cleavage, Nature WO WO 2014/068.346 A2 5, 2014 Biotechnology 30, 836-838, Sep. 10, 2012).* WO WO 2014/070887 A1 5, 2014 Cong et al. (Multiplex Genome Engineering Using CRISPR/Cas WO WO 2014/071006 A1 5, 2014 Systems, Science, vol. 339 No. 6121 pp. 819-823, Jan. 3, 2013).* WO WO 2014/071219 A1 5, 2014 Mali et al. (RNA-Guided Engineering via Cas9. WO WO 2014/071235 A1 5, 2014 Science, vol. 339, No. 6121 pp. 823-826, Jan. 3, 2013).* WO WO 2014/072941 A1 5, 2014 WO WO 2014/081729 A1 5, 2014 Kim et al. (Hybrid restriction enzymes: Zinc finger fusions to Fok I WO WO 2014/081730 A1 5, 2014 cleavage domain, Proc. Natl. Acad. Sci. USA vol.93, pp. 1156-1160, WO WO 2014/081855 A1 5, 2014 Feb. 1996).* WO WO 2014/082644 A1 6, 2014 Doyon et al. (Enhancing zinc-finger-nuclease activity with improved WO WO 2014/085261 A1 6, 2014 obligate heterodimeric architectures, Nature Methods, vol. 8, No. 1, WO WO 2014/085593 A1 6, 2014 pp. 74-79, Dec. 5, 2010).* WO WO 2014/085830 A2 6, 2014 Semenova et al. (Interference by clustered regularly interspaced short WO WO 2014/0892.12 A1 6, 2014 palindromic repeat (CRISPR) RNA is governed by a seed sequence, WO WO 2014/08929.0 A1 6, 2014 Proc Natl AcadSci U S A. Jun. 21, 2011:108025): 10098-103).* WO WO 2014/089348 A1 6, 2014 Qi et al. (Repurposing CRISPR as an RNA-guided platform for WO WO 2014/089513 A1 6, 2014 WO WO 2014/089533 A2 6, 2014 sequence-specific control of gene expression, Cell. Feb. 28, WO WO 2014/089541 A2 6, 2014 2013; 152(5): 1173-83).* WO WO 2014/093479 A1 6, 2014 Pennisi (The CRISPR craze, Science. Aug. 23, 2013:341 (6148):833 WO WO 2014/093595 A1 6, 2014 6).* WO WO 2014/093622 A2 6, 2014 Hwang et al. (Efficient genome editing in zebrafish using a CRISPR WO WO 2014/O93635 A1 6, 2014 Cas system, Nat Biotechnol. Mar. 2013:31(3):227-9).* WO WO 2014/O93655 A2 6, 2014 Zhang et al. (CRISPR/Cas9 for genome editing: progress, implica WO WO 2014/093661 A2 6, 2014 tions and challenges, Hum Mol Genet. Sep. 15, 2014:23(R1):R40 WO WO 2014/093694 A1 6, 2014 6).* WO WO 2014/O93701 A1 6, 2014 Mali et al. (Cas9 transcriptional activators for target specificity WO WO 2014/O93709 A1 6, 2014 Screening and paired nickases for cooperative genome engineering, WO WO 2014/O93712 A1 6, 2014 Nature Biotechnology 31, 833-838, Aug. 1, 2013).* US 9,163.284 B2 Page 3

(56) References Cited Bibikova et al., Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Mol Cell Biol. Jan. OTHER PUBLICATIONS 2001:21(1):289-97. Bibikova et al., Targeted chromosomal cleavage and mutagenesis in U.S. Appl. No. 61/716.256, filed Oct. 19, 2012, Jinek et al. Drosophila using zinc-finger nucleases. Genetics. Jul. U.S. Appl. No. 61/717,324, filed Oct. 23, 2012, Cho et al. 2002; 161(3): 1169-75. U.S. Appl. No. 61/734,256, filed Dec. 6, 2012, Chen et al. Bochet al., Breaking the code of DNA binding specificity of TAL U.S. Appl. No. 61/758,624, filed Jan. 30, 2013, Chen et al. type III effectors. Science. Dec. 11, 2009:326(5959): 1509-12. Doi: U.S. Appl. No. 61/761,046, filed Feb. 5, 2013, Knight et al. 10.1126/science. 11788.11. U.S. Appl. No. 61/794,422, filed Mar. 15, 2013, Knight et al. Boch. TALEs of genome targeting. Nat Biotechnol. Feb. U.S. Appl. No. 61/803,599, filed Mar. 20, 2013, Kim et al. 2011:29(2):135-6. Doi:10.1038/nbt. 1767. U.S. Appl. No. 61/837,481, filed Jun. 20, 2013, Cho et al. Brown et al., Serine recombinases as tools for genome engineering. U.S. Appl. No. 14/258.458, filed Apr. 22, 2014, Cong. Methods. Apr. 2011:53(4):372-9. doi:10.1016/jymeth.2010.12.031. International Search Report and Written Opinion for PCT/US2012/ Epub Dec. 30, 2010. 047778, mailed May 30, 2013. Bulyket al., Exploring the DNA-binding specificities of zinc fingers International Preliminary Report on Patentability for PCT/US2012/ with DNA microarrays. Proc Natl Acad Sci U S A. Jun. 19. 047778, mailed Feb. 6, 2014. 2001:98(13):7158-63. Epub Jun. 12, 2001. International Search Report for PCT/US2013/032589, mailed Jul. Cade et al., Highly efficient generation of heritable zebrafish gene 26, 2013. mutations using homo- and heterodimeric TALENs. Nucleic Acids Genbank Submission; NIH/NCBI, Accession No. J04623. Kita et al., Res. Sep. 2012:40(16):8001-10. Doi:10.1093/margks518. Epub Jun. Aug. 26, 1993. 2 pages. 7, 2012. Genbank Submission; NIH/NCBI, Accession No. NC 002737.1. Carroll et al., Gene targeting in Drosophila and Caenorhabditis Ferretti et al., Jun. 27, 2013. 1 page. elegans with zinc-finger nucleases. Methods Mol Biol. 2008:435:63 Genbank Submission; NIH/NCBI, Accession No. NC 015683.1. 77 doi:10.1007/978-1-59745-232-8 5. Trost et al., Jul. 6, 2013. 1 page. Carroll et al., Progress and prospects: Zinc-finger nucleases as gene Genbank Submission; NIH/NCBI, Accession No. NC 016782.1. therapy agents. Gene Ther. Nov. 2008:15(22): 1463-8. doi:10.1038/ Trost et al., Jun. 11, 2013. 1 page. gt.2008. 145. Epub Sep. 11, 2008. Genbank Submission; NIH/NCBI, Accession No. NC 016786.1. Cermak et al. Efficient design and assembly of custom TALEN and Trost et al., Aug. 28, 2013. 1 page. other TAL effector-based constructs for DNA targeting. Nucleic Genbank Submission; NIH/NCBI, Accession No. NC 017053.1. Acids Res. Jul. 2011:39(12):e82. Doi. 10.1093/margkr218. Epub Fittipaldi et al., Jul. 6, 2013. 1 page. Apr. 14, 2011. Genbank Submission; NIH/NCBI, Accession No. NC 017317.1. Charpentier et al., Biotechnology: Rewriting a genome. Nature. Mar. Trost et al., Jun. 11, 2013. 1 page. 7, 2013:495(7439):50-1. doi:10.1038/495.050a. Genbank Submission; NIH/NCBI, Accession No. NC 017861.1. Cho et al., Analysis of off-target effects of CRISPR/Cas-derived Heidelberg et al., Jun. 11, 2013. 1 page. RNA-guided endonucleases and nickases. Genome Res. Jan. Genbank Submission; NIH/NCBI, Accession No. NC 0180 10.1. 2014:24(1): 132-41. doi: 10.1101/gr. 162339.113. Epub Nov. 19, Lucas et al., Jun. 11, 2013. 2 pages. 2013. Genbank Submission; NIH/NCBI, Accession No. NC 018721.1. Christian et al. Targeting G with TAL effectors: a comparison of Feng et al., Jun. 11, 2013. 1 pages. activities of TALENs constructed with NN and NK repeat variable Genbank Submission; NIH/NCBI, Accession No. NC 021284.1. di-residues. PLoS One. 2012;7(9):e45383. doi: 10.1371journal. Ku et al., Jul. 12, 2013. 1 page. pone.0045383. Epub Sep. 24, 2012. Genbank Submission; NIH/NCBI, Accession No. NC 021314.1. Christian et al., Targeting DNA double-strand breaks with TAL effec Zhang et al., Jul. 15, 2013. 1 page. tor nucleases. Genetics. Oct. 2010; 186(2):757-61. Doi:10.1534/ge Genbank Submission; NIH/NCBI, Accession No. NC 021846.1. Lo netics. 110.120717. Epub Jul. 26, 2010. et al., Jul. 22, 2013. 1 page. Chylinski et al., The tracrRNA and Cas9 families of type IICRISPR Genbank Submission; NIH/NCBI, Accession No. NP 472073.1. Cas immunity systems. RNA Biol. May 2013; 10(5):726-37, doi: Glaser et al., Jun. 27, 2013. 2 pages. 10.4161/rna.24321. Epub Apr. 5, 2013. Genbank Submission; NIH/NCBI, Accession No. P42212. Prasher et Cong et al., Comprehensive interrogation of natural TALE DNA al., Mar. 19, 2014. 7 pages. binding modules and transcriptional repressor domains. Nat Com Genbank Submission; NIH/NCBI, Accession No.YP 002342100.1. mun. Jul. 24, 2012:3:968. doi:10.1038/ncomms 1962. Bernardini et al., Jun. 10, 2013. 2 pages. Cong et al., Multiplex genome engineering using CRISPR/Cas sys Genbank Submission; NIH/NCBI, Accession No.YP 002344900.1. tems. Science. Feb. 15, 2013:339(6121):819-23. doi:10.1126/sci Gundogdu et al., Mar. 19, 2014. 2 pages. ence. 1231 143. Epub Jan. 3, 2013. Genbank Submission; NIH/NCBI, Accession No. YP 820832.1. Cornu et al., DNA-binding specificity is a major determinant of the Makarova et al., Aug. 27, 2013. 2 pages. activity and toxicity of zinc-finger nucleases. Mol Ther. Feb. UniprotSubmission; UniProt, Accession No. P04275. Last modified 2008; 16(2):352-8. Epub Nov. 20, 2007. Jul. 9, 2014, version 107. 29 pages. Cradick et al., CRISPR/Cas9 systems targeting B-globin and CCR5 UniprotSubmission; UniProt, Accession No. P04264. Last modified have Substantial off-target activity. Nucleic Acids Res. Nov 1. Jun. 11, 2014, version 6.15 pages. 2013:41 (20):9584-92. doi:10.1093/nar/gkt714. Epub Aug. 11, 2013. UniprotSubmission; UniProt, Accession No. P01011. Last modified Cradick et al., Zinc-finger nucleases as a novel therapeutic strategy Jun. 11, 2014, version 2. 15 pages. for targeting hepatitis B virus DNAs. MolTher. May 2010;18(5):947 Akopian et al., Chimeric recombinases with designed DNA sequence 54. Doi:10.1038/mt.2010.20. Epub Feb. 16, 2010. recognition. Proc Natl AcadSci U S A. Jul. 22, 2003: 100(15):8688 Cui et al., Targeted integration in rat and mouse embryos with zinc 91. Epub Jul. 1, 2003. finger nucleases. Nat Biotechnol. Jan. 2011:29(1):64-7. Doi: Barrangou et al., CRISPR provides acquired resistance against 10.1038/nbt. 1731. Epub Dec. 12, 2010. viruses in prokaryotes. Science. Mar. 23, 2007:315(5819): 1709-12. Dahlem et al., Simple methods for generating and detecting locus Bedell et al. In vivo genome editing using a high-efficiency TALEN specific mutations induced with TALENs in the zebrafish genome. system. Nature. Nov. 1, 2012:491 (7422): 114-8. Doi:10.1038/na PLoS Genet. 2012:8(8):e 1002861. doi: 10.1371/journalipgen. ture 11537. Epub Sep. 23, 2012. 1002861. Epub Aug. 16, 2012. Beumer et al., Efficient gene targeting in Drosophila with zinc-finger De Souza, Primer: genome editing with engineered nucleases. Nat nucleases. Genetics. Apr. 2006; 172(4):2391-403. Epub Feb. 1, 2006. Methods. Jan. 2012:9(1):27. US 9,163.284 B2 Page 4

(56) References Cited Gupta et al., Zinc finger -dependent and -independent contri butions to the in vivo off-target activity of Zinc finger nucleases. OTHER PUBLICATIONS Nucleic Acids Res. Jan. 2011:39(1):381-92. doi: 10.1093/nar/ gkg787. Epub Sep. 14, 2010. Deltcheva et al., CRISPR RNA maturation by trans-encoded small Hale et al., RNA-guided RNA cleavage by a CRISPR RNA-Cas RNA and host factor RNase III. Nature. Mar. 31, protein complex. Cell. Nov. 25, 2009:139(5):945-56. doi:10.1016/j. 2011:471 (7340):602-7. doi:10.1038/nature09886. ce11.2009.07.040. Dicarlo et al., Genome engineering in Saccharomyces cerevisiae Händel et al., Expanding or restricting the target site repertoire of using CRISPR-Cas systems. Nucleic Acids Res. Apr. zinc-finger nucleases: the inter-domain linker as a major determinant 2013:41(7):4336-43, doi:10.1093/nar/gkt135. Epub Mar. 4, 2013. of target site selectivity. Mol Ther. Jan. 2009; 17(1):104-11. doi: Ding et al., ATALEN genome-editing system for generating human 10.1038/mt.2008.233. Epub Nov. 11, 2008. stem cell-based disease models. Cell Stem Cell. Feb. 7, Hartung et al., Cre mutants with altered DNA binding properties. J 2013; 12(2):238-51. Doi:10.1016/j.stem.2012. 11.011. Epub Dec. 13, Biol Chem. Sep. 4, 1998:273(36):22884-91. 2012. Hirano et al., Site-specific recombinases as tools for heterologous Doyon et al., Enhancing zinc-finger-nuclease activity with improved gene integration. Appl Microbiol Biotechnol. Oct. 2011:92(2):227 obligate heterodimeric architectures. Nat Methods. Jan. 39, doi:10.1007/s00253-011-3519-5. Epub Aug. 7, 2011. Review. 2011;8(1):74–9. Doi:10.1038/nmeth. 1539. Epub Dec. 5, 2010. Hockemeyer et al., Efficient targeting of expressed and silent genes in Doyon et al., Heritable targeted gene disruption in Zebrafish using human ESCs and iPSCs using zinc-finger nucleases. Nat Biotechnol. designed zinc-finger nucleases. Nat Biotechnol. Jun. Sep. 2009:27(9):851-7, doi:10.1038/nbt. 1562. Epub Aug. 13, 2009. 2008:26(6):702-8. Doi:10.1038/nbt 1409. Epub May 25, 2008. Hockemeyer et al., Genetic engineering of human pluripotent cells ESvelt et al., Genome-scale engineering for systems and synthetic using TALE nucleases. Nat Biotechnol. Jul. 7, 2011:29(8):731-4. biology. Mol Syst Biol. 2013:9:641. doi:10.1038/msb.2012.66. doi:10.1038/nbt. 1927. Esvelt et al., Orthogonal Cas9 for RNA-guided gene regu Horvath et al., CRISPR/Cas, the immune system of bacteria and lation and editing. Nat Methods. Nov. 2013:10(11): 1116-21. doi: archaea. Science. Jan. 8, 2010:327(5962): 167-70. doi:10.1126/sci 10.1038/nmeth.2681. Epub Sep. 29, 2013. ence.11795.55. Fu et al., Improving CRISPR-Cas nuclease specificity using trun Hsu et al., DNA targeting specificity of RNA-guided Cas9 nucleases. cated guide RNAS. Nat Biotechnol. Mar. 2014:32(3):279-84. doi: Nat Biotechnol. Sep. 2013:31(9):827-32. doi: 10.1038/nbt.2647. 10.1038/nbt.2808. Epub Jan. 26, 2014. Epub Jul. 21, 2013. Fu et al., High-frequency off-target mutagenesis induced by Huang et al., Heritable gene targeting in Zebrafish using customized CRISPR-Cas nucleases in human cells. Nat Biotechnol. Sep. TALENs. Nat Biotechnol. Aug. 5, 2011:29(8):699-700. doi: 2013:31 (9):822-6. doi:10.1038/nbt.2623. Epub Jun. 23, 2013. 10.1038/nbt. 1939. Gabriel et al. An unbiased genome-wide analysis of zinc-finger Humbert et al., Targeted gene therapies: tools, applications, optimi nuclease specificity. Nat Biotechnol. Aug. 7, 2011:29(9):816-23. doi: zation. Crit Rev Biochem Mol Biol. May-Jun. 2012:47(3):264-81. 10.1038/nbt. 1948. doi:10.3109/10409238.2012.658112. Gaj et al., Structure-guided reprogramming of serine recombinase Hurt et al., Highly specific Zinc finger proteins obtained by directed DNA sequence specificity. Proc Natl Acad Sci U S A. Jan. 11, domain shuffling and cell-based selection. Proc Natl AcadSci USA. 2011:10802):498-503, doi: 10.1073/pnas. 1014214108. Epub Dec. Oct. 14, 2003: 100(21): 12271-6. Epub Oct. 3, 2003. 27, 2010. Hwang et al., Efficient genome editing in zebrafish using a CRISPR Gaj et al., ZFN, TALEN, and CRISPR/Cas-based methods for Cas system. Nat Biotechnol. Mar. 2013:31(3):227-9. doi:10.1038/ genome engineering. Trends Biotechnol. Jul. 2013:31(7):397-405. nbt. 2501. Epub Jan. 29, 2013. doi:10.1016/j.tibtech.2013.04.004. Epub May 9, 2013. Jamieson et al., Drug discovery with engineered zinc-finger proteins. Gao et al., Crystal structure of a TALE protein reveals an extended Nat Rev Drug Discov. May 2003:2(5):361-8. N-terminal DNA binding region. Cell Res. Dec. 2012:22(12): 1716 Jiang et al., RNA-guided editing of bacterial genomes using 20. doi:10.1038/cr2012.156. Epub Nov. 13, 2012. CRISPR-Cas systems. Nat Biotechnol. Mar. 2013:31(3):233-9, doi: Gasiunas et al., Cas9-crRNA ribonucleoprotein complex mediates 10.1038/nbt.2508. Epub Jan. 29, 2013. specific DNA cleavage for adaptive immunity in bacteria. Proc Natl Jinek et al. A programmable dual-RNA-guided DNA endonuclease Acad Sci U S A. Sep. 25, 2012:109(39):E2579-86. Epub Sep. 4, in adaptive bacterial immunity. Science. Aug. 17. 2012. Supplementary materials included. 2012:337(6096):816-21. doi:10.1126/science. 1225829. Epub Jun. Gasiunas et al., RNA-dependent DNA endonuclease Cas9 of the 28, 2012. CRISPR system: Holy Grail of genome editing? Trends Microbiol. Jineket al., RNA-programmed genome editing in human cells. Elife. Nov. 2013:21(11):562-7. doi:10.1016/j.tim.2013.09.001. Epub Oct. Jan. 29, 2013:2:e00471. doi:10.7554/eLife.00471. 1, 2013. Jinek et al., Structures of Cas9 endonucleases reveal RNA-mediated Gilbert et al., CRISPR-mediated modular RNA-guided regulation of conformational activation. Science. Mar. 14, transcription in eukaryotes. Cell. 2013 154(2):442-51. 2014:343(6176): 1247997, doi:10.1126/science. 1247997. Epub Feb. Gordley et al., Evolution of programmable Zinc finger-recombinases 6, 2014. with activity in human cells. J Mol Biol. Mar. 30, 2007:367(3):802 Jore et al., Structural basis for CRISPRRNA-guided DNA recogni 13. Epub Jan. 12, 2007. tion by Cascade. Nat Struct Mol Biol. May 2011:18(5):529-36. doi: Guilinger et al., Broad specificity profiling of TALENs results in 10.1038/nsmb.2019. Epub Apr. 3, 2011. engineered nucleases with improved DNA-cleavage specificity. Nat Joung et al...TALENs: a widely applicable technology for targeted Methods. Apr. 2014; 11(4):429-35. doi:10.1038/nmeth.2845. Epub genome editing. Nat Rev Mol Cell Biol. Jan. 2013;14(1):49-55. doi: Feb. 16, 2014. 10.1038/nrm3486. Epub Nov. 21, 2012. Guilinger et al., Fusion of catalytically inactive Cas9 to FokI nuclease Kaiser et al., Gene therapy. Putting the fingers on gene repair. Sci improves the specificity of genome modification. Nat Biotechnol. ence. Dec. 23, 2005:310(5756): 1894-6. Jun. 2014:32(6):577-82. doi:10.1038/nbt.2909. Epub Apr. 25, 2014. Kandavelou et al., Targeted manipulation of mammalian genomes Guo et al., Directed evolution of an enhanced and highly efficient using designed Zinc finger nucleases. Biochem BiophyS Res Com FokI cleavage domain for zinc finger nucleases. J Mol Biol. Jul. 2, mun. Oct. 9, 2009:388(1):56-61. doi:10.1016/j.bbrc.2009.07.112. 2010:400(1):96-107. doi:10.1016/j.ijmb.2010.04.060. Epub May 4, Epub Jul. 25, 2009. 2010. Karpenshifetal. From yeast to mammals: recent advances in genetic Guo et al., Structure of Cre recombinase complexed with DNA in a control of homologous recombination. DNA Repair (Amst). Oct. 1, site-specific recombination synapse. Nature. Sep. 4. 2012; 11(10):781-8. doi:10.1016/j.dnarep.2012.07.001. Epub Aug. 1997:389(6646):40-6. 11, 2012. Review. US 9,163.284 B2 Page 5

(56) References Cited Mali et al., Cas9 transcriptional activators for target specificity Screening and paired nickases for cooperative genome engineering. OTHER PUBLICATIONS Nat Biotechnol. Sep. 2013:31(9):833-8. doi: 10.1038/nbt.2675. Epub Aug. 1, 2013. Kilbride et al., Determinants of product topology in a hybrid Cre-Tn3 Mali et al., RNA-guided human genome engineering via Cas9. Sci resolvase site-specific recombination system. J Mol Biol. Jan. 13, ence. Feb. 15, 2013:339(6121):823-6. doi: 10.1126/science. 2006:355(2):185-95. Epub Nov. 9, 2005. 1232033. Epub Jan. 3, 2013. Kim et al. A library of TAL effector nucleases spanning the human Mani et al., Design, engineering, and characterization of Zinc finger genome. Nat Biotechnol. Mar. 2013:31(3):251-8. Doi:10.1038/nbt. nucleases. Biochem Biophys Res Commun. Sep. 23, 2517. Epub Feb. 17, 2013. 2005:335(2):447-57. Kim et al., Highly efficient RNA-guided genome editing in human Meckler et al., Quantitative analysis of TALE-DNA interactions sug cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. gests polarity effects. Nucleic Acids Res. Apr. 2013:41(7):41 18-28. Jun. 2014:24(6):10 12-9, doi:10.1101/gr. 171322.113. Epub Apr. 2, doi:10.1093/nar/gkt085. Epub Feb. 13, 2013. 2014. Meng et al., Profiling the DNA-binding specificities of engineered Kim et al., Hybrid restriction enzymes: zinc finger fusions to Fok I Cys2His2 Zinc finger domains using a rapid cell-based method. cleavage domain. Proc Natl AcadSci USA. Feb. 6, 1996:93(3): 1156 Nucleic Acids Res. 2007:35(11):e81. Epub May 30, 2007. 60. Meng et al., Targeted gene inactivation in Zebrafish using engineered Kim et al., TALENs and ZFNs are associated with different zinc-finger nucleases. Nat Biotechnol. Jun. 2008:26(6):695-701. doi: mutationsignatures. Nat Methods. Mar. 2013; 10(3):185. doi: 10.1038/nbt1398. Epub May 25, 2008. 10.1038/nmeth.2364. Epub Feb. 10, 2013. Miller et al., A Tale nuclease architecture for efficient genome edit Kim et al., Targeted genome editing in human cells with Zinc finger ing. Nat Biotechnol. Feb. 2011:29(2): 143-8. doi:10.1038/nbt. 1755. nucleases constructed via modular assembly. Genome Res. Jul. Epub Dec. 22, 2010. 2009;19(7): 1279-88. doi: 10.1101/gr.0894.17.108. Epub May 21, Miller et al., An improved zinc-finger nuclease architecture for highly 2009. specific genome editing. Nat Biotechnol. Jul. 2007:25(7):778-85. Kluget al., Zinc fingers: a novel protein fold for nucleic acid recog Epub Jul. 1, 2007. nition. Cold Spring Harb Symp Quant Biol. 1987:52:473-82. Moore et al., Improved somatic mutagenesis in Zebrafish using tran Krishna et al., Structural classification of Zinc fingers: Survey and scription activator-like effector nucleases (TALENs). PloS One. Summary. Nucleic Acids Res. Jan. 15, 2003:31(2):532-50. 2012;7(5):e37877. Doi:10.1371journal.pone.0037877. Epub May Larson et al., CRISPR interference (CRISPRi) for sequence-specific 24, 2012. control of gene expression. Nat Protoc. Nov. 2013;8(11):2180-96. Morbitzer et al., Assembly of custom TALE-type DNA binding doi:10.1038/nprot.2013.132. Epub Oct. 17, 2013. domains by modular cloning. Nucleic Acids Res. Jul. Lei et al., Efficient targeted gene disruption in Xenopus embryos 2011:39(13):5790-9. doi:10.1093/nar/gkr151. Epub Mar. 18, 2011. using engineered transcription activator-like effector nucleases Moscou et al. A simple cipher governs DNA recognition by TAL (TALENs). Proc Natl AcadSci USA. Oct. 23, 2012:109(43): 17484 effectors. Science. Dec. 11, 2009:326(5959): 1501. doi:10.1126/sci 9. Doi:10.1073/pnas. 1215421 109. Epub Oct. 8, 2012. ence. 1178817. Li et al., Modularly assembled designer TAL effector nucleases for Mussolino et al. A novel Tale nuclease scaffold enables high genome targeted gene knockout and gene replacement in eukaryotes. Nucleic editing activity in combination with low toxicity. Nucleic Acids Res. Acids Res. Aug. 2011:39(14):6315-25. doi: 10.1093/nar/gkr188. Nov. 2011:39(21):9283-93. Doi:10.1093/nar/gkrS97. Epub Aug. 3, Epub Mar. 31, 2011. 2011. Lietal. TAL nucleases (TALNs): hybrid proteins composed of TAL Narayanan et al., Clamping down on weak terminal base pairs: effectors and FokI DNA-cleavage domain. Nucleic Acids Res. Jan. oligonucleotides with molecular caps as fidelity-enhancing elements 2011:39(1):359-72. doi:10.1093/nar/gkg704. Epub Aug. 10, 2010. at the 5’- and 3'-terminal residues. Nucleic Acids Res. May 20, Liu et al., Cell-penetrating peptide-mediated delivery of TALEN 2004:32(9): 2901-11. Print 2004. proteins via bioconjugation for genome engineering. PLoS One. Jan. Nishimasu et al., Crystal structure of Cas9 in complex with guide 20, 2014:9(1):e85755. doi:10.1371journal.pone.0085755.. eCollec RNA and target DNA. Cell. Feb. 27, 2014; 156(5):935-49. doi: tion 2014. 10.1016/j.cell.2014.02.001. Epub Feb. 13, 1014. Liu et al., Design of polydactyl zinc-finger proteins for unique Osborn et al., TALEN-based gene correction for epidermolysis addressing within complex genomes. Proc Natl AcadSci USA. May bullosa. Mol Ther. Jun. 2013:21(6): 1151-9. doi:10.1038/mt.2013. 27, 1997;94(11):5525-30. 56. Epub Apr. 2, 2013. Lombardo et al., Gene editing in human stem cells using Zinc finger Pabo et al., Design and selection of novel Cys2His2 zinc finger nucleases and integrase-defective lentiviral vector delivery. Nat proteins. Annu Rev Biochem. 2001:70:313-40. Biotechnol. Nov. 2007:25(11): 1298–306. Epub Oct. 28, 2007. Pan et al., Biological and biomedical applications of engineered Maeder et al., CRISPRRNA-guided activation of endogenous human nucleases. Mol Biotechnol. Sep. 2013:55(1):54-62. doi: 10.1007/ genes. Nat Methods. Oct. 2013:10(10):977-9. doi:10.1038/nmeth. S12033-012-9613-9. 2598. Epub Jul. 25, 2013. Pattanayak et al., High-throughput profiling of off-target DNA cleav Maeder et al., Rapid 'open-source' engineering of customized zinc age reveals RNA-programmed Cas9 nuclease specificity. Nat finger nucleases for highly efficient gene modification. Mol Cell. Jul. Biotechnol. Sep. 2013:31(9):839-43, doi:10.1038/nbt.2673. Epub 25, 2008:31(2):294-301. doi:10.1016/j.molcel.2008.06.016. Aug. 11, 2013. Maeder et al., Robust, Synergistic regulation of human gene expres Pattanayak et al., Revealing off-target cleavage specificities of zinc sion using TALE activators. Nat Methods. Mar. 2013; 10(3):243-5. finger nucleases by in vitro selection. Nat Methods. Aug. 7. doi:10.1038/nmeth.2366. Epub Feb. 10, 2013. 2011;8(9):765-70. doi:10.1038/nmeth. 1670. Mahfouz et al., De novo-engineered transcription activator-like Pavletich et al., Zinc finger-DNA recognition: crystal structure of a effector (TALE) hybrid nuclease with novel DNA binding specificity Zif268-DNA complex at 2.1 A. Science. May 10, creates double-strand breaks. Proc Natl Acad Sci U S A. Feb. 8, 1991:252(5007):809-17. 2011:10806):2623-8. doi:10.1073/pnas. 1019533108. Epub Jan. 24, Pennisi et al., The CRISPR craze. Science. Aug. 23. 2011. 2013:341 (6148):833-6. doi:10.1126/science.341.6148.833. Mak et al., The crystal structure of TAL effector PthXol bound to its Pennisi et al., The tale of the TALEs. Science. Dec. 14, DNA target. Science. Feb. 10, 2012:335(6069):716-9, doi:10.1126/ 2012:338(6113): 1408-11. doi:10.1126/science.338.6113. 1408. science.1216211. Epub Jan. 5, 2012. Perez et al., Establishment of HIV-1 resistance in CD4+ T cells by Mali et al., Cas9 as a versatile tool for engineeringbiology. Nat genome editing using zinc-finger nucleases. Nat Biotechnol. Jul. Methods. Oct. 2013; 10(10):957-63. doi:10.1038/nmeth.2649. 2008:26(7):808-16. Doi:10.1038/nbt1410. Epub Jun. 29, 2008. US 9,163.284 B2 Page 6

(56) References Cited Segal et al., Toward controlling gene expression at will: Selection and design of Zinc finger domains recognizing each of the 5'-GNN-3' OTHER PUBLICATIONS DNA target sequences. Proc Natl Acad Sci U S A. Mar. 16. 1999;96(6):2758-63. Perez-Pinera et al., Advances in targeted genome editing. Curr Opin Semenova et al., Interference by clustered regularly interspaced short Chem Biol. Aug. 2012:16(3-4):268-77. doi:10.1016/j.cbpa. 2012.06. palindromic repeat (CRISPR) RNA is governed by a seed sequence. 007. Epub Jul. 20, 2012. Proc Natl Acad Sci U S A. Jun. 21, 2011; 108(25):10098-103. doi: Perez-Pinera et al., RNA-guided gene activation by CRISPR-Cas9 10.1073/pnas. 1104144108. Epub Jun. 6, 2011. based transcription factors. Nat Methods. Oct. 2013:10(10):973-6. Serganov et al., Structural basis for discriminative regulation of gene doi:10.1038/nmeth.2600. Epub Jul. 25, 2013. expression by adenine-and guanine-sensing mRNAs. Chem Biol. Peteket al., Frequent endonuclease cleavage at off-target locations in Dec. 2004; 11(12): 1729-41. vivo. Mol Ther. May 2010:18(5):983-6. Doi: 10.1038/mt. 2010.35. Shaikh et al., Chimeras of the Flp and Cre recombinases: tests of the Epub Mar. 9, 2010. mode of cleavage by Flp and Cre. J Mol Biol. Sep. 8, 2000;302(1):27 Porteus, Design and testing of Zinc finger nucleases for use in mam 48. malian cells. Methods Mol Biol. 2008:435:47-61. doi:10.1007/978 Shalem et al., Genome-scale CRISPR-Cas9 knockout screening in 1-59745-232-8 4. human cells. Science. Jan. 3, 2014:343(6166): 84-7. doi: 10.1126/ Proudfoot et al., Zinc finger recombinases with adaptable DNA science. 1247005. Epub Dec. 12, 2013. sequence specificity. PLoS One. Apr. 29, 2011;6(4):e 19537. doi: Sheridan, First CRISPR-Caspatent opens race to stake out intellec 10.1371journal.pone.0019537. tual property. Nat Biotechnol. 2014:32(7):599-601. Qi et al., Repurposing CRISPR as an RNA-guided platform for Sheridan, Gene therapy finds its niche. Nat Biotechnol. Feb. sequence-specific control of gene expression. Cell. Feb. 28, 2011:29(2): 121-8. doi:10.1038/nbt. 1769. 2013; 152(5): 1173-83, doi:10.1016/j.ce 11.2013.02.022. Shimizu et al., Adding fingers to an engineered Zinc finger nuclease Ramirez et al., Engineered Zinc finger nickases induce homology can reduce activity. Biochemistry. Jun. 7, 2011:50(22):5033-41. doi: directed repair with reduced mutagenic effects. Nucleic Acids Res. 10.1021/bi200393g. Epub May 11, 2011. Jul. 2012:40(12):5560-8. doi: 10.1093/nar/gks 179. Epub Feb. 28, Siebert et al. An improved PCR method for walking in uncloned 2012. genomic DNA. Nucleic Acids Res. Mar. 25, 1995:23(6):1087-8. Ramirez et al., Unexpected failure rates for modular assembly of Sun et al., Optimized TAL effector nucleases (TALENs) for use in engineered zinc fingers. Nat Methods. May 2008:5(5):374-5. Doi: treatment of sickle cell disease. Mol Biosyst. Apr. 2012;8(4): 1255 10.1038/nmeth0508-374. 63. doi:10.1039/c2mb05461b. Epub Feb. 3, 2012. Ran et al., Double nicking by RNA-guided CRISPR Cas9 for Szczepek et al., Structure-based redesign of the dimerization inter enhanced genome editing specificity. Cell. Sep. 12, face reduces the toxicity of zinc-finger nucleases. Nat Biotechnol. 2013:154(6): 1380-9. doi:10.1016/j.cell.2013.08.021. Epub Aug. 29, Jul. 2007:25(7):786-93. Epub Jul. 1, 2007. 2013. Tebas et al., Gene editing of CCR5 in autologous CD4 T cells of Reyonet al., Flashassembly of TALENs for high-throughput genome persons infected with HIV.NEngl J Med. Mar. 6, 2014:370(10):901 editing. Nat Biotechnol. May 2012:30(5):460-5. doi:10.1038/nbt. 10. doi:10.1056/NEJMOa130.0662. 217 O. Tsai et al., Dimeric CRISPRRNA-guided FokI nucleases for highly Saleh-Gohari et al., Conservative homologous recombination pref specific genome editing. Nat Biotechnol. Jun. 2014:32(6):569-76. erentially repairs DNA double-strand breaks in the Sphase of the cell doi:10.1038/nbt.2908. Epub Apr. 25, 2014. cycle in human cells. Nucleic Acids Res. Jul. 13, 2004:32(12):3683 Urnov et al., Genome editing with engineered Zinc finger nucleases. 8. Print 2004. Nat Rev. Genet. Sep. 2010; 11 (9):636-46, doi:10.1038/nrg2842. Sander et al., CRISPR-Cas systems for editing, regulating and tar Urnov et al., Highly efficient endogenous human gene correction geting genomes. Nat Biotechnol. Apr. 2014:32(4):347-55. doi: using designed zinc-finger nucleases. Nature. Jun. 2. 10.1038/nbt.2842. Epub Mar. 2, 2014. 2005:435(7042):646-51. Epub Apr. 3, 2005. Sander et al. In silico abstraction of Zinc finger nuclease cleavage Van Duyne et al., Teaching Cre to follow directions. Proc Natl Acad profiles reveals an expanded landscape of off-target sites. Nucleic Sci U S A. Jan. 6, 2009; 106(1):4-5. doi:10.1073/pnas.081 1624 106. Acids Res. Oct. 2013:41 (19): e181. doi:10.1093/margkt716. Epub Epub Dec. 31, 2008. Aug. 14, 2013. Vanamee et al., FokI requires two specific DNA sites for cleavage. J Sander et al., Targeted gene disruption in Somatic Zebrafish cells Mol Biol. May 25, 2001:309(1):69-78. using engineered TALENs. Nat Biotechnol. Aug. 5, 2011:29(8):697 Wah et al., Structure of FokI has implications for DNA cleavage. Proc 8. doi:10.1038/nbt. 1934. Natl AcadSci U S A. Sep. 1, 1998:95(18): 10564-9. Santiago et al., Targeted gene knockout in mammalian cells by using Wang et al., Genetic screens in human cells using the CRISPR-Cas9 engineered zinc-finger nucleases. Proc Natl AcadSci USA. Apr. 15, system. Science. Jan. 3, 2014:343(6166):80-4. doi:10.1126/science. 2008:105(15):5809-14. doi:10.1073/pnas. 0800940 105. Epub Mar. 1246981. Epub Dec. 12, 2013. 21, 2008. Wang et al., One-step generation of mice carrying mutations in mul Sapranauskas et al., The Streptococcus thermophilus CRISPR/Cas tiple genes by CRISPR/Cas-mediated genome engineering. Cell. system provides immunity in Escherichia coli. Nucleic Acids Res. May 9, 2013; 153(4):910-8. doi:10.1016/j.cell.2013.04.025. Epub Nov. 2011:39(21):9275-82. doi:10.1093/nar/gkró06. Epub Aug. 3, May 2, 2013. 2011. Wang et al., Targeted gene addition to a predetermined site in the Sashital et al., Mechanism of foreign DNA selection in a bacterial human genome using a ZFN-based nicking enzyme. Genome Res. adaptive immune system. Mol Cell. Jun. 8, 2012:46(5):606-15. doi: Jul. 2012:22(7): 1316-26. doi:10.1101/gr. 122879.111. Epub Mar. 20, 10.1016/j.molcel.2012.03.020. Epub Apr. 19, 2012. 2012. Schriefer et al., Low pressure DNA shearing: a method for random Weber et al., Assembly of designer TAL effectors by Golden Gate DNA sequence analysis. Nucleic Acids Res. Dec. 25. cloning. PLoS One. 2011;6(5):e 19722. doi:10.1371journal.pone. 1990:18(24):7455-6. 00.19722. Epub May 19, 2011. Schwank et al., Functional repair of CFTR by CRISPR/Cas9 in Wiedenheft et al., RNA-guided genetic silencing systems in bacteria intestinal stem cell organoids of cystic fibrosis patients. Cell Stem and archaea. Nature. Feb. 15, 2012:482(7385):331-8. doi:10.1038/ Cell. Dec. 5, 2013:13(6):653-8. doi:10.1016/j.stem. 2013. 11.002. nature 10886. Review. Schwartz et al., Post-translational enzyme activation in an animal via Wolfe et al., Analysis of Zinc fingers optimized via phage display: optimized conditional protein splicing. Nat Chem Biol. Jan. evaluating the utility of a recognition code. J Mol Biol. Feb. 5, 2007:3(1):50-4. Epub Nov. 26, 2006. 1999:285(5):1917-34. Segal et al., Evaluation of a modular strategy for the construction of Wood et al., Targeted genome editing across species using ZFNs and novel polydactyl Zinc finger DNA-binding proteins. Biochemistry. TALENs. Science. Jul. 15, 2011:333(6040):307. doi: 10.1126/sci Feb. 25, 2003:42(7):2137-48. ence. 1207773. Epub Jun. 23, 2011. US 9,163.284 B2 Page 7

(56) References Cited 2009:5(1):97-110. doi:10.1016/j.stem.2009.05.023. Epub Jun. 18, 2009. OTHER PUBLICATIONS International Search Report and Written Opinion for PCT/US2014/ 052231, mailed Dec. 4, 2014. Wu et al., Correction of a genetic disease in mouse via use of Invitation to Pay Additional Fees for PCT/US2014/054291, mailed CRISPR-Cas9. Cell Stem Cell. Dec. 5, 2013; 13(6):659-62. doi: 10.1016/j.stem.2013.10.016. Dec. 18, 2014. Yanover et al., Extensive protein and DNA backbone sampling Genbank Submission; NIH/NCBI, Accession No. J04623. Kita et al., improves structure-based specificity prediction for C2H2 zinc fin Apr. 26, 1993. 2 pages. gers. Nucleic Acids Res. Jun. 2011:39(11):4564-76. doi:10.1093/ NCBI Reference Sequence: NM 002427.3. Wu et al., May 3, 2014. nar/gkr048. Epub Feb. 22, 2011. 5 pages. Yin et al., Genome editing with Cas9 in adult mice corrects a disease Fuchs et al., Polyarginine as a multifunctional fusion tag. Protein Sci. mutation and phenotype. Nat Biotechnol. Jun. 2014:32(6):551-3. Jun. 2005;14(6): 1538-44. doi:10.1038/nbt.2884. Epub Mar. 30, 2014. Mussolino et al., TALE nucleases: tailored genome engineering Zhang et al., Conditional gene manipulation: Cre-ating a new bio made easy. Curr Opin Biotechnol. Oct. 2012:23(5):644-50. doi: logical era. J Zhejiang Univ Sci B. Jul. 2012; 13(7):51 1-24. doi: 10.1016/j.copbio.2012.01.013. Epub Feb. 17, 2012. 10. 1631/jzus. B1200042. Review. O'Connell et al., Programmable RNA recognition and cleavage by Zhang et al., Efficient construction of sequence-specific TAL effec CRISPR/Cas9. Nature. Sep. 28, 2014. doi:10.1038/nature 13769. tors for modulating mammalian transcription. Nat Biotechnol. Feb. International Search Report and Written Opinion for PCT/US2014/ 2011:29(2):149-53. doi:10.1038/nbt. 1775. Epub Jan. 19, 2011. 050283, mailed Nov. 6, 2014. Zou et al., Gene targeting of a disease-related gene in human induced pluripotent stem and embryonic stem cells. Cell Stem Cell. Jul. 2, * cited by examiner U.S. Patent Oct. 20, 2015 Sheet 1 of 55 US 9,163,284 B2

FG. A U.S. Patent Oct. 20, 2015 Sheet 2 of 55 US 9,163,284 B2

(E1. i 8.8 tigia igation

p.8 with two printers: adapter 2-constant sequence a

2 adapter E.

Apicens containing cleavesi ibrary meters Gei purification

is Capitatiosa analysis Cassisg38& D8& ceavage specificity profie FIG 1B U.S. Patent Oct. 20, 2015 Sheet 3 Of 55 US 9,163,284 B2

specificity sce 8 to -9 -S

-S is 3 +3.3 k +3. --. -8.

-3.3 c. -8.3 83 ic: 8& -3.3 c. 8.f -8 ics -8, -3.3 c. 3,

s FG, 2A

(3 in CassCA sgriA w2., singia-Kitant seese:ces oxy

A A T ceases A. FG. 2B U.S. Patent Oct. 20, 2015 Sheet 4 of 55 US 9,163,284 B2

F.G. 2C 000 ni Cassic Asgr8A w.0, s.

AG C C CA; C : C S S Crag F.G. 2D 200 at NA, (3 nt; Case:CiA sgr8A y2

FG. 2E sgr. A w.x., singierstant sequences oriy

FG. 2F

U.S. Patent Oct. 20, 2015 Sheet 6 of 55 US 9,163,284 B2

is a 3' ar.get NiA

gRNA

FG, 3A

Wig::::::::::::::::::::::::::::::::88-3 3 -ika:ggggia:Cassassikasegas:::::::::::g-

Wa::::::::::::::::::::::::::::::::::::: M3

ax: is a 8883.88:38:::::::::8-3 3 A::::::::::CC:{A::::::CXies 8.83

FG. 3B U.S. Patent Oct. 20, 2015 Sheet 7 Of 55 US 9,163,284 B2

his A ei 8: -M Me cra 5 ------ascertagecoccascage------s y 3 wax w w w w a -caggagsagascarcerce--- A v' ' ' w A. A 3 * w s’ -gxisticci: exict: coccaccataxi sagasca 3’ -ge:3883ceak:88-89 (ex:

C^2 S3'-aces ------crecce:Caxesex8Excecce:{x:- agrics.ccessecsacc-s’ ------3 f{} : ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; 5'-cycocticaascagecoccacGutawa&Agca & 4 3 w88:38.88:88:38: 88. & cra 3 r-seaagascarcassassis------3 3 wwww.www.wiki:C&x:::::::::: w w w w w w w x X w.S. f & * 3:688:338&:388:::::::::::: & 3'-GCCUGAuccGAAusAAAtiu{ coatia 88.8. A ------Casasagascaca ------3. y W. MWW MW M. W888&8888 W M W W. M. W. W. M. W M3 s’ -graaataracteristickcheuuwasheeta k & 3 M{{{3x38x8:3388.88:33 (383 &&. FG. 3C

U.S. Patent Oct. 20, 2015 Sheet 9 Of 55 US 9,163,284 B2

a s

8. 38as ; : 3:3: 388: 88a. * 3%. 8, :38, 38 83%

3:38;&

skis&

FIG. 4 U.S. Patent Oct. 20, 2015 Sheet 10 Of 55 US 9,163,284 B2

§§§§§§

FG 5A

ae

FG, SB U.S. Patent Oct. 20, 2015 Sheet 11 Of 55 US 9,163,284 B2

:: &

8 : : 3 & 8 & 3 & 8 : : : *:::::::::::::::::::::::: F.G. SC

3 : 3 & : : : 3: 8 & :: : *:::::::x: {x :::::::::::::::: F.G. SID U.S. Patent Oct. 20, 2015 Sheet 12 Of 55 US 9,163,284 B2

:: &

8:88 x 8:388

F.G. S.E U.S. Patent Oct. 20, 2015 Sheet 13 Of 55 US 9,163,284 B2

specificity Six: -- c. 3-3 -- is: - &. C. --S

--, -8.

F.G. 6A

ntv Cassic A2 sgrinia va. , single-mutant sequences any T

c Ä. C T C C C T C A A G C A G G C C C C GC Nige F.G. 6B U.S. Patent Oct. 20, 2015 Sheet 14 of 55 US 9,163,284 B2

C C C & A is { A, C { { . . . . . N. & Af F.G. 6G {{3} f ass: gRNA v1.0, sin

{ c

& C A & { . A S S C C C C { { N S is Ay F.G. 6)

2}{ i 38A, it was:C.A. sgrixia we, U.S. Patent Oct. 20, 2015 Sheet 15 Of 55 US 9,163,284 B2

( :83. A

(s c T A C as CAG co is FG, 7B U.S. Patent Oct. 20, 2015 Sheet 16 of 55 US 9,163,284 B2

200 nM DNA, 1000 rhi Cass: Ci A2 sRNA wi.

FIG.7C

c cc c fore cac occeeded Ai FIG.7D

FG. 7E

{{x} at Cass: A2 sgrinia, w8., single-mutant $8guierces or

c cc c T cae case c cc c ec Ngc A. F.G. 7F U.S. Patent Oct. 20, 2015 Sheet 17 Of 55 US 9,163,284 B2

specticity SCE -- : 8 S:

38 is 8.f

A. x G C A G A T G T A 3 T G T C C A C A N 3 G A. FG. 8A

1: gi Cas: A4 s RNA 2.

{ A. D { C A G A C & A G is röy FG. 8B U.S. Patent Oct. 20, 2015 Sheet 18 Of 55 US 9,163,284 B2

200 nivi N.A., 30 rik Cass. C.A4 sgr. A y

&

& C A (3 A T G T A G T G T R T C C A C A N G 3 F.G. 8E y

ce. occ Acc ji FG, 8F U.S. Patent Oct. 20, 2015 Sheet 19 Of 55 US 9,163,284 B2

U.S. Patent Oct. 20, 2015 Sheet 20 Of 55 US 9,163,284 B2

s: cifA, variable basepairs Cii A, 2 variable basepairs} 3. 1.8- : 1.3 y (38 x (8 3 Q.8 ; 38 39.4 0.4 (3.2 C.2 500is uu"; M a8 sois uv; III. Tay positio of base air position of F.G. OA F.G. B.

8 Ci.TAt 3 variabie basepairs 8. Ci. At 4 variate basepairs) $8: {} {} 0.8 is 0.8 as o 3,4- 3: S 3.2 502 is 0.0 RA Soosuu-, A& 8. position of base pair s positio8 of base pai

Ci.A, 5 variable basepairis) CA, 8 variaixie basepairs 1.0 (.8 3 (.8 0.3. 8, 3.

A8 Af position of kase pair position of base pair F.G. E F.G. F U.S. Patent Oct. 20, 2015 Sheet 21 Of 55 US 9,163,284 B2

s C.A. 7 wariate asepairs (i.iiki, 8 variatie basepairs : 1. 10 8.8 3e 8 0.6 is 0.8 5 (.4 : 8, 2 8.2 s 8. . . Af . A. 8: position of base pair position of tase pair FG. OG F.G. OH

Cii A, 3 variate asepairs E3 CiAt, it wariable basepairs is to $s 1.0 8.8 8,8. is .8 is 8 3. 4 5 (.4 5 oz s (3.2

Sc. i. 8A S 3. i X Rosition of base pair position of base pair FIG. O. FG. I.

C. At it variate asepairis 3. CitAt, 2 variatie basepairs to id: 3.0 0.8 es 8.8 is 8.8 is 8 83 3.4 o.4 S 3.2 So 2 ; , III III88: 500 8 &A. 8.8 positio of base pair S. position of base pair FG, OK F.G. O. U.S. Patent Oct. 20, 2015 Sheet 22 of 55 US 9,163.284 B2

CA2, it wariate basepairs CTA2, 2 variabie base airs

is . - 8: position of base pair positic? of base gai FG. A F.G. B

CifA3, 3 yari&is basspairs CA2, & Wariabie base pair(s)

4.

8w & 88: position of base pair Af position- - of base pai PAM FG, C FG D

3. CiA2, 5 variable tasepairs 3. (if A2, 6 wariable basepairs 3 to 1.0 g {38 0.8 0.6 $3.8 0.4 (.4 s 8.2 0.2 8.8 00 position of base pai position of base pair FG, E F.G. F U.S. Patent Oct. 20, 2015 Sheet 23 Of 55 US 9,163,284 B2

CiA3, 7 wariabie basegairts 3. CiA2, 3 variable basepair(s) 3 10 : 1. a 0.8 0.6- 3.8 is {3,4 5 0.2- 0.2 {{. 3A i.0.0 A: position of base pair gosition of base pair FG. H.

8. (i.A2, 9 wariable basepairs s CA2, variabie basepair(s) s 3.0 : 1.0 8 a 0.8 0.6 a to 4 So 2 .2 su,0-. â. is 00- A. 8. position of base pair position of base pair

(i.A2, variahie asepairs 3. C. A2, 2 variable basepairs to 1.0 N 38 .8 is .8 G.8 0.4 0.4 5 02 is 2 solo 88:X C. & 8 A. is uu"; is: position of base pair c positio of tase pair F.G. K FG. L. U.S. Patent Oct. 20, 2015 Sheet 24 of 55 US 9,163.284 B2

CiA3, variabie asepair(s) 3. CiA3, 2 variable basepairs} 3 3 10 X 3.8 a 3.8 8.is 8.4 0.2. oo a& A. 8A position of base pair position of base paif FIG. 2A FG. 2B

1.0 to - 8 i 8.8 0.6 5 (.6 8x i (4- 82w 0.2 Xtood R. soo3x sa2. t 1 Af 8 8 position of base pair FG. 2C F.G. 2D

CiA3, 5 variable tast pairs g cifA3, 6 variabie basepair(s) 1.0 c: 1.8 isit 0.8is 8- 508a n is. 0.6 3 .8 $3.4 : 4. 50.2 5 02 0.0 is 0.0 AR: 8 positic of tase pair position of base pair FG. 2E F.G. 2F U.S. Patent Oct. 20, 2015 Sheet 25 Of 55 US 9,163,284 B2

CiA3, it wariate asepairs g CiA3, 8 variable basspair(s) : 1.0- s a 3.8 an .88 s: 3 .8 3 (8- 8.is .4 8. 8.4 50.2 {3. 3. U.9- A: 3. - i 8. position of base paif positio: {f base pair

3. CiA3, 9 Wariatsie asepairs C.A3, it wariate basepairs 5 : 10 to en 3.8 as (38 B.S. s (3.8 4 (3.4 So 2 3.

too &W& 8X . p A. position of base pair positio of base pair

CiA3, variabie basspairs 3 CiA3, 3 variabie basspairs 3 s 1.8 : . es .8 oth 38 3 (.8 as .6 8.is 8.4 8.is 8.4 8, 8,

isx W : solo6.0- A. s position of base pair s position of base pair FG. 2K U.S. Patent Oct. 20, 2015 Sheet 26 of 55 US 9,163,284 B2

CfA4, variable basepairs} (ii A4, 2 variable basepair(s) 1.0 is 0.8 0.8- .4 3. s ).3

8. Af Af position of base pair position of base air FG, 3A F.G. 3B

if A4, 3 variabis basepair(s) CiA4, 4 variable basepair(s) : 3 e 0.8 .8 3.4 8 3 (.3 3.8 A88 (.3 position of tase aif position of base paif FG, 3C FG, 3D

Ci A4, 5 variable basepair(s) ci.A4, 6 variable basepair(s)

6

3, position of base air position of ase air F.G. 3E FG. 3F U.S. Patent Oct. 20, 2015 Sheet 27 Of 55 US 9,163,284 B2

8. CA4, 7 variable tasepairs CiA4, 8 variable basepairs) it 8 3 .( 0.8 is 3.8 s (38 3 (.8 8. (.4 8. .4 02 S 3.2 C. .C. 88: UU- &&. i position of base pair E position of base pair FG, 3G FG. 3H

3. CiA4, 8 variatsie aaSepairs 8 CiA4, variable assepairs ka 1.0 ar 8 to a 'way 0.2 s:as 30u U- Ä8 issolo-is uu-, WWYY & position of base aii position of tase pair

s i.A., 11 variabie basepairs s Ci.A., 2 variable basepairs 1.0 8 8,8 is 8.8 S.is 8.4 8.2 8 : a issaas 0.0- A. is 0.0- x Ai position of base pair 8. position of base air FG. I3K FG, 3. U.S. Patent Oct. 20, 2015 Sheet 28 Of 55 US 9,163,284 B2

CiA, variate tasepairs CA, 2 variable basepairis

33% pie-setectio:> gre-selectics: 80% post-selection 80%- post-Seiecki is 83% 60% res *: 40% 40% as 2% c. 2%

s R&f 3 stations a fixes of 838taiis F.G. 4A F.G.r 4B

C.A., 3 variabie basepairs C.A., 4 variate a sepairis

I pre-select: 83 m m pie-selectionlect post-selection 8 8. post-selectics: is 63% *S 40% Rs 2.8% {3 3 { 1 8 s 4. auties of it:tations SR:::ies of Ritairs F.G. 4C

Ci. At, 3 variable basepairs C.A. Swariate asepairs

100% pie-seectio: 3% pre-se-pre-S88&to: - - - - - t------80% post-selection 80% post-setector 80%gro 80% s res 40% & 2% 8 23%,

2 3. 4. s G 2 3 3 S 6 sites tails 338 Bf Ratifs F.G. 4E FG. 4F U.S. Patent Oct. 20, 2015 Sheet 29 Of 55 US 9,163,284 B2

CiA, variate asepairs (i.A1, 8 variate basepairs 8% 38%. pre-selection pre-selection g 8% st-selection 80% host-Seieiatio is 60% is 60% *:: 48% 40% -: 3: ; c: 3,

8 2 3 4 5 & if 3 3 : S 3 8 Re; if statists airie of Rutations FG, 4G F.G. 4H.

CA, 9 variable basepairs CA, Q wariable asepairs 13.3% I. pse-selection 88% pie-setectio: 8 3C: past-selection 88% post-selectic 60% 83% 40% s 2 48% sts 239: & 3.8% {} 2 3 4 5 8 8 9. & 2 4 3 & 18 use of Rations is bef of titatio:S FG, 4 F.G. 4

C.A, it wariable basepairs ci. A 2 variate asepairs iii, 3:38, ------?ection88-888:338): g;8-selectics 80% post-selection post-selection is 80% is 38% 83 28% { { 2 4 & 8 to (3 2 : 8 3 : 3 tifief of statics fixes of titations F.G. 4K F.G. 4L U.S. Patent Oct. 20, 2015 Sheet 30 Of 55 US 9,163,284 B2

CA2, 1 waristis basspairs) CiA2, 2 variate basepairs

18% eit (38% ict se-Seieition pre-selection 880% ost-selection: g 80% post-selection is 83% 80% " res 40% 40% s s 3 23% 8; 3% 3 2 $83.8 if Ratio:S 38e of titatics F.G. 5A F.G. SB

CA2, 3 variaixie asepairis A3, 4 Waristie basepairs 188% pre-selection 83% se-Seie: 38-selection 3 8% ost-selectio: A post-setection is 80% : 80% w; 40% 40% 33% Š 23%,

8: 2 3 re; if 383S it bef if 8:38:3ts FG, SC F.G. 5D

CA2, 5 variabie basepairs A2, 6 variabie basepairs

{{3% 38% rur pre-selection 8% age-Seieiof 880% B post-se:ectics: gbu% post-selection s 83% s 38% S 4.3% S 30% is 23%* w is 28%

? 2 3 4. * 2 3 a S. 8 afte: of its tastic:3S $3rise of filiatics F.G. 5E FG, SF U.S. Patent Oct. 20, 2015 Sheet 31 Of 55 US 9,163,284 B2

ci.A2, variable basepairs CitA2, 8 variable asepairs

83% ruwww. 83% pse-set&ction It pre-selector g 33% post-selection post-selectiof 5. 83% 43% s 8 238

3 2 3 4 5 & if 2 3 4 5 8 8 8 Eye of Ritatiss size iguatics FG, 5G FG, SH

(i.A2, 8 variable basepairs CiA2, it wariate basepairs

{% r 3. 80% pre-selectix pre-selectio: a post-Seieiti post-section 83% 40% - is 23%

1 2 3 4 8 8 3 { 2 & 8 8 : bef if Rutations 838 of statio:S F.G. 5 F.G. 5,

A2, it wariabie basepairs CA2, t2 variaisie basepairs

80%r " pre-se-----taaaaaaaaa. 188% - - - - 8-S&S 3. 80% pre-selection 80% post-setection post-selectio 3. 83% 3. 83% 43% $8% s its 23% 3 23%

3 4 S 8 3 & 8 8 8, 2 38e of 3:333s. tries of titatos F.G. 5K F.G. 5, U.S. Patent Oct. 20, 2015 Sheet 32 Of 55 US 9,163,284 B2

CiA3, variable basepairs CiA3, 2 variable basepair(s) 8% e 80% pre-Seieition pre-seection as 80% 80% B post-selection &s B post-selection is 60% 80% s 43% 43% 3 C is 23% st 28%

g use of ratists 3e of stations F.G. 6A F.G. 6B

CiA3, 3 variable basepairs CiA3, Wariaisie basegairs {8%, 80% pre-se-re-se:ects: - - - - -t------It pre-selection 80% B. gest-setection 80% past-seier:to: is 60% 60% “s * 40% 48% s(3 2%.w s: 23% G ; 2 3 3. 3 A. ittie if satists i.e. of titatio:S F.G. 6)

CiA3, 5 variabie basepairs CiA3, 8 variable basepairs

38% ------lect ------3C: &^ K. te-selection 8e-selection 3.80% post-seiection 8 83% post-setection is 60%- - is are68% res s 48% 43% s & 20% 8ts 20%

2. 3 a. 3 8 2 3 4 3 S i:8&f ratics ;8&t of stafs F.G. 6E F.G. 6F U.S. Patent Oct. 20, 2015 Sheet 33 Of 55 US 9,163,284 B2

ifas, variate basepairs (CiA3, 8 variaixie asepairs 30% 8% pre-selectics 88% pie-setection: 3 80% post-Seieitici gave A post-selection is 60% 83% . 4.3% 40% c & 28. is 23% 8 2 3 ; 8 S 2 3 : 6 8 is as a ?tations 888&f f :Jt8tiaf's F.G. 6G

CiA3, 9 variable basepairs Cita3, variatie asspair(s)

13% ------lect ------83% ------ipre-setecto -----to pie-Select3: 80% post-selection 3 80% post-selection 83% 63% e res 43% 40% s: 20% { { 3 : S 3 8 {} 4 8 8 : isities of Rutatists :::::ite of testi:38 F.G. 6 F.G. 6.

CiA3, 1 variaihie basepairs CiA3, 2 variabie basepairs 83% pie-seiection fe-88tect: 3 83% post-Seer:tio: 380% post-seisctin are63% Ye 3. 83% 43% 4.3% c 3 & 28% 33 23% C 8 : 2 4 6 & 8 2 i 8 8 2 fiber of friations 8; bef if Jiati&S F.G. 6K F.G. 6L U.S. Patent Oct. 20, 2015 Sheet 34 of 55 US 9,163,284 B2

CA4, wariabie basepairs A4, 2 variaisle asepairs

;33%. rumorpre-selection pre-selectio 3. 88% post-seection gest-selection 80% o res 48% s st 23%

;38. Of Yigios: ity&f of 333i:S F.G. 7A FG. 7B

CA4, 3 waiiaixie basepairs CiA4, 4 variate basepairs

pre-selection {}{3% presel'''''tmife-Seiecki A post-selectii: g 80% A post-selection is 80% *: 43%

2 3 s ifier of 38tors Rise fatiggs FG. 7C FG. 7D

Cilla4, 5 variabie basepairs CA, 8 variatie asepairs

8% 3% e: pre-selectic 3e-selector 8 88% post-seection 83% post-sei:tion 5%. 8% 48% 40% s: 23, is 28%

{ 4 g 3. 3. 5 3 3 3 is ise of taxis :::ities & 3 statiots F.G. 7E F.G. 7F U.S. Patent Oct. 20, 2015 Sheet 35 of 55 US 9,163,284 B2

CiA4, f Wariate asepairs CiA4, 8 wariabie basepair(s)

33% pre-selection&c. 3% pre-selection&c. 3 8% post-selectio: 80% it post-selection 88% 80% 2 . * 43% 48% a: c. 23% Rs 2.3%

1 2 3 ; : 8 7 2 3 : 3 8 8 3:3: ; 3 siggs f:33 & rustics FG, 7G FG 7

CiA4, 9 waiiage basepairs CA4, 10 variabie basepairs 188% 3: pre-Selectif pre-selector g 80% B post-selection g80% post-selectio: 60% S. 40% &: 28%

2 3 4 8 8 8 2 : 8 8 : fibe of friations giftier of 8taticis FG. T. FG. 7.

CA4, it variable basepairs CiA4, 2 variable tasepairs

38% pie-Seiectio: 83% p8-S88c3:3:88 8 8% post-seection 83% post-selector: is 80% 68% res 40% 43% &S 28% cs 28% { 2 : 8 8 } 3 & 4 & 8 2 338 of statis tuber of titations F.G. 7K FG. 7 U.S. Patent Oct. 20, 2015 Sheet 36 of 55 US 9,163.284 B2

200 rv ONA, OO in C 388 CIA sgrinia v2.1 -

position of base pair FG. 8A

2O in NA. O. rhi Cass:w CLA2 sgrMA v2.

Avi positio of base pair

FG. 8B U.S. Patent Oct. 20, 2015 Sheet 37 of 55 US 9,163.284 B2

200 nM DNA, 100 nivi Cass: CTA3 sgrMA v2.

. 8.

8

.

,

t positio...' ' ' '.' '.' * of* *...* r * .base pairw 2. Af FG. 8C

200 nM DNA, 100 nM Casg:CLTA4 sgrMA v2.

i. 8.

posities of base pair FG, 8 D U.S. Patent Oct. 20, 2015 Sheet 38 of 55 US 9,163.284 B2

200 nivi DNA, 000 na Cass}:Ci. A sgxi.A w. 3.

8

0.0- posities of base pair FG. 9A

200 nivi DNA, 000 nM. Cass-CA2 sgrinA wo

swawass 8.

position of base pair FG, 9B U.S. Patent Oct. 20, 2015 Sheet 39 Of 55 US 9,163,284 B2

28 g \A 80 ; Cas8: C. TA3 sgrMA w.0

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ position& of ase air F.G. 19C

200 nivi DiNA, 000 nM Cass: CTA4 sgriiA w.(

y 2 : of ci &se pa: FG. 9D U.S. Patent Oct. 20, 2015 Sheet 40 of 55 US 9,163.284 B2

200 nM DNA, 1000 nivi Cass-CTA sgriiA v2.1 -:

8.

O. w

8, - positionA -- of base pair 2: A FIG. 20A

2008. DNA, 1000 at Case CLA2 sgria v2.

. :

position of base pair FG 20B U.S. Patent Oct. 20, 2015 Sheet 41 of 55 US 9,163.284 B2

23 8 :-A, 38 it. Oas:Qi A3 sgrixia v. - &. Se 0.8

S. g 3. 4" s 8: 8 S

0.0 positiona a of base pair 2O PA FG, 2C

200 at DNA, 1000 at Cass: Cli A4 sgri-Aw2.

se 0.8

3.8- 3. 3.: (.4- s 0.2- w

position of base pair FG, 2.0D

U.S. Patent Oct. 20, 2015 Sheet 43 of 55 US 9,163,284 B2

--- Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. NR CA3 gre-selectic 300 & Oxia 3. { w Cassi:CiA3 sgr. A v.2. { 000 m. Case:C.A: srhia w8.

i. 3 9 w 5 8 3 Sw Sx 3 8 kgw. i. 93. 28 w- & its::::::::S FG. 2C

Q. A4 pre-seiectio: 23 & Chia 8 Cas::::A3 sg3NA, w?, k .338 & Casa:C. A sgr8Awa,

3 & 2 is- 3x 8{ 8 isM 5 - 3 isa is 9 2 a- A disciectices FG, 2D U.S. Patent Oct. 20, 2015 Sheet 44 of 55 US 9,163,284 B2

?. A pse-seiectixx (288 . xxia 8 Cass: a sgria, y2. 300 & Casci. At sgri-Ay3. : 8. s 8 s:

s: s

oo------3 9 - 5 8 3 5 8 5 9 2 - & citieices

M. i. A pse-seection: 2: & Rhia { i Cass:Qi A3 sgrkAw2. ( ; 888 & Cass: Asghia y2. K: & & *: s s s cs:

{. 3 - 3 8is 3 &5 R. a cit::::itectities

FG. 22B

U.S. Patent Oct. 20, 2015 Sheet 47 of 55 US 9,163,284 B2

------30 at Casa C.; A3 sg. 8:A v3. { {}00 & Cas&CiA3 sgxi.Aw2. 8. s saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

8

f 3 RS a 3c 8: 8 as <5 &3 8 as & 983 it2 w A: i:3:geois FG, 23C

{{ 8: Casa:CiA3 sigrxia was { {{{ ai: Qass.C.I.A.4 sgri-Aw8.

-3. i

r < { w S. 3 & ( & & 3 is { . PA: i::cies:cies FG. 23)

U.S. Patent Oct. 20, 2015 Sheet 50 of 55 US 9,163,284 B2

C. A sg.8A-degendent specifixity charge {{{ ai: wa. - 888 as w. 3.

s & 8 $3. &

w M

S. X X s X “::::::::: 0- 8 is eact- at a & 33sixt of 33&e sai 2: 3:

C. As $gr8&-tieperce: speciicity charge : « va. - 38: 8: wa. } &

e six- XXX

s: 3- *. & & six : 2 3.

, ei:3, Ex :::::::::: : IIe s: 38sitio of ease 33: 3 &: FG. 25B U.S. Patent Oct. 20, 2015 Sheet 51 of 55 US 9,163,284 B2

C. 83 sgi-ik-8 epetitiestsg:cificity charge {{{X} : y. - (30 888 w8. $* x X e 3 8. & is c 2 X w8& s x x & 8. » ** a II arc s positios (; base ga: £8

FG, 25C

C3: &4 sgikhia-tiege:88 specificity :38ge {{{3 & w8. ~ : {}{33:8: wa.

3 3 &. X X

s

3. X &s. X- sixx g * x x.

&

s est-e-a-dotgossix of ase 3a; 2: 3:8: - FG, 2.5D U.S. Patent Oct. 20, 2015 Sheet 52 of 55 US 9,163,284 B2

C. A sgr8&-tieperscient specificity charge {{}{{ 88 y 3.3 - 333 nt w?. k

e X s: 3

83 X 2 s:

e 8, ...... e.

s position of iase pair :: S-88 w F.G. 26A

CiA2sgr8&-depexiest specificity chang: 3000 8 y. - (30 88 w8. d

s: x X 3 *

3. X

s 8 &

s X X 8X X *

E. : X s 0 w: (IIa. { ; positiox: 3 xase pair 88 & . FG. 26B U.S. Patent Oct. 20, 2015 Sheet 53 of 55 US 9,163,284 B2

Cili A3 sg8&A-depetitia: specificity charge {{{83 nt; w8 - 800 88 w2.

4. &. X

2 3 'S s & c 2 X s & 3. ~ & g & X& & ... x w * x x 8 3. & s X 8 El-Eco-aircacao-e-- r s positics of ase 38 2. P&: - F.G. 26C

{i : Axisgithiaxiepertiest specificity :38ge {{338 88 y 3 - {:}} 8: wa.

ki 8

&X 3 8xX. & &s X

g *& *& 3. X

8: X s: 0- or--tex8

F.G. 26 D

U.S. Patent Oct. 20, 2015 Sheet 55 of 55 US 9,163,284 B2

tigiit is sgrix A w?, sgr. x, y : cut A cit a

F.G. 28 US 9,163,284 B2 1. 2 METHODS FOR IDENTIFYING ATARGET reconstruction of target sites from sequences of two half-sites SITE OF ACAS9 NUCLEASE resulting from two adjacent cuts of the nuclease of a library member nucleic acid (see e.g., PCT Application WO 2013/ RELATED APPLICATION 066438; and Pattanayak, V., Ramirez, C. L., Joung, J. K. & Liu, D. R. Revealing off-target cleavage specificities of zinc This application claims priority under 35 U.S.C. S 119(e) to finger nucleases by in vitro selection. Nature methods 8, U.S. provisional patent application, U.S. Ser. No. 61/864, 765-770 (2011), the entire contents of each of which are 289, filed Aug. 9, 2013, the entire contents of which are incorporated herein by reference). In contrast to such “two incorporated herein by reference. cut strategies, the methods of the present disclosure utilize 10 an optimized “one cut Screening strategy, which allows for GOVERNMENT SUPPORT the identification of library members that have been cut at least once by the nuclease. As explained in more detail else This invention was made with U.S. Government support where herein, the “one-cut’ selection strategies provided undergrant numbers HR0011-11-2-0003 and N66001-12-C- herein are compatible with single end high-throughput 4207, awarded by the Defense Advanced Research Projects 15 sequencing methods and do not require computational recon Agency. The U.S. Government has certain rights in the inven struction of cleaved target sites from cut half-sites, thus tion. streamlining the nuclease profiling process. Some aspects of this disclosure provide in vitro selection BACKGROUND OF THE INVENTION methods for evaluating the cleavage specificity of endonu cleases and for selecting nucleases with a desired level of Site-specific endonucleases theoretically allow for the tar specificity. Such methods are useful, for example, for char geted manipulation of a single site within a genome and are acterizing an endonuclease of interest and for identifying a useful in the context of gene targeting as well as for therapeu nuclease exhibiting a desired level of specificity, for example, tic applications. In a variety of organisms, including mam for identifying a highly specific endonuclease for clinical mals, site-specific endonucleases have been used for genome 25 applications. engineering by stimulating either non-homologous end join Some aspects of this disclosure provide methods of iden ing or homologous recombination. In addition to providing tifying Suitable nuclease target sites that are sufficiently dif powerful research tools, site-specific nucleases also have ferent from any other site within a genome to achieve specific potential as gene therapy agents, and two site-specific endo cleavage by a given nuclease without any or at least minimal nucleases have recently entered clinical trials: one, CCR5 30 off-target cleavage. Such methods are useful for identifying 2246, targeting a human CCR-5 allele as part of an anti-HIV candidate nuclease target sites that can be cleaved with high therapeutic approach (NCT00842634, NCT01044654, specificity on a genomic background, for example, when NCT01252641), and the other one, VF24684, targeting the choosing a target site for genomic manipulation in vitro or in human VEGF-A promoter as part of an anti-cancer therapeu vivo. tic approach (NCT01082926). 35 Some aspects of this disclosure provide methods of evalu Specific cleavage of the intended nuclease target site with ating, selecting, and/or designing site-specific nucleases with out or with only minimal off-target activity is a prerequisite enhanced specificity as compared to current nucleases. For for clinical applications of site-specific endonuclease, and example, provided herein are methods that are useful for also for high-efficiency genomic manipulations in basic selecting and/or designing site-specific nucleases with mini research applications, as imperfect specificity of engineered 40 mal off-target cleavage activity, for example, by designing site-specific binding domains has been linked to cellular tox variant nucleases with binding domains having decreased icity and undesired alterations of genomic loci other than the binding affinity, by lowering the final concentration of the intended target. Most nucleases available today, however, nuclease, by choosing target sites that differ by at least three exhibit significant off-target activity, and thus may not be base pairs from their closest sequence relatives in the genome, Suitable for clinical applications. Technology for evaluating 45 and, in the case of RNA-programmable nucleases, by select nuclease specificity and for engineering nucleases with ing a guide RNA that results in the fewest off-target sites improved specificity are therefore needed. being bound and/or cut. Compositions and kits useful in the practice of the methods SUMMARY OF THE INVENTION described herein are also provided. 50 Some aspects of this disclosure provide methods for iden Some aspects of this disclosure are based on the recogni tifying a target site of a nuclease. In some embodiments, the tion that the reported toxicity of some engineered site-specific method comprises (a) providing a nuclease that cuts a double endonucleases is based on off-target DNA cleavage, rather Stranded nucleic acid target site, wherein cutting of the target than on off-target binding alone. Some aspects of this disclo site results in cut nucleic acid strands comprising a 5' phos Sure provide strategies, compositions, systems, and methods 55 phate moiety; (b) contacting the nuclease of (a) with a library to evaluate and characterize the sequence specificity of site of candidate nucleic acid molecules, wherein each nucleic specific nucleases, for example, RNA-programmable endo acid molecule comprises a concatemer of a sequence com nucleases, such as Cas9 endonucleases, Zinc finger nucleases prising a candidate nuclease target site and a constant insert (ZNFs), homing endonucleases, or transcriptional activator sequence, under conditions Suitable for the nuclease to cut a like element nucleases (TALENs). 60 candidate nucleic acid molecule comprising a target site of The strategies, methods, and reagents of the present dis the nuclease; and (c) identifying nuclease target sites cut by closure represent, in Some aspects, an improvement over pre the nuclease in (b) by determining the sequence of an uncut vious methods for assaying nuclease specificity. For example, nuclease target site on the nucleic acid strand that was cut by Some previously reported methods for determining nuclease the nuclease in step (b). In some embodiments, the nuclease target site specificity profiles by Screening libraries of nucleic 65 creates blunt ends. In some embodiments, the nuclease cre acid molecules comprising candidate target sites relied on a ates a 5' overhang. In some embodiments, the determining of “two-cut” in vitro selection method which requires indirect step (c) comprises ligating a first nucleic acid adapter to the 5' US 9,163,284 B2 3 4 end of a nucleic acid strand that was cut by the nuclease in step nuclease comprises an enediyne functional group. In some (b) via 5'-phosphate-dependent ligation. In some embodi embodiments, the nuclease is an antibiotic. In some embodi ments, the nucleic acid adapter is provided in double-stranded ments, the compound is dynemicin, neocarzinostatin, cali form. In some embodiments, the 5'-phosphate-dependent cheamicin, esperamicin, bleomycin, or a derivative thereof. ligation is a blunt end ligation. In some embodiments, the In some embodiments, the nuclease is a homing endonu method comprises filling in the 5'-overhang before ligating clease. the first nucleic acid adapter to the nucleic acid strand that was Some aspects of this disclosure provide libraries of nucleic cut by the nuclease. In some embodiments, the determining of acid molecules, in which each nucleic acid molecule com step (c) further comprises amplifying a fragment of the con prises a concatemer of a sequence comprising a candidate catemer cut by the nuclease that comprises an uncut target site 10 nuclease target site and a constant insert sequence of 10-100 via a PCR reaction using a PCR primer that hybridizes with nucleotides. In some embodiments, the constant insert the adapter and a PCR primer that hybridizes with the con sequence is at least 15, at least 20, at least 25, at least 30, at stant insert sequence. In some embodiments, the method fur least 35, at least 40, at least 45, at least 50, at least 55, at least ther comprises enriching the amplified nucleic acid mol 60, at least 65, at least 70, at least 75, at least 80, or at least 95 ecules for molecules comprising a single uncut target 15 nucleotides long. In some embodiments, the constant insert sequence. In some embodiments, the step of enriching com sequence is not more than 15, not more than 20, not more than prises a size fractionation. In some embodiments, the deter 25, not more than 30, not more than 35, not more than 40, not mining of step (c) comprises sequencing the nucleic acid more than 45, not more than 50, not more than 55, not more Strand that was cut by the nuclease in step (b), or a copy than 60, not more than 65, not more than 70, not more than 75, thereof obtained via PCR. In some embodiments, the library not more than 80, or not more than 95 nucleotides long. In of candidate nucleic acid molecules comprises at least 10, at Some embodiments, the candidate nuclease target sites are least 10, at least 10', at least 10', or at least 10' different sites that can be cleaved by an RNA-programmable nuclease, candidate nuclease cleavage sites. In some embodiments, the a Zinc Finger Nuclease (ZFN), a Transcription Activator-Like nuclease is a therapeutic nuclease which cuts a specific Effector Nuclease (TALEN), a homing endonuclease, an nuclease target site in a gene associated with a disease. In 25 organic compound nuclease, or an enediyne antibiotic (e.g., Some embodiments, the method further comprises determin dynemicin, neocarzinostatin, calicheamicin, esperamicin, ing a maximum concentration of the therapeutic nuclease at bleomycin). In some embodiments, the candidate nuclease which the therapeutic nuclease cuts the specific nuclease target site can be cleaved by a Cas9 nuclease. In some target site, and does not cut more than 10, more than 5, more embodiments, the library comprises at least 10, at least 10, than 4, more than 3, more than 2, more than 1, or no additional 30 at least 107, at least 10, at least 10, at least 10', at least 10'', nuclease target sites. In some embodiments, the method fur or at least 10" different candidate nuclease target sites. In ther comprises administering the therapeutic nuclease to a some embodiments, the library comprises nucleic acid mol Subject in an amount effective to generate a final concentra ecules of a molecular weight of at least 0.5 kDa, at least 1 kDa, tion equal or lower than the maximum concentration. In some at least 2 kDa, at least 3 kDa, at least 4 kDa, at least 5 kDa, at embodiments, the nuclease is an RNA-programmable 35 least 6 kDa, at least 7 kDa, at least 8 kDa, at least 9 kDa, at nuclease that forms a complex with an RNA molecule, and least 10 kDa, at least 12 kDa, or at least 15 kDa. In some wherein the nuclease:RNA complex specifically binds a embodiments, the library comprises candidate nuclease tar nucleic acid sequence complementary to the sequence of the get sites that are variations of a known target site of a nuclease RNA molecule. In some embodiments, the RNA molecule is of interest. In some embodiments, the variations of a known a single-guide RNA (sgRNA). In some embodiments, the 40 nuclease target site comprise 10 or fewer, 9 or fewer, 8 or sgRNA comprises 5-50 nucleotides, 10-30 nucleotides, fewer, 7 or fewer, 6 or fewer, 5 or fewer, 4 or fewer, 3 or fewer, 15-25 nucleotides, 18-22 nucleotides, 19-21 nucleotides, or 2 or fewer mutations as compared to a known nuclease e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, target site. In some embodiments, the variations differ from 25, 26, 27, 28, 29, or 30 nucleotides. In some embodiments, the known target site of the nuclease of interest by more than the nuclease is a Cas9 nuclease. In some embodiments, the 45 5%, more than 10%, more than 15%, more than 20%, more nuclease target site comprises a sgRNA-complementary than 25%, or more than 30% on average, distributed binomi sequence-protospacer adjacent motif (PAM) structure, and ally. In some embodiments, the variations differ from the the nuclease cuts the target site within the sgRNA-comple known target site by no more than 10%, no more than 15%, no mentary sequence. In some embodiments, the sgRNA more than 20%, no more than 25%, nor more than 30%, no complementary sequence comprises 10, 11, 12, 13, 14, 15. 50 more than 40%, or no more than 50% on average, distributed 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 binomially. In some embodiments, the nuclease of interest is nucleotides. In some embodiments, the nuclease comprises a Cas9 nuclease, a Zinc finger nuclease, a TALEN, a homing an unspecific nucleic acid cleavage domain. In some embodi endonuclease, an organic compound nuclease, or an enediyne ments, the nuclease comprises a FokI cleavage domain. In antibiotic (e.g., dynemicin, neocarzinostatin, calicheamicin, Some embodiments, the nuclease comprises a nucleic acid 55 esperamicin, bleomycin). In some embodiments, the candi cleavage domain that cleaves a target sequence upon cleavage date nuclease target sites are Cas9 nuclease target sites that domain dimerization. In some embodiments, the nuclease comprise asgRNA-complementary sequence-PAM struc comprises a binding domain that specifically binds a nucleic ture, wherein the sgRNA-complementary sequence com acid sequence. In some embodiments, the binding domain prises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, comprises a Zinc finger. In some embodiments, the binding 60 25, 26, 27, 28, 29, or 30 nucleotides. domain comprises at least 2, at least 3, at least 4, or at least 5 Some aspects of this disclosure provide methods for select Zinc fingers. In some embodiments, the nuclease is a Zinc ing a nuclease that specifically cuts a consensus target site Finger Nuclease. In some embodiments, the binding domain from a plurality of nucleases. In some embodiments, the comprises a Transcriptional Activator-Like Element. In some method comprises (a) providing a plurality of candidate embodiments, the nuclease is a Transcriptional Activator 65 nucleases that cut the same consensus sequence; (b) for each Like Element Nuclease (TALEN). In some embodiments, the of the candidate nucleases of step (a), identifying a nuclease nuclease is an organic compound. In some embodiments, the target site cleaved by the candidate nuclease that differ from US 9,163,284 B2 5 6 the consensus target site using a method provided herein; (c) nuclease. In some embodiments, the kit further comprises a selecting a nuclease based on the nuclease target site(s) iden nucleic acid molecule comprising a target site of the isolated tified in step (b). In some embodiments, the nuclease selected nuclease. In some embodiments, the kit comprises an excipi in step (c) is the nuclease that cleaves the consensus target site ent and instructions for contacting the nuclease with the with the highest specificity. In some embodiments, the excipient to generate a composition Suitable for contacting a nuclease that cleaves the consensus target site with the high nucleic acid with the nuclease. In some embodiments, the est specificity is the candidate nuclease that cleaves the lowest composition is suitable for contacting a nucleic acid within a number of target sites that differ from the consensus site. In genome. In some embodiments, the composition is Suitable Some embodiments, the candidate nuclease that cleaves the for contacting a nucleic acid within a cell. In some embodi consensus target site with the highest specificity is the candi 10 date nuclease that cleaves the lowest number of target sites ments, the composition is suitable for contacting a nucleic that are different from the consensus site in the context of a acid within a subject. In some embodiments, the excipient is target genome. In some embodiments, the candidate nuclease a pharmaceutically acceptable excipient. selected in step (c) is a nuclease that does not cleave any target Some aspects of this disclosure provide pharmaceutical site other than the consensus target site. In some embodi 15 compositions that are Suitable for administration to a Subject. ments, the candidate nuclease selected in step (c) is a nuclease In some embodiments, the composition comprises an isolated that does not cleave any target site other than the consensus nuclease as provided herein. In some embodiments, the com target site within the genome of a Subject at a therapeutically position comprises a nucleic acid encoding such a nuclease. effective concentration of the nuclease. In some embodi In some embodiments, the composition comprises a pharma ments, the method further comprises contacting a genome ceutically acceptable excipient. with the nuclease selected in step (c). In some embodiments, Other advantages, features, and uses of the invention will the genome is a vertebrate, mammalian, human, non-human be apparent from the detailed description of certain non primate, rodent, mouse, rat, hamster, goat, sheep, cattle, dog, limiting embodiments of the invention; the drawings, which cat, reptile, amphibian, fish, nematode, insect, or fly genome. are schematic and not intended to be drawn to Scale; and the In some embodiments, the genome is within a living cell. In 25 claims. Some embodiments, the genome is within a Subject. In some embodiments, the consensus target site is within an allele that BRIEF DESCRIPTION OF THE DRAWINGS is associated with a disease or disorder. In some embodi ments, cleavage of the consensus target site results in treat FIGS. 1A-B. In vitro selection overview. (a) Cas9 com ment or prevention of a disease or disorder, e.g., amelioration 30 plexed with a short guide RNA (sgRNA) recognizes -20 or prevention of at least one sign and/or symptom of the bases of a target DNA substrate that is complementary to the disease or disorder. In some embodiments, cleavage of the sgRNA sequence and cleaves both DNA strands. The white consensus target site results in the alleviation of a sign and/or triangles represent cleavage locations. (b) A modified version symptom of the disease or disorder. In some embodiments, of our previously described in vitro selection was used to cleavage of the consensus target site results in the prevention 35 comprehensively profile Cas9 specificity. A concatemeric of the disease or disorder. In some embodiments, the disease pre-selection DNA library in which each molecule contains is HIV/AIDS. In some embodiments, the allele is a CCR5 one of 10' distinct variants of a target DNA sequence (white allele. In some embodiments, the disease is a proliferative rectangles) was generated from synthetic DNA oligonucle disease. In some embodiments, the disease is cancer. In some otides by ligation and rolling-circle amplification. This embodiments, the allele is a VEGFA allele. 40 library was incubated with a Cas9:sgRNA complex of inter Some aspects of this disclosure provide isolated nucleases est. Cleaved library members contain 5' phosphate groups that have been selected according to a method provided (circles) and therefore are substrates for adapter ligation and herein. In some embodiments, the nuclease has been engi PCR. The resulting amplicons were subjected to high neered to cleave a target site within a genome. In some throughput DNA sequencing and computational analysis. embodiments, the nuclease is a Cas9 nuclease comprising an 45 FIGS. 2A-H. In vitro selection results for Cas'9:CLTA1 sgRNA that is complementary to the target site within the sgRNA. Heat maps' show the specificity profiles of Cas9: genome. In some embodiments, the nuclease is a Zinc Finger CLTA 1 sgRNA v2.1 under enzyme-limiting conditions (a,b), Nuclease (ZFN) or a Transcription Activator-Like Effector Cas9:CLTA 1 sgRNA v1.0 under enzyme-saturating condi Nuclease (TALEN), a homing endonuclease, or an organic tions (c. d), and Cas9:CLTA 1 sgRNA v2.1 under enzyme compound nuclease (e.g., an enediyne, an antibiotic nuclease, 50 saturating conditions (e., f). Heat maps show all post-selection dynemicin, neocarzinostatin, calicheamicin, esperamicin, sequences (a,c, e) or only those sequences containing a single bleomycin, or a derivative thereof). In some embodiments, mutation in the 20-base pairsgRNA-specified target site and the nuclease has been selected based on cutting no other two-base pair PAM (b, d, f). Specificity scores of 1.0 and -1.0 candidate target site, not more than one candidate target site, corresponds to 100% enrichment for and against, respec not more than two candidate target sites, not more than three 55 tively, a particular base pair at a particular position. Black candidate target sites, not more than four candidate target boxes denote the intended target nucleotides. (g) Effect of sites, not more than five candidate target sites, not more than Cas9:sgRNA concentration on specificity. Positional speci six candidate target sites, not more than seven candidate ficity changes between enzyme-limiting (200 nM DNA, 100 target sites, not more than eight candidate target sites, not nM. Cas9:sgRNA v2.1) and enzyme-saturating (200 nM more than eight candidate target sites, not more than nine 60 DNA, 1000 nM Cas9:sgRNA v2.1) conditions, normalized to candidate target sites, or not more than ten candidate target the maximum possible change in positional specificity, are sites in addition to its known nuclease target site. shown for CLTA 1. (h) Effect of sgRNA architecture on speci Some aspects of this disclosure provide kits comprising a ficity. Positional specificity changes between sgRNA v1.0 library of nucleic acid molecules comprising candidate and sgRNA v2.1 under enzyme-saturating conditions, nor nuclease target sites as provided herein. Some aspects of this 65 malized to the maximum possible change in positional speci disclosure provide kits comprising an isolated nuclease as ficity, are shown for CLTA1. See FIGS. 6-8, 25, and 26 for provided herein. In some embodiments, the nuclease is a Cas9 corresponding data for CLTA2, CLTA3, and CLTA4. US 9,163,284 B2 7 8 FIGS. 3A-D. Target sites profiled in this study. (A) The 5' B), Cas9:CLTA4sgRNA v1.0 under enzyme-excess condi end of the sgRNA has 20 nucleotides that are complementary tions (C, D), and Cas9:CLTA4sgRNA v2.1 under enzyme to the target site. The target site contains an NGG motif saturating conditions (E. F). Heat maps show all post-selec (PAM) adjacent to the region of RNA:DNA complementarity. tion sequences (A, C, E) or only those sequences containing (B) Four human clathrin gene (CLTA) target sites are shown. 5 a single mutation in the 20-base pairsgRNA-specified target Sequences correspond, from top to bottom, to SEQID NOs: site and two-base pair PAM (B, D, F). Specificity scores of 1.0 2-7, respectively. (C, D) Four human clathrin gene (CLTA) and -1.0 corresponds to 100% enrichment for and against, target sites are shown with sgRNAs. SgRNA v1.0 is shorter respectively, a particular base pair at a particular position. than sgRNA v2.1. The PAM is shown for each site. The Black boxes denote the intended target nucleotides. non-PAM end of the target site corresponds to the 5' end of the 10 sgRNA. Sequences in FIG. 3C correspond, from top to bot FIGS. 9A-D. In vitro selection results as sequence logos. tom, to SEQID NOs: 8-19, respectively. Sequences in FIG. Information content is plotted for each target site position 3D correspond, from top to bottom, to SEQID NOs: 20-31, (1-20) specified by CLTA1 (A), CLTA2 (B), CLTA3 (C), and respectively. CLTA4 (D) sgRNA v2.1 under enzyme-limiting conditions. FIG. 4. Cas9:guide RNA cleavage of on-target DNA 15 Positions in the PAM are labelled “P1,” “P2, and “P3. Infor sequences in vitro. Discrete DNA cleavage assays on an mation content is plotted in bits. 2.0 bits indicates absolute approximately 1-kb linear substrate were performed with 200 specificity and 0 bits indicates no specificity. nM on-target site and 100 nM Cas9:v1.0 sgRNA, 100 nM FIGS. 10A-L. Tolerance of mutations distal to the PAM for Cas9:v2.1 sgRNA, 1000 nM Cas9:v1.0 sgRNA, and 1000 nM CLTA1. The maximum specificity scores at each position are Cas9:v2.1 sgRNA for each of four CLTA target sites. For shown for the Cas9:CLTA 1 v2.1 sgRNA selections when CLTA1, CLTA2, and CLTA4, Cas9:v2.1 sgRNA shows considering only those sequences with on-target base pairs in higher activity than Cas9:v1.0 sgRNA. For CLTA3, the activi gray, while allowing mutations in the first 1-12 base pairs ties of the Cas9:v1.0 sgRNA and Cas9:v2.1 sgRNA were (a-l). The positions that are not constrained to on-target base comparable. pairs are indicated by dark bars. Higher specificity score FIGS. 5A-E. In vitro selection results for four target sites. 25 values indicate higher specificity at a given position. The In vitro selections were performed on 200 nM pre-selection positions that were not allowed to contain any mutations library with 100 nM Cas9:sgRNA v2.1, 1000 nM Cas9: (gray) were plotted with a specificity score of +1. For all sgRNA v1.0, or 1000 nM Cas9:sgRNA v.2.1. (A) Post-selec panels, specificity scores were calculated from pre-selection tion PCR products are shown for the 12 selections performed. library sequences and post-selection library sequences with DNA containing 1.5 repeats were quantified for each selec 30 an na5,130 and na74,538, respectively. tion and pooled in equimolar amounts before gel purification FIGS. 11 A-L. Tolerance of mutations distal to the PAM for and sequencing. (B-E) Distributions of mutations are shown CLTA2. The maximum specificity scores at each position are for pre-selection (black) and post-selection libraries (col shown for the Cas9:CLTA2 v2.1 sgRNA selections when ored). The post-selection libraries are enriched for sequences considering only those sequences with on-target base pairs in with fewer mutations than the pre-selection libraries. Muta 35 gray, while allowing mutations in the first 1-12 base pairs tions are counted from among the 20 base pairs specified by (a-l). The positions that are not constrained to on-target base the sgRNA and the two-base pair PAM. P-values are <0.01 for pairs are indicated by dark bars. Higher specificity score all pairwise comparisons between distributions in each panel. values indicate higher specificity at a given position. The P-values were calculated using t-tests, assuming unequal size positions that were not allowed to contain any mutations and unequal variance. 40 (gray) were plotted with a specificity score of +1. For all FIGS. 6A-F. In vitro selection results for Cas'9:CLTA2 panels, specificity scores were calculated from pre-selection sgRNA. Heat maps show the specificity profiles of Cas9: library sequences and post-selection library sequences with CLTA2 sgRNA v2.1 under enzyme-limiting conditions (A, an na3,190 and na25,365, respectively. B), Cas9:CLTA2 sgRNA v1.0 under enzyme-excess condi FIGS. 12A-L. Tolerance of mutations distal to the PAM for tions (C, D), and Cas9:CLTA2 sgRNA v2.1 under enzyme 45 CLTA3. The maximum specificity scores at each position are excess conditions (E. F). Heat maps show all post-selection shown for the Cas9:CLTA3 v2.1 sgRNA selections when sequences (A, C, E) or only those sequences containing a considering only those sequences with on-target base pairs in single mutation in the 20-base pair sgRNA-specified target gray, while allowing mutations in the first 1-12 base pairs site and two-base pair PAM (B, D, F). Specificity scores of 1.0 (a-l). The positions that are not constrained to on-target base and -1.0 corresponds to 100% enrichment for and against, 50 pairs are indicated by dark bars. Higher specificity score respectively, a particular base pair at a particular position. values indicate higher specificity at a given position. The Black boxes denote the intended target nucleotides. positions that were not allowed to contain any mutations FIGS. 7A-F. In vitro selection results for Cas'9:CLTA3 (gray) were plotted with a specificity score of +1. For all sgRNA. Heat maps show the specificity profiles of Cas9: panels, specificity scores were calculated from pre-selection CLTA3 sgRNA v2.1 under enzyme-limiting conditions (A, 55 library sequences and post-selection library sequences with B), Cas9:CLTA3 sgRNA v1.0 under enzyme-excess condi an na5,604 and na 158,424, respectively. tions (C, D), and Cas9:CLTA3 sgRNA v2.1 under enzyme FIGS. 13 A-I. Tolerance of mutations distal to the PAM for saturating conditions (E. F). Heat maps show all post-selec CLTA4. The maximum specificity scores at each position are tion sequences (A, C, E) or only those sequences containing shown for the Cas9:CLTA4 v2.1 sgRNA selections when a single mutation in the 20-base pairsgRNA-specified target 60 considering only those sequences with on-target base pairs in site and two-base pair PAM (B, D, F). Specificity scores of 1.0 gray, while allowing mutations in the first 1-12 base pairs and -1.0 corresponds to 100% enrichment for and against, (a-i). The positions that are not constrained to on-target base respectively, a particular base pair at a particular position. pairs are indicated by dark bars. Higher specificity score Black boxes denote the intended target nucleotides. values indicate higher specificity at a given position. The FIGS. 8A-F. In vitro selection results for Cas'9:CLTA4 65 positions that were not allowed to contain any mutations sgRNA. Heat maps show the specificity profiles of Cas9: (gray) were plotted with a specificity score of +1. For all CLTA4 sgRNA v2.1 under enzyme-limiting conditions (A, panels, specificity scores were calculated from pre-selection US 9,163,284 B2 10 library sequences and post-selection library sequences with is lowest in the middle of the target site and in the several an n-2.323 and na21,819, respectively. nucleotides most distal to the PAM. FIGS. 14A-L. Tolerance of mutations distal to the PAM in FIGS. 19A-D. Positional specificity patterns for 1000 nM CLTA1 target sites. Distributions of mutations are shown for Cas9:sgRNA v1.0. Positional specificity, defined as the sum in vitro selection on 200 nM pre-selection library with 1000 of the magnitude of the specificity score for each of the four nM. Cas9:CLTA 1 sgRNA v2.1. The number of mutations possible base pairs recognized at a certain position in the shown are in a 1-12 base pair target site Subsequence farthest target site, is plotted for each target site under enzyme-excess from the PAM (a-1) when the rest of the target site, including conditions with sgRNA v1.0 (A-D). The positional specificity the PAM, contains only on-target base pairs. The pre-selec is shown as a value normalized to the maximum positional tion and post-selection distributions are similar for up to three 10 specificity value of the target site. Positional specificity is base pairs, demonstrating tolerance for target sites with muta relatively constant across the target site but is lowest in the tions in the three base pairs farthest from the PAM when the middle of the target site and in the several nucleotides most rest of the target sites have optimal interactions with the distal to the PAM. Cas9:sgRNA. For all panels, graphs were generated from FIGS. 20A-D. Positional specificity patterns for 1000 nM pre-selection library sequences and post-selection library 15 Cas9:sgRNA v2.1. Positional specificity, defined as the sum sequences with an na5,130 and na74,538, respectively. of the magnitude of the specificity score for each of the four FIGS. 15A-L. Tolerance of mutations distal to the PAM in possible base pairs recognized at a certain position in the CLTA2 target sites. Distributions of mutations are shown for target site, is plotted for each target site under enzyme-excess in vitro selection on 200 nM pre-selection library with 1000 conditions with sgRNA v2.1 (A-D). The positional specificity nM. Cas9:CLTA2 sgRNA v2.1. The number of mutations is shown as a value normalized to the maximum positional shown are in a 1-12 base pair target site Subsequence farthest specificity value of the target site. Positional specificity is from the PAM (a-1) when the rest of the target site, including relatively constant across the target site but is lowest in the the PAM, contains only on-target base pairs. The pre-selec middle of the target site and in the several nucleotides most tion and post-selection distributions are similar for up to three distal to the PAM. base pairs, demonstrating tolerance for target sites with muta 25 FIGS. 21A-D. PAM nucleotide preferences. The abun tions in the three base pairs farthest from the PAM when the dance in the pre-selection library and post-selection libraries rest of the target sites have optimal interactions with the under enzyme-limiting or enzyme-excess conditions are Cas9:sgRNA. For all panels, graphs were generated from shown for all 16 possible PAM dinucleotides for selections pre-selection library sequences and post-selection library with CLTA1 (a), CLTA2 (b), CLTA3 (c), and CLTA4 (d) sequences with an na3,190 and na21,265, respectively. 30 sgRNA v2.1. GG dinucleotides increased in abundance in the FIGS. 16A-L. Tolerance of mutations distal to PAM in post-selection libraries, while the other possible PAM CLTA3 target sites. Distributions of mutations are shown for dinucleotides decreased in abundance after the selection. in vitro selection on 200 nM pre-selection library with 1000 FIGS. 22A-D. PAM nucleotide preferences for on-target nM. Cas9:CLTA3 sgRNA v2.1. The number of mutations sites. Only post-selection library members containing no shown are in a 1-12 base pair target site Subsequence farthest 35 mutations in the 20 base pairs specified by the guide RNAs from the PAM (a-1) when the rest of the target site, including were included in this analysis. The abundance in the pre the PAM, contains only on-target base pairs. The pre-selec selection library and post-selection libraries under enzyme tion and post-selection distributions are similar for up to three limiting and enzyme-excess conditions are shown for all 16 base pairs, demonstrating tolerance for target sites with muta possible PAM dinucleotides for selections with CLTA 1 (A), tions in the three base pairs farthest from the PAM when the 40 CLTA2 (B), CLTA3 (C), and CLTA4 (D) sgRNA v.2.1. GG rest of the target sites have optimal interactions with the dinucleotides increased in abundance in the post-selection Cas9:sgRNA. For all panels, graphs were generated from libraries, while the other possible PAM dinucleotides gener pre-selection library sequences and post-selection library ally decreased in abundance after the selection, although this sequences with an nes,604 and ne158,424, respectively. effect for the enzyme-excess concentrations of Cas9:sgRNA FIGS. 17A-L. Tolerance of mutations distal to PAM in 45 was modest or non-existent for many dinucleotides. CLTA4 target sites. Distributions of mutations are shown for FIGS. 23 A-D. PAM dinucleotide specificity scores. The in vitro selection on 200 nM pre-selection library with 1000 specificity scores under enzyme-limiting and enzyme-excess nM. Cas9:CLTA4 sgRNA v2.1. The number of mutations conditions are shown for all 16 possible PAM dinucleotides shown are in a 1-12 base pair target site Subsequence farthest (positions 2 and 3 of the three-nucleotide NGG PAM) for from the PAM (a-1) when the rest of the target site, including 50 selections with CLTA1 (A), CLTA2 (B), CLTA3 (C), and the PAM, contains only on-target base pairs. The pre-selec CLTA4 (D) sgRNA v2.1. The specificity score indicates the tion and post-selection distributions are similar for up to three enrichment of the PAM dinucleotide in the post-selection base pairs, demonstrating tolerance for target sites with muta library relative to the pre-selection library, normalized to the tions in the three base pairs farthest from the PAM when the maximum possible enrichment of that dinucleotide. A speci rest of the target sites have optimal interactions with the 55 ficity score of +1.0 indicates that a dinucleotide is 100% Cas9:sgRNA. For all panels, graphs were generated from enriched in the post-selection library, and a specificity score pre-selection library sequences and post-selection library of-1.0 indicates that a dinucleotide is 100% de-enriched. GG sequences with an ne2.323 and ne21,819, respectively. dinucleotides were the most enriched in the post-selection FIGS. 18A-D. Positional specificity patterns for 100 nM libraries, and AG, GA, GC, GT, and TG show less relative Cas9:sgRNA v2.1. Positional specificity, defined as the sum 60 de-enrichment compared to the other possible PAM dinucle of the magnitude of the specificity score for each of the four otides. possible base pairs recognized at a certain position in the FIGS. 24A-D. PAM dinucleotide specificity scores for on target site, is plotted for each target site under enzyme-limit target sites. Only post-selection library members containing ing conditions for sgRNA v2.1 (A-D). The positional speci no mutations in the 20 base pairs specified by the guide RNAs ficity is shown as a value normalized to the maximum posi 65 were included in this analysis. The specificity Scores under tional specificity value of the target site. Positional specificity enzyme-limiting and enzyme-excess conditions are shown is highest at the end of the target site proximal to the PAM and for all 16 possible PAM dinucleotides (positions 2 and 3 of the US 9,163,284 B2 11 12 three-nucleotide NGGPAM) for selections with CLTA1 (A), thereof. A Cas9 nuclease is also referred to sometimes as a CLTA2 (B), CLTA3 (C), and CLTA4 (D) sgRNA v2.1. The casnil nuclease or a CRISPR (clustered regularly interspaced specificity score indicates the enrichment of the PAM short palindromic repeat)-associated nuclease. CRISPR is an dinucleotide in the post-selection library relative to the pre adaptive immune system that provides protection against selection library, normalized to the maximum possible mobile genetic elements (e.g., viruses, transposable elements enrichment of that dinucleotide. A specificity score of +1.0 and conjugative plasmids). CRISPR clusters contain spacers, indicates that a dinucleotide is 100% enriched in the post sequences complementary to antecedent mobile elements, selection library, and a specificity score of -1.0 indicates that and target invading nucleic acids. CRISPR clusters are tran a dinucleotide is 100% de-enriched. GG dinucleotides were scribed and processed into CRISPRRNA (crRNA). In type II the most enriched in the post-selection libraries, AG and GA 10 CRISPR systems correct processing of pre-crRNA requires a nucleotides were neither enriched or de-enriched in at least trans-encoded small RNA (tracrRNA), endogenous ribonu one selection condition, and GC, GT, and TG show less clease 3 (rinc) and a Cas9 protein. The tracrRNA serves as a relative de-enrichment compared to the other possible PAM guide for ribonuclease 3-aided processing of pre-crRNA. dinucleotides. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically FIGS. 25A-D. Effects of Cas9:sgRNA concentration on 15 cleaves linear or circular dsDNA target complementary to the specificity. Positional specificity changes between enzyme spacer. The target strand not complementary to crRNA is first limiting (200 nM DNA, 100 nM Cas9:sgRNA v2.1) and cut endonucleolytically, then trimmed 3'-5' exonucleolyti enzyme-excess (200 nM DNA, 1000 nM Cas9:sgRNA v2.1) cally. In nature, DNA-binding and cleavage typically requires conditions are shown for selections with SgRNAS targeting protein and both RNA species. However, single guide RNAs CLTA1 (A), CLTA2 (B), CLTA3 (C), and CLTA4 (D) target (“sgRNA, or simply “gNRA) can be engineered so as to sites. Lines indicate the maximum possible change in posi incorporate aspects of both the crRNA and tracrRNA into a tional specificity for a given position. The highest changes in single RNA molecule. See, e.g., Jinek M. Chylinski K., Fon specificity occur proximal to the PAM as enzyme concentra fara I., Hauer M., Doudna J. A., Charpentier E. Science 337: tion is increased. 816-821 (2012), the entire contents of which is hereby incor FIGS. 26A-D. Effects of sgRNA architecture on specific 25 porated by reference. Cas9 recognizes a short motif in the ity. Positional specificity changes between Cas9:sgRNA v1.0 CRISPR repeat sequences (the PAM or protospacer adjacent and Cas9:sgRNA v2.1 under enzyme-excess (200 nM DNA, motif) to help distinguish self versus non-self. Cas9 nuclease 1000 nM Cas9:sgRNA v2.1) conditions are shown for selec sequences and structures are well known to those of skill in tions with sgRNAs targeting CLTA1 (A), CLTA2 (B), CLTA3 the art (see, e.g., "Complete genome sequence of an M1 Strain (C), and CLTA4 (D) target sites. Lines indicate the maximum 30 of Streptococcus pyogenes.” Ferretti J. J., McShan W. M., possible change in positional specificity for a given position. Ajdic D. J., Savic D. J., Savic G. Lyon K., Primeaux C. FIG. 27. Cas9:guide RNA cleavage of off-target DNA Sezate S., Suvorov A.N., Kenton S. Lai H.S., Lin S. P., Qian sequences in vitro. Discrete DNA cleavage assays on a 96-bp Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L. expand/ linear substrate were performed with 200 nM DNA and 1000 collapse author list McLaughlin R. E. Proc. Natl. Acad. Sci. nM. Cas9:CLTA4 v2.1 sgRNA for the on-target CLTA4 site 35 U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by (CLTA4-0) and five CLTA4 off-target sites identified by in trans-encoded small RNA and host factor RNase III.' vitro selection. Enrichment values shown are from the invitro Deltcheva E., Chylinski K., Sharma C.M., Gonzales K. Chao selection with 1000 nM Cas9:CLTA4 v2.1 sgRNA. CLTA4-1 Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E. and CLTA4-3 were the most highly enriched sequences under Nature 471:602-607(2011); and “A programmable dual these conditions. CLTA4-2a, CLTA4-2b, and CLTA4-2c are 40 RNA-guided DNA endonuclease in adaptive bacterial immu two-mutation sequences that represent a range of enrichment nity.” Jinek M. Chylinski K., Fonfara I., Hauer M., Doudna J. values from high enrichment to no enrichment to high de A., Charpentier E. Science 337:816-821 (2012), the entire enrichment. Lowercase letters indicate mutations relative to contents of each of which are incorporated herein by refer the on-target CLTA4 site. The enrichment values are qualita ence). Cas9 orthologs have been described in various species, tively consistent with the observed amount of cleavage in 45 including, but not limited to, S. pyogenes and S. thermophilus. vitro. Additional suitable Cas9 nucleases and sequences will be FIG. 28. Effect of guide RNA architecture and Cas9: apparent to those of skill in the art based on this disclosure, sgRNA concentration on in vitro cleavage of an off-target and Such Cas9 nucleases and sequences include Cas9 site. Discrete DNA cleavage assays on a 96-bp linear sub sequences from the organisms and loci disclosed in Chylin strate were performed with 200 nM DNA and 100 nM Cas9: 50 ski, Rhun, and Charpentier, “The tracrRNA and Cas9 families v1.0 sgRNA, 100 nM Cas9:v2.1 sgRNA, 1000 nM Cas9:v1.0 oftype IICRISPR-Cas immunity systems” (2013)RNA Biol sgRNA, or 1000 nM Cas9:v2.1 sgRNA for the CLTA4-3 ogy 10:5, 726-737; the entire contents of which are incorpo off-target site (5' GggGATGTAGTGTTTCCACtGGG mu rated herein by reference. In some embodiments, proteins tations are shown in lowercase letters). DNA cleavage is comprising Cas9 or fragments thereof proteins are referred to observed under all four conditions tested, and cleavage rates 55 as “Cas9 variants.” A Cas9 variant shares homology to Cas9, are higher under enzyme-excess conditions, or with V2.1 or a fragment thereof. For example a Cas9 variant is at least sgRNA compared with v1.0 sgRNA. about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least DEFINITIONS about 98% identical, at least about 99% identical, at least 60 about 99.5% identical, or at least about 99.9% to wild type As used herein and in the claims, the singular forms 'a' Cas9. In some embodiments, the Cas9 variant comprises a “an and “the include the singular and the plural reference fragment of Cas9 (e.g., a gRNA binding domain or a DNA unless the context clearly indicates otherwise. Thus, for cleavage domain). Such that the fragment is at least about 70% example, a reference to “an agent' includes a single agent and identical, at least about 80% identical, at least about 90% a plurality of Such agents. 65 identical, at least about 95% identical, at least about 98% The term “Cas9 or “Cas9 nuclease refers to an RNA identical, at least about 99% identical, at least about 99.5% guided nuclease comprising a Cas9 protein, or a fragment identical, or at least about 99.9% to the corresponding frag US 9,163,284 B2 13 14 ment of wild type Cas9. In some embodiments, wild type - Continued Cas9 corresponds to Cas9 from Streptococcus pyogenes TTTTATATTTAGCTAGTCATTATGAAAAGTTGAAGGGTAGTCCAGAAGAT AACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGA (NCBI Reference Sequence: NC 017053.1, SEQID NO:40 GATTATTGAGCAAATCAGTGAATTTTCTAAGCGTGTTATTTTAGCAGATG (nucleotide); SEQID NO:41 (amino acid)). CCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGACAAACCA ATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCT TGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAAC (SEQ ID NO: 40) GATATACGTCTACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCC ATGGATAAGAAATACT CAATAGGCTTAGATATCGGCACAAATAGCGTCGG ATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGGAGGTGA ATGGGCGGTGATCACTGATGATTATAAGGTTCCGTCTAAAAAGTTCAAGG CTGA TTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAATCTTATAGGGGCT CTTTTATTTGGCAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGAC 10 (SEQ ID NO: 41) AGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGG KKYSIGLDIGTNSWGWAWITDDYKWPSKKFKWLGNTDRHSIKKNLIGA AGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGA LFGSGETAEATRLKRTARRRYTRRKNRICYLOEIFSNEMAKVDDSFFHR CTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCC EESFLWEEDKKHERHPIFGNIWDEWAYHEKYPTIYHLRKKLADSTDKAD TATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAA LIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIOLVOIYNOLFEENP CTATCTATCATCTGCGAAAAAAATTGGCAGATTCTACTGATAAAGCGGAT sASRWDAKAILSARLSKSRRLENLIAQLPGEKRNGLFGNLIALSLGLTP TTGCGCTTAATCTATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCA 15 FKSNFDLAEDAKLOLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI TTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAAC DILRVNSEITKAPLSASMIKRYDEHHODLTLLKALWRQOLPEKYKEI TATTTATCCAGTTGGTACAAATCTACAATCAATTATTTGAAGAAAACCCT OSKINGYAGYIDGGASOEEFYKFIKPILEKMDGTEELLWKLNREDLLR ATTAACGCAAGTAGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAG TFDNGSIPHOIHLGELHAILRROEDFYPFLKDNREKIEKILTFRIPY TAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGA PLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAOSFIERMTNFDK GAAATGGCTTGTTTGGGAATCTCATTGCTTTGTCATTGGGATTGACCCCT NEKVLPKHSLLYEYFTWYNELTKWKYVTEGMRKPAFLSGEOKKAI AATTTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTC KTNRKVTVKOLKEDYFKKIECFDSVEISGVEDRFNASLGAYHDLLKI AAAAGATACTTACGATGATGATTTAGATAATTTATTGGCGCAAATTGGAG KDFLDNEENEDILEDIVLTLTLFEDRGMIEERLKTYAHLFDDKWMKO ATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATT RRYTGWGRLSRKLINGIRDKOSGKTILDFLKSDGFANRNFMOLI TTACTTTCAGATATCCTAAGAGTAAATAGTGAAATAACTAAGGCTCCCCT FKEDIOKAQVSGOGHSLHEQIANLAGSPAIKKGILOTWKIVDEL ATCAGCTTCAATGATTAAGCGCTACGATGAACATCATCAAGACTTGACTC KPENIVIEMARENOTTOKGOKNSRERMKRIEEGIKELGSOI TTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATC TOLONEKLYLYYLONGRDMYWDOELDINRLSDYDVDHIVPOS TTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGC KVLTRSDKNRGKSDNVPSEEVVKKMKNYWROLLNAKLITORK TAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTAGAAAAAATGG 25 RGGLSELDKAGFIKROLVETROITKHVAOILDSRMNTKYDENDKL ATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGC KVITLKSKLVSDFRKDFOFYKVREINNYHHAHDAYLNAVVGTALIKKY AAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGG LESEFWYGDYKVYDVRKMIAKSEOEIGKATAKYFFYSNIMNFF TGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAA ANGEIRKRPLIETNGETGEIWWDKGRDFATWRKVLSMPOVNIVK AAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTAT GFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTWAYSWLWWAKWEK TATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCG SKKLKSWKELLGITIMERSSFEKNPIDFILEAKGYKEWKKDLIIKLP GAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATA 30 FELENGRKRMLASAGELOKGNELALPSKYWNFLYLASHYEKLKGSPEDR AAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAA E OKOLFVEOHKHYLDEIIEOISEFSKRVILADANLDKVLSAYNKHRDH AATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTA REOAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEWLDATLIHOS TTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAGGGAA ITGLYETRIDLSQLGGD TGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGAT TTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGA The term “concatemer, as used herein in the context of TTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTG 35 nucleic acid molecules, refers to a nucleic acid molecule that AAGATAGATTTAATGCTTCATTAGGCGCCTACCATGATTTGCTAAAAATT ATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGA contains multiple copies of the same DNA sequences linked GGATATTGTTTTAACATTGACCTTATTTGAAGATAGGGGGATGATTGAGG in a series. For example, a concatemer comprising ten copies AAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAACAG of a specific sequence of nucleotides (e.g., XYZo), would CTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGAT comprise ten copies of the same specific sequence linked to TAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGA 40 AATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGAT each other in series, e.g., 5'-XYZXYZXYZXYZXYZX AGTTTGACATTTAAAGAAGATATTCAAAAAGCACAGGTGTCTGGACAAGG YZXYZXYZXYZXYZ-3'. A concatemer may comprise any CCATAGTTTACATGAACAGATTGCTAACTTAGCTGGCAGTCCTGCTATTA number of copies of the repeat unit or sequence, e.g., at least AAAAAGGTATTTTACAGACTGTAAAAATTGTTGATGAACTGGTCAAAGTA 2 copies, at least 3 copies, at least 4 copies, at least 5 copies, ATGGGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCA. GACAACT CAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCG at least 10 copies, at least 100 copies, at least 1000 copies, etc. AAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTT 45 An example of a concatemer of a nucleic acid sequence GAAAATACT CAATTGCAAAATGAAAAGCTCTATCTCTATTATCTACAAAA comprising a nuclease target site and a constant insert TGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTG sequence would be (target site)-(constant insert ATTATGATGTCGATCACATTGTTCCACAAAGTTTCATTAAAGACGATTCA ATAGACAATAAGGTACTAACGCGTTCTGATAAAAATCGTGGTAAATCGGA sequence). A concatemer may be a linear nucleic acid TAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGAC molecule, or may be circular. AACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACG 50 The terms "conjugating.” “conjugated, and "conjugation' AAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAA ACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTT refer to an association of two entities, for example, of two TGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATTCGA molecules Such as two proteins, two domains (e.g., a binding GAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAA domain and a cleavage domain), or a protein and an agent, AGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCC e.g., a protein binding domain and a small molecule. In some ATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATAT 55 aspects, the association is between a protein (e.g., RNA CCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGT TCGTAAAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAA programmable nuclease) and a nucleic acid (e.g., a guide AATATTTCTTTTACTCTAATATCATGAACTTCTTCAAAACAGAAATTACA RNA). The association can be, for example, via a direct or CTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTAATGGGGA indirect (e.g., via a linker) covalent linkage or via non-cova AACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCA lent interactions. In some embodiments, the association is AAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAG 60 ACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAA covalent. In some embodiments, two molecules are conju GCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTG gated via a linker connecting both molecules. For example, in ATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAA Some embodiments where two proteins are conjugated to GGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAAT each other, e.g., a binding domain and a cleavage domain of TATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTTTAGAAGCTA AAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATAT an engineered nuclease, to form a protein fusion, the two AGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGG 65 proteins may be conjugated via a polypeptide linker, e.g., an AGAATTACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATT amino acid sequence connecting the C-terminus of one pro tein to the N-terminus of the other protein. US 9,163,284 B2 15 16 The term “consensus sequence, as used herein in the con The term “library, as used herein in the context of nucleic text of nucleic acid sequences, refers to a calculated sequence acids or proteins, refers to a population of two or more dif representing the most frequent nucleotide residues found at ferent nucleic acids or proteins, respectively. For example, a each position in a plurality of similar sequences. Typically, a library of nuclease target sites comprises at least two nucleic consensus sequence is determined by sequence alignment in acid molecules comprising different nuclease target sites. In which similar sequences are compared to each other and some embodiments, a library comprises at least 10", at least similar sequence motifs are calculated. In the context of 10, at least 10, at least 10, at least 10, at least 10, at least nuclease target site sequences, a consensus sequence of a 107, at least 10, at least 10, at least 10', at least 10', at least nuclease target site may, in Some embodiments, be the 10', at least 10", at least 10", or at least 10" different sequence most frequently bound, or bound with the highest 10 nucleic acids or proteins. In some embodiments, the members affinity, by a given nuclease. With respect to RNA-program of the library may comprise randomized sequences, for mable nuclease (e.g., Cas9) target site sequences, the consen example, fully or partially randomized sequences. In some Sus sequence may, in some embodiments, be the sequence or embodiments, the library comprises nucleic acid molecules region to which a gRNA, or a plurality of gRNAs, is expected that are unrelated to each other, e.g., nucleic acids comprising or designed to bind, e.g., based on complementary base pair 15 fully randomized sequences. In other embodiments, at least 1ng. some members of the library may be related, for example, The term “effective amount, as used herein, refers to an they may be variants or derivatives of a particular sequence, amount of a biologically active agent that is Sufficient to elicit Such as a consensus target site sequence. a desired biological response. For example, in Some embodi The term “linker, as used herein, refers to a chemical ments, an effective amount of a nuclease may refer to the group or a molecule linking two adjacent molecules or moi amount of the nuclease that is sufficient to induce cleavage of eties, e.g., a binding domain and a cleavage domain of a a target site specifically bound and cleaved by the nuclease. nuclease. Typically, the linker is positioned between, or As will be appreciated by the skilled artisan, the effective flanked by, two groups, molecules, or other moieties and amount of an agent, e.g., a nuclease, a hybrid protein, or a connected to each one via a covalent bond, thus connecting polynucleotide, may vary depending on various factors as, for 25 the two. In some embodiments, the linker is an amino acid or example, on the desired biological response, the specific a plurality of amino acids (e.g., a peptide or protein). In some allele, genome, target site, cell, or tissue being targeted, and embodiments, the linker is an organic molecule, group, poly the agent being used. mer, or chemical moiety. The term “enediyne.” as used herein, refers to a class of The term “nuclease, as used herein, refers to an agent, for bacterial natural products characterized by either nine- and 30 example a protein or a small molecule, capable of cleaving a ten-membered rings containing two triple bonds separated by phosphodiester bond connecting nucleotide residues in a a double bond (see, e.g., K. C. Nicolaou: A. L. Smith; E. W. nucleic acid molecule. In some embodiments, a nuclease is a Yue (1993). “Chemistry and biology of natural and designed protein, e.g., an enzyme that can bind a nucleic acid molecule enediynes'. PNAS 90 (13): 5881–5888; the entire contents of and cleave a phosphodiester bond connecting nucleotide resi which are incorporated herein by reference). Some enediynes 35 dues within the nucleic acid molecule. A nuclease may be an are capable of undergoing Bergman cyclization, and the endonuclease, cleaving a phosphodiester bonds within a resulting diradical, a 1.4-dehydrobenzene derivative, is polynucleotide chain, or an exonuclease, cleaving a phos capable of abstracting hydrogen atoms from the Sugar back phodiester bond at the end of the polynucleotide chain. In bone of DNA which results in DNA strand cleavage (see, e.g., Some embodiments, a nuclease is a site-specific nuclease, S. Walker; R. Landovitz: W. D. Ding: G. A. Ellestad; D. 40 binding and/or cleaving a specific phosphodiester bond Kahne (1992). “Cleavage behavior of calicheamicingamma 1 within a specific nucleotide sequence, which is also referred and calicheamicin T. Proc Natl Acad Sci U.S.A. 89 (10): to herein as the “recognition sequence, the “nuclease target 4608-12; the entire contents of which are incorporated herein site.” or the “target site. In some embodiments, a nuclease is by reference). Their reactivity with DNA confers an antibiotic a RNA-guided (i.e., RNA-programmable) nuclease, which character to many enediynes, and some enediynes are clini 45 complexes with (e.g., binds with) an RNA having a sequence cally investigated as anticancer antibiotics. Nonlimiting that complements a target site, thereby providing the examples of enediynes are dynemicin, neocarzinostatin, cali sequence specificity of the nuclease. In some embodiments, a cheamicin, esperamicin (see, e.g., Adrian L. Smith and K. C. nuclease recognizes a single stranded target site, while in Bicolaou, “The Enediyne Antibiotics' J. Med. Chem., 1996, other embodiments, a nuclease recognizes a double-stranded 39 (11), pp. 2103-2117; and Donald Borders, “Enediyne anti 50 target site, for example a double-stranded DNA target site. biotics as antitumor agents.” Informa Healthcare; 1 edition The target sites of many naturally occurring nucleases, for (Nov. 23, 1994, ISBN-10: 082478.9385; the entire contents of example, many naturally occurring DNA restriction which are incorporated herein by reference). nucleases, are well known to those of skill in the art. In many The term “homing endonuclease, as used herein, refers to cases, a DNA nuclease, such as EcoRI, HindIII, or BamHI, a type of restriction enzymes typically encoded by introns or 55 recognize a palindromic, double-stranded DNA target site of inteins Edgell DR (February 2009). “Selfish DNA: homing 4 to 10 base pairs in length, and cut each of the two DNA endonucleases find a home'. Curr Biol 19 (3): R115-R117: Strands at a specific position within the target site. Some Jasin M (June 1996). “Genetic manipulation of genomes with endonucleases cut a double-stranded nucleic acid target site rare-cutting endonucleases”. Trends Genet 12 (6): 224-8; symmetrically, i.e., cutting both strands at the same position Burt A. KoufopanouV (December 2004). “Homing endonu 60 so that the ends comprise base-paired nucleotides, also clease genes: the rise and fall and rise again of a selfish referred to herein as blunt ends. Other endonucleases cut a element”. Curr Opin Genet Dev 14 (6): 609-15; the entire double-stranded nucleic acid target sites asymmetrically, i.e., contents of which are incorporated herein by reference. Hom cutting each Strand at a different position so that the ends ing endonuclease recognition sequences are long enough to comprise unpaired nucleotides. Unpaired nucleotides at the occur randomly only with a very low probability (approxi 65 end of a double-stranded DNA molecule are also referred to mately once every 7x10'bp), and are normally found in only as “overhangs, e.g., as “5'-overhang' or as “3'-overhang.” one instance per genome. depending on whether the unpaired nucleotide(s) form(s) the US 9,163,284 B2 17 18 5' or the 3' end of the respective DNA strand. Double-stranded ine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, DNA molecule ends ending with unpaired nucleotide(s) are O(6)-methylguanine, and 2-thiocytidine); chemically modi also referred to as sticky ends, as they can “stick to other fied bases; biologically modified bases (e.g., methylated double-stranded DNA molecule ends comprising comple bases); intercalated bases; modified Sugars (e.g. 2'-fluorori mentary unpaired nucleotide(s). A nuclease protein typically bose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or comprises a “binding domain that mediates the interaction modified phosphate groups (e.g., phosphorothioates and of the protein with the nucleic acid Substrate, and also, in 5'-N-phosphoramidite linkages). Some cases, specifically binds to a target site, and a “cleavage The term “pharmaceutical composition, as used herein, domain that catalyzes the cleavage of the phosphodiester refers to a composition that can be administrated to a subject bond within the nucleic acid backbone. In some embodiments 10 in the context of treatment of a disease or disorder. In some a nuclease protein can bind and cleave a nucleic acid mol embodiments, a pharmaceutical composition comprises an ecule in a monomeric form, while, in other embodiments, a active ingredient, e.g., a nuclease or a nucleic acid encoding a nuclease protein has to dimerize or multimerize in order to nuclease, and a pharmaceutically acceptable excipient. cleave a target nucleic acid molecule. Binding domains and The term “proliferative disease.” as used herein, refers to cleavage domains of naturally occurring nucleases, as well as 15 any disease in which cell or tissue homeostasis is disturbed in modular binding domains and cleavage domains that can be that a cell or cell population exhibits an abnormally elevated fused to create nucleases binding specific target sites, are well proliferation rate. Proliferative diseases include hyperprolif known to those of skill in the art. For example, Zinc fingers or erative diseases, such as pre-neoplastic hyperplastic condi transcriptional activator like elements can be used as binding tions and neoplastic diseases. Neoplastic diseases are charac domains to specifically binda desired target site, and fused or terized by an abnormal proliferation of cells and include both conjugated to a cleavage domain, for example, the cleavage benign and malignant neoplasias. Malignant neoplasia is also domain of FokI, to create an engineered nuclease cleaving the referred to as cancer. target site. The terms “protein.” “peptide.” and “polypeptide' are used The terms “nucleic acid' and “nucleic acid molecule, as interchangeably herein, and refer to a polymer of amino acid used herein, refer to a compound comprising a nucleobase 25 residues linked together by peptide (amide) bonds. The terms and an acidic moiety, e.g., a nucleoside, a nucleotide, or a refer to a protein, peptide, or polypeptide of any size, struc polymer of nucleotides. Typically, polymeric nucleic acids, ture, or function. Typically, a protein, peptide, or polypeptide e.g., nucleic acid molecules comprising three or more nucle will be at least three amino acids long. A protein, peptide, or otides are linear molecules, in which adjacent nucleotides are polypeptide may refer to an individual protein or a collection linked to each other via a phosphodiester linkage. In some 30 of proteins. One or more of the amino acids in a protein, embodiments, “nucleic acid refers to individual nucleic acid peptide, or polypeptide may be modified, for example, by the residues (e.g. nucleotides and/or nucleosides). In some addition of a chemical entity such as a carbohydrate group, a embodiments, “nucleic acid refers to an oligonucleotide hydroxyl group, a phosphate group, a farnesyl group, an chain comprising three or more individual nucleotide resi isofarnesyl group, a fatty acid group, a linker for conjugation, dues. As used herein, the terms "oligonucleotide' and “poly 35 functionalization, or other modification, etc. A protein, pep nucleotide' can be used interchangeably to refer to a polymer tide, or polypeptide may also be a single molecule or may be of nucleotides (e.g., a string of at least three nucleotides). In a multi-molecular complex. A protein, peptide, or polypep Some embodiments, “nucleic acid encompasses RNA as tide may be just a fragment of a naturally occurring protein or well as single and/or double-stranded DNA. Nucleic acids peptide. A protein, peptide, or polypeptide may be naturally may be naturally occurring, for example, in the context of a 40 occurring, recombinant, or synthetic, or any combination genome, a transcript, an mRNA, tRNA, rRNA, siRNA, thereof. A protein may comprise different domains, for SnRNA, a plasmid, cosmid, , chromatid, or other example, a nucleic acid binding domain and a nucleic acid naturally occurring nucleic acid molecule. On the other hand, cleavage domain. In some embodiments, a protein comprises a nucleic acid molecule may be a non-naturally occurring a proteinaceous part, e.g., an amino acid sequence constitut molecule, e.g., a recombinant DNA or RNA, an artificial 45 ing a nucleic acid binding domain, and an organic compound, chromosome, an engineered genome, or fragment thereof, or e.g., a compound that can act as a nucleic acid cleavage agent. a synthetic DNA, RNA, DNA/RNA hybrid, or including non In some embodiments, a protein is in a complex with, or is in naturally occurring nucleotides or nucleosides. Furthermore, association with, a nucleic acid, e.g., RNA. the terms “nucleic acid, “DNA “RNA and/or similar The term "randomized as used herein in the context of terms include nucleic acid analogs, i.e. analogs having other 50 nucleic acid sequences, refers to a sequence or residue within than a phosphodiester backbone. Nucleic acids can be puri a sequence that has been synthesized to incorporate a mixture fied from natural sources, produced using recombinant of free nucleotides, for example, a mixture of all four nucle expression systems and optionally purified, chemically Syn otides A, T, G, and C. Randomized residues are typically thesized, etc. Where appropriate, e.g., in the case of chemi represented by the letter N within a nucleotide sequence. In cally synthesized molecules, nucleic acids can comprise 55 Some embodiments, a randomized sequence or residue is nucleoside analogs such as analogs having chemically modi fully randomized, in which case the randomized residues are fied bases or Sugars, and backbone modifications. A nucleic synthesized by adding equal amounts of the nucleotides to be acid sequence is presented in the 5' to 3’ direction unless incorporated (e.g., 25% T. 25% A, 25% G, and 25% C) during otherwise indicated. In some embodiments, a nucleic acid is the synthesis step of the respective sequence residue. In some or comprises natural nucleosides (e.g. adenosine, thymidine, 60 embodiments, a randomized sequence or residue is partially guanosine, cytidine, uridine, deoxyadenosine, deoxythymi randomized, in which case the randomized residues are syn dine, deoxyguanosine, and deoxycytidine); nucleoside ana thesized by adding non-equal amounts of the nucleotides to logs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyr be incorporated (e.g., 79% T.7% A, 7% G, and 7% C) during rolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, the synthesis step of the respective sequence residue. Partial 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, 65 randomization allows for the generation of sequences that are C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cyti templated on a given sequence, but have incorporated muta dine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenos tions at a desired frequency. For example, ifa known nuclease US 9,163,284 B2 19 20 target site is used as a synthesis template, partial randomiza The terms “small molecule' and “organic compound are tion in which at each step the nucleotide represented at the used interchangeably herein and refer to molecules, whether respective residue is added to the synthesis at 79%, and the naturally-occurring or artificially created (e.g., via chemical other three nucleotides are added at 7% each, will result in a synthesis) that have a relatively low molecular weight. Typi mixture of partially randomized target sites being synthe cally, an organic compound contains carbon. An organic com sized, which still represent the consensus sequence of the pound may contain multiple carbon-carbon bonds, Stereo original target site, but which differ from the original target centers, and other functional groups (e.g., amines, hydroxyl, site at each residue with a statistical frequency of 21% for carbonyls, or heterocyclic rings). In some embodiments, each residue so synthesized (distributed binomially). In some organic compounds are monomeric and have a molecular embodiments, a partially randomized sequence differs from 10 weight of less than about 1500 g/mol. In certain embodi the consensus sequence by more than 5%, more than 10%, ments, the molecular weight of the Small molecule is less than more than 15%, more than 20%, more than 25%, or more than about 1000 g/mol or less than about 500 g/mol. In certain 30% on average, distributed binomially. In some embodi embodiments, the Small molecule is a drug, for example, a ments, a partially randomized sequence differs from the con drug that has already been deemed safe and effective for use sensus site by no more than 10%, no more than 15%, no more 15 in humans or animals by the appropriate governmental than 20%, no more than 25%, nor more than 30%, no more agency or regulatory body. In certain embodiments, the than 40%, or no more than 50% on average, distributed bino organic molecule is known to bind and/or cleave a nucleic mially. acid. In some embodiments, the organic compound is an The term “RNA-programmable nuclease, and “RNA enediyne. In some embodiments, the organic compound is an guided nuclease' are used interchangeably herein and refer to antibiotic drug, for example, an anticancer antibiotic Such as a nuclease that forms a complex with (e.g., binds or associates dynemicin, neocarzinostatin, calicheamicin, esperamicin, with) one or more RNA that is not a target for cleavage. In bleomycin, or a derivative thereof. Some embodiments, an RNA-programmable nuclease, when The term “subject as used herein, refers to an individual in a complex with an RNA, may be referred to as a nuclease: organism, for example, an individual mammal. In some RNA complex. Typically, the bound RNA(s) is referred to as 25 embodiments, the Subject is a human. In some embodiments, a guide RNA (gRNA) or a single-guide RNA (sgRNA). The the Subject is a non-human mammal. In some embodiments, gRNA?sgRNA comprises a nucleotide sequence that comple the Subject is a non-human primate. In some embodiments, ments a target site, which mediates binding of the nucleasef the Subject is a rodent. In some embodiments, the Subject is a RNA complex to said target site and providing the sequence sheep, a goat, a cattle, a cat, or a dog. In some embodiments, specificity of the nuclease:RNA complex. In some embodi 30 the Subject is a vertebrate, an amphibian, a reptile, a fish, an ments, the RNA-programmable nuclease is the (CRISPR insect, a fly, or a nematode. associated system) Cas9 endonuclease, for example Cas9 The terms “target nucleic acid,” and “target genome as (Csn1) from Streptococcus pyogenes (see, e.g., "Complete used herein in the context of nucleases, refer to a nucleic acid genome sequence of an M1 Strain of Streptococcus pyo molecule or a genome, respectively, that comprises at least genes.” Ferretti J. J., McShan W. M., Ajdic D.J., Savic D.J., 35 one target site of a given nuclease. Savic G. Lyon K., Primeaux C., Sezate S., Suvorov A. N., The term “target site.” used herein interchangeably with Kenton S. Lai H.S., Lin S. P., Qian Y., Jia H. G., Najar F. Z. the term “nuclease target site refers to a sequence within a Ren Q., Zhu H., Song L. expand/collapse author list nucleic acid molecule that is bound and cleaved by a nuclease. McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658 A target site may be single-stranded or double-stranded. In 4663(2001): “CRISPR RNA maturation by trans-encoded 40 the context of nucleases that dimerize, for example, nucleases small RNA and host factor RNase III. Deltcheva E., Chylin comprising a FokI DNA cleavage domain, a target sites typi ski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z.A., cally comprises a left-halfsite (bound by one monomer of the Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607 nuclease), a right-half site (bound by the second monomer of (2011); and “A programmable dual-RNA-guided DNA endo the nuclease), and a spacer sequence between the half sites in nuclease in adaptive bacterial immunity.” Jinek M. Chylinski 45 which the cut is made. This structure (left-half site-spacer K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Sci sequence-right-half site) is referred to herein as an LSR ence 337:816-821 (2012), the entire contents of each of which structure. In some embodiments, the left-half site and/or the are incorporated herein by reference right-half site is between 10-18 nucleotides long. In some Because RNA-programmable nucleases (e.g., Cas9) use embodiments, either or both half-sites are shorter or longer. In RNA:DNA hybridization to determine target DNA cleavage 50 Some embodiments, the left and right half sites comprise sites, these proteins are able to cleave, in principle, any different nucleic acid sequences. In the context of Zinc finger sequence specified by the guide RNA. Methods of using nucleases, target sites may, in Some embodiments comprise RNA-programmable nucleases, such as Cas9, for site-spe two half-sites that are each 6-18 bp long flanking a non cific cleavage (e.g., to modify a genome) are known in the art specified spacer region that is 4-8 bp long. In the context of (See e.g., Cong, L. etal. Multiplex genome engineering using 55 TALENs, target sites may, in some embodiments, comprise CRISPR/Cas systems. Science 339,819-823 (2013); Mali, P. two half-sites sites that are each 10-23 bp long flanking a et al. RNA-guided human genome engineering via Cas9. non-specified spacer region that is 10-30 bp long. In the Science 339, 823-826 (2013); Hwang, W.Y. et al. Efficient context of RNA-guided (e.g., RNA-programmable) genome editing in zebrafish using a CRISPR-Cas system. nucleases, a target site typically comprises a nucleotide Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. 60 sequence that is complementary to the sgRNA of the RNA RNA-programmed genome editing in human cells. eLife 2, programmable nuclease, and a protospacer adjacent motif e(00471 (2013); Dicarlo, J. E. et al. Genome engineering in (PAM) at the 3' end adjacent to the sgRNA-complementary Saccharomyces cerevisiae using CRISPR-Cas systems. sequence. For the RNA-guided nuclease Cas9, the target site Nucleic acids research (2013); Jiang, W. et al. RNA-guided may be, in Some embodiments, 20 base pairs plus a 3 base pair editing of bacterial genomes using CRISPR-Cas systems. 65 PAM (e.g., NNN, wherein N represents any nucleotide). Nature biotechnology 31, 233-239 (2013); the entire contents Typically, the first nucleotide of a PAM can be any nucleotide, of each of which are incorporated herein by reference). while the two downstream nucleotides are specified depend US 9,163,284 B2 21 22 ing on the specific RNA-guided nuclease. Exemplary target Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V. et al. sites for RNA-guided nucleases, such as Cas9, are known to (2011). “Efficient design and assembly of custom TALEN those of skill in the art and include, without limitation, NNG, and other TAL effector-based constructs for DNA targeting. NGN, NAG, and NGG, wherein N represents any nucleotide. Nucleic Acids Research; Morbitzer, R.; Elsaesser, J.; Haus In addition, Cas9 nucleases from different species (e.g., S. ner, J.; Lahaye, T. (2011). “Assembly of custom TALE-type thermophilus instead of S. pyogenes) recognizes a PAM that DNA binding domains by modular cloning. Nucleic Acids comprises the sequence NGGNG. Additional PAM Research; Li, T.; Huang, S.; Zhao, X. Wright, D.A.: Carpen sequences are known, including, but not limited to NNA ter, S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). GAAW and NAAR (see, e.g., Esvelt and Wang, Molecular “Modularly assembled designer TAL effector nucleases for Systems Biology, 9:641 (2013), the entire contents of which 10 targeted gene knockout and gene replacement in eukaryotes'. are incorporated herein by reference). For example, the target Nucleic Acids Research. Weber, E.; Gruetzner, R.; Werner, site of an RNA-guided nuclease, such as, e.g., Cas9, may S.: Engler, C.; Marillonnet, S. (2011). Bendahmane, Moham comprise the structure N-PAM), where each N is, inde med. ed. “Assembly of Designer TAL Effectors by Golden pendently, any nucleotide, and 2 is an integer between 1 and Gate Cloning”. PLoS ONE 6 (5): e19722; the entire contents 50. In some embodiments, 2 is at least 2, at least 3, at least 4, 15 of each of which are incorporated herein by reference). at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, The terms “treatment,” “treat,” and “treating.” refer to a at least 11, at least 12, at least 13, at least 14, at least 15, at least clinical intervention aimed to reverse, alleviate, delay the 16, at least 17, at least 18, at least 19, at least 20, at least 25, onset of, or inhibit the progress of a disease or disorder, or one at least 30, at least 35, at least 40, at least 45, or at least 50. In or more symptoms thereof, as described herein. As used some embodiments, is 5, 6,7,8,9, 10, 11, 12, 13, 14, 15, 16, herein, the terms “treatment,” “treat,” and “treating refer to a 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, clinical intervention aimed to reverse, alleviate, delay the 34, 35,36, 37,38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or onset of, or inhibit the progress of a disease or disorder, or one 50. In some embodiments, Z is 20. or more symptoms thereof, as described herein. In some The term “Transcriptional Activator-Like Effector.” embodiments, treatment may be administered after one or (TALE) as used herein, refers to bacterial proteins comprising 25 more symptoms have developed and/or after a disease has a DNA binding domain, which contains a highly conserved been diagnosed. In other embodiments, treatment may be 33-34 amino acid sequence comprising a highly variable administered in the absence of symptoms, e.g., to prevent or two-amino acid motif (Repeat Variable Diresidue, RVD). The delay onset of a symptom or inhibit onset or progression of a RVD motif determines binding specificity to a nucleic acid disease. For example, treatment may be administered to a sequence, and can be engineered according to methods well 30 Susceptible individual prior to the onset of symptoms (e.g., in known to those of skill in the art to specifically bind a desired light of a history of symptoms and/or in light of genetic or DNA sequence (see, e.g., Miller, Jeffrey; et. al. (February other susceptibility factors). Treatment may also be contin 2011). “A TALE nuclease architecture for efficient genome ued after symptoms have resolved, for example to prevent or editing. Nature Biotechnology 29 (2): 143-8; Zhang, Feng; delay their recurrence. et. al. (February 2011). “Efficient construction of sequence 35 The term "Zinc finger, as used herein, refers to a small specific TAL effectors for modulating mammalian transcrip nucleic acid-binding protein structural motif characterized by tion’. Nature Biotechnology 29 (2): 149-53; Gei?ler, R.; a fold and the coordination of one or more Zinc ions that Scholze, H.; Hahn, S.; Streubel, J.; Bonas, U.: Behrens, S. E.; stabilize the fold. Zinc fingers encompass a wide variety of Boch, J. (2011), Shiu, Shin-Han. ed. “Transcriptional Activa differing protein structures (see, e.g., Klug A, Rhodes D tors of Human Genes with Programmable DNA-Specificity'. 40 (1987). “Zinc fingers: a novel protein fold for nucleic acid PLoS ONE 6 (5): e19509; Boch, Jens (February 2011). recognition'. Cold Spring Harb. Symp. Quant. Biol. 52:473 “TALEs of genome targeting. Nature Biotechnology 29 (2): 82, the entire contents of which are incorporated herein by 135-6; Boch, Jens; et al. (December 2009). “Breaking the reference). Zinc fingers can be designed to bind a specific Code of DNA Binding Specificity of TAL-Type III Effec sequence of nucleotides, and Zinc finger arrays comprising tors'. Science 326 (5959): 1509-12; and Moscou, Matthew J.; 45 fusions of a series of Zinc fingers, can be designed to bind Adam J. Bogdanove (December 2009). ''A Simple Cipher virtually any desired target sequence. Such Zinc finger arrays Governs DNA Recognition by TAL Effectors”. Science 326 can form a binding domain of a protein, for example, of a (5959): 1501; the entire contents of each of which are incor nuclease, e.g., if conjugated to a nucleic acid cleavage porated herein by reference). The simple relationship domain. Different type of zinc finger motifs are known to between amino acid sequence and DNA recognition has 50 those of skill in the art, including, but not limited to, allowed for the engineering of specific DNA binding domains Cys-His, Gag knuckle, Treble clef, Zinc ribbon, Zn/Cys, by selecting a combination of repeat segments containing the and TAZ2 domain-like motifs (see, e.g., Krishna SS. Majum appropriate RVDs. dar I, Grishin NV (January 2003). "Structural classification The term “Transcriptional Activator-Like Element of zinc fingers: survey and summary'. Nucleic Acids Res. 31 Nuclease.” (TALEN) as used herein, refers to an artificial 55 (2): 532–50). Typically, a single Zinc finger motif binds 3 or 4 nuclease comprising a transcriptional activator like effector nucleotides of a nucleic acid molecule. Accordingly, a Zinc DNA binding domain to a DNA cleavage domain, for finger domain comprising 2 Zinc finger motifs may bind 6-8 example, a FokI domain. A number of modular assembly nucleotides, a Zinc finger domain comprising 3 Zinc finger schemes for generating engineered TALE constructs have motifs may bind 9-12 nucleotides, a Zinc finger domain com been reported (see e.g., Zhang, Feng, et. al. (February 2011). 60 prising 4 Zinc finger motifs may bind 12-16 nucleotides, and “Efficient construction of sequence-specific TAL effectors So forth. Any Suitable protein engineering technique can be for modulating mammalian transcription’. Nature Biotech employed to alter the DNA-binding specificity of zinc fingers nology 29 (2): 149-53; Gei?ler, R.; Scholze, H.; Hahn, S.; and/or design novel Zinc finger fusions to bind virtually any Streubel, J.; Bonas, U.: Behrens, S. E.; Boch, J. (2011), Shiu, desired target sequence from 3-30 nucleotides in length (see, Shin-Han. ed. “Transcriptional Activators of Human Genes 65 e.g., Pabo CO, Peisach E. Grant RA (2001). “Design and with Programmable DNA-Specificity”. PLoS ONE 6 (5): selection of novel cys2His2 Zinc finger proteins’. Annual e19509; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; Review of Biochemistry 70:313-340; Jamieson AC, Miller J US 9,163,284 B2 23 24 C, Pabo CO (2003). “Drug discovery with engineered zinc Some embodiments, the cleavage domain of a Zinc finger finger proteins”. Nature Reviews Drug Discovery 2 (5): 361 nuclease has to dimerize in order to cut a bound nucleic acid. 368; and Liu Q, Segal DJ. Ghiara J. B. Barbas C F (May In some such embodiments, the dimer is a heterodimer of two 1997). “Design of polydactyl zinc-finger proteins for unique monomers, each of which comprise a different Zinc finger addressing within complex genomes. Proc. Natl. Acad. Sci. binding domain. For example, in Some embodiments, the U.S.A. 94 (11); the entire contents of each of which are incor dimer may comprise one monomer comprising Zinc finger porated herein by reference). Fusions between engineered domain A conjugated to a FokI cleavage domain, and one Zinc finger arrays and protein domains that cleave a nucleic monomer comprising Zinc finger domain B conjugated to a acid can be used to generate a "Zinc finger nuclease. A Zinc FokI cleavage domain. In this nonlimiting example, Zinc fin finger nuclease typically comprises a Zinc finger domain that 10 binds a specific target site within a nucleic acid molecule, and ger domain Abinds a nucleic acid sequence on one side of the a nucleic acid cleavage domain that cuts the nucleic acid target site, Zinc finger domain B binds a nucleic acid sequence molecule within or in proximity to the target site bound by the on the other side of the target site, and the dimerize FokI binding domain. Typical engineered Zinc finger nucleases domain cuts the nucleic acid in between the Zinc finger comprise a binding domain having between 3 and 6 indi 15 domain binding sites. vidual Zinc finger motifs and binding target sites ranging from 9 base pairs to 18 base pairs in length. Longer target sites are DETAILED DESCRIPTION OF CERTAIN particularly attractive in situations where it is desired to bind EMBODIMENTS OF THE INVENTION and cleave a target site that is unique in a given genome. The term “zinc finger nuclease.” as used herein, refers to a Introduction nuclease comprising a nucleic acid cleavage domain conju Site-specific nucleases are powerful tools for targeted gated to a binding domain that comprises a Zinc finger array. genome modification in vitro or in vivo. Some site specific In some embodiments, the cleavage domain is the cleavage nucleases can theoretically achieve a level of specificity for a domain of the type II restriction endonuclease FokI. Zinc target cleavage site that would allow one to target a single finger nucleases can be designed to target virtually any 25 unique site in a genome for cleavage without affecting any desired sequence in a given nucleic acid molecule for cleav other genomic site. It has been reported that nuclease cleav age, and the possibility to the design Zinc finger binding age in living cells triggers a DNA repair mechanism that domains to bind unique sites in the context of complex frequently results in a modification of the cleaved, repaired genomes allows for targeted cleavage of a single genomic site genomic sequence, for example, via homologous recombina in living cells, for example, to achieve a targeted genomic 30 tion. Accordingly, the targeted cleavage of a specific unique alteration of therapeutic value. Targeting a double-strand sequence within a genome opens up new avenues for gene break to a desired genomic locus can be used to introduce targeting and gene modification in living cells, including cells frame-shift mutations into the coding sequence of a gene due that are hard to manipulate with conventional gene targeting to the error-prone nature of the non-homologous DNA repair methods, Such as many human Somatic or embryonic stem pathway. Zinc finger nucleases can be generated to target a 35 cells. Nuclease-mediated modification of disease-related site of interest by methods well known to those of skill in the sequences, e.g., the CCR-5 allele in HIV/AIDS patients, or of art. For example, Zinc finger binding domains with a desired genes necessary for tumor neovascularization, can be used in specificity can be designed by combining individual Zinc the clinical context, and two site specific nucleases are cur finger motifs of known specificity. The structure of the Zinc rently in clinical trials. finger protein Zif268 bound to DNA has informed much of 40 One important aspect in the field of site-specific nuclease the work in this field and the concept of obtaining Zinc fingers mediated modification are off-target nuclease effects, e.g., the for each of the 64 possible base pair triplets and then mixing cleavage of genomic sequences that differ from the intended and matching these modular Zinc fingers to design proteins target sequence by one or more nucleotides. Undesired side with any desired sequence specificity has been described effects of off-target cleavage range from insertion into (Pavletich N P. Pabo CO (May 1991). “Zinc finger-DNA 45 unwanted loci during a gene targeting event to severe com recognition: crystal structure of a Zif268-DNA complex at plications in a clinical scenario. Off-target cleavage of 2.1 A. Science 252 (5007): 809-17, the entire contents of sequences encoding essential gene functions or tumor Sup which are incorporated herein). In some embodiments, sepa pressor genes by an endonuclease administered to a subject rate Zinc fingers that each recognize a 3 base pair DNA may result in disease or even death of the Subject. Accord sequence are combined to generate 3-, 4-, 5-, or 6-finger 50 ingly, it is desirable to characterize the cleavage preferences arrays that recognize target sites ranging from 9 base pairs to of a nuclease before using it in the laboratory or the clinic in 18 base pairs in length. In some embodiments, longer arrays order to determine its efficacy and safety. Further, the char are contemplated. In other embodiments, 2-finger modules acterization of nuclease cleavage properties allows for the recognizing 6-8 nucleotides are combined to generate 4-, 6-, selection of the nuclease best suited for a specific task from a or 8-Zinc finger arrays. In some embodiments, bacterial or 55 group of candidate nucleases, or for the selection of evolution phage display is employed to develop a Zinc finger domain products obtained from a plurality of nucleases. Such a char that recognizes a desired nucleic acid sequence, for example, acterization of nuclease cleavage properties may also inform a desired nuclease target site of 3-30 bp in length. Zinc finger the de-novo design of nucleases with enhanced properties, nucleases, in some embodiments, comprise a Zinc finger Such as enhanced specificity or efficiency. binding domain and a cleavage domain fused or otherwise 60 In many scenarios where a nuclease is employed for the conjugated to each other via a linker, for example, a polypep targeted manipulation of a nucleic acid, cleavage specificity is tide linker. The length of the linker determines the distance of a crucial feature. The imperfect specificity of Some engi the cut from the nucleic acid sequence bound by the Zinc neered nuclease binding domains can lead to off-target cleav finger domain. If a shorterlinker is used, the cleavage domain age and undesired effects both in vitro and in vivo. Current will cut the nucleic acid closer to the bound nucleic acid 65 methods of evaluating site-specific nuclease specificity, sequence, while a longer linker will result in a greater dis including ELISA assays, microarrays, one-hybrid systems, tance between the cut and the bound nucleic acid sequence. In SELEX, and its variants, and Rosetta-based computational US 9,163,284 B2 25 26 predictions, are all premised on the assumption that the bind bleomycin). Suitable nucleases in addition to the ones ing specificity of the nuclease is equivalent or proportionate described herein will be apparent to those of skill in the art to their cleavage specificity. based on this disclosure. It was previously discovered that the prediction of nuclease Further, the methods, reagents, and strategies provided off-target binding effects constitute an imperfect approxima 5 herein allow those of skill in the art to identify, design, and/or tion of a nucleases off-target cleavage effects that may result select nucleases with enhanced specificity and minimize the in undesired biological effects (see PCT Application WO off-target effects of any given nuclease (e.g., site-specific 2013/066438; and Pattanayak, V., Ramirez, C. L., Joung, J. K. nucleases such as ZFNs, and TALENS which produce cleav & Liu, D. R. Revealing off-target cleavage specificities of age products with Sticky ends, as well as RNA-programmable 10 nucleases, for example Cas9, which produce cleavage prod Zinc-finger nucleases by in vitro selection. Nature methods 8, ucts having bluntends). While of particular relevance to DNA 765-770 (2011), the entire contents of each of which are and DNA-cleaving nucleases, the inventive concepts, meth incorporated herein by reference). This finding was consis ods, strategies, and reagents provided herein are not limited in tent with the notion that the reported toxicity of some site this respect, but can be applied to any nucleic acidinuclease specific DNA nucleases results from off-target DNA cleav 15 pair. age, rather than off-target binding alone. Identifying Nuclease Target Sites Cleaved by a Site-Specific The methods and reagents of the present disclosure repre Nuclease sent, in some aspects, an improvement over previous methods Some aspects of this disclosure provide improved methods and allow for an accurate evaluation of a given nuclease's and reagents to determine the nucleic acid target sites cleaved target site specificity and provide Strategies for the selection by any site-specific nuclease. The methods provided herein of Suitable unique target sites and the design or selection of can be used for the evaluation of target site preferences and highly specific nucleases for the targeted cleavage of a single specificity of both nucleases that create blunt ends and site in the context of a complex genome. For example, some nucleases that create Sticky ends. In general. Such methods previously reported methods for determining nuclease target comprise contacting a given nuclease with a library of target site specificity profiles by screening libraries of nucleic acid 25 sites under conditions suitable for the nuclease to bind and cut molecules comprising candidate target sites relied on a “two a target site, and determining which target sites the nuclease cut in vitro selection method which requires indirect recon actually cuts. A determination of a nuclease's target site pro struction of target sites from sequences of two half-sites file based on actual cutting has the advantage over methods resulting from two adjacent cuts of the nuclease of a library that rely on binding in that it measures a parameter more member nucleic acid (see e.g., Pattanayak, V. et al., Nature 30 relevant for mediating undesired off-target effects of site Methods 8, 765-770 (2011)). In contrast to such “two-cut” specific nucleases. In general, the methods provided herein strategies, the methods of the present disclosure utilize a "one comprise ligating an adapter of a known sequence to nucleic cut screening strategy, which allows for the identification of acid molecules that have been cut by a nuclease of interest via library members that have been cut at least once by the 5'-phosphate-dependent ligation. Accordingly, the methods nuclease. The “one-cut’ selection strategies provided herein 35 provided herein are particularly useful for identifying target are compatible with single end high-throughput sequencing sites cut by nucleases that leave a phosphate moiety at the methods and do not require computational reconstruction of 5'-end of the cut nucleic acid strand when cleaving their target cleaved target sites from cut half-sites because they feature, in site. Afterligating an adapter to the 5'-end of a cut nucleic acid Some embodiments, direct sequencing of an intact target Strand, the cut strand can directly be sequenced using the nuclease sequence in a cut library member nucleic acid. 40 adapter as a sequencing linker, or a part of the cut library Additionally, the presently disclosed “one-cut’ screening member concatemer comprising an intact target site identical methods utilize concatemers of a candidate nuclease target to the cut target site can be amplified via PCR and the ampli site and constant insert region that are about 10-fold shorter fication product can then be sequenced. than previously reported constructs used for two-cut strate In Some embodiments, the method comprises (a) providing gies (~50 bp repeat sequence length versus ~500 bp repeat 45 a nuclease that cuts a double-stranded nucleic acid target site, sequence length in previous reports). This difference in repeat wherein cutting of the target site results in cut nucleic acid sequence length in the concatemers of the library allows for Strands comprising a 5'-phosphate moiety; (b) contacting the the generation of highly complex libraries of candidate nuclease of (a) with a library of candidate nucleic acid mol nuclease target sites, e.g., of libraries comprising 10" differ ecules, wherein each nucleic acid molecule comprises a con ent candidate nuclease target sequences. As described herein, 50 catemer of a sequence comprising a candidate nuclease target an exemplary library of Such complexity has been generated, site and a constant insert sequence, under conditions Suitable templated on a known Cas9 nuclease target site by varying the for the nuclease to cut a candidate nucleic acid molecule sequence of the known target site. The exemplary library comprising a target site of the nuclease; and (c) identifying demonstrated that a greater than 10-fold coverage of all nuclease target sites cut by the nuclease in (b) by determining sequences with eight or fewer mutations of the known target 55 the sequence of an uncut nuclease target site on the nucleic site can be achieved using the strategies provided herein. The acid strand that was cut by the nuclease in step (b). use of a shorter repeat sequence also allows the use of single In some embodiments, the method comprises providing a end sequencing, since both a cut half-site and an adjacent nuclease and contacting the nuclease with a library of candi uncut site of the same library member are contained within a date nucleic acid molecules comprising candidate target sites. 100 nucleotide sequencing read. 60 In some embodiments, the candidate nucleic acid molecules The strategies, methods, libraries, and reagents provided are double-stranded nucleic acid molecules. In some embodi hereincan be utilized to analyze the sequence preferences and ments, the candidate nucleic acid molecules are DNA mol specificity of any site-specific nuclease, for example, to Zinc ecules. In some embodiments, each nucleic acid molecule in Finger Nucleases (ZFNs). Transcription Activator-Like the library comprises a concatemer of a sequence comprising Effector Nucleases (TALENs), homing endonucleases, 65 a candidate nuclease target site and a constant insert organic compound nucleases, and enediyne antibiotics (e.g., sequence. For example, in Some embodiments, the library dynemicin, neocarzinostatin, calicheamicin, esperamicin, comprises nucleic acid molecules that comprise the structure US 9,163,284 B2 27 28 R-(candidate nuclease target site)-(constant insert double-stranded nucleic acid target site and creates blunt sequence).-R, wherein R and R are, independently, ends. In other embodiments, the nuclease creates a 5'-over nucleic acid sequences that may comprise a fragment of the hang. In some such embodiments, the target site comprises a (candidate nuclease target site)-(constant insert sequence) left-half site-spacer sequence-right-half site (LSR) structure, and n is an integer between 2 and y. In some 5 structure, and the nuclease cuts the target site within the embodiments, y is at least 10", at least 10, at least 10, at least Spacer Sequence. 10, at least 10, at least 10, at least 107, at least 10, at least 10, at least 10', at least 10', at least 10', at least 10", at In some embodiments, a nuclease cuts a double-stranded least 10", or at least 10'. In some embodiments, y is less target site and creates blunt ends. In some embodiments, a than 10°, less than 10, less than 10, less than 10, less than 10 nuclease cuts a double-stranded target site and creates an 10, less than 107, less than 10, less than 10, less than 10'. overhang, or Sticky end, for example, a 5'-overhang. In some less than 10'', less than 10°, less than 10", less than 10", or Such embodiments, the method comprises filling in the less than 10" 5'-overhangs of nucleic acid molecules produced from a For example, in some embodiments, the candidate nucleic nucleic acid molecule that has been cut once by the nuclease, acid molecules of the library comprise a candidate nuclease 15 wherein the nucleic acid molecules comprise a constant insert target site of the structure (N2)-(PAM), and, thus, the sequence flanked by a left or right half-site and cut spacer nucleic acid molecules of the library comprise the structure sequence on one side, and an uncut target site sequence on the R—(N2)-(PAM)-(constant region)-R, wherein R and other side, thereby creating blunt ends. Rare, independently, nucleic acid sequences that may com In some embodiments, the determining of step (c) com prise afragment of the (N2)-(PAM)-(constant region) repeat 20 prises ligating a first nucleic acid adapter to the 5' end of a unit; each N represents, independently, any nucleotide; Z is an nucleic acid strand that was cut by the nuclease in step (b) via integer between 1 and 50; and X is an integer between 2 and 5'-phosphate-dependent ligation. In some embodiments, the y. In some embodiments, y is at least 10", at least 10, at least nuclease creates bluntends. In Such embodiments, an adapter 10, at least 10, at least 10, at least 10, at least 107, at least can directly be ligated to the blunt ends resulting from the 10, at least 10, at least 10', at least 10', at least 10', at 25 nuclease cut of the target site by contacting the cut library least 10", at least 10", or at least 10'. In some embodi members with a double-stranded, blunt-ended adapter lack ments, y is less than 10°, less than 10, less than 10, less than ing 5' phosphorylation. In some embodiments, the nuclease 10, less than 10, less than 107, less than 10, less than 10, creates an overhang (sticky end). In some Such embodiments, less than 10", less than 10', less than 10', less than 10", an adapter may be ligated to the cut site by contacting the cut less than 10", or less than 10'. In some embodiments, Z is at 30 library member with an excess of adapter having a compatible least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, Sticky end. If a nuclease is used that cuts within a constant at least 14, at least 15, at least 16, at least 17, at least 18, at least spacer sequence between variable half-sites, the Sticky end 19, at least 20, at least 25, at least 30, at least 35, at least 40, can be designed to match the 5' overhang created from the at least 45, or at least 50. In some embodiments, Z is 20. Each 35 spacer sequence. In embodiments, where the nuclease cuts N represents, independently, any nucleotide. Accordingly, a within a variable sequence, a population of adapters having a sequence provided as N2 with 2 would be NN, with each N. variable overhang sequence and a constant annealed independently, representing A, T, G, or C. Accordingly, NZ sequence (for use as a sequencing linker or PCR primer) may with 2 can represent AA, AT AG, AC, TA, TT, TG, TC, GA, be used, or the 5' overhangs may be filled into form bluntends GT, GG, GC, CA, CT, CG, and CC. 40 before adapter ligation. In other embodiments, the candidate nucleic acid mol In some embodiments, the determining of step (c) further ecules of the library comprise a candidate nuclease target site comprises amplifying a fragment of the concatemer cut by the of the structure left-half site-spacer sequence-right-half nuclease that comprises an uncut target site via PCR using a site (“LSR), and, thus, the nucleic acid molecules of the PCR primer that hybridizes with the adapter and a PCR library comprise the structure R-(LSR)-(constant 45 primer that hybridizes with the constant insert sequence. region)-R, wherein R and Rare, independently, nucleic Typically, the amplification of concatemers via PCR will acid sequences that may comprise a fragment of the (LSR)- yield amplicons comprising at least one intact candidate tar (constant region) repeat unit, and X is an integer between 2 get site identical to the cut target sites because the target sites andy. In some embodiments, y is at least 10", at least 10, at in each concatemer are identical. For single-direction least 10, at least 10, at least 10, at least 10, at least 107, at 50 sequencing, an enrichment of amplicons that comprise one least 10, at least 10, at least 10', at least 10', at least 10', intact target site, no more than two intact target sites, no more at least 10", at least 10", or at least 10'. In some embodi than three intact target sites, no more than four intact target ments, y is less than 10°, less than 10, less than 10, less than sites, or no more than five intact target sites may be desirable. 10, less than 10, less than 107, less than 10, less than 10, In embodiments where PCR is used for amplification of cut less than 10", less than 10', less than 10', less than 10", 55 nucleic acid molecules, the PCR parameters can be optimized less than 10", or less than 10'. The constant region, in some to favor the amplification of short sequences and disfavor the embodiments, is of a length that allows for efficient self amplification of longer sequences, e.g., by using a short elon ligation of a single repeat unit. Suitable lengths will be appar gation time in the PCR cycle. Another possibility for enrich ent to those of skill in the art. For example, in some embodi ment of short amplicons is size fractionation, e.g., via gel ments, the constant region is between 5 and 100 base pairs 60 electrophoresis or size exclusion chromatography. Size frac long, for example, about 5 base pairs, about 10 base pairs, tionation can be performed before and/or after amplification. about 15 base pairs, about 20 base pairs, about 25 base pairs, Other suitable methods for enrichment of short amplicons about 30 base pairs, about 35 base pairs, about 40 base pairs, will be apparent to those of skill in the art and the disclosure about 50 base pairs, about 60 base pairs, about 70 base pairs, is not limited in this respect. about 80 base pairs, about 90 base pairs, or about 100 base 65 In some embodiments, the determining of step (c) com pairs long. In some embodiments, the constant region is 16 prises sequencing the nucleic acid strand that was cut by the base pairs long. In some embodiments, the nuclease cuts a nuclease in step (b), or a copy thereof obtained via amplifi US 9,163,284 B2 29 30 cation, e.g., by PCR. Sequencing methods are well known to Incubation of the nuclease with the library nucleic acids those of skill in the art. The disclosure is not limited in this will result in cleavage of those concatemers in the library that respect. comprise target sites that can be bound and cleaved by the In some embodiments, the nuclease being profiled using nuclease. If a given nuclease cleaves a specific target site with the inventive system is an RNA-programmable nuclease that high efficiency, a concatemer comprising target sites will be forms a complex with an RNA molecule, and wherein the cut, e.g., once or multiple times, resulting in the generation of nuclease:RNA complex specifically binds a nucleic acid fragments comprising a cut target site adjacent to one or more sequence complementary to the sequence of the RNA mol repeat units. Depending on the structure of the library mem ecule. In some embodiments, the RNA molecule is a single bers, an exemplary cut nucleic acid molecule released from a guide RNA (sgRNA). In some embodiments, the sgRNA 10 library member concatemer by a single nuclease cleavage comprises 5-50 nucleotides, 10-30 nucleotides, 15-25 nucle may, for example, be of the structure (cut target site)-(con otides, 18-22 nucleotides, 19-21 nucleotides, e.g., 10, 11, 12, stant region)-(target site)-(constant region).-R. For 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, example, in the context of an RNA-guided nuclease, an exem or 30 nucleotides. In some embodiments, the sgRNA com plary cut nucleic acid molecule released from a library mem prises 5-50 nucleotides, 10-30 nucleotides, 15-25 nucle 15 ber concatemer by a single nuclease cleavage may, for otides, 18-22 nucleotides, 19-21 nucleotides, e.g., 10, 11, 12, example, be of the structure (PAM)-(constant region)-(N2)- 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, (PAM)-(constant region).-R. And in the context of a or 30 nucleotides that are complementary to a sequence of the nuclease cutting an LSR structure within the spacer region, an nuclease target site. In some embodiments, the sgRNA com exemplary cut nucleic acid molecule released from a library prises 20 nucleotides that are complementary to the nuclease member concatemer by a single nuclease cleavage may, for target site. In some embodiments, the nuclease is a Cas9 example, be of the structure (cut spacer region)-(right half nuclease. In some embodiments, the nuclease target site com site)-(constant region)-(LSR)-(constant region).-R. Such prises a sgRNA-complementary sequence-protospacer cut fragments released from library candidate molecules can adjacent motif (PAM) structure, and the nuclease cuts the then be isolated and/or the sequence of the target site cleaved target site within the sgRNA-complementary sequence. In 25 by the nuclease identified by sequencing an intact target site Some embodiments, the sgRNA-complementary sequence (e.g., an intact (N2)-(PAM) site of released repeat units. See, comprises 5-50 nucleotides, 10-30 nucleotides, 15-25 nucle e.g., FIG. 1B for an illustration. otides, 18-22 nucleotides, 19-21 nucleotides, e.g., 10, 11, 12, Suitable conditions for exposure of the library of nucleic 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, acid molecules will be apparent to those of skill in the art. In or 30 nucleotides. 30 Some embodiments, Suitable conditions do not result in dena In some embodiments, the RNA-programmable nuclease turation of the library nucleic acids or the nuclease and allow is a Cas9 nuclease. The RNA-programmable Cas9 endonu for the nuclease to exhibit at least 50%, at least 60%, at least clease cleaves double-stranded DNA (dsDNA) at sites adja 70%, at least 80%, at least 90%, at least 95%, or at least 98% cent to a two-base-pair PAM motif and complementary to a of its nuclease activity. guide RNA sequence (sgRNA). Typically, the sgRNA 35 Additionally, if a given nuclease cleaves a specific target sequence that is complementary to the target site sequence is site, some cleavage products will comprise a cut half site and about 20 nucleotides long, but shorter and longer comple an intact, or uncut target site. As described herein, such prod mentary SgRNA sequences can be used as well. For example, ucts can be isolated by routine methods, and because the in some embodiments, the sgRNA comprises 5-50 nucle insert sequence, in Some aspects, is less than 100 base pairs, otides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucle 40 Such isolated cleavage products may be sequenced in a single otides, 19-21 nucleotides, e.g., 10, 11, 12, 13, 14, 15, 16, 17. read-through, allowing identification of the target site 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. sequence without reconstructing the sequence, e.g., from cut The Cas9 system has been used to modify genomes in mul half sites. tiple cell types, demonstrating its potential as a facile Any method suitable for isolation and sequencing of the genome-engineering tool. 45 repeat units can be employed to elucidate the LSR sequence In some embodiments, the nuclease comprises an unspe cleaved by the nuclease. For example, since the length of the cific nucleic acid cleavage domain. In some embodiments, the constant region is known, individual released repeat units can nuclease comprises a FokI cleavage domain. In some embodi be separated based on their size from the larger uncut library ments, the nuclease comprises a nucleic acid cleavage nucleic acid molecules as well as from fragments of library domain that cleaves a target sequence upon cleavage domain 50 nucleic acid molecules that comprise multiple repeat units dimerization. In some embodiments, the nuclease comprises (indicating non-efficient targeted cleavage by the nuclease). a binding domain that specifically binds a nucleic acid Suitable methods for separating and/or isolating nucleic acid sequence. In some embodiments, the binding domain com molecules based on their size are well-knownto those of skill prises a Zinc finger. In some embodiments, the binding in the art and include, for example, size fractionation meth domain comprises at least 2, at least 3, at least 4, or at least 5 55 ods, such as gel electrophoresis, density gradient centrifuga Zinc fingers. In some embodiments, the nuclease is a Zinc tion, and dialysis over a semi-permeable membrane with a Finger Nuclease. In some embodiments, the binding domain suitable molecular cutoff value. The separated/isolated comprises a Transcriptional Activator-Like Element. In some nucleic acid molecules can then be further characterized, for embodiments, the nuclease is a Transcriptional Activator example, by ligating PCR and/or sequencing adapters to the Like Element Nuclease (TALEN). In some embodiments, the 60 cut ends and amplifying and/or sequencing the respective nuclease is a homing endonuclease. In some embodiments, nucleic acids. Further, if the length of the constant region is the nuclease is an organic compound. In some embodiments, selected to favor self-ligation of individual released repeat the nuclease comprises an enediyne functional group. In units, such individual released repeat units may be enriched Some embodiments, the nuclease is an antibiotic. In some by contacting the nuclease treated library molecules with a embodiments, the compound is dynemicin, neocarzinostatin, 65 ligase and Subsequent amplification and/or sequencing based calicheamicin, esperamicin, bleomycin, or a derivative on the circularized nature of the self-ligated individual repeat thereof. units. US 9,163,284 B2 31 32 In some embodiments, where a nuclease is used that gen tion, and the methods described herein can be used to deter erates 5'-overhangs as a result of cutting a target nucleic acid, mine a concentration at which a given nuclease efficiently the 5'-overhangs of the cut nucleic acid molecules are filled in. cuts its intended target site, but does not efficiently cut any Methods for filling in 5'-overhangs are well known to those of off-target sequences. In some embodiments, a maximum con skill in the art and include, for example, methods using DNA centration of a therapeutic nuclease is determined at which polymerase I Klenow fragment lacking exonuclease activity the therapeutic nuclease cuts its intended nuclease target site (Klenow (3'->5' exo-)). Filling in 5'-overhangs results in the but does not cut more than 10, more than 5, more than 4, more overhang-templated extension of the recessed strand, which, than 3, more than 2, more than 1, or any additional sites. In in turn, results in blunt ends. In the case of single repeat units Some embodiments, a therapeutic nuclease is administered to released from library concatemers, the resulting structure is a 10 a Subject in an amount effective to generate a final concen blunt-ended SR-(constant region)-LS', with S' and S' tration equal or lower than the maximum concentration deter comprising blunt ends. PCR and/or sequencing adapters can mined as described above. then be added to the ends by blunt end ligation and the In Some embodiments, the library of candidate nucleic acid respective repeat units (including SR and LS' regions) can molecules used in the methods provided herein comprises at be sequenced. From the sequence data, the original LSR 15 least 10, at least 10, at least 10', at least 10', or at least region can be deduced. Blunting of the overhangs created 10' different candidate nuclease target sites. during the nuclease cleavage process also allows for distin In some embodiments, the nuclease is a therapeutic guishing between target sites that were properly cut by the nuclease which cuts a specific nuclease target site in a gene respective nuclease and target sites that were non-specifically associated with a disease. In some embodiments, the method cut, e.g., based on non-nuclease effects Such as physical further comprises determining a maximum concentration of shearing. Correctly cleaved nuclease target sites can be rec the therapeutic nuclease at which the therapeutic nuclease ognized by the existence of complementary S.R and LS cuts the specific nuclease target site and does not cut more regions, which comprise a duplication of the overhang nucle than 10, more than 5, more than 4, more than 3, more than 2, otides as a result of the overhang fill in while target sites that more than 1, or no additional sites. In some embodiments, the were not cleaved by the respective nuclease are unlikely to 25 method further comprises administering the therapeutic comprise overhang nucleotide duplications. In some embodi nuclease to a Subject in an amount effective to generate a final ments, the method comprises identifying the nuclease target concentration equal or lower than the maximum concentra site cut by the nuclease by determining the sequence of the tion. left-half site, the right-half-site, and/or the spacer sequence of Nuclease Target Site Libraries a released individual repeat unit. Any suitable method for 30 Some embodiments of this disclosure provide libraries of amplifying and/or sequencing can be used to identify the LSR nucleic acid molecules for nuclease target site profiling. In sequence of the target site cleaved by the respective nuclease. some embodiments, the candidate nucleic acid molecules of Methods for amplifying and/or sequencing nucleic acids are the library comprise the structure R—(N2)-(PAM)-(con well knownto those of skill in the art and the disclosure is not stant region)-R, wherein R and R are, independently, limited in this respect. In the case of nucleic acids released 35 nucleic acid sequences that may comprise a fragment of the from library concatemers that comprise a cut half site and an (N2)-(PAM)-(constant region) repeat unit; each N repre uncut target site (e.g., comprises at least about 1.5 repeat sents, independently, any nucleotide; Z is an integer between sequences), filling in the 5'-overhangs also provides for assur 1 and 50; and X is an integer between 2 and y. In some ance that the nucleic acid was cleaved by the nuclease. embodiments, y is at least 10", at least 10, at least 10, at least Because the nucleic acid also comprises an intact, or uncut 40 10, at least 10, at least 10, at least 107, at least 10, at least target site, the sequence of said site can be determined without 10, at least 10', at least 10', at least 10', at least 10", at having to reconstruct the sequence from a left-half site, right least 10", or at least 10'. In some embodiments, y is less half site, and/or spacer sequence. than 10°, less than 10, less than 10, less than 10, less than Some of the methods and strategies provided herein allow 10, less than 107, less than 10, less than 10, less than 10'. for the simultaneous assessment of a plurality of candidate 45 less than 10'', less than 10°, less than 10", less than 10", or target sites as possible cleavage targets for any given less than 10'. In some embodiments, Zis at least 2, at least 3, nuclease. Accordingly, the data obtained from Such methods at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, can be used to compile a list of target sites cleaved by a given at least 10, at least 11, at least 12, at least 13, at least 14, at least nuclease, which is also referred to herein as a target site 15, at least 16, at least 17, at least 18, at least 19, at least 20, profile. If a sequencing method is used that allows for the 50 at least 25, at least 30, at least 35, at least 40, at least 45, or at generation of quantitative sequencing data, it is also possible least 50. In some embodiments, Z is 20. Each N represents, to record the relative abundance of any nuclease target site independently, any nucleotide. Accordingly, a sequence pro detected to be cleaved by the respective nuclease. Target sites vided as N2 with 2 would be NN, with each N, indepen that are cleaved more efficiently by the nuclease will be dently, representing A, T, G, or C. Accordingly, N2 with 2 detected more frequently in the sequencing step, while target 55 can represent AA, AT AG, AC, TA, TT, TG, TC, GA, GT, GG, sites that are not cleaved efficiently will only rarely release an GC, CA, CT, CG, and CC. individual repeat unit from a candidate concatemer, and thus, In some embodiments, a library is provided comprising will only generate few, if any, sequencing reads. Such quan candidate nucleic acid molecules that comprise target sites titative sequencing data can be integrated into a target site with a partially randomized left-half site, a partially random profile to generate a ranked list of highly preferred and less 60 ized right-half site, and/or a partially randomized spacer preferred nuclease target sites. sequence. In some embodiments, the library is provided com The methods and strategies of nuclease target site profiling prising candidate nucleic acid molecules that comprise target provided herein can be applied to any site-specific nuclease, sites with a partially randomized left half site, a fully random including, for example, ZFNs, TALENs, homing endonu ized spacer sequence, and a partially randomized right half cleases, and RNA-programmable nucleases, such as Cas9 65 site. In some embodiments, a library is provided comprising nucleases. As described in more detail herein, nuclease speci candidate nucleic acid molecules that comprise target sites ficity typically decreases with increasing nuclease concentra with a partially or fully randomized sequence, wherein the US 9,163,284 B2 33 34 target sites comprise the structure N-(PAM), for example In some embodiments, a library of candidate nucleic acid as described herein. In some embodiments, partially random molecules is provided that comprises at least 10, at least 10. ized sites differ from the consensus site by more than 5%, at least 107, at least 10, at least 10, at least 10', at least 10', more than 10%, more than 15%, more than 20%, more than at least 10', at least 10", at least 10", or at least 10" 25%, or more than 30% on average, distributed binomially. different candidate nuclease target sites. In some embodi In some embodiments such a library comprises a plurality ments, the candidate nucleic acid molecules of the library are of nucleic acid molecules, each comprising a concatemer of a concatemers produced from a secularized templates by roll candidate nuclease target site and a constant insert sequence, ing cycle amplification. In some embodiments, the library also referred to herein as a constant region. For example, in comprises nucleic acid molecules, e.g., concatemers, of a 10 molecular weight of at least 5 kDa, at least 6 kDa, at least 7 Some embodiments, the candidate nucleic acid molecules of kDa, at least 8 kDa, at least 9 kDa, at least 10 kDa, at least 12 the library comprise the structure R-(sgRNA-complemen kDa, or at least 15 kDa. In some embodiments, the molecular tary sequence)-(PAM)-(constant region)-R, or the struc weight of the nucleic acid molecules within the library may be ture R-(LSR)-(constant region)-R, wherein the structure larger than 15 kDa. In some embodiments, the library com in square brackets ("... I') is referred to as a repeat unit or 15 prises nucleic acid molecules within a specific size range, for repeat sequence; R and R2 are, independently, nucleic acid example, within a range of 5-7 kDa, 5-10 kDa, 8-12 kDa, sequences that may comprise a fragment of the repeat unit, 10-15 kDa, or 12-15 kDa, or 5-10 kDa or any possible sub and X is an integer between 2 and y. In some embodiments, y range. While Some methods suitable for generating nucleic is at least 10", at least 10, at least 10, at least 10, at least 10, acid concatemers according to some aspects of this disclosure at least 10, at least 107, at least 10, at least 10, at least 10", result in the generation of nucleic acid molecules of greatly at least 10'', at least 10', at least 10", at least 10", or at least different molecular weights, such mixtures of nucleic acid 10'. In some embodiments, y is less than 10°, less than 10, molecules may be size fractionated to obtain a desired size less than 10, less than 10, less than 10°, less than 107, less distribution. Suitable methods for enriching nucleic acid mol than 10, less than 10, less than 10", less than 10'', less than ecules of a desired size or excluding nucleic acid molecules of 10', less than 10", less than 10", or less than 10". The 25 a desired size are well knownto those of skill in the art and the constant region, in Some embodiments, is of a length that disclosure is not limited in this respect. allows for efficient self-ligation of a single repeat unit. In In some embodiments, partially randomized sites differ Some embodiments, the constant region is of a length that from the consensus site by no more than 10%, no more than allows for efficient separation of single repeat units from 15%, no more than 20%, no more than 25%, nor more than fragments comprising two or more repeat units. In some 30 30%, no more than 40%, or no more than 50% on average, embodiments, the constant region is of a length allows for distributed binomially. For example, in some embodiments efficient sequencing of a complete repeat unit in one sequenc partially randomized sites differ from the consensus site by ing read. Suitable lengths will be apparent to those of skill in more than 5%, but by no more than 10%; by more than 10%, the art. For example, in some embodiments, the constant but by no more than 20%; by more than 20%, but by no more region is between 5 and 100 base pairs long, for example, 35 than 25%; by more than 5%, but by no more than 20%, and so about 5 base pairs, about 10 base pairs, about 15 base pairs, on. Using partially randomized nuclease target sites in the about 20 base pairs, about 25 base pairs, about 30 base pairs, library is useful to increase the concentration of library mem about 35 base pairs, about 40 base pairs, about 50 base pairs, bers comprising target sites that are closely related to the about 60 base pairs, about 70 base pairs, about 80 base pairs, consensus site, for example, that differ from the consensus about 90 base pairs, or about 100 base pairs long. In some 40 sites in only one, only two, only three, only four, or only five embodiments, the constant region is 1, 2, 3, 4, 5, 6, 7, 8, 9, 0. residues. The rationale behind this is that a given nuclease, for 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, example a given ZFN or RNA-programmable nuclease, is 29, 30, 31, 32,33,34, 35,36, 37,38, 39, 40, 41,42, 43,44, 45, likely to cut its intended target site and any closely related 46,47, 48,49, 50, 51, 52,53,54, 55,56, 57,58, 59, 60, 61, 62, target sites, but unlikely to cut a target sites that is vastly 63,64, 65,66, 67,68, 69,70, 71,72, 73,74, 75,76, 77,78,79, 45 different from or completely unrelated to the intended target or 80 base pairs long. site. Accordingly, using a library comprising partially ran An LSR site typically comprises a left-half site-spacer domized target sites can be more efficient than using libraries sequence-right-half site structure. The lengths of the half comprising fully randomized target sites without compromis size and the spacer sequence will depend on the specific ing the sensitivity in detecting any off-target cleavage events nuclease to be evaluated. In general, the half-sites will be 6-30 50 for any given nuclease. Thus, the use of partially randomized nucleotides long, and preferably 10-18 nucleotides long. For libraries significantly reduces the cost and effort required to example, each half site individually may be 6, 7, 8, 9, 10, 11, produce a library having a high likelihood of covering virtu 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, ally all off-target sites of a given nuclease. In some embodi 29, or 30 nucleotides long. In some embodiments, an LSR site ments however it may be desirable to use a fully randomized may be longer than 30 nucleotides. In some embodiments, the 55 library of target sites, for example, in embodiments, where the left half site and the right half site of an LSR are of the same specificity of a given nuclease is to be evaluated in the context length. In some embodiments, the left half site and the right of any possible site in a given genome. half site of an LSR are of different lengths. In some embodi Selection and Design of Site-Specific Nucleases ments, the left half site and the right halfsite of an LSR are of Some aspects of this disclosure provide methods and strat different sequences. In some embodiments, a library is pro 60 egies for selecting and designing site-specific nucleases that vided that comprises candidate nucleic acids which comprise allow the targeted cleavage of a single, unique sites in the LSRs that can be cleaved by a FokI cleavage domain, a Zinc context of a complex genome. In some embodiments, a Finger Nuclease (ZFN), a Transcription Activator-Like method is provided that comprises providing a plurality of Effector Nuclease (TALEN), a homing endonuclease, or an candidate nucleases that are designed or known to cut the organic compound (e.g., an enediyne antibiotic Such as dyne 65 same consensus sequence; profiling the target sites actually micin, neocarzinostatin, calicheamicin, and esperamicinl; cleaved by each candidate nuclease, thus detecting any and bleomycin). cleaved off-target sites (target sites that differ from the con US 9,163,284 B2 35 36 sensus target site); and selecting a candidate nuclease based Transcription Activator-Like Effector Nucleases (TALENs). on the off-target site(s) so identified. In some embodiments, a homing endonuclease, an organic compound nuclease, oran this method is used to select the most specific nuclease from enediyne antibiotic (e.g., dynemicin, neocarzinostatin, cali a group of candidate nucleases, for example, the nuclease that cheamicin, esperamicin, bleomycin). cleaves the consensus target site with the highest specificity, Isolated Nucleases the nuclease that cleaves the lowest number of off-target sites, Some aspects of this disclosure provide isolated site-spe the nuclease that cleaves the lowest number of off-target sites cific nucleases with enhanced specificity that are designed in the context of a target genome, or a nuclease that does not using the methods and strategies described herein. Some cleave any target site other than the consensus target site. In embodiments, of this disclosure provide nucleic acids encod Some embodiments, this method is used to select a nuclease 10 ing Such nucleases. Some embodiments of this disclosure that does not cleave any off-target site in the context of the provide expression constructs comprising Such encoding genome of a subject at concentration that is equal to or higher nucleic acids. For example, in some embodiments an isolated thana therapeutically effective concentration of the nuclease. nuclease is provided that has been engineered to cleave a The methods and reagents provided herein can be used, for desired target site within a genome, and has been evaluated example, to evaluate a plurality of different nucleases target 15 according to a method provided herein to cut less than 1, less ing the same intended targets site, for example, a plurality of than 2, less than 3, less than 4, less than 5, less than 6, less than variations of a given site-specific nuclease, for example a 7, less than 8, less than 9 or less than 10 off-target sites at a given Zinc finger nuclease. Accordingly, such methods may concentration effective for the nuclease to cut its intended be used as the selection step in evolving or designing a novel target site. In some embodiments an isolated nuclease is pro site-specific nucleases with improved specificity. vided that has been engineered to cleave a desired unique Identifying Unique Nuclease Target Sites within a Genome target site that has been selected to differ from any other site Some embodiments of this disclosure provide a method for within a genome by at least 3, at least 4, at least 5, at least 6, selecting a nuclease target site within a genome. As described at least 7, at least 8, at least 9, or at least 10 nucleotide in more detail elsewhere herein, it was Surprisingly discov residues. In some embodiments, the isolated nuclease is an ered that off target sites cleaved by a given nuclease are 25 RNA-programmable nuclease, such as a Cas9 nuclease; a typically highly similar to the consensus target site, e.g., Zinc Finger Nuclease (ZFN); or a Transcription Activator differing from the consensus target site in only one, only two, Like Effector Nuclease (TALEN), a homing endonuclease, an only three, only four, or only five nucleotide residues. Based organic compound nuclease, or an enediyne antibiotic (e.g., on this discovery, a nuclease target sites within the genome dynemicin, neocarzinostatin, calicheamicin, esperamicin, can be selected to increase the likelihood of a nuclease tar 30 bleomycin). In some embodiments, the isolated nuclease geting this site not cleaving any off target sites within the cleaves a target site within an allele that is associated with a genome. For example, in some embodiments, a method is disease or disorder. In some embodiments, the isolated provided that comprises identifying a candidate nuclease tar nuclease cleaves a target site the cleavage of which results in get site; and comparing the candidate nuclease target site to treatment or prevention of a disease or disorder. In some other sequences within the genome. Methods for comparing 35 embodiments, the disease is HIV/AIDS, or a proliferative candidate nuclease target sites to other sequences within the disease. In some embodiments, the allele is a CCR5 (for genome are well known to those of skill in the art and include treating HIV/AIDS) or a VEGFA allele (for treating a prolif for example sequence alignment methods, for example, using erative disease). a sequence alignment software or algorithm such as BLAST In some embodiments, the isolated nuclease is provided as on a general purpose computer. A suitable unique nuclease 40 part of a pharmaceutical composition. For example, some target site can then be selected based on the results of the embodiments provide pharmaceutical compositions com sequence comparison. In some embodiments, if the candidate prising a nuclease as provided herein, or a nucleic acid encod nuclease target site differs from any other sequence within the ing Such a nuclease, and a pharmaceutically acceptable genome by at least 3, at least 4, at least 5, at least 6, at least 7. excipient. Pharmaceutical compositions may optionally com at least 8, at least 9, or at least 10 nucleotides, the nuclease 45 prise one or more additional therapeutically active Sub target site is selected as a unique site within the genome, StanceS. whereas if the site does not fulfill this criteria, the site may be In some embodiments, compositions provided herein are discarded. In some embodiments, once a site is selected based administered to a Subject, for example, to a human Subject, in on the sequence comparison, as outlined above, a site-specific order to effect a targeted genomic modification within the nuclease targeting the selected site is designed. For example, 50 subject. In some embodiments, cells are obtained from the a Zinc finger nuclease may be designed to target any selected Subject and contacted with a nuclease or a nuclease-encoding nuclease target site by constructing a Zinc finger array binding nucleic acid ex vivo, and re-administered to the Subject after the target site, and conjugating the Zinc finger array to a DNA the desired genomic modification has been effected or cleavage domain. In embodiments where the DNA cleavage detected in the cells. Although the descriptions of pharma domain needs to dimerize in order to cleave DNA, to Zinc 55 ceutical compositions provided herein are principally finger arrays will be designed, each binding a half site of the directed to pharmaceutical compositions which are Suitable nuclease target site, and each conjugated to a cleavage for administration to humans, it will be understood by the domain. In some embodiments, nuclease designing and/or skilled artisan that Such compositions are generally Suitable generating is done by recombinant technology. Suitable for administration to animals of all sorts. Modification of recombinant technologies are well known to those of skill in 60 pharmaceutical compositions suitable for administration to the art, and the disclosure is not limited in this respect. humans in order to render the compositions suitable for In some embodiments, a site-specific nuclease designed or administration to various animals is well understood, and the generated according to aspects of this disclosure is isolated ordinarily skilled Veterinary pharmacologist can design and/ and/or purified. The methods and strategies for designing or perform Such modification with merely ordinary, if any, site-specific nucleases according to aspects of this disclosure 65 experimentation. Subjects to which administration of the can be applied to design or generate any site-specific pharmaceutical compositions is contemplated include, but nuclease, including, but not limited to Zinc Finger Nucleases, are not limited to, humans and/or other primates; mammals, US 9,163,284 B2 37 38 including commercially relevant mammals such as cattle, incubated with nickel-nitriloacetic acid (nickel-NTA) resin pigs, horses, sheep, cats, dogs, mice, and/or rats; and/or birds, (Qiagen) at 4°C. for 20 minto capture His-tagged Cas9. The including commercially relevant birds such as chickens, resin was transferred to a 20-mL column and washed with 20 ducks, geese, and/or turkeys. column volumes of lysis buffer. Cas9 was eluted in 20 mM Formulations of the pharmaceutical compositions Tris-HCl (pH 8), 0.1 M KC1, 20% glycerol, 1 mMTCEP and described herein may be prepared by any method known or 250 mM imidazole, and concentrated by Amicon ultra cen hereafter developed in the art of pharmacology. In general, trifugal filter (Millipore, 30-kDa molecular weight cut-off) to Such preparatory methods include the step of bringing the -50 mg/mL. The 6x His tag and maltose-binding protein active ingredient into association with an excipient and/or one were removed by TEV protease treatment at 4°C. for 20h and or more other accessory ingredients, and then, if necessary 10 captured by a second Ni-affinity purification step. The eluent, and/or desirable, shaping and/or packaging the product into a containing Cas9, was injected into a HiTrap SPFF column desired single- or multi-dose unit. (GE Healthcare) in purification buffer containing 20 mM Pharmaceutical formulations may additionally comprise a Tris-HCl (pH 8), 0.1 M KC1, 20% glycerol, and 1 mMTCEP. pharmaceutically acceptable excipient, which, as used Cas9 was eluted with purification buffer containing a linear herein, includes any and all solvents, dispersion media, dilu 15 KCl gradient from 0.1 M to 1 M over five column volumes. ents, or other liquid vehicles, dispersion or Suspension aids, The eluted Cas9 was further purified by a HiLoad Superdex Surface active agents, isotonic agents, thickening or emulsi 200 column in purification buffer, snap-frozen in liquid nitro fying agents, preservatives, Solid binders, lubricants and the gen, and stored in aliquots at -80° C. like, as Suited to the particular dosage form desired. Reming In Vitro RNA Transcription. 100 pmol CLTA(ii) v2.1 fivd ton's The Science and Practice of Pharmacy, 21 Edition, A. and v2.1 template rev were incubated at 95°C. and cooled at R. Gennaro (Lippincott, Williams & Wilkins, Baltimore, 0.1° C./s to 37° C. in NEBuffer2 (50 mM sodium chloride, 10 Md., 2006; incorporated in its entirety herein by reference) mM Tris-HCl, 10 mM magnesium chloride, 1 mM dithio discloses various excipients used in formulating pharmaceu threitol, pH 7.9) supplemented with 10 uM dNTP mix (Bio tical compositions and known techniques for the preparation Rad). 10 U of Klenow Fragment (3'->5' exo) (NEB) were thereof. See also PCT application PCT/US2010/055131, 25 added to the reaction mixture and a double-stranded CLTA(#) incorporated in its entirety herein by reference, for additional v2.1 template was obtained by overlap extension for 1 h at 37° Suitable methods, reagents, excipients and solvents for pro C. 200 nMCLTA(ii) v2.1 template alone or 100 nMCLTA(ii) ducing pharmaceutical compositions comprising a nuclease. template with 100 nMT7 promoter oligo was incubated over Except insofar as any conventional excipient medium is night at 37° C. with 0.16U/uL of T7 RNA Polymerase (NEB) incompatible with a Substance or its derivatives, such as by 30 in NEB RNAPol Buffer (40 mM Tris-HCl, pH 7.9, 6 mM producing any undesirable biological effect or otherwise magnesium chloride, 10 mM dithiothreitol, 2 mM spermi interacting in a deleterious manner with any other component dine) supplemented with 1 mMrNTP mix (1 mMrATP, 1 mM (s) of the pharmaceutical composition, its use is contemplated rCTP, 1 mM rGTP, 1 mM rUTP). In vitro transcribed RNA to be within the scope of this disclosure. was precipitated with ethanol and purified by gel electro The function and advantage of these and other embodi 35 phoresis on a Criterion 10% polyacrylamide TBE-Urea gel ments of the present invention will be more fully understood (Bio-Rad). Gel-purified sgRNA was precipitated with etha from the Examples below. The following Examples are nol and redissolved in water. intended to illustrate the benefits of the present invention and In Vitro Library Construction. 10 pmol of CLTA(i) lib to describe particular embodiments, but are not intended to oligonucleotides were separately circularized by incubation exemplify the full scope of the invention. Accordingly, it will 40 with 100 units of CircIligase II ssDNA Ligase (Epicentre) in be understood that the Examples are not meant to limit the 1x CircIligase II Reaction Buffer (33 mM Tris-acetate, 66 Scope of the invention. mM potassium acetate, 0.5 mM dithiothreitol, pH 7.5) Supplemented with 2.5 mM manganese chloride in a total EXAMPLES reaction volume of 20 uL for 16 hours at 60°C. The reaction 45 mixture was incubated for 10 minutes at 85°C. to inactivate Materials and Methods the enzyme. 5 uL (5 pmol) of the crude circular single Oligonucleotides. All oligonucleotides used in this study stranded DNA were converted into the concatemeric pre were purchased from Integrated DNA Technologies. Oligo selection libraries with the illustra TempliPhi Amplification nucleotide sequences are listed in Table 9. Kit (GE Healthcare) according to the manufacturer's proto Expression and Purification of S. pyogenes Cas9. E. coli 50 col. Concatemeric pre-selection libraries were quantified Rosetta (DE3) cells were transformed with plasmid with the Quant-it PicoGreen dsDNA Assay Kit (Invitrogen). pMJ806', encoding the S. pyogenes cas9 gene fused to an In Vitro Cleavage of on-Target and Off-Target Substrates. N-terminal 6x His-tag/maltose binding protein. The resulting Plasmid templates for PCR were constructed by ligation of expression strain was inoculated in Luria-Bertani (LB) broth annealed oligonucleotides CLTA(#) sitefwd/rev into HindIII/ containing 100 ug/mL of amplicillin and 30 ug/mL of 55 Xbal double-digested puC19 (NEB). On-target substrate chloramphenicol at 37° C. overnight. The cells were diluted DNAs were generated by PCR with the plasmid templates 1:100 into the same growth medium and grown at 37°C. to and test fivd and test rev primers, then purified with the ODoo-0.6. The culture was incubated at 18°C. for 30 min, QIAquick PCR Purification Kit (Qiagen). Off-target sub and isopropyl B-D-1-thiogalactopyranoside (IPTG) was strate DNAs were generated by primer extension. 100 pmol added at 0.2 mM to induce Cas9 expression. After-17 h, the 60 off-target (ii) fwd and off-target (ii) rev primers were incu cells were collected by centrifugation at 8,000 g and resus bated at 95°C. and cooled at 0.1° C./s to 37°C. in NEBuffer2 pended in lysis buffer (20 mM tris(hydroxymethyl)-ami (50 mM sodium chloride, 10 mM Tris-HCl, 10 mM magne nomethane (Tris)-HCl, pH 8.0, 1 MKC1, 20% glycerol, 1 mM sium chloride, 1 mM dithiothreitol, pH 7.9) supplemented tris(2-carboxyethyl)phosphine (TCEP)). The cells were lysed with 10 uM dNTP mix (Bio-Rad). 10 U of Klenow Fragment by sonication (10 sec pulse-on and 30 sec pulse-off for 10 min 65 (3'->5' exo-) (NEB) were added to the reaction mixture and total at 6 W output) and the soluble lysate was obtained by double-stranded off-target templates were obtained by over centrifugation at 20,000 g for 30 min. The cell lysate was lap extension for 1 h at 37°C. followed by enzyme inactiva US 9,163,284 B2 39 40 tion for 20 min at 75° C., then purified with the QIAquick ~70%, based on the fraction of GFP-positive cells observed PCR Purification Kit (Qiagen). 200 nM substrate DNAs were by fluorescence microscopy. 48 h after transfection, cells incubated with 100 nM Cas9 and 100 nM (v1.0 or v2.1) were washed with phosphate buffered saline (PBS), pelleted sgRNA or 1000 nM Cas9 and 1000 nM (v1.0 or v2.1) sgRNA and frozen at -80° C. Genomic DNA was isolated from 200 in Cas9 cleavage buffer (200 mM HEPES, pH 7.5, 1.5 M uL cell lysate using the DNeasy Blood and Tissue Kit potassium chloride, 100 mM magnesium chloride, 1 mM (Qiagen) according to the manufacturer's protocol. EDTA, 5 mM dithiothreitol) for 10 min at 37° C. On-target cleavage reactions were purified with the QIAquick PCR Off-Target Site Sequence Determination. 100 ng genomic Purification Kit (Qiagen), and off-target cleavage reactions DNA isolated from cells treated with Cas9 expression plas were purified with the QIAquick Nucleotide Removal Kit 10 mid and single-strand RNA expression plasmid (treated cells) (Qiagen) before electrophoresis in a Criterion 5% polyacry or Cas9 expression plasmid alone (control cells) were ampli lamide TBE gel (Bio-Rad). fied by PCR with 10 s 72° C. extension for 35 cycles with In Vitro Selection. 200 nM concatemeric pre-selection primers CLTA(#)-(#)-(ii) fwd and CLTA(#)-(#)-(ii) rev and libraries were incubated with 100 nM Cas 9 and 100 nM Phusion Hot Start Flex DNA Polymerase (NEB) in Buffer GC sgRNA or 1000 nM Cas9 and 1000 nM sgRNA in Cas9 15 (NEB), supplemented with 3% DMSO. Relative amounts of cleavage buffer (200 mM HEPES, pH 7.5, 1.5 M potassium crude PCR products were quantified by gel, and Cas9-treated chloride, 100 mM magnesium chloride, 1 mM EDTA, 5 mM (control) and Cas9:sgRNA-treated PCRs were separately dithiothreitol) for 10 min at 37° C. Pre-selection libraries pooled in equimolar concentrations before purification with were also separately incubated with 2U of BspMI restriction the QIAquick PCR Purification Kit (Qiagen). Purified DNA endonuclease (NEB) in NEBuffer 3 (100 mM NaCl, 50 mM was amplified by PCR with primers PE1-barcodei and PE2 Tris-HCl, 10 mM MgCl2, 1 mM dithiothreitol, pH 7.9) for 1 barcodeii for 7 cycles with Phusion Hot Start Flex DNA h at 37° C. Blunt-ended post-selection library members or Polymerase (NEB) in Buffer HF (NEB). Amplified control sticky-ended pre-selection library members were purified and treated DNA pools were purified with the QIAquick PCR with the QIAQuick PCR Purification Kit (Qiagen) and ligated Purification Kit (Qiagen), followed by purification with to 10 pmol adapter1/2(AACA) (Cas9:v2.1 sgRNA, 100 nM), 25 adapter1/2(TTCA) (Cas9:v2.1 sgRNA, 1000 nM), Agencourt AMPure XP (Beckman Coulter). Purified control adapter1/2 (Cas9:v2.1 sgRNA, 1000 nM), or lib adapter1/ and treated DNAs were quantified with the KAPA Library CLTA(ii) lib adapter 2 (pre-selection) with 1,000 U of T4 Quantification Kit-Illumina (KAPA Biosystems), pooled in a DNA Ligase (NEB) in NEBT4 DNA Ligase Reaction Buffer 1:1 ratio, and Subjected to paired-end sequencing on an Illu (50 mM Tris-HCl, pH 7.5, 10 mM magnesium chloride, 1 mM 30 mina MiSeq. ATP, 10 mM dithiothreitol) overnight (>10 h) at room tem Statistical Analysis. Statistical analysis was performed as perature. Adapter-ligated DNA was purified with the previously described. P-values in Table 1 and Table 6 were QIAquick PCR Purification Kit and PCR-amplified for 10-13 calculated for a one-sided Fisher exact test. cycles with Phusion Hot Start Flex DNA Polymerase (NEB) Algorithms in Buffer HF (NEB) and primers CLTA(ii)sel PCR/PE2 short 35 (post-selection) or CLTA(ii) lib seqPCR/lib fivd PCR (pre All scripts were written in C++. Algorithms used in this selection). Amplified DNAs were gel purified, quantified study are as previous reported (reference) with modification. with the KAPA Library Quantification Kit-Illumina (KAPA Sequence binning. 1) designate sequence pairs starting Biosystems), and Subjected to single-read sequencing on an with the barcode “AACA” or “TTCA' as post-selection Illumina MiSeq or Rapid Run single-read sequencing on an 40 library members. 2) for post-selection library members (with Illumina HiSeq 2500 (Harvard University FAS Center for illustrated example): Systems Biology Core facility, Cambridge, Mass.). Example Read: Selection Analysis. Pre-selection and post-selection sequencing data were analyzed as previously described', with modification (Algorithms) using scripts written in C++. 45 (SEQ ID NO: 42) Raw sequence data is not shown; see Table 2 for a curated AACACATGGGTCGACACAAACACAACTCGGCAGGTACTTGCAGATGTAGT summary. Specificity scores were calculated with the formu lae: positive specificity score (frequency of base pair at posi CTTTCCACATGGGTCGACACAAACACAACTCGGCAGGTATCTCGTATGCC tion post-selection-frequency of base pair at position pre i) search both paired reads for the positions, pos1 and pos2, of selection)/(1-frequency of base pair at position pre 50 the constant sequence “CTCGGCAGGT (SEQ ID selection) and negative specificity Score (frequency of base NO:43). ii) keep only sequences that have identical pair at positionpost-selection-frequency of base pair at sequences between the barcode and pos1 and preceding position pre-selection)/(frequency of base pair at position pos2.iii) keep the region between the two instances of the pre-selection). Normalization for sequence logos was per constant sequence (the region between the barcode and formed as previously described’. 55 pos1 contains a cut half-site; the region that is between the Cellular Cleavage Assays. HEK293T cells were split at a two instances of the constant sequence contains a full site) density of 0.8x10 per well (6-well plate) before transcription Example: and maintained in Dulbecco's modified eagle medium (DMEM) supplemented with 10% fetal bovine serum (FBS) in a 37° C. humidified incubator with 5% CO. After 1 day, 60 (SEQ ID NO: 44) cells were transiently transfected using Lipofectamine 2000 ACTTGCAGATGTAGTCTTTCCACATGGGTCGACACAAACACAA (Invitrogen) following the manufacturer's protocols. HEK293T cells were transfected at 70% confluency in each ii) search the sequence for a selection barcode (“ well of 6-well plate with 1.0 g of the Cas9 expression plas TGTGTTTGTGTT (SEQ ID NO:45) for CLTA1, “ mid (Cas9-HA-2xNLS-GFP-NLS) and 2.5 lug of the single 65 AGAAGAAGAAGA (SEQ ID NO:46) for CLTA2, “ strand RNA expression plasmid pSiliencer-CLTA (version TTCTCTTTCTCT (SEQ ID NO:47) for CLTA3, “ 1.0 or 2.1). The transfection efficiencies were estimated to be ACACAAACACAA (SEQID NO:48) for CLTA4) US 9,163,284 B2 41 42 Example: 2) search all sequence reads for the flanking sequences to identify the potential off-target site (the sequence between the flanking sequences) (SEQ ID NO: 49) ACTTGCAGATGTAGTCTTTCCACATGGGTCGACACAAACACAA Example Potential Off-target Site:

CLTA4 (SEO ID NO : 55) iii) the sequence before the barcode is the full post-selection CCCTGGAAACACACATCGC library member (first four and last four nucleotides are 3) if the potential off-target site contains indels (length is less fully randomized flanking sequence) 10 than 23), keep sequence as potential off-target site if all Example: corresponding FASTQ quality characters for the sequence are 2 or higher in ASCII code (Phred quality score >=30) Example Potential Off-target Site Length=22 (SEQ ID NO: 15 example corresponding FASTQ quality characters: ACTT GCAGATGTAGTCTTTCCACATGG GTCG HHGHHHHHHHHHHHHHHHHHHH 4) bin and manually inspect all sequences that pass steps 2 and iv) parse the quality scores for the positions corresponding to 3 and keep sequences as potential modified sequences if the 23 nucleotide post-selection library member they have at least one deletion involving position 16, 17, or Example Read: 18 (of 20 counting from the non-PAM end) of if they have an insertion between position 17 and 18, consistent with the most frequent modifications observed for the intended (SEQ ID NO: 51) target site (FIG. 3) AACACATGGGTCGACACAAACACAACTCGGCAGGTACTTGCAGATGTAGT Example Potential Off-target Site (Reverse Complement, CTTTCCACATGGGTCGACACAAACACAACTCGGCAGGTATCTCGTATGCC 25 with Positions Labeled) with Reference Sequence:

CCCFFFFFHHHHH.JJJJJJJJJJJJJJJJJJJJJGIJJJJIJIJJJIIIH 11111111112222 IIJJJHHHGHAEFCDDDDDDDDDDDDDDDDDDDDDDDPCDDEDD non-PAM end 123456789 O123456789 O123 PAM end aDCCCD 30 GCAGATGTAGTGTTTC-ACAGGG (SEO ID NO. 56) V) keep sequences only if the corresponding quality score GCAGATGTAGTGTTTCCACAGGG (SEO ID NO : 57) string (underlined) FASTQ quality characters for the 4) repeat steps 1-3 for read2 and keep only if the sequence is sequence are ?' or higher in ASCII code (Phred quality the same score >=30) 35 5) compare overall counts in Cas9+sgRNA treated sample to NHEJ Sequence Calling Cas9 alone sample to identify modified sites Filter Based on Cleavage Site (for Post-selection Example Read: Sequences) 1) tabulate the cleavage site locations across the recogni (SEQ ID NO: 52) 40 tion site by identifying the first position in the full CAATCTCCCGCATGCGCTCAGTCCTCATCTCCCTCAAGCAGGCCCCGCTG sequenced recognition site (between the two constant sequences) that is identical to the first position in the GTGCACTGAAGAGCCACCCTGTGAAACACACACGCAATATCTTAATC sequencing read after the barcode (before the first con CTACTCAGTGAAGCTCTTCACAGTCATTGGATTAATTATGTTGAGTTCTT stant sequence). 45 2) after tabulation, repeat step 1, keeping only sequences TTGGACCAAACC with cleavage site locations that are presentinat least 5% of the sequencing reads. Example Quality Scores: Results Broad Off-target DNA Cleavage Profiling Reveals RNA-pro 50 grammed Cas9 Nuclease Specificity. CCBCCFFFFCCCGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHGGGGGGGG Sequence-specific endonucleases including zinc-finger GHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHH nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) have become important tools to modify HHHHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH genes in induced pluripotent stem cells (iPSCs),' in multi 55 cellular organisms." and in ex vivo gene therapy clinical HHHGHFHHHHHF trials. 'Although ZFNs and TALENs have proved effective 1) identify the 20 base pairs flanking both sides of 20 base pair for such genetic manipulation, a new ZFN or TALEN protein target site--three base pair PAM for each target site must be generated for each DNA target site. In contrast, the Example Flanking Sequences: RNA-guided Cas9 endonuclease uses RNA:DNA hybridiza 60 tion to determine target DNA cleavage sites, enabling a single monomeric protein to cleave, in principle, any sequence (SEQ ID NO: 53) specified by the guide RNA.'' GCTGGTGCACTGAAGAGCCA Previous studies'''' demonstrated that Cas9 mediates genome editing at sites complementary to a 20-nucleotide (SEQ ID NO: 54) 65 sequence in a bound guide RNA. In addition, target sites must AATATCTTAATCCTACT CAG include a protospacer adjacent motif (PAM) at the 3' end adjacent to the 20-nucleotide target site; for Streptococcus US 9,163,284 B2 43 44 pyogenes Cas9, the PAM sequence is NGG. Cas9-mediated mize the impact of errors during DNA amplification or DNA cleavage specificity both in vitro and in cells has been sequencing, only sequences with two identical copies of the inferred previously based on assays against Small collections repeated cut half-site were analyzed. of potential single-mutation off-target sites. These studies Pre-selection libraries were incubated under enzyme-lim Suggested that perfect complementarity between guide RNA iting conditions (200 nM target site library, 100 nM Cas9: and target DNA is required in the 7-12 base pairs adjacent to sgRNA v2.1) or enzyme-saturating conditions (200 nM target the PAM end of the target site (3' end of the guide RNA) and site library, 1000 nM Cas9:sgRNA v2.1) for each of the four mismatches are tolerated at the non-PAM end (5' end of the guide RNAs targets tested (CLTA1, CLTA2, CLTA3, and guide RNA). 12, 17-9 CLTA4) (FIGS. 3C and 3D). A second guide RNA construct, Although Such a limited number of nucleotides specifying 10 Cas9:guide RNA target recognition would predict multiple sgRNA v1.0, which is less active than sgRNA v2.1, was sites of DNA cleavage in genomes of moderate to large size assayed under enzyme-saturating conditions alone for each of (>~10 bp). Cas9:guide RNA complexes have been success the four guide RNA targets tested (200 nM target site library, fully used to modify both cells' ' ' and organisms." A 1000 nM Cas9:sgRNA v1.0). The two guide RNA constructs study using Cas9:guide RNA complexes to modify Zebrafish 15 differ in their length (FIG. 3) and in their DNA cleavage embryos observed toxicity at a rate similar to that of ZFNs activity level under the selection conditions, consistent with and TALENs.'" A recent, broad study of the specificity of previous reports' (FIG. 4). Both pre-selection and post-se DNA binding (transcriptional repression) in E. coli of a cata lection libraries were characterized by high-throughput DNA lytically inactive Cas9 mutant using high-throughput sequencing and computational analysis. As expected, library sequencing found no detectable off-target transcriptional members with fewer mutations were significantly enriched in repression in the relatively small E. coli transcriptome.' post-selection libraries relative to pre-selection libraries While these studies have substantially advanced our basic (FIG. 5). understanding of Cas9, a systematic and comprehensive pro Pre- and Post-Selection Library Composition. The pre file of Cas9:guide RNA-mediated DNA cleavage specificity selection libraries for CLTA1, CLTA2, CLTA3, and CLTA4 generated from measurements of Cas9 cleavage on a large 25 had observed mean mutation rates of 4.82 (n=1,129,593), number of related mutant target sites has not been described. 5.06 (n=847,618), 4.66 (n=692,997), and 5.00 (n=951,503) Such aspecificity profile is needed to understand and improve mutations per 22-base pair target site, including the two-base the potential of Cas9:guide RNA complexes as research tools pair PAM, respectively. The post-selection libraries treated and future therapeutic agents. under enzyme-limiting conditions with Cas9 plus CLTA1, We modified our previously published in vitro selection, 30 adapted to process the blunt-ended cleavage products pro CLTA2, CLTA3, or CLTA4 v.2.1 sgRNAs contained means of duced by Cas9 compared to the overhang-containing prod 1.14 (n=1,206,268), 1.21 (n=668,312), 0.91 (n=1,138,568), ucts of ZFN cleavage, to determine the off-target DNA cleav and 1.82 (n=560,758) mutations per 22-base pair target site. age profiles of Cas9:single guide RNA (sgRNA)'' Under enzyme-excess conditions, the mean number of muta complexes. Each selection experiment used DNA substrate 35 tions among sequences Surviving selection increased to 1.61 libraries containing ~10" sequences, a size sufficiently large (n=640,391), 1.86 (n=399,560), 1.46 (n=936,414), and 2.24 to include ten-fold coverage of all sequences with eight or (n=506,179) mutations per 22-base pair target site, respec fewer mutations relative to each 22-base pair target sequence tively, for CLTA1, CLTA2, CLTA3, or CLTA4 v2.1 sgRNAs. (including the two-base pair PAM) (FIG.1). We used partially These results reveal that the selection significantly enriched randomized nucleotide mixtures at all 22 target-site base 40 library members with fewer mutations for all Cas9:sgRNA pairs to create a binomially distributed library of mutant complexes tested, and that enzyme-excess conditions target sites with an expected mean of 4.62 mutations per resulted in the putative cleavage of more highly mutated target site. In addition, target site library members were library members compared with enzyme-limiting conditions flanked by four fully randomized base pairs on each side to (FIG. 5). test for specificity patterns beyond those imposed by the 45 We calculated specificity scores to quantify the enrichment canonical 20-base pair target site and PAM. level of each base pair at each position in the post-selection Pre-selection libraries of 10' individual potential off-tar library relative to the pre-selection library, normalized to the get sites were generated for each of four different target maximum possible enrichment of that base pair. Positive sequences in the human clathrin light chain A (CLTA) gene specificity Scores indicate base pairs that were enriched in the (FIG. 3). Synthetic 5'-phosphorylated 53-base oligonucle 50 post-selection library and negative specificity Scores indicate otides were self-ligated into circular single-stranded DNA in base pairs that were de-enriched in the post-selection library. vitro, then converted into concatemeric 53-base pair repeats For example, a score of +0.5 indicates that a base pair is through rolling-circle amplification. The resulting pre-selec enriched to 50% of the maximum enrichment value, while a tion libraries were incubated with their corresponding Cas9: score of -0.5 indicates that a base pair is de-enriched to 50% sgRNA complexes. Cleaved library members containing free 55 of the maximum de-enrichment value. 5' phosphates were separated from intact library members In addition to the two base pairs specified by the PAM, all through the 5' phosphate-dependent ligation of non-phospho 20 base pairs targeted by the guide RNA were enriched in the rylated double-stranded sequencing adapters. The ligation sequences from the CLTA 1 and CLTA2 selections (FIG. 2, tagged post-selection libraries were amplified by PCR. The FIGS. 6 and 9, and Table 2). For the CLTA3 and CLTA4 PCR step generated a mixture of post-selection DNA frag 60 selections (FIGS. 7 and 8, and Table 2), guide RNA-specified ments containing 0.5, 1.5, or 2.5, etc. repeats of library mem base pairs were enriched at all positions except for the two bers cleaved by Cas9, resulting from amplification of an most distal base pairs from the PAM (5' end of the guide adapter-ligated cut half-site with or without one or more RNA), respectively. At these non-specified positions farthest adjacent corresponding full sites (FIG. 1). Post-selection from the PAM, at least two of the three alternate base pairs library members with 1.5 target-sequence repeats were iso 65 were nearly as enriched as the specified base pair. Our finding lated by gel purification and analyzed by high-throughput that the entire 20 base-pair target site and two base pair PAM sequencing. In a final computational selection step to mini can contribute to Cas9:sgRNADNA cleavage specificity con US 9,163,284 B2 45 46 trasts with the results from previous single-substrate assays each target site, normalized to the same Sum for the most Suggesting that only 7-12 base pairs and two base pair PAM highly specified position (FIG. 18-20). Under both enzyme are specified.' ' ' ' limiting and enzyme-excess conditions, the PAM end of the All single-mutant pre-selection (ne 14,569) and post-selec target site is highly specified. Under enzyme-limiting condi tion library members (nd 103,660) were computationally ana tions, the PAM end of the molecule is almost absolutely lyzed to provide a selection enrichment value for every pos specified (specificity score a+0.9 for guide RNA-specified sible single-mutant sequence. The results of this analysis base pairs) by CLTA1, CTLA2, and CLTA3 guide RNAs (FIG. 2 and FIGS. 6 and 8) show that when only single (FIG. 2 and FIG. 6-9), and highly specified by CLTA4 guide mutant sequences are considered, the six to eight base pairs RNA (specificity score of +0.7 to +0.9). Within this region of closest to the PAM are generally highly specified and the 10 high specificity, specific single mutations, consistent with non-PAM end is poorly specified under enzyme-limiting con wobble pairing between the guide RNA and target DNA, that ditions, consistent with previous findings.' ' '7' Under are tolerated. For example, under enzyme-limiting conditions enzyme-saturating conditions, however, single mutations for single-mutant sequences, a dA:dT off-target base pair and even in the six to eight base pairs most proximal to the PAM a guide RNA-specified dO:dC base pair are equally tolerated are tolerated, Suggesting that the high specificity at the PAM 15 at position 17 out of 20 (relative to the non-PAM end of the end of the DNA target site can be compromised when enzyme target site) of the CLTA3 target site. At this position, anrG:dT concentrations are high relative to substrate (FIG. 2). The wobble RNA:DNA base pair may be formed, with minimal observation of high specificity against single mutations close apparent loss of cleavage activity. to the PAM only applies to sequences with a single mutation Importantly, the selection results also reveal that the choice and the selection results do not Support a model in which any of guide RNA hairpin affects specificity. The shorter, less combination of mutations is tolerated in the region of the active sgRNA v1.0 constructs are more specific than the target site farthest from the PAM (FIG. 10-15). Analyses of longer, more-active sgRNA v2.1 constructs when assayed pre- and post-selection library composition are described under identical, enzyme-saturating conditions that reflect an elsewhere herein, position-dependent specificity patterns are excess of enzyme relative to Substrate in a cellular context illustrated in FIGS. 18-20, PAM nucleotide specificity is 25 (FIG.2 and FIGS. 5-8). The higher specificity of sgRNA v1.0 illustrated in FIGS. 21-24, and more detailed effects of Cas9: oversgRNA v2.1 is greater for CLTA1 and CLTA2 (-40-90% sgRNA concentration on specificity are described in FIG. 2G difference) than for CLTA3 and CLTA4 (<40% difference). and FIG.25). Interestingly, this specificity difference is localized to differ Specificity at the Non-PAM End of the Target Site. To ent regions of the target site for each target sequence (FIGS. assess the ability of Cas9:v2.1 sgRNA under enzyme-excess 30 2H and 26). Collectively, these results indicate that different conditions to tolerate multiple mutations distal to the PAM, guide RNA architectures result in different DNA cleavage we calculated maximum specificity scores at each position specificities, and that guide RNA-dependent changes in for sequences that contained mutations only in the region of specificity do not affect all positions in the target site equally. one to 12 base pairs at the end of the target site most distal Given the inverse relationship between Cas9:sgRNA concen from the PAM (FIG. 10-17). 35 tration and specificity described above, we speculate that the The results of this analysis show no selection (maximum differences in specificity between guide RNA architectures specificity Score ~0) against sequences with up to three muta arises from differences in their overall level of DNA-cleavage tions, depending on the target site, at the end of the molecule activities. farthest from the PAM when the rest of the sequence contains Effects of Cas9:sgRNA Concentration on DNA Cleavage no mutations. For example, when only the three base pairs 40 Specificity. To assess the effect of enzyme concentration on farthest from the PAM are allowed to vary (indicated by dark patterns of specificity for the four target sites tested, we cal bars in FIG. 11C) in the CLTA2 target site, the maximum culated the concentration-dependent difference in positional specificity scores at each of the three variable positions are specificity and compared it to the maximal possible change in close to Zero, indicating that there was no selection for any of positional specificity (FIG. 25). In general, specificity was the four possible base pairs at each of the three variable 45 higher under enzyme-limiting conditions than enzyme-ex positions. However, when the eight base pairs farthest from cess conditions. A change from enzyme-excess to enzyme the PAM are allowed to vary (FIG. 11H), the maximum limiting conditions generally increased the specificity at the specificity scores at positions 4-8 are all greater than +0.4. PAM end of the target by >80% of the maximum possible indicating that the Cas9:SgRNA has a sequence preference at change in specificity. Although a decrease in enzyme concen these positions even when the rest of the substrate contains 50 tration generally induces Small (~30%) increases in specific preferred, on-target base pairs. ity at the end of the target sites farthest from the PAM, con We also calculated the distribution of mutations (FIG. centration decreases induce much larger increases in 15-17), in both pre-selection and v2.1 sgRNA-treated post specificity at the end of the target site nearest the PAM. For selection libraries under enzyme-excess conditions, when CLTA4, a decrease in enzyme concentration is accompanied only the first 1-12 base pairs of the target site are allowed to 55 by a small (-30%) decrease in specificity at some base pairs vary. There is significant overlap between the pre-selection near the end of the target site farthest from the PAM. and post-selection libraries for only a subset of the data (FIG. Specificity of PAM Nucleotides. To assess the contribution 15-17. a-c), demonstrating minimal to no selection in the of the PAM to specificity, we calculated the abundance of all post-selection library for sequences with mutations only in 16 possible PAM dinucleotides in the pre-selection and post the first three base pairs of the target site. These results col 60 selection libraries, considering all observed post-selection lectively show that Cas9:sgRNA can tolerate a small number target site sequences (FIG. 21) or considering only post of mutations (-one to three) at the end of the sequence farthest selection target site sequences that contained no mutations in from the PAM when provided with maximal sgRNA:DNA the 20 base pairs specified by the guide RNA (FIG. 22). interactions in the rest of the target site. Considering all observed post-selection target site sequences, Specificity at the PAM End of the Target Site. We plotted 65 under enzyme-limiting conditions, GG dinucleotides repre positional specificity as the Sum of the magnitudes of the sented 99.8%, 99.9%, 99.8%, and 98.5% of the post-selection specificity scores for all four base pairs at each position of PAM dinucleotides for selections with CLTA1, CLTA2, US 9,163,284 B2 47 48 CLTA3, and CLTA4 v2.1 sgRNAs, respectively. In contrast, CLTA4-3-1, CLTA4-3-3, and CLTA4-4-8) (Tables 1 and 6-8). under enzyme-excess conditions, GG dinucleotides repre The CLTA4 target site was modified by Cas9:CLTA4 v2.1 sented 97.7%, 98.3%,95.7%, and 87.0% of the post-selection sgRNA at a frequency of 76%, while off-target sites, CLTA4 PAM dinucleotides for selections with CLTA1, CLTA2, 3-1 CLTA4-3-3, and CLTA4-4-8, were modified at frequen CLTA3, and CLTA4 v2.1 sgRNAs, respectively. These data cies of 24%, 0.47% and 0.73%, respectively. The CLTA1 demonstrate that an increase in enzyme concentration leads to target site was modified by Cas9:CLTA 1 v2.1 sgRNA at a increased cleavage of Substrates containing non-canonical frequency of 0.34%, while off-target sites, CLTA 1-1-1 and PAM dinucleotides. CLTA 1-2-2, were modified at frequencies of 0.09% and To account for the pre-selection library distribution of 0.16%, respectively. PAM dinucleotides, we calculated specificity scores for the 10 Under enzyme-saturating conditions with the v2.1 sgRNA, PAM dinucleotides (FIG. 23). When only on-target post the two verified CLTA 1 off-target sites, CLTA 1-1-1 and selection sequences are considered under enzyme-excess CLTA 1-2-2, were two of the three most highly enriched conditions (FIG. 24), non-canonical PAM dinucleotides with sequences identified in the in vitro selection. CLTA4-3-1 and a single G rather than two Gs are relatively tolerated. Under CLTA4-3-3 were the highest and third-highest enriched enzyme-excess conditions, Cas9:CLTA4sgRNA 2.1 exhib 15 sequences of the seven CLTA4 three-mutation sequences ited the highest tolerance of non-canonical PAM dinucle enriched in the in vitro selection that are also present in the otides of all the Cas9:sgRNA combinations tested. AG and genome. The in vitro selection enrichment values of the four GA dinucleotides were the most tolerated, followed by GT, mutation sequences were not calculated, since 12 out of the TG, and CG PAM dinucleotides. In selections with Cas9: 14CLTA4 sequences in the genome containing four muta CLTA1, 2, or 3 sgRNA 2.1 under enzyme-excess conditions, tions, including CLTA4-4-8, were observed at a level of only AG was the predominate non-canonical PAM (FIGS. 23 and one sequence count in the post-selection library. Taken 24). Our results are consistent with another recent study of together, these results confirm that several of the off-target PAM specificity, which shows that Cas9:sgRNA can recog substrates identified in the in vitro selection that are present in nize AG PAM dinucleotides. In addition, our results show the human genome are indeed cleaved by Cas9:sgRNA com that under enzyme-limiting conditions, GG PAM dinucle 25 plexes in human cells, and also suggest that the most highly otides are highly specified, and under enzyme-excess condi enriched genomic off-target sequences in the selection are tions, non-canonical PAM dinucleotides containing a single modified in cells to the greatest extent. G can be tolerated, depending on the guide RNA context. The off-target sites we identified in cells were among the To confirm that the in vitro selection results accurately most-highly enriched in our in vitro selection and contain up reflect the cleavage behavior of Cas9 in vitro, we performed 30 to four mutations relative to the intended target sites. While it discrete cleavage assays of six CLTA4 off-target Substrates is possible that heterochromatin or covalent DNA modifica containing one to three mutations in the target site. We cal tions could diminish the ability of a Cas9:guide RNA com culated enrichment values for all sequences in the post-selec plex to access genomic off-target sites in cells, the identifica tion libraries for the Cas9:CLTA4 v2.1 sgRNA under tion of five out of 49 tested cellular off-target sites in this enzyme-saturating conditions by dividing the abundance of 35 study, rather than Zero or many, strongly suggests that Cas9 each sequence in the post-selection library by the calculated mediated DNA cleavage is not limited to specific targeting of abundance in the pre-selection library. Under enzyme-satu only a 7-12-base pair target sequence, as Suggested in recent rating conditions, the single one, two, and three mutation studies.' ' ' sequences with the highest enrichment values (27.5, 43.9, and The cellular genome modification data are also consistent 95.9) were cleaved to >71% completion (FIG. 27). A two 40 with the increase in specificity of sgRNA v1.0 compared to mutation sequence with an enrichment value of 1.0 was sgRNA v2.1 sgRNAs observed in the in vitro selection data cleaved to 35%, and a two-mutation sequence with an enrich and discrete assays. Although the CLTA 1-2-2, CLTA 4-3-3, ment value near Zero (0.064) was not cleaved. The three and CLTA 4-4-8 sites were modified by the Cas9-sgRNA v2.1 mutation sequence, which was cleaved to 77% by CLTA4 complexes, no evidence of modification at any of these three v2.1 sgRNA, was cleaved to a lower efficiency of 53% by 45 sites was detected in Cas9:sgRNA v1.0-treated cells. The CLTA4 v1.0 sgRNA (FIG. 28). These results indicate that the CLTA4-3-1 site, which was modified at 32% of the frequency selection enrichment values of individual sequences are pre of on-target CLTA4 site modification in Cas9:v2.1 sgRNA dictive of in vitro cleavage efficiencies. treated cells, was modified at only 0.5% of the on-target To determine if results of the in vitro selection and in vitro modification frequency in V1.0 sgRNA-treated cells, repre cleavage assays pertain to Cas9:guide RNA activity in human 50 senting a 62-fold change in selectivity. Taken together, these cells, we identified 51 off-target sites (19 for CLTA 1 and 32 results demonstrate that guide RNA architecture can have a for CLTA4) containing up to eight mutations that were both significant influence on Cas9 specificity in cells. Our speci enriched in the in vitro selection and present in the human ficity profiling findings present an important caveat to recent genome (Tables 3-5). We expressed Cas9:CLTA 1 sgRNA and ongoing efforts to improve the overall DNA modification v1.0, Cas9:CLTA1 sgRNA v2.1, Cas9:CLTA4sgRNA v1.0, 55 activity of Cas9:guide RNA complexes through guide RNA Cas9:CLTA4 sgRNA v2.1, or Cas9 without sgRNA in engineering.' ' HEK293T cells by transient transfection and used genomic Overall, the off-target DNA cleavage profiling of Cas9 and PCR and high-throughput DNA sequencing to look for evi Subsequent analyses show that (i) Cas9:guide RNA recogni dence of Cas9:sgRNA modification at 46 of the 51 off-target tion extends to 18-20 specified target site base pairs and a sites as well as at the on-target loci; no specific amplified 60 two-base pair PAM for the four target sites tested; (ii) increas DNA was obtained for five of the 51 predicted off-target sites ing Cas9:guide RNA concentrations can decrease DNA (three for CLTA1 and two for CLTA4). cleaving specificity in vitro; (iii) using more active sgRNA Deep sequencing of genomic DNA isolated from architectures can increase DNA-cleavage specificity both in HEK293T cells treated with Cas9:CLTA1 sgRNA or Cas9: vitro and in cells but impair DNA-cleavage specificity both in CLTA4sgRNA identified sequences evident of non-homolo 65 vitro and in cells; and (iv) as predicted by our in vitro results, gous end-joining (NHEJ) at the on-target sites and at five of Cas9:guide RNA can modify off-target sites in cells with up the 49 tested off-target sites (CLTA 1-1-1, CLTA 1-2-2, to four mutations relative to the on-target site. Our findings US 9,163,284 B2 49 50 provide key insights to our understanding of RNA-pro unit of the lower limit of the range, unless the context clearly grammed Cas9 specificity, and reveal a previously unknown dictates otherwise. It is also to be understood that unless role for sgRNA architecture in DNA-cleavage specificity. The otherwise indicated or otherwise evident from the context principles revealed in this study may also apply to Cas9-based and/or the understanding of one of ordinary skill in the art, effectors engineered to mediate functions beyond DNA 5 values expressed as ranges can assume any Subrange within cleavage. the given range, wherein the endpoints of the Subrange are expressed to the same degree of accuracy as the tenth of the Equivalents and Scope unit of the lower limit of the range. In addition, it is to be understood that any particular Those skilled in the art will recognize, or be able to ascer 10 embodiment of the present invention may be explicitly tain using no more than routine experimentation, many excluded from any one or more of the claims. Where ranges equivalents to the specific embodiments of the invention are given, any value within the range may explicitly be described herein. The scope of the present invention is not excluded from any one or more of the claims. Any embodi intended to be limited to the above description, but rather is as ment, element, feature, application, or aspect of the compo set forth in the appended claims. 15 sitions and/or methods of the invention, can be excluded from In the claims articles such as “a,” “an, and “the may mean any one or more claims. For purposes of brevity, all of the one or more than one unless indicated to the contrary or embodiments in which one or more elements, features, pur otherwise evident from the context. Claims or descriptions poses, or aspects is excluded are not set forth explicitly that include 'or' between one or more members of a group are herein. considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to Tables a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes Table 1. Cellular modification induced by Cas9:CLTA4 embodiments in which exactly one member of the group is sgRNA. present in, employed in, or otherwise relevant to a given 25 33 human genomic DNA sequences were identified that product or process. The invention also includes embodiments were enriched in the Cas9:CLTA4 v2.1 sgRNA in vitro selec in which more than one, or all of the group members are tions under enzyme-limiting or enzyme-saturating condi present in, employed in, or otherwise relevant to a given tions. Sites shown with underline contain insertions or dele product or process. tions (indels) that are consistent with significant Cas9: Furthermore, it is to be understood that the invention 30 sgRNA-mediated modification in HEK293T cells. In vitro encompasses all variations, combinations, and permutations enrichment values for selections with Cas9:CLTA4 v1.0 in which one or more limitations, elements, clauses, descrip sgRNA or Cas9:CLTA4 v2.1 sgRNA are shown for sequences tive terms, etc., from one or more of the claims or from with three or fewer mutations. Enrichment values were not relevant portions of the description is introduced into another calculated for sequences with four or more mutations due to claim. For example, any claim that is dependent on another 35 low numbers of in vitro selection sequence counts. Modifi claim can be modified to include one or more limitations cation frequencies (number of sequences with indels divided found in any other claim that is dependent on the same base by total number of sequences) in HEK293T cells treated with claim. Furthermore, where the claims recite a composition, it Cas9 without sgRNA (“no sgRNA), Cas9 with CLTA4 v1.0 is to be understood that methods of using the composition for sgRNA, or Cas9 with CLTA4 v2.1 sgRNA. P-values are listed any of the purposes disclosed herein are included, and meth 40 for those sites that show significant modification in v1.0 ods of making the composition according to any of the meth sgRNA- or v2.1 sgRNA-treated cells compared to cells ods of making disclosed herein or other methods known in the treated with Cas9 without sgRNA. “Not tested (n.t.)” indi art are included, unless otherwise indicated or unless it would cates that PCR of the genomic sequence failed to provide be evident to one of ordinary skill in the art that a contradic specific amplification products. tion or inconsistency would arise. 45 Table 2: Raw selection sequence counts. Positions -4 to -1 Where elements are presented as lists, e.g., in Markush are the four nucleotides preceding the 20-base pair target site. group format, it is to be understood that each Subgroup of the PAM1, PAM2, and PAM3 are the PAM positions immediately elements is also disclosed, and any element(s) can be following the target site. Positions+4 to +7 are the four nucle removed from the group. It is also noted that the term “com otides immediately following the PAM. prising is intended to be open and permits the inclusion of 50 Table 3: CITA1 genomic off-target sequences. 20 human additional elements or steps. It should be understood that, in genomic DNA sequences were identified that were enriched general, where the invention, or aspects of the invention, in the Cas9:CLTA 1 v2.1 sgRNA in vitro selections under is/are referred to as comprising particular elements, features, enzyme-limiting or enzyme-excess conditions. “m” refers to steps, etc., certain embodiments of the invention or aspects of number of mutations from on-target sequence with mutations the invention consist, or consistessentially of Such elements, 55 shown in lower case. Sites shown with underline contain features, steps, etc. For purposes of simplicity those embodi insertions or deletions (indels) that are consistent with sig ments have not been specifically set forth in haec verba nificant Cas9:sgRNA-mediated modification in HEK293T herein. Thus for each embodiment of the invention that com cells. Human genome coordinates are shown for each site prises one or more elements, features, steps, etc., the inven (assembly GRCh37). CLTA 1-0-1 is present at two loci, and tion also provides embodiments that consist or consist essen 60 sequence counts were pooled from both loci. Sequence tially of those elements, features, steps, etc. counts are shown for amplified and sequenced DNA for each Where ranges are given, endpoints are included. Further site from HEK293T cells treated with Cas9 without sgRNA more, it is to be understood that unless otherwise indicated or (“no sgRNA), Cas9 with CLTA1 v1.0 sgRNA, or Cas9 with otherwise evident from the context and/or the understanding CLTA1 v2.1 sgRNA. of one of ordinary skill in the art, values that are expressed as 65 Table 4: CLTA4 genomic off-target sequences. 33 human ranges can assume any specific value within the stated ranges genomic DNA sequences were identified that were enriched in different embodiments of the invention, to the tenth of the in the Cas9:CLTA4 v2.1 sgRNA in vitro selections under US 9,163,284 B2 51 52 enzyme-limiting or enzyme-excess conditions. “m” refers to indicates that the site was not tested or PCR of the genomic number of mutations from on-target sequence with mutations sequence failed to provide specific amplification products. shown in lower case. Sites shown with underline contain Table 7: CLTA1 genomic off-target indel sequences. Inser tion and deletion-containing sequences from cells treated insertions or deletions (indels) that are consistent with sig with amplified and sequenced DNA for the on-target genomic nificant Cas9:sgRNA-mediated modification in HEK293T sequence (CLTA 1-0-1) and each modified off-target site from cells. Human genome coordinates are shown for each site HEK293T cells treated with Cas9 without sgRNA (“no (assembly GRCh37). Sequence counts are shown for ampli sgRNA), Cas9 with CLTA1 v1.0 sgRNA, or Cas9 with fied and sequenced DNA for each site from HEK293T cells CLTA 1 V2.1 sgRNA. “ref refers to the human genome ref treated with Cas9 without sgRNA (“no sgRNA), Cas9 with erence sequence for each site, and the modified sites are listed CLTA4 v1.0 sgRNA, or Cas9 with CLTA4 v2.1 sgRNA. 10 below. Mutations relative to the on-target genomic sequence Table 5: genomic coordinates of CLTA 1 and CLTA4 off are shown in lowercase letters. Insertions and deletions are target sites. 54 human genomic DNA sequences were identi shown in underlined bold letters or dashes, respectively. fied that were enriched in the Cas9:CLTA 1 v2.1 sgRNA and Modification percentages are shown for those conditions Cas9:CLTA4 v2.1 sgRNA in vitro selections under enzyme (v1.0 sgRNA or v2.1 sgRNA) that show statistically signifi 15 cant enrichment of modified sequences compared to the con limiting or enzyme-excess conditions. Human genome coor trol (no sgRNA). dinates are shown for each site (assembly GRCh37). Table 8: CLTA4 genomic off-target indel sequences. Inser Table 6: Cellular modification induced by Cas9:CLTA1 tion and deletion-containing sequences from cells treated sgRNA. 20 human genomic DNA sequences were identified with amplified and sequenced DNA for the on-target genomic that were enriched in the Cas9:CLTA 1 v2.1 sgRNA in vitro sequence (CLTA4-0-1) and each modified off-target site from selections under enzyme-limiting or enzyme-excess condi HEK293T cells treated with Cas9 without sgRNA (“no tions. Sites shown with underline contain insertions or dele sgRNA), Cas9 with CLTA4 v1.0 sgRNA, or Cas9 with tions (indels) that are consistent with significant Cas9: CLTA4 v2.1 sgRNA. “ref refers to the human genome ref sgRNA-mediated modification in HEK293T cells. In vitro erence sequence for each site, and the modified sites are listed enrichment values for selections with Cas9:CLTA1 v 1.0 25 below. Mutations relative to the on-target genomic sequence sgRNA or Cas9:CLTA 1 V2.1 sgRNA are shown for sequences are shown in lowercase letters. Insertions and deletions are with three or fewer mutations. Enrichment values were not shown in underlined bold letters or dashes, respectively. calculated for sequences with four or more mutations due to Modification percentages are shown for those conditions low numbers of in vitro selection sequence counts. Modifi (v1.0 sgRNA or v2.1 sgRNA) that show statistically signifi cation frequencies (number of sequences with indels divided 30 cant enrichment of modified sequences compared to the con by total number of sequences) in HEK293T cells treated with trol (no sgRNA). Cas9 without sgRNA (“no sgRNA), Cas9 with CLTA1 v1.0 Table 9: Oligonucleotides used in this study. All oligo sgRNA, or Cas9 with CLTA 1 v2.1 sgRNA. P-values of sites nucleotides were purchased from Integrated DNA Technolo that show significant modification in v1.0 sgRNA- or v2.1 gies. An asterisk (*) indicates that the preceding nucleotide sgRNA-treated cells compared to cells treated with Cas9 35 was incorporated as a hand mix of phosphoramidites consist without sgRNA were 1.1E-05 (v1.0) and 6.9E-55 (v2.1) for ing of 79 mol% of the phosphoramidite corresponding to the CLTA1-0-1, 2.6E-03 (v1.0) and 2.0E-10 (v2.1) for CLTA1 preceding nucleotide and 4 mol % of each of the other three 1-1, and 4.6E-08 (v2.1) for CLTA 1-2-2. P-values were calcu canonical phosphoramidites. “/SPhos/ denotes a 5" phos lated using a one-sided Fisher exact test. “Not tested (n.t.)” phate group installed during synthesis. TABLE 1.

in vitro of enrichment

Mutations sequence SEQ ID NO. gene W1. O W2.1

CLTA4 - O - 1 O GCAGATGTAGTGTTTCCACAGGG SEQ NO: 58 CLTA 20 7. E.

CLTA4 - 3 - 1 3 aCAtATGTAGTaTTTCCACAGGG SEQ NO : 59 le B 12 B

CLTA4 - 3-2 3 GCAtATGTAGTGTTTCCAaATGt SEO NO : 60 2.99 697

CLTA4-3-3 3 cCAGATGTAGTa TTCCCACAGGG SEQ NO: 61 CELF1 1 O9. a 92

CLTA4 - 3 - 4 3 GCAGtTtTAGTGTTTt CACAGGG SEO NO : 62 BCO738Of O.79 3.12

CLTA4 - 3 - 5 3 GCAGAgtTAGTGTTTCCACACaG SEQ NO : 63 MPPED2 O 1.22

CLTA4 - 3-6 3 GCAGATGgAGgGTTTtCACAGGG SEQ NO : 64 DCHS2 1.57 1.17

CLTA4 - 3 - 7 3 GgAaATtTAGTGTTTCCACAGGG SEQ NO : 65 O. 43 O. 42

CLTA4 - 4 - 1 4. aaAGAaGTAGTaTTTCCACATGG SEQ NO: 66

CLTA4 - 4-2 4. aaAGATGTAGTcaTTCCACAAGG SEQ NO : 67

CLTA4 - 4-3 4. aaAtATGTAGTcTTTCCACAGGG SEQ NO : 68

CLTA4 - 4 - 4 4. atAGATGTAGTGTTTCCAaAGGa SEQ NO : 69 NR1H4

CLTA4 - 4 - 5 4. cCAGAgGTAGTGcTcCCACAGGG SEQ NO : 70

US 9,163,284 B2 69 70 TABL

no sgRNA v1. O sgRNA v2.1 s gRNA

modified total modified total modified total m sequence sequences sequences sequences sequences sequences sequences

A1 - O - 1 OAGTCCTCATCTCCCTCAAGCAGG 2 58889 18 42 683 178 52845 (SEQ ID NO: 91)

A1-1-1 1AGTCCTCAaCTCCCTCAAGCAGG 1. 39.804 9 29 OOO 37 4 O588 (SEQ ID NO: 92)

A1-2-1 2 AGCCCTCATtTCCCTCAAGCAGG O 16276 O 15 O32 O 18277 (SEO ID NO: 93)

A1-2-2 2AcTCCTCATCCCCCTCAAGCCGG 3 21.267 1. 2OO42 33 22579 (SEQ ID NO: 94)

A1-2-3 2 AGTCaTCATCTCCCTCAAGCAGa O O O O O O (SEO ID NO: 95)

A1-3 - 1 3 COSTCCTCCTCTCCCoCAAGCAGG 2 539 O1 O 42.194 O 522O5 (SEQ ID NO: 96)

A1-3-2 3 to TCCTCitTCTCCCTCAAGCAGa O 14890 O 14231 O 15937 (SEO ID NO: 97)

CA1-4-1 4 AagctTCATCTCtCTCAAGCTGG O 495.79 2 31413 O 41234 (SEO ID NO: 98)

CA1-4-2 4AGTaCTCtTtTCCCTCAgQCTGG 2 3 OO13 1. 23 470 4. 26.999 (SEO ID NO: 99)

A1 - 4 - 3 4 AGTCtTaAatTCCCTCAAGCAGG 2 63792 O 52321 1. 73 OOf (SEQ ID NO: 1.OO)

CA1- 4- 4 4 AGTg CTCATCTaCCagAAGCTGG 1. 12585 O 11339 O 12O66 (SEQ ID NO: 101)

CA1-4-5 4 ccTCCTCATCTCCCTgcAGCAGG 4. 3 O 568 1. 23810 O 2787 O (SEQ ID NO: 102)

A1 - 4 - 6 4 ctaCatCATCTCCCTCAAGCTGG O 132OO 1. 12886 2 1284.3 (SEQ ID NO: 103)

CA1- 4-7 4 gCTCCTCATCTCCCTaAAaCAGa 1. 8697 3 8188 O 8783 (SEQ ID NO O4)

CA1- 4-8 4 to TCCTCATCdgCCTCAgQCAGG O 13169 O 88 OS 2 1283 O (SEQ ID NO O5)

CA1-5-1 5AGaCacCATCTCCCTtgAGCTGG O 46109 1. 325.15 2 35567 (SEQ ID NO: 106)

CA1-5-2 5AGgCaTCATCTaCaTCAAG-tTGG O 4128O O 28896 O 351.52 (SEO ID NO : 107)

A1-5-3 5AGTaaTCActTCCaTCAAGCCGG O O O O O O (SEQ ID NO: 108)

A1-5-4 5 to c(CTCAcCTCCCTaaAGCAGG 2 24169 5 17512 1. 23 483 (SEQ ID NO: 109)

A1-5-5 is to TCitTitATtTCCCTCtAGCTGG O 11527 O 10481 1. 11027 (SEQ ID NO: 110)

A1 - 6-1 6AGTCCTCATCTCCCTCAAGCAGG O 6537 O 565.4 O 6741 (SEQ ID NO: 111) US 9,163,284 B2 71 72 TABLE 4

no scRNA V1. O scRNA V2. 1 scRNA

modified total modified total modified total m sequence sequences sequences sequences sequences sequences sequences

A4 - O - 1 O GCAGATGTAGTGTTTCCACAGGG 6 29.191 2005 1864 O 14970 9661 (SEQ ID NO: 2)

A4 - 3 - 1 3 acAtATGTAGTaTTTCCACAGGG 2 341.65 ill 2 OO18 3874 6082 (SEQ ID NO: 3)

A4-3-2 3 GCAtATGTAGTGTTTCCAaATGt 3 17923 O 11688 2 388 O (SEQ ID NO: 4)

A4-3-3 3 cAGATGTAGTaTTcCCACAGGG O 16559 0. 12 OOf 52. 1082 (SEQ ID NO: 5)

A4 - 3 - 4 3 GCAGtTtTAGTGTTT't CACAGGG O 21722 O 12831 O 57.26 (SEQ ID NO: 6)

A4 - 3 - 5 3 GCAGAgtTAGTGTTTCCACACag 1. 21222 2 13555 3 6.425 (SEQ ID NO: 7)

A4 - 3 - 6 3 GCAGATGgAGgGTTTtcACAGGG 3 2O342 3 12804 3 4O68 (SEQ ID NO: 8)

A4 - 3 - 7 3 GgAaATtTAGTGTTTCCACAGGG 2 38 894 3 24 O17 1. 29347 (SEQ ID NO: 9)

A4 - 4 - 1 4 a.a.AGAaGTAGTaTTTCCACATGG O O O O O O (SEQ ID NO: 12O)

A4 - 4-2 4 a.a.AGATGTAGTcaTTCCACAAGG 1. 27326 O 17365 1. 18941 (SEQ ID NO: 121)

A4 - 4-3 4 a.a.AtaTGTAGTcTTTCCACAGGG 2 46.232 3 32264 O 32638 (SEQ ID NO: 122)

A4 - 4 - 4 4 atAGATGTAGTGTTTCCAaAGGa. 9 27821 1. 16223 8 15388 (SEQ ID NO: 123)

A4 - 4 - 5 4 ccAGAgGTAGTGcTcCCACAGGG 1. 20979 1. 15674 1. 15086 (SEQ ID NO: 124)

A4 - 4 - 6 4 ccAGATGTgagGTTTCCACAAGG 4. 22 O21 O 15691 1. 14253 (SEQ ID NO: 125)

A4 - 4 - 7 4 ctacATGTAGTGTTTCCAtATGG 2 35942 O 23. Of 6 1. 11867 (SEQ ID NO: 126)

A4 - 4 - 8 4 ctAGATGaAGTGcTTCCACATGG 1. 1O 692 l 7609 59 8077 (SEO ID NO : 127) : :

A4 - 4 - 9 4 GaAaATGgAGTGTTTaCACATGG O 34 616 O 223 O2 1. 24 671 (SEQ ID NO: 128)

A4 - 4 - 10 4 GCAaATGaAGTGT caGCACAAGG 1. 25210 O 161.87 O 16974 (SEQ ID NO: 129)

A4 - 4 - 11 4 GCAaATGTAt TaTTTCCACtAGG O 34144 1. 2477O O 22547 (SEQ ID NO: 130)

CA4 - 4-12 4 GCAGATGTAGctTTTgtACATGG O 14254 O 961.6 O 99.94 (SEQ ID NO: 131)

A4 - 4 - 13 4 GCAGcTtaAGTGTTTt CACATGG 8 39 466 1. 7609 5 16525 (SEQ ID NO: 132)

A4 - 4 - 14 4ttAcATGTAGTGTTTaCACACGG O O O 223 O2 O O (SEQ ID NO: 133)

A4 - 5 - 5 GaAGAgGaAGTGTTTgCoCAGGG 1. 27 616 1. 16319 1. 1614 O (SEQ ID NO: 134)

A4 - 5-2 5 GaAGATGTgGagTTgaCACATGG 1. 22533 O 14292 O 15O13 (SEQ ID NO: 135)

A4 - 5-3 5 GCAGAagTAcTGTTgttACAAGG 1. 44.243 1. 29391 1. 29734 (SEQ ID NO: 136)