USOO938843OB2

(12) United States Patent (10) Patent No.: US 9,388,430 B2 Liu et al. (45) Date of Patent: Jul. 12, 2016

(54) CAS9-RECOMBINASE FUSION 8,361,725 B2 1/2013 Russell et al. AND USES THEREOF 8.492,082 B2 7/2013 De Franciscis et al. 8,546,553 B2 10/2013 Terns et al. 8,569.256 B2 10/2013 Heyes et al. (71) Applicant: President and Fellows of Harvard 8,680,069 B2 3/2014 de Fougerolles et al. College, Cambridge, MA (US) 8,691,750 B2 4/2014 Constien et al. 8,697.359 B1 4/2014 Zhang (72) Inventors: David R. Liu, Lexington, MA (US); 8,709,466 B2 4/2014 Coady et al. John Paul Guilinger, Ridgway, CO 8,728,526 B2 5, 2014 Heller 8,748,667 B2 6, 2014 Budzik et al. (US); David B. Thompson, Cambridge, 8,758,810 B2 6, 2014 Okada et al. MA (US) 8,759,103 B2 6, 2014 Kim et al. 8,759,104 B2 6/2014 Unciti-Broceta et al. (73) Assignee: President and Fellows of Harvard 8,771,728 B2 7/2014 Huang et al. College, Cambridge, MA (US) 8,790,664 B2 7/2014 Pitard et al. 8,846,578 B2 9/2014 McCray et al. 8,993.233 B2 * 3/2015 Zhang ...... C12N 15,85 (*) Notice: Subject to any disclaimer, the term of this 424/94.1 patent is extended or adjusted under 35 8,999,641 B2 * 4/2015 Zhang ...... C12N 15.63 U.S.C. 154(b) by 0 days. 424/94.1 9,068,179 B1 6, 2015 Liu et al. 9,163,284 B2 10/2015 Liu et al. (21) Appl. No.: 14/320,467 2006, OO88864 A1 4/2006 Smoke et al. 2008/O124725 A1 5/2008 Barrangou et al. (22) Filed: Jun. 30, 2014 2008. O1822.54 A1 7/2008 Hall et al. 2009,0130718 A1 5, 2009 Short (65) Prior Publication Data 2009, 0234109 A1 9, 2009 Han et al. US 2015/OO71898A1 Mar. 12, 2015 (Continued) Related U.S. Application Data FOREIGN PATENT DOCUMENTS (60) Provisional application No. 61/980,315, filed on Apr. AU 2012244264 11, 2012 16, 2014, provisional application No. 61/915,414, AU 2015252O23 A1 11 2015 filed on Dec. 12, 2013, provisional application No. (Continued) 61/874,609, filed on Sep. 6, 2013. OTHER PUBLICATIONS (51) Int. Cl. International Search Report and Written Opinion for PCT/US2014/ CI2N 9/22 (2006.01) 052231, mailed Dec. 4, 2014. CI2N 15/63 (2006.01) International Search Report and Written Opinion for PCT/US2014/ CI2N 15/90 (2006.01) 050283, mailed Nov. 6, 2014. CI2N 9/12 (2006.01) NCBI Reference Sequence: NM 002427.3. Wu et al., May 3, 2014. A61 K 47/00 (2006.01) 5 pages. A61 K38/00 (2006.01) Barrangou, RNA-mediated programmable DNA cleavage. Nat Biotechnol. Sep. 2012:30(9):836-8. doi:10.1038/nbt.2357. (52) U.S. Cl. Carroll, A CRISPR approach to gene targeting. Mol Ther. Sep. CPC ...... CI2N 15/907 (2013.01); C12N 9/1241 2012:20(9): 1658-60.doi:10.1038/mt.2012.171. (2013.01); C12N 9/22 (2013.01); A61 K38/00 Fuchs et al., Polyarginine as a multifunctional fusion tag. Sci. (2013.01); A61 K47/00 (2013.01); C07K Jun. 2005;14(6): 1538-44. 2319/80 (2013.01) (Continued) (58) Field of Classification Search CPC ...... C12N 9/22; C12N 15/63 See application file for complete search history. Primary Examiner — Maryam Monshipouri (74) Attorney, Agent, or Firm — Wolf, Greenfield & Sacks, (56) References Cited P.C. U.S. PATENT DOCUMENTS (57) ABSTRACT 5,780,053 A 7/1998 Ashley et al. Some aspects of this disclosure provide compositions, meth 6,057,153 A 5/2000 George et al. ods, and kits for improving the specificity of RNA-program 6,453,242 B1 9/2002 Eisenberg et al. mable endonucleases, such as Cas9. Also provided are vari 6,503,717 B2 1/2003 Case et al. ants of Cas9, e.g., Cas9 dimers and fusion proteins, 6,534,261 B1 3/2003 Cox, III et al. 6,599,692 B1 7/2003 Case et al. engineered to have improved specificity for cleaving nucleic 6,607,882 B1 8/2003 Cox, III et al. acid targets. Also provided are compositions, methods, and 6,824,978 B1 1 1/2004 Cox, III et al. kits for site-specific recombination, using Cas9 fusion pro 6,933,113 B2 8, 2005 Case et al. teins (e.g., nuclease-inactivated Cas9 fused to a recombinase 6,979,539 B2 12/2005 Cox, III et al. 7,013,219 B2 3, 2006 Case et al. catalytic domain). Such Cas9 variants are useful in clinical 7,163,824 B2 1/2007 Cox, III et al. and research settings involving site-specific modification of 7,479,573 B2 1/2009 Chu et al. DNA, for example, genomic modifications. 7,794,931 B2 9, 2010 Breaker et al. 7.919,277 B2 4/2011 Russell et al. 7 Claims, 35 Drawing Sheets US 9,388.430 B2 Page 2

(56) References Cited 2015.0218573 A1 8/2015 Loque et al. 2015,0225773 A1 8, 2015 Farmer et al. U.S. PATENT DOCUMENTS 2015,0252358 A1 9/2015 Maeder et al.

2010/0076057 A1 3/2010 Sontheimer et al. FOREIGN PATENT DOCUMENTS 2010/0093617 A1 4/2010 Barrangou et al. 2010, 0104690 A1 4/2010 Barrangou et al. 2010/0316643 A1 12/2010 Eckert et al. s 58: Al '3. 2011/005916.0 A1 3/2011 Essner et al. CN 103388006 A 11, 2013 2011 0104787 A1 5, 2011 Church et al. CN 103614415 A 3, 2014 2011/O189776 A1 8, 2011 Terns et al. CN 103.642836 A 3, 2014 2011/0217739 A1 9, 2011 Terns et al. CN 103668472 A 3, 2014 2012fO141523 A1 6/2012 Castado et al. CN 103820441 A 5, 2014 2012/0270273 Al 10/2012 Zhang et al. CN 103820454. A 5, 2014 2013/01 17869 A1 5/2013 Duchateau et al. CN 10391 1376. A T 2014 2013,0130248 A1 5, 2013 Haurwitz et al. CN 103923.911 A T 2014 2013,01582.45 A1 6, 2013 Russell et al. CN 104004778 A 8, 2014 2013,0165389 A1 6/2013 Schellenberger et al. CN 104.109687. A 10/2014 2013/0344117 A1 12/2013 Mirosevich et al. CN 104178461 A 12/2014 2014,0005269 A1 1/2014 Ngwuluka et al. CN 104342457 2, 2015 2014, OO17214 A1 1/2014 Cost CN 104404036. A 3, 2015 2014, OO18404 A1 1/2014 Chen et al. CN 10445O774 A 3, 2015 2014/0044793 A1 2/2014 Goll et al. CN 104480.144. A 4/2015 2014f0065711 A1 3/2014 Liu et al. CN 104498493 A 4/2015 2014.0068797 A1 3/2014 Doudna et al. CN 104504304 A 4/2015 2014/O127752 A1 5, 2014 Zhou et al. CN 104531704 A 4/2015 2014/O141094 A1 5/2014 Smyth et al. CN 104531705. A 4/2015 2014/0141487 A1 5, 2014 Feldman et al. CN 104560864. A 4/2015 2014/O186843 A1 7/2014 Zhang et al. CN 104593418 A 5/2O15 2014/O186958 A1 7/2014 Zhang et al. CN 104593422 A 5/2O15 2014/0234289 A1 8, 2014 Liu et al. CN 104611370 A 5/2O15 2014/0273O37 A1 9, 2014 Wu CN 104651392 A 5/2O15 2014/0273226 A1 9, 2014 Wu CN 104651398 A 5/2O15 2014/0273230 A1 9, 2014 Chen et al. CN 104651399. A 5/2O15 2014/0295556 Al 10/2014 Joung et al. CN 104651401 A 5/2O15 2014/0295557 A1 10/2014 Joung et al. CN 104673816. A 6, 2015 2014/034245.6 A1 11/2014 Mali et al. CN 104726449 A 6, 2015 2014/0342457 A1 11/2014 Mali et al. CN 104726494. A 6, 2015 2014/0342458 A1 11/2014 Mali et al. CN 104745626 A 7/2015 2014/03494.00 A1 11/2014 Jakimo et al. CN 104762321 A 7/2015 2014/0356867 A1 12, 2014 Peter et al. CN 104805078. A 7/2015 2014/0356956 All 12/2014 Church et al. CN 104805099. A 7/2015 2014/0356958 A1 12, 2014 Mali et al. CN 104805118 A 7/2015 2014/0356959 A1 12, 2014 Church et al. CN 104846010 A 8, 2015 2014/0357523 A1 12/2014 Zeiner et al. CN 104894068 A 9, 2015 2014/0377868 Al 12/2014 Joung et al. CN 104894075 A 9, 2015 2015, 0010526 A1 1/2015 Liu et al. CN 104928321 A 9, 2015 2015,003.1089 A1 1/2015 Lindstrom CN 105039339 A 11, 2015 2015,003 1132 A1 1/2015 Church et al. CN 105039399 A 11, 2015 2015,003 1133 A1 1/2015 Church et al. CN 105063.061 A 11, 2015 2015,0044191 A1 2/2015 Liu et al. CN 105087620 A 11, 2015 2015,0044192 A1 2/2015 Liu et al. CN 105112422 A 12/2015 2015,0044772 A1 2/2015 Zhao CN 105112445 A 12/2015 2015,0050699 A1 2/2015 Siksnys et al. CN 105112519 A 12/2015 2015.0056177 A1 2/2015 Liu et al. CN 105132427 A 12/2015 2015.0056629 A1 2/2015 Guthrie-Honea CN 105132451 A 12/2015 2015,0064138 A1 3/2015 Liu et al. WO WO 2006/OO2547 A1 1, 2006 2015. OO64789 A1 3/2015 Paschon et al. WO WO 2006/042112 A2 4/2006 2015,007 1899 A1 3/2015 Liu et al. WO WO 2007/O2SO97 A2 3, 2007 2015, 0071900 A1 3/2015 Liu et al. WO WO 2007.136815 A2 11/2007 2015, 0071901 A1 3/2015 Liu et al. WO WO 2010/054108 A2 5, 2010 2015, 0071902 A1 3/2015 Liu et al. WO WO 2010/0541.54 A2 5, 2010 2015, 0071903 A1 3/2015 Liu et al. WO WO 2010/068289 A2 6, 2010 2015, 0071906 A1 3/2015 Liu et al. WO WO 2010.075424 A2 T 2010 2015, 007968O A1 3/2015 Bradley et al. WO WO 2010.102257 A2 9, 2010 2015,0098.954 A1 4/2015 Hyde et al. WO WO 2010, 129019 A2 11/2010 2015, 0118216 A1 4/2015 Liu et al. WO WO 2010, 132092 A2 11/2010 2015, 0132269 A1 5, 2015 Orkin et al. WO WO 2010/144150 A2 12/2010 2015, 0140664 A1 5/2015 Byrne et al. WO WO 2011/O17293 A2 2, 2011 2015. O159172 A1 6, 2015 Miller et al. WO WO 2011/053868 A1 5, 2011 2015,0165054 A1 6, 2015 Liu et al. WO WO 2011/O53982 A2 5, 2011 2015, 0166980 A1 6, 2015 Liu et al. WO WO 2011, 109031 A1 9, 2011 2015, 0166981 A1 6, 2015 Liu et al. WO WO 2012/O54726 A1 4/2012 2015, 0166982 A1 6, 2015 Liu et al. WO WO 2012/065043 A2 5, 2012 2015, 0166984 A1 6, 2015 Liu et al. WO WO 2012/138927 A2 10/2012 2015, 0166985 A1 6, 2015 Liu et al. WO WO 2012/158985 A2 11/2012 2015,019 1744 A1 7, 2015 Wolfe et al. WO WO 2012, 158986 A2 11/2012 2015,0197759 A1 7, 2015 Xu et al. WO WO 2012, 164565 A1 12/2012 2015,0211058 A1 7, 2015 Carstens et al. WO WO 2013/O12674 A1 1, 2013 US 9,388.430 B2 Page 3

(56) References Cited WO WO 2014/093852 A1 6, 2014 WO WO 2014,096972 A2 6, 2014 FOREIGN PATENT DOCUMENTS WO WO 2014,099.744 A1 6, 2014 WO WO 2014,099.750 A2 6, 2014 WO WO 2013/O13105 A2 1/2013 WO WO 2014f104878 A1 T 2014 WO WO 2013,066438 A2 5, 2013 WO WO 2014/113493 A1 T 2014 WO WO 2013,098244 A1 T 2013 WO WO 2014, 124226 A1 8, 2014 WO WO 2013/126794 A1 8, 2013 WO WO 2014/127287 A1 8, 2014 WO WO 2013, 130824 A1 9, 2013 WO WO 2014, 128324 A1 8, 2014 WO WO 2013, 141680 A1 9, 2013 WO WO 2014f1286.59 A1 8, 2014 WO WO 2013/142378 A9 9, 2013 WO WO 2014f130955 A1 8, 2014 WO WO 2013/142578 A2 9, 2013 WO WO 2014f131833 A1 9, 2014 WO WO 2013/16O230 A1 10, 2013 WO WO 2014, 143381 A1 9, 2014 WO WO 2013, 166315 A1 11, 2013 WO WO 2014, 144094 A1 9, 2014 WO WO 2013, 169398 A2 11/2013 WO WO 2014f1441:55 A1 9, 2014 WO WO 2013/1698O2 A1 11, 2013 WO WO 2014, 144288 A1 9, 2014 WO WO 2013, 176772 A2 11/2013 WO WO 2014f144592 A2 9, 2014 WO WO 2013, 176915 A1 11, 2013 WO WO 2014, 144761 A2 9, 2014 WO WO 2013, 1769 16 A1 11, 2013 WO WO 2014f145599 A2 9, 2014 WO WO 2013, 181440 A1 12/2013 WO WO 2014, 150624 A1 9, 2014 WO WO 2013, 186754 A2 12/2013 WO WO 2014/152432 A2 9, 2014 WO WO 2013, 188037 A2 12/2013 WO WO 2014, 153470 A2 9, 2014 WO WO 2013, 188522 A2 12/2013 WO WO 2014, 161821 A1 10, 2014 WO WO 2013, 188638 A2 12/2013 WO WO 2014, 164466 A1 10, 2014 WO WO 2013, 192278 A1 12/2013 WO WO 2014, 165177 A1 10, 2014 WO WO 2014/005042 A2 1, 2014 WO WO 2014, 165349 A1 10/2014 WO WO 2014/O11237 A1 1, 2014 WO WO 2014, 165612 A2 10/2014 WO WO 2014/011901 A2 1, 2014 WO WO 2014, 165825 A2 10/2014 WO WO 2014/O18423 A2 1, 2014 WO WO 2014, 172458 A1 10/2014 WO WO 2014/020608 A1 2/2014 WO WO 2014f172470 A 10, 2014 WO WO 2014/022120 A1 2, 2014 WO WO 2014, 172489 A2 10/2014 WO WO 2014/0227 O2 A2 2, 2014 WO WO 2014, 182700 A1 11, 2014 WO WO 2014/036219 A2 3, 2014 WO WO 2014, 183071 A2 11/2014 WO WO 2014/0395.13 A2 3, 2014 WO WO 2014, 184143 A1 11, 2014 WO WO 2014/O39523 A1 3, 2014 WO WO 2014, 184741 A1 11, 2014 WO WO 2014/039684 A1 3, 2014 WO WO 2014, 184744 A1 11, 2014 WO WO 2014/039692 A2 3, 2014 WO WO 2014, 186585 A2 11/2014 WO WO 2014/O397O2 A2 3, 2014 WO WO 2014, 186686 A2 11/2014 WO WO 2014/039872 A1 3, 2014 WO WO 2014, 190181 A1 11, 2014 WO WO 2014/O3997O A1 3, 2014 WO WO 2014, 191128 A1 12/2014 WO WO 2014/041327 A1 3, 2014 WO WO 2014, 191518 A1 12/2014 WO WO 2014/043143 A1 3, 2014 WO WO 2014, 191521 A2 12/2014 WO WO 2014/0471.03 A2 3, 2014 WO WO 2014, 191525 A1 12/2014 WO WO 2014/059173 A2 4/2014 WO WO 2014, 191527 A1 12/2014 WO WO 2014/059255 A1 4/2014 WO WO 2014, 194190 A1 12/2014 WO WO 2014/065596 A1 5, 2014 WO WO 2014, 197568 A2 12/2014 WO WO 2014/066505 A1 5, 2014 WO WO 2014, 197748 A2 12/2014 WO WO 2014/068.346 A2 5, 2014 WO WO 2014, 1993.58 A1 12/2014 WO WO 2014/070887 A1 5, 2014 WO WO 2014, 200659 A1 12/2014 WO WO 2014/071006 A1 5, 2014 WO WO 2014, 201015 A2 12/2014 WO WO 2014/071219 A1 5, 2014 WO WO 2014,204578 A1 12/2014 WO WO 2014/071235 A1 5, 2014 WO WO 2014, 204723 A1 12/2014 WO WO 2014/072941 A1 5, 2014 WO WO 2014, 204724 A1 12/2014 WO WO 2014/081729 A1 5, 2014 WO WO 2014, 204725 A1 12/2014 WO WO 2014/081730 A1 5, 2014 WO WO 2014, 20472.6 A1 12/2014 WO WO 2014/081855 A1 5, 2014 WO WO 2014, 204727 A1 12/2014 WO WO 2014/082644 A1 6, 2014 WO WO 2014, 204728 A1 12/2014 WO WO 2014/085261 A1 6, 2014 WO WO 2014, 204729 A1 12/2014 WO WO 2014/085593 A1 6, 2014 WO WO 2015,002780 A1 1/2015 WO WO 2014/085830 A2 6, 2014 WO WO 2015,004241 A2 1/2015 WO WO 2014/0892.12 A1 6, 2014 WO WO 2015.006290 A1 1/2015 WO WO 2014/08929.0 A1 6, 2014 WO WO 2015.006294 A2 1/2015 WO WO 2014/089348 A1 6, 2014 WO WO 2015, OO6498A2 1/2015 WO WO 2014/089513 A1 6, 2014 WO WO 2015, OO6747 A2 1/2015 WO WO 2014/089533 A2 6, 2014 WO WO 2015.01.0114 A1 1/2015 WO WO 2014/089541 A2 6, 2014 WO WO 2015/O13583 A2 1/2015 WO WO 2014/093479 A1 6, 2014 WO WO 2015,01786.6 A1 2/2015 WO WO 2014/093595 A1 6, 2014 WO WO 2015,018503 A1 2/2015 WO WO 2014/093622 A2 6, 2014 WO WO 2015,021353 A1 2/2015 WO WO 2014/093635 A1 6, 2014 WO WO 2015,021426 A1 2/2015 WO WO 2014/093655 A2 6, 2014 WO WO 2015/021990 A1 2/2015 WO WO 2014/093661 A2 6, 2014 WO WO 2015,024.017 A2 2/2015 WO WO 2014/093694 A1 6, 2014 WO WO 2015,026883 A1 2/2015 WO WO 2014/093701 A1 6, 2014 WO WO 2015,026885 A1 2/2015 WO WO 2014/093709 A1 6, 2014 WO WO 2015,02688.6 A1 2/2015 WO WO 2014/093712 A1 6, 2014 WO WO 2015,026887 A1 2/2015 WO WO 2014/093718 A1 6, 2014 WO WO 2015,027134 A1 2/2015 WO WO 2014/09373.6 A1 6, 2014 WO WO 2015.O3O881 A1 3/2015 WO WO 2014/O93768 A1 6, 2014 WO WO 2015,031619 A1 3, 2015 US 9,388.430 B2 Page 4

(56) References Cited WO WO 2015,138855 A1 9, 2015 WO WO 2015,13887O A2 9, 2015 FOREIGN PATENT DOCUMENTS WO WO 2015,139008 A1 9, 2015 WO WO 2015,139 139 A1 9, 2015 WO WO 2015,031775 A1 3/2015 WO WO 2015/143046 A2 9, 2015 WO WO 2015/033293 A1 3, 2015 WO WO 2015,14867O A1 10/2015 WO WO 2015,034872 A2 3, 2015 WO WO 2015, 148680 A1 10/2015 WO WO 2015,035.136 A2 3, 2015 WO WO 2015,148761 A1 10/2015 WO WO 2015,035139 A2 3, 2015 WO WO 2015.14886.0 A1 10/2015 WO WO 2015,035162 A2 3, 2015 WO WO 2015.148863 A2 10/2015 WO WO 2015,04007S A1 3, 2015 WO WO 2015.153760 A2 10/2015 WO WO 2015/040402 A1 3, 2015 WO WO 2015.153780 A1 10/2015 WO WO 2015,048577 A2 4/2015 WO WO 2015.153789 A1 10/2015 WO WO 2015,048690 A1 4/2015 WO WO 2015.153791 A1 10/2015 WO WO 2015,052133 A1 4/2015 WO WO 2015, 153940 A1 10/2015 WO WO 2015,052231 A2 4/2015 WO WO 2015, 155341 A1 10/2015 WO WO 2015,053995 A1 4/2015 WO WO 2015/155686 A2 10/2015 WO WO 2015/O54253 A1 4/2015 WO WO 2015,157 070 A2 10/2015 WO WO 2015,057976 A1 4/2015 WO WO 2015, 157534 A1 10/2015 WO WO 2015,05798O A1 4/2015 WO WO 2015, 159068 A1 10/2015 WO WO 2015,059265 A1 4/2015 WO WO 2015, 159086 A1 10/2015 WO WO 2015, O65964 A1 5/2015 WO WO 2015, 159087 A1 10/2015 WO WO 2015/0661 19 A1 5/2015 WO WO 2015, 160683 A1 10/2015 WO WO 2015,066637 A1 5/2015 WO WO 2015, 161276 A2 10/2015 WO WO 2015, O70O83 A1 5/2015 WO WO 2015, 163733 A1 10/2015 WO WO 2015, O70193 A1 5/2015 WO WO 2015, 164748 A1 10/2015 WO WO 2015, O70212 A1 5/2015 WO WO 2015, 166272 A2 11/2015 WO WO 2015, O71474 A2 5/2015 WO WO 2015/1681.25 A1 11/2015 WO WO 2015/073683 A2 5/2015 WO WO 2015/1681.58 A1 11/2015 WO WO 2015,073,867 A1 5/2015 WO WO 2015, 168404 A1 11, 2015 WO WO 2015,07399.0 A1 5/2015 WO WO 2015/168547 A2 11/2015 WO WO 2015, O7505.6 A1 5/2015 WO WO 2015, 168800 A1 11, 2015 WO WO 2015, O75154 A2 5/2015 WO WO 2015/171603 A1 11/2015 WO WO 2015, O75195 A1 5/2015 WO WO 2015/171894 A1 11/2015 WO WO 2015, O77058 A2 5/2015 WO WO 2015, 175642 A2 11/2015 WO WO 2015, O77290 A2 5/2015 WO WO 2015, 17954.0 A1 11/2015 WO WO 2015,077318 A1 5/2015 WO WO 2015, 184259 A1 12/2015 WO WO 2015, O7905.6 A1 6, 2015 WO WO 2015, 18805.6 A1 12/2015 WO WO 2015, O790.57 A2 6, 2015 WO WO 2015, 188094 A1 12/2015 WO WO 2015,086795 A2 6, 2015 WO WO 2015, 188109 A1 12/2015 WO WO 2015,086798 A1 6, 2015 WO WO 2015, 188191 A1 12/2015 WO WO 2015,088643 A1 6, 2015 WO WO 2015,089046 A1 6, 2015 OTHER PUBLICATIONS WO WO 2015,089077 A2 6, 2015 WO WO 2015,089.277 A1 6, 2015 Liu et al., Fast Colorimetric Sensing of Adenosine and Cocaine Based WO WO 2015,089351 A1 6, 2015 on a General Sensor Design Involving Aptamers and Nanoparticles. WO WO 2015,089354 A1 6, 2015 Angew Chem. 2006; 118(1):96-100. WO WO 2015,089364 A1 6, 2015 WO WO 2015,0894.06 A1 6, 2015 Mussolino et al., TALE nucleases: tailored genome engineering WO WO 2015,0894.19 A2 6, 2015 made easy. Curr Opin Biotechnol. Oct. 2012:23(5):644-50. doi: WO WO 2015,089.427 A1 6, 2015 10.1016/j.copbio.2012.01.013. Epub Feb. 17, 2012. WO WO 2015,089.462 A1 6, 2015 Zhang et al., CRISPR/Cas9 for genome editing: progress, implica WO WO 2015,0894.65 A1 6, 2015 tions and challenges. Hum Mol Genet. Sep. 15, 2014:23(R1):R40-6. WO WO 2015,089473 A1 6, 2015 doi:10.1093/hmg/ddu 125. Epub Mar. 20, 2014. W. W. 392, A. 3. Invitation to Pay Additional Fees for PCT/US2014/054291, mailed Dec. 18, 2014. W. W. 33.959 A. 223: U.S. Appl. No. 61/716,256, filed Oct. 19, 2012, Jinek et al. WO WO 2015/103153 A1 7/2015 U.S. Appl. No. 61/717,324, filed Oct. 23, 2012, Cho et al. WO WO 2015, 105928 A1 7/2015 U.S. Appl. No. 61/734,256, filed Dec. 6, 2012, Chen et al. WO WO 2015, 108993 A1 7/2015 U.S. Appl. No. 61/758,624, filed Jan. 30, 2013, Chen et al. WO WO 2015/112790 A2 7/2015 U.S. Appl. No. 61/761,046, filed Feb. 5, 2013, Knight et al. WO WO 2015/112896 A2 7/2015 U.S. Appl. No. 61/794,422 filed Mar. 15, 2013, Knight et al. WO WO 2015, 113063 A1 7/2015 U.S. Appl. No. 61/803,599, filed Mar. 20, 2013, Kim et al. WO WO 2015, 115903 A1 8, 2015 U.S. Appl. No. 61/837,481, filed Jun. 20, 2013, Cho et al. WO WO 2015, 116686 A1 8, 2015 U.S. Appl. No. 14/258,458, filed Apr. 22, 2014, Cong. W. W. 33 9. A. 3. International Search Report and Written Opinion for PCT/US2012/ WO WO 2015/121454 A1 8, 2015 047778, mailed May 30, 2013. WO WO 2015, 122967 A1 8, 2015 International Preliminary Report on Patentability for PCT/US2012/ WO WO 2015,123339 A1 8, 2015 047778, mailed Feb 6, 2014. WO WO 2015, 126927 A2 8, 2015 International Search Report for PCT/US2013/032589, mailed Jul. WO WO 2015,127428 A1 8, 2015 26, 2013. WO WO 2015,127439 A1 8, 2015 Genbank Submission; NIH/NCBI, Accession No. J04623. Kita et al., WO WO 2015, 131101 A1 9, 2015 Apr. 26, 1993. 2 pages. WO WO 2015, 134812 A1 9, 2015 Genbank Submission; NIH/NCBI, Accession No. NC 002737.1. WO WO 2015.136001 A1 9, 2015 Ferretti et al., Jun. 27, 2013. 1 page. WO WO 2015, 13851.0 A1 9, 2015 Genbank Submission; NIH/NCBI, Accession No. NC 015683.1. WO WO 2015/138739 A2 9, 2015 Trost et al., Jul. 6, 2013. 1 page. US 9,388.430 B2 Page 5

(56) References Cited duced cytidine deaminase and APOBEC3G. J. Biol Chem. Oct. 9, 2009:284(41):27761-5. doi:10.1074/bc.R109.052449. Epub Aug. OTHER PUBLICATIONS 13, 2009. Cho et al., Analysis of off-target effects of CRISPR/Cas-derived Genbank Submission; NIH/NCBI, Accession No. NC 016782.1. RNA-guided endonucleases and nickases. Genome Res. Jan. Trost et al., Jun. 11, 2013. 1 page. 2014:24(1): 132-41. doi: 10.1101/gr. 162339.113. Epub Nov. 19, Genbank Submission; NIH/NCBI, Accession No. NC 016786.1. 2013. Trost et al., Aug. 28, 2013. 1 page. Christian et al., Targeting DNA double-strand breaks with TAL effec Genbank Submission; NIH/NCBI, Accession No. NC 017053.1. tor nucleases. Genetics. Oct. 2010; 186(2):757-61. Doi:10.1534/ge Fittipaldi et al., Jul. 6, 2013. 1 page. netics. 110.120717. Epub Jul. 26, 2010. Genbank Submission; NIH/NCBI, Accession No. NC 017317.1. Chylinski et al., The tracrRNA and Cas9 families of type IICRISPR Trost et al., Jun. 11, 2013. 1 page. Cas immunity systems. RNA Biol. May 2013; 10(5):726-37, doi: Genbank Submission; NIH/NCBI, Accession No. NC 017861.1. 10.4161/rna.24321. Epub Apr. 5, 2013. Heidelberg et al., Jun. 11, 2013. 1 page. Cong et al., Multiplex genome engineering using CRISPR/Cas sys Genbank Submission; NIH/NCBI, Accession No. NC 0180 10.1. tems. Science. Feb. 15, 2013:339(6121):819-23. doi:10.1126/sci Lucas et al., Jun. 11, 2013. 2 pages. ence. 1231 143. Epub Jan. 3, 2013. Genbank Submission; NIH/NCBI, Accession No. NC 018721.1. Conticello, The AID/APOBEC family of nucleic acid mutators. Feng et al., Jun. 11, 2013. 1 page. Genome Biol. 2008:9(6):229. doi:10.1 186/gb-2008-9-6-229. Epub Genbank Submission; NIH/NCBI, Accession No. NC 021284.1. Jun. 17, 2008. Ku et al., Jul. 12, 2013. 1 page. Cradick et al., CRISPR/Cas9 systems targeting B-globin and CCR5 Genbank Submission; NIH/NCBI, Accession No. NC 021314.1. genes have substantial off-target activity. Nucleic Acids Res. Nov. 1, Zhang et al., Jul. 15, 2013. 1 page. 2013:41 (20):9584-92. doi:10.1093/nar/gkt714. Epub Aug. 11, 2013. Genbank Submission; NIH/NCBI, Accession No. NC 021846.1. Lo De Souza, Primer: genome editing with engineered nucleases. Nat et al., Jul. 22, 2013. 1 page. Methods. Jan. 2012:9(1):27. Genbank Submission; NIH/NCBI, Accession No. NP 472073.1. Deltcheva et al., CRISPR RNA maturation by trans-encoded small Glaser et al., Jun. 27, 2013. 2 pages. RNA and host factor RNase III. Nature. Mar. 31, Genbank Submission; NIH/NCBI, Accession No. P42212. Prasher et 2011:471 (7340):602-7. doi:10.1038/nature09886. al., Mar. 19, 2014. 7 pages. Dicarlo et al., Genome engineering in Saccharomyces cerevisiae Genbank Submission; NIH/NCBI, Accession No.YP 002342100.1. using CRISPR-Cas systems. Nucleic Acids Res. Apr. Bernardini et al., Jun. 10, 2013. 2 pages. 2013:41(7):4336-43, doi:10.1093/nar/gkt135. Epub Mar. 4, 2013. Genbank Submission; NIH/NCBI, Accession No.YP 002344900.1. Esvelt et al., Orthogonal Cas9 proteins for RNA-guided gene regu Gundogdu et al., Mar. 19, 2014. 2 pages. lation and editing. Nat Methods. Nov. 2013:10(11): 11 16-21. doi: Genbank Submission; NIH/NCBI, Accession No. YP 820832.1. 10.1038/nmeth.2681. Epub Sep. 29, 2013. Makarova et al., Aug. 27, 2013. 2 pages. Fu et al., Improving CRISPR-Cas nuclease specificity using trun UniProt Submission; UniProt, Accession No. P04275. Last modified cated guide RNAS. Nat Biotechnol. Mar. 2014:32(3):279-84. doi: Jul. 9, 2014, version 107. 29 pages. 10.1038/nbt.2808. Epub Jan. 26, 2014. UniProt Submission; UniProt, Accession No. PO4264. Last modified Fu et al., High-frequency off-target mutagenesis induced by Jun. 11, 2014, version 6.15 pages. CRISPR-Cas nucleases in human cells. Nat Biotechnol. Sep. UniProt Submission; UniProt, Accession No. PO1011. Last modified 2013:31 (9):822-6. doi:10.1038/nbt.2623. Epub Jun. 23, 2013. Jun. 11, 2014, version 2. 15 pages. Gaj et al., ZFN, TALEN, and CRISPR/Cas-based methods for Akopian et al., Chimeric recombinases with designed DNA sequence genome engineering. Trends Biotechnol. Jul. 2013:31(7):397-405. recognition. Proc Natl AcadSci U S A. Jul. 22, 2003: 100(15):8688 doi:10.1016/j.tibtech. 2013.04.004. Epub May 9, 2013. 91. Epub Jul. 1, 2003. Gasiunas et al., Cas9-crRNA ribonucleoprotein complex mediates Arnold et al., Mutants of Tn3 resolvase which do not require acces specific DNA cleavage for adaptive immunity in bacteria. Proc Natl sory binding sites for recombination activity. EMBO J. Mar. 1, Acad Sci U S A. Sep. 25, 2012:109(39):E2579-86. Epub Sep. 4, 1999; 18(5): 1407-14. 2012. Supplementary materials included. Barrangou et al., CRISPR provides acquired resistance against Gasiunas et al., RNA-dependent DNA endonuclease Cas9 of the viruses in prokaryotes. Science. Mar. 23, 2007:315(5819): 1709-12. CRISPR system: Holy Grail of genome editing? Trends Microbiol. Bhagwat, DNA-cytosine deaminases: from antibody maturation to Nov. 2013:21(11):562-7. doi:10.1016/j.tim.2013.09.001. Epub Oct. antiviral defense. DNA Repair (Amst). Jan. 5, 2004:3(1):85-9. 1, 2013. Bibikova et al., Stimulation of homologous recombination through Gerber et al., RNA editing by base deamination: more enzymes, more targeted cleavage by chimeric nucleases. Mol Cell Biol. Jan. targets, new mysteries. Trends BiochemSci. Jun. 2001:26(6):376-84. 2001:21(1):289-97. Gilbert et al., CRISPR-mediated modular RNA-guided regulation of Brown et al., Serine recombinases as tools for genome engineering. transcription in eukaryotes. Cell. 2013 154(2):442-51. Methods. Apr. 2011:53(4):372-9. doi:10.1016/jymeth.2010.12.031. Guilinger et al., Broad specificity profiling of TALENs results in Epub Dec. 30, 2010. engineered nucleases with improved DNA-cleavage specificity. Nat Burke et al., Activating mutations of Tn3 resolvase marking inter Methods. Apr. 2014; 11(4):429-35. doi:10.1038/nmeth.2845. Epub faces important in recombination catalysis and its regulation. Mol Feb. 16, 2014. Microbiol. Feb. 2004:51(4):937-48. Guilinger et al., Fusion of catalytically inactive Cas9 to Fold nuclease Carroll et al., Gene targeting in Drosophila and Caenorhabditis improves the specificity of genome modification. Nat Biotechnol. elegans with zinc-finger nucleases. Methods Mol Biol. 2008:435:63 Jun. 2014:32(6):577-82. doi:10.1038/nbt.2909. Epub Apr. 25, 2014. 77 doi:10.1007/978-1-59745-232-8 5. Guo et al., Directed evolution of an enhanced and highly efficient Chang et al., Modification of DNA ends can decrease end joining FokI cleavage domain for zinc finger nucleases. J Mol Biol. Jul. 2, relative to homologous recombination in mammalian cells. Proc Natl 2010:400(1):96-107. doi:10.1016/j.ijmb.2010.04.060. Epub May 4, Acad Sci U S A. Jul. 1987:84(14):4959-63. 2010. Charpentier et al., Biotechnology: Rewriting a genome. Nature. Mar. Hale et al., RNA-guided RNA cleavage by a CRISPR RNA-Cas 7, 2013:495(7439):50-1. doi:10.1038/495.050a. protein complex. Cell. Nov. 25, 2009:139(5):945-56. doi:10.1016/j. Chavez et al., Therapeutic applications of the dC31 integrase system. ce11.2009.07.040. Curr Gene Ther. Oct. 2011; 11(5):375-81. Review. Horvath et al., CRISPR/Cas, the immune system of bacteria and Chelico et al., Biochemical basis of immunological and retroviral archaea. Science. Jan. 8, 2010:327(5962): 167-70. doi:10.1126/sci responses to DNA-targeted cytosine deamination by activation-in ence.11795.55. US 9,388.430 B2 Page 6

(56) References Cited O'Connell et al., Programmable RNA recognition and cleavage by CRISPR/Cas9. Nature. Sep. 28, 2014. doi: 10.1038/nature 13769. OTHER PUBLICATIONS Epub ahead of print. Pan et al., Biological and biomedical applications of engineered Hsu et al., DNA targeting specificity of RNA-guided Cas9 nucleases. nucleases. Mol Biotechnol. Sep. 2013:55(1):54-62. doi: 10.1007/ Nat Biotechnol. Sep. 2013:31 (9):827-32. doi: 10.1038/nbt.2647. S12033-012-9613-9. Epub Jul. 21, 2013. Pattanayak et al., High-throughput profiling of off-target DNA cleav Humbert et al., Targeted gene therapies: tools, applications, optimi age reveals RNA-programmed Cas9 nuclease specificity. Nat zation. Crit Rev Biochem Mol Biol. May-Jun. 2012:47(3):264-81. Biotechnol. Sep. 2013:31(9):839-43, doi:10.1038/nbt.2673. Epub doi:10.3109/10409238.2012.658112. Aug. 11, 2013. Hurt et al., Highly specific Zinc finger proteins obtained by directed Pattanayak et al., Revealing off-target cleavage specificities of zinc domain shuffling and cell-based selection. Proc Natl AcadSci USA. finger nucleases by in vitro selection. Nat Methods. Aug. 7. Oct. 14, 2003: 100(21): 12271-6. Epub Oct. 3, 2003. 2011;8(9):765-70. doi:10.1038/nmeth. 1670. Hwang et al., Efficient genome editing in zebrafish using a CRISPR Pennisi et al., The CRISPR craze. Science. Aug. 23. Cas system. Nat Biotechnol. Mar. 2013:31(3):227-9. doi:10.1038/ 2013:341 (6148):833-6. doi:10.1126/science.341.6148.833. nbt.2501. Epub Jan. 29, 2013. Perez-Pinera et al., Advances in targeted genome editing. Curr Opin Jiang et al., RNA-guided editing of bacterial genomes using Chem Biol. Aug. 2012:16(3-4):268-77. doi:10.1016/j.cbpa. 2012.06. CRISPR-Cas systems. Nat Biotechnol. Mar. 2013:31(3):233-9, doi: 007. Epub Jul. 20, 2012. 10.1038/nbt.2508. Epub Jan. 29, 2013. Perez-Pinera et al., RNA-guided gene activation by CRISPR-Cas9 Jinek et al. A programmable dual-RNA-guided DNA endonuclease based transcription factors. Nat Methods. Oct. 2013:10(10):973-6. in adaptive bacterial immunity. Science. Aug. 17. doi:10.1038/nmeth.2600. Epub Jul. 25, 2013. 2012:337(6096):816-21. doi:10.1126/science. 1225829. Epub Jun. Pham et al., Reward versus risk: DNA cytidine deaminases triggering 28, 2012. immunity and disease. Biochemistry. Mar. 1, 2005:44(8):2703-15. Jineket al., RNA-programmed genome editing in human cells. Elife. Porteus, Design and testing of Zinc finger nucleases for use in mam Jan. 29, 2013:2:eOO471. doi:10.7554/eLife.00471. malian cells. Methods Mol Biol. 2008:435:47-61. doi:10.1007/978 Jinek et al., Structures of Cas9 endonucleases reveal RNA-mediated 1-59745-232-8 4. conformational activation. Science. Mar. 14, Qi et al., Repurposing CRISPR as an RNA-guided platform for 2014:343(6176): 1247997, doi:10.1126/science. 1247997. Epub Feb. sequence-specific control of . Cell. Feb. 28, 6, 2014. 2013; 152(5): 1173-83, doi:10.1016/j.ce 11.2013.02.022. Jore et al., Structural basis for CRISPRRNA-guided DNA recogni Ran et al., Double nicking by RNA-guided CRISPR Cas9 for tion by Cascade. Nat Struct Mol Biol. May 2011;18(5):529-36. doi: enhanced genome editing specificity. Cell. Sep. 12, 10.1038/nsmb.2019. Epub Apr. 3, 2011. 2013; 154(6): 1380-9. doi:10.1016/j.cell.2013.08.021. Epub Aug. 29, Kim et al., Highly efficient RNA-guided genome editing in human 2013. cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. Reynaud et al., What role for AID: mutator, or assembler of the Jun. 2014:24(6):10 12-9, doi:10.1101/gr. 171322.113. Epub Apr. 2, immunoglobulin mutasome? Nat Immunol. Jul. 2003:4(7):631-8. 2014. Reyon et al., FLASH assembly of TALENs for high-throughput Kim et al., Hybrid restriction enzymes: zinc finger fusions to Fok I genome editing. Nat Biotechnol. May 2012:30(5):460-5. doi: cleavage domain. Proc Natl AcadSci USA. Feb. 6, 1996:93(3): 1156 10.1038/nbt.2170. 60. Sander et al., CRISPR-Cas systems for editing, regulating and tar Kim et al., TALENs and ZFNs are associated with different geting genomes. Nat Biotechnol. Apr. 2014:32(4):347-55. doi: mutationsignatures. Nat Methods. Mar. 2013; 10(3):185. doi: 10.1038/nbt.2842. Epub Mar. 2, 2014. 10.1038/nmeth.2364. Epub Feb. 10, 2013. Sapranauskas et al., The Streptococcus thermophilus CRISPR/Cas Larson et al., CRISPR interference (CRISPRi) for sequence-specific system provides immunity in Escherichia coli. Nucleic Acids Res. control of gene expression. Nat Protoc. Nov. 2013;8(11):2180-96. Nov. 2011:39(21):9275-82. doi:10.1093/nar/gkró06. Epub Aug. 3, doi:10.1038/nprot.2013.132. Epub Oct. 17, 2013. 2011. Lietal. TAL nucleases (TALNs): hybrid proteins composed of TAL Sashital et al., Mechanism of foreign DNA selection in a bacterial effectors and FokI DNA-cleavage domain. Nucleic Acids Res. Jan. adaptive immune system. Mol Cell. Jun. 8, 2012:46(5):606-15. doi: 2011:39(1):359-72. doi:10.1093/nar/gkg704. Epub Aug. 10, 2010. 10.1016/j.molcel.2012.03.020. Epub Apr. 19, 2012. Maeder et al., CRISPRRNA-guided activation of endogenous human Schriefer et al., Low pressure DNA shearing: a method for random genes. Nat Methods. Oct. 2013:10(10):977-9. doi:10.1038/nmeth. DNA sequence analysis. Nucleic Acids Res. Dec. 25. 2598. Epub Jul. 25, 2013. 1990; 18(24):7455-6. Mali et al., Cas9 as a versatile tool for engineeringbiology. Nat Schwank et al., Functional repair of CFTR by CRISPR/Cas9 in Methods. Oct. 2013; 10(10):957-63. doi:10.1038/nmeth.2649. intestinal stem cell organoids of cystic fibrosis patients. Cell Stem Mali et al., CAS9 transcriptional activators for target specificity Cell. Dec. 5, 2013:13(6):653-8. doi:10.1016/j.stem. 2013.11.002. Screening and paired nickases for cooperative genome engineering. Semenova et al., Interference by clustered regularly interspaced short Nat Biotechnol. Sep. 2013:31 (9):833-8. doi: 10.1038/nbt.2675. palindromic repeat (CRISPR) RNA is governed by a seed sequence. Epub Aug. 1, 2013. Proc Natl Acad Sci U S A. Jun. 21, 2011; 108(25):10098-103. doi: Mali et al., RNA-guided human genome engineering via Cas9. Sci 10.1073/pnas. 1104144108. Epub Jun. 6, 2011. ence. Feb. 15, 2013:339(6121):823-6. doi: 10.1126/science. Shalem et al., Genome-scale CRISPR-Cas9 knockout screening in 1232033. Epub Jan. 3, 2013. human cells. Science. Jan. 3, 2014:343(6166): 84-7. doi: 10.1126/ Miller et al., A TALE nuclease architecture for efficient genome science. 1247005. Epub Dec. 12, 2013. editing. Nat Biotechnol. Feb. 2011:29(2): 143-8. doi:10.1038/nbt. Sheridan, First CRISPR-Caspatent opens race to stake out intellec 1755. Epub Dec. 22, 2010. tual property. Nat Biotechnol. 2014:32(7):599-601. Narayanan et al., Clamping down on weak terminal base pairs: Siebert et al. An improved PCR method for walking in uncloned oligonucleotides with molecular caps as fidelity-enhancing elements genomic DNA. Nucleic Acids Res. Mar. 25, 1995:23(6):1087-8. at the 5’- and 3'-terminal residues. Nucleic Acids Res. May 20, Tsai et al., Dimeric CRISPRRNA-guided FokI nucleases for highly 2004:32(9): 2901-11. Print 2004. specific genome editing. Nat Biotechnol. Jun. 2014:32(6):569-76. Navaratnam et al. An overview of cytidine deaminases. Int J doi:10.1038/nbt.2908. Epub Apr. 25, 2014. Hematol. Apr. 2006;83(3):195-200. Vanamee et al., FokI requires two specific DNA sites for cleavage. J Nishimasu et al., Crystal structure of Cas9 in complex with guide Mol Biol. May 25, 2001:309(1):69-78. RNA and target DNA. Cell. Feb. 27, 2014:156(5):935-49. doi: Wah et al., Structure of FokI has implications for DNA cleavage. Proc 10.1016/j.cell.2014.02.001. Epub Feb. 13, 2014. Natl AcadSci U S A. Sep. 1, 1998:95(18): 10564-9. US 9,388.430 B2 Page 7

(56) References Cited Lazar et al., Transforming growth factor alpha: mutation of aspartic acid 47 and leucine 48 results in different biological activities. Mol OTHER PUBLICATIONS Cell Biol. Mar. 1988; 8(3): 1247-52. Lewis et al. A serum-resistant cytofectin for cellular delivery of Wang et al., Genetic screens in human cells using the CRISPR-Cas9 antisense oligodeoxynucleotides and plasmid DNA. Proc Natl Acad system. Science. Jan. 3, 2014:343(6166):80-4. doi:10.1126/science. 1246981. Epub Dec. 12, 2013. Sci U S A. Apr. 16, 1996:93(8):3176-81. Wang et al. One-step generation of mice carrying mutations in mul Lundberg et al., Delivery of short interfering RNA using tiple genes by CRISPR/Cas-mediated genome engineering. Cell. endosomolytic cell-penetrating peptides. FASEB J. Sep. May 9, 2013; 153(4):910-8. doi:10.1016/j.cell.2013.04.025. Epub 2007:21(11):2664-71. Epub Apr. 26, 2007. May 2, 2013. Mullins et al., Transgenesis in nonmurine species. Hypertension. Wiedenheft et al., RNA-guided genetic silencing systems in bacteria Oct. 1993:22(4): 630-3. and archaea. Nature. Feb. 15, 2012:482(7385):331-8. doi:10.1038/ Nomura et al., Synthetic mammalian riboswitches based on guanine nature 10886. Review. aptazyme. Chem Commun (Camb). Jul. 21, 2012:48(57):7215-7. Wood et al., Targeted genome editing across species using ZFNs and doi:10.1039/c2cc33140c. Epub Jun. 13, 2012. TALENs. Science. Jul. 15, 2011:333(6040):307. doi:10.1126/sci Pattanayaketal. Determining the specificities of TALENs, Cas9, and ence.1207773. Epub Jun. 23, 2011. other genome-editing enzymes. Methods Enzymol. 2014:546:47-78. Wu et al., Correction of a genetic disease in mouse via use of doi:10.1016/B978-0-12-801185-0.00003-9. CRISPR-Cas9. Cell Stem Cell. Dec. 5, 2013; 13(6):659-62. doi: Phillips, The challenge of gene therapy and DNA delivery. J Pharm 10.1016/j.stem.2013.10.016. Pharmacol. Sep. 2001:53(9): 1169-74. Yin et al., Genome editing with Cas9 in adult mice corrects a disease Qi et al., Engineering naturally occurring trans-acting non-coding mutation and phenotype. Nat Biotechnol. Jun. 2014:32(6):551-3. RNAS to sense molecular signals. Nucleic Acids Res. Jul. doi:10.1038/nbt.2884. Epub Mar. 30, 2014. 2012;40(12):5775-86. doi:10.1093/nar/gks 168. Epub Mar. 1, 2012. Partial Supplementary European Search Report for Application No. Ramakrishna et al., Gene disruption by cell-penetrating peptide EP 12845790.0, mailed Mar. 18, 2015. mediated delivery of Cas9 protein and guide RNA. Genome Res. Jun. International Search Report and Written Opinion for PCT/US2014/ 2014:24(6):1020-7. doi:10.1101/gr. 171264. 113. Epub Apr. 2, 2014. 052231, mailed Jan. 30, 2015 (Corrected Version). Samal et al., Cationic polymers and their therapeutic potential. Chem International Search Report and Written Opinion for PCT/US2014/ Soc Rev. Nov. 7, 2012:41 (21):7147-94. doi:10.1039/c2cs35094g. 054247, mailed Mar. 27, 2015. Epub Aug. 10, 2012. International Search Report and Written Opinion for PCT/US2014/ Sang, Prospects for transgenesis in the chick. Mech Dev. Sep. 054291, mailed Mar. 27, 2015. 2004; 121(9): 1179-86. International Search Report and Written Opinion for PCT/US2014/ Schwarze et al., In vivo protein transduction: delivery of a biologi 054252, mailed Mar. 5, 2015. cally active protein into the mouse. Science. Sep. 3, International Search Report and Written Opinion for PCT/US2014/ 1999:285(5433): 1569-72. 070038, mailed Apr. 14, 2015. Sells et al., Delivery of protein into cells using polycationic Boeckle et al., Melittin analogs with highlytic activity at endosomal liposomes. Biotechniques. Jul. 1995; 19(1):72-6, 78. pH enhance transfection with purified targeted PEI polyplexes. J Thorpe et al., Functional correction of episomal mutations with short Control Release. May 15, 2006; 112(2):240-8. Epub Mar. 20, 2006. DNA fragments and RNA-DNA oligonucleotides. JGene Med. Mar.- Branden and Tooze. Introduction to Protein Structure. 1999; 2nd Apr. 2002:4(2): 195-204. edition. Garland Science Publisher: 3-12. Wacey et al., Disentangling the perturbational effects of Cameron, Recent advances in transgenic technology. Mol substitutions in the DNA-binding domain of p53. Hum Genet. Jan. Biotechnol. Jun. 1997;7(3):253-65. 1999; 104(1): 15-22. Caron et al., Intracellular delivery of a Tat-eGFP fusion protein into Wadia et al., Modulation of cellular function by TAT mediated muscle cells. Mol Ther. Mar. 2001:3(3):310-8. transduction of full length proteins. Curr Protein Pept Sci. Apr. Chung-Il et al., Artificial control of gene expression in mammalian 2003;4(2): 97-104. cells by modulating RNA interference through aptamer-small mol Wadia et al., Transducible TAT-HA fusogenic peptide enhances ecule interaction. RNA. May 2006; 12(5):710-6. Epub Apr. 10, 2006. escape of TAT-fusion proteins after lipid raft macropinocytosis. Nat Cradick et al., ZFN-site searches genomes for zinc finger nuclease Med. Mar. 2004; 10(3):310-5. Epub Feb. 8, 2004. target sites and off-target sites. BMC Bioinformatics. May 13, Supplementary European Search Report for Application No. EP 2011:12:152. doi:10.1186/1471-2105-12-152. 1284.5790.0, mailed Oct. 12, 2015. Gilleron et al., Image-based analysis of lipid nanoparticle-mediated No Author Listed, EMBL Accession No. Q99ZW2. Nov. 2012. 2 siRNA delivery, intracellular trafficking and endosomal escape. Nat pages. Biotechnol. Jul. 2013:31(7):638-46. doi:10.1038/nbt.2612. Epub No Author Listed, Invitrogen LipofectamineTM 2000 product sheets, Jun. 23, 2013. 2002. 2 pages. Guo et al., Protein tolerance to random amino acid change. J Gene No Author Listed, Invitrogen LipofectamineTM 2000 product sheets, Med. Mar.-Apr. 2002:4(2): 195-204. 2005.3 pages. Hasadsri et al., Functional protein delivery into neurons using poly No Author Listed, Invitrogen LipofectamineTMLTX product sheets, meric nanoparticles. J Biol Chem. Mar. 13, 2009:284(11):6972-81. 2011. 4 pages. doi:10.1074/jbc.M805956200. Epub Jan. 7, 2009. No Author Listed. Thermo Fisher Scientific How Cationic Lipid Hill et al., Functional analysis of conserved histidines in ADP-glu Mediated Transfection Works, retrieved from the internet Aug. 27, cose pyrophosphorylase from Escherichia coli.Biochem Biophys 2015. 2 pages. Res Commun. Mar 17, 1998:244(2):573-7. UniProt Submission; UniProt, Accession No. P01011. Last modified Houdebine. The methods to generate transgenic animals and to con Sep. 18, 2013, version 2. 15 pages. trol transgene expression. J Biotechnol. Sep. 25, 2002:98(2-3): 145 Alexandrov et al., Signatures of mutational processes in human can 60. cer. Nature. Aug. 22, 2013:500(7463):415-21. doi: 10.1038/na Kappel et al., Regulating gene expression in transgenic animals.Curr ture 12477. Epub Aug. 14, 2013. Opin Biotechnol. Oct. 1992:3(5):548-53. Li et al., Current approaches for engineering proteins with diverse Klauser et al. An engineered small RNA-mediated genetic switch biological properties. Adv Exp Med Biol. 2007:620:18-33. based on a ribozyme expression platform. Nucleic Acids Res. May 1, 2013:41 (10):5542-52. doi:10.1093/nar/gkt253. Epub Apr. 12, 2013. * cited by examiner U.S. Patent Jul. 12, 2016 Sheet 1 of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 2 of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 3 Of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 4 of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 5 Of 35 US 9,388,430 B2

k &

::: U.S. Patent Jul. 12, 2016 Sheet 6 of 35 US 9,388,430 B2

89"ROHH U.S. Patent Jul. 12, 2016 Sheet 7 Of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 8 Of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 9 Of 35 US 9,388,430 B2

VL"EDIH

$8: &A, ea--- f s U.S. Patent Jul. 12, 2016 Sheet 10 Of 35 US 9,388,430 B2

w we a re. c

& - - - were

--

N S.

w asf :: 85y : SSs st 3 < 3. g iii. S; is f () U.S. Patent Jul. 12, 2016 Sheet 11 of 35 US 9,388,430 B2

%. ::: 's,

83 %. 8: U.S. Patent Jul. 12, 2016 Sheet 12 Of 35 US 9,388,430 B2

xx

C 5 x C.N. l s 2 3 3: : : s

8. 3. 88: s: : : 8: s: : -: x- x % 8U

8: & 3. & i k; U.S. Patent US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 15 Of 35 US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 16 of 35 US 9,388,430 B2

0.838. i. (8:30: 83.8 -

. o S2 (5 x x:

s 'a x :: {{...,a 3.838.------380- - - -xii. , , 8.3: - U.S. Patent US 9,388,430 B2

(da)u?ue,leoedsvnaeping

U.S. Patent US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 19 Of 35 US 9,388,430 B2

i 8

4. sesses 8: U.S. Patent US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 21 Of 35 US 9,388,430 B2

CS (ES

U.S. Patent Jul. 12, 2016 Sheet 22 Of 35 US 9,388,430 B2

8 swww. &

U.S. Patent Jul. 12, 2016 Sheet 23 Of 35 US 9,388,430 B2

..***********************--- ::::::: U.S. Patent Jul. 12, 2016 Sheet 24 of 35 US 9,388,430 B2

(pºnu?u03)?I“DIH U.S. Patent Jul. 12, 2016 Sheet 25 Of 35 US 9,388,430 B2

\7710

% ept

i:

% epu U.S. Patent Jul. 12, 2016 Sheet 26 of 35 US 9,388,430 B2

83) U.S. Patent US 9,388,430 B2

%) eput

U.S. Patent Jul. 12, 2016 Sheet 29 Of 35 US 9,388,430 B2

i

88: 883 π 888 8(iš -8.8s:

U.S. Patent Jul. 12, 2016 US 9,388,430 B2

8 w &: - )

U.S. Patent US 9,388,430 B2

U.S. Patent US 9,388,430 B2

U.S. Patent Jul. 12, 2016 Sheet 35 of 35 US 9,388,430 B2

US 9,388,430 B2 1. 2 CAS9-RECOMBINASE FUSION PROTEINS certain aspects described herein relate to the discovery that AND USES THEREOF increasing the number of sequences (e.g., having a nuclease bind at more than one site at a desired target), and/or splitting RELATED APPLICATION the activities (e.g., target binding and target cleaving) of a nuclease between two or more proteins, will increase the This application claims the benefit of the filing date of U.S. specificity of a nuclease and thereby decrease the likelihood provisional applications 61/874,609, filed Sep. 6, 2013; of off-target effects. Accordingly, Some aspects of this disclo 61/915,414, filed Dec. 12, 2013; and and 61/980,315, filed Sure provide strategies, compositions, systems, and methods Apr. 16, 2014; all entitled Cas9 Variants and Uses Thereof, to improve the specificity of site-specific nucleases, in par the entire contents of each of which are incorporated herein 10 ticular, RNA-programmable endonucleases, such as Cas9 by reference. endonuclease. Certain aspects of this disclosure provide vari ants of Cas9 endonuclease engineered to have improved BACKGROUND OF THE INVENTION specificity. Other aspects of this disclosure are based on the recogni Site-specific endonucleases theoretically allow for the tar 15 tion that site-specific recombinases (SSRs) available today geted manipulation of a single site within a genome and are are typically limited to recognizing and binding distinct con useful in the context of gene targeting for therapeutic and sensus sequences. Thus certain aspects described herein research applications. In a variety of organisms, including relate to the discovery that fusions between RNA-program mammals, site-specific endonucleases have been used for mable (nuclease-inactivated) nucleases (or RNA-binding genome engineering by stimulating either non-homologous domains thereof), and a recombinase domain, provide novel end joining or homologous recombination. In addition to recombinases theoretically capable of binding and recombin providing powerful research tools, site-specific nucleases ing DNA at any site chosen, e.g., by a practitioner (e.g., sites also have potential as gene therapy agents, and two site specified by guide RNAs (gRNAs) that are engineered or specific endonucleases have recently entered clinical trials: selected according the sequence of the area to be recom (1) CCR5-2246, targeting a human CCR-5allele as part of an 25 bined). Such novel recombinases are therefore useful, inter anti-HIV therapeutic approach (NCT00842634, alia, for the targeted insertion, deletion, inversion, transloca NCT01044654, NCT01252641); and (2) VF24684, targeting tion or other genomic modifications. Thus, also provided are the human VEGF-A as part of an anti-cancer thera methods of using these inventive recombinase fusion pro peutic approach (NCT01082926). teins, e.g., for Such targeted genomic manipulations. Specific cleavage of the intended nuclease target site with 30 Accordingly, one embodiment of the disclosure provides out or with only minimal off-target activity is a prerequisite fusion proteins and dimers thereof, for example, fusion pro for clinical applications of site-specific endonuclease, and teins comprising two domains: (i) a nuclease-inactivated also for high-efficiency genomic manipulations in basic Cas9 domain; and (ii) a nuclease domain (e.g., a monomer of research applications. For example, imperfect specificity of the FokI DNA cleavage domain). See e.g., FIGS. 1A, 6D. The engineered site-specific binding domains has been linked to 35 fusion protein may further comprise a nuclear localization cellular toxicity and undesired alterations of genomic loci signal (NLS) domain, which signals for the fusion proteins to other than the intended target. Most nucleases available be transported into the nucleus of a cell. In some embodi today, however, exhibit significant off-target activity, and thus ments, one or more domains of the fusion proteins are sepa may not be suitable for clinical applications. An emerging rated by a linker. In certain embodiments, the linker is a nuclease platform for use in clinical and research settings are 40 non-peptidic linker. In certain embodiments, the linker is a the RNA-guided nucleases, such as Cas9. While these peptide linker. In the case of peptide linkers, the peptide linker nucleases are able to bind guide RNAs (gRNAs) that direct may comprise an XTEN linker, an amino acid sequence com cleavage of specific target sites, off-target activity is still prising one or more repeats of the tri-peptide GGS, or any observed for certain Cas9:gRNA complexes (Pattanayak et sequence as provided in FIG. 12A. In some embodiments, the al., “High-throughput profiling of off-target DNA cleavage 45 fusion protein is encoded by a nucleotide sequence set forthas reveals RNA-programmed Cas9 nuclease specificity.” Nat. SEQID NO:9, SEQID NO:10, SEQID NO:11, or SEQ ID Biotechnol. 2013; doi:10.1038/nbt.2673). Technology for NO:12, or a variant or fragment of any one of SEQID NO:9, engineering nucleases with improved specificity is therefore SEQ ID NO:10, SEQ ID NO:11, or SEQ ID NO:12. The needed. nuclease-inactivated Cas9 domain is capable of binding a Another class of enzymes useful for targeted genetic 50 guide RNA (gRNA). In certain embodiments, having dimers manipulations are site-specific recombinases (SSRs). These of Such fusion protein each comprising a gRNA binding two enzymes perform rearrangements of DNA segments by rec distinct regions of a target nucleic acid provides for improved ognizing and binding to short DNA sequences, at which they specificity, for example as compared to monomeric RNA cleave the DNA backbone, exchange the two DNA helices guided nucleases comprising a single gRNA to direct binding involved and rejoin the DNA strands. Such rearrangements 55 to the target nucleic acid. allow for the targeted insertion, inversion, excision, or trans According to another aspect of the invention, methods for location of DNA segments. However, like site-specific endo site-specific DNA cleavage using the inventive Cas9 variants nucleases, naturally-occurring SSRS typically recognize and are provided. The methods typically comprise (a) contacting bind specific consensus sequences, and are thus limited in this DNA with a fusion protein of the invention (e.g., a fusion respect. Technology for engineering recombinases with 60 protein comprising a nuclease-inactivated Cas9 domain and a altered and/or improved specificity is also needed. FokI DNA cleavage domain), wherein the inactive Cas9 domain binds a gRNA that hybridizes to a region of the DNA; SUMMARY OF THE INVENTION (b) contacting the DNA with a second fusion protein (e.g., a fusion protein comprising a nuclease-inactivated Cas9 and Some aspects of this disclosure are based on the recogni 65 FokI DNA cleavage domain), wherein the inactive Cas9 tion that the reported toxicity of some engineered site-specific domain of the second fusion protein binds a second gRNA endonucleases is based on off-target DNA cleavage. Thus that hybridizes to a second region of DNA; wherein the bind US 9,388,430 B2 3 4 ing of the fusion proteins in steps (a) and (b) results in the herein are provided. For example, in Some embodiments, a dimerization of the nuclease domains of the fusion proteins, dimer comprises two halves of a split Cas9 protein, for such that the DNA is cleaved in a region between the bound example, (i) a protein comprising a gRNA binding domain of fusion proteins. In some embodiments, the gRNAS of steps Cas9, but not a DNA cleavage domain, and (ii) a protein (a) and (b) hybridize to the same strand of the DNA, or the comprising a DNA cleavage domain of Cas9, but not a gRNA gRNAs of steps (a) and (b) hybridize to opposite strands of the binding domain. See e.g., FIG. 2A. In some embodiments, a DNA. In some embodiments, the gRNAs of steps (a) and (b) dimer comprises one half of a split Cas9 protein, and a fusion hybridize to regions of the DNA that are no more than 10, no protein comprising the other half of the split Cas9 protein. See more than 15, no more than 20, no more than 25, no more than e.g., FIG. 2B. For example, in certain embodiments such a 30, no more than 40, no more than 50, no more than 60, no 10 dimer comprises (i) a protein comprising a gRNA binding more than 70, no more than 80, no more than 90, or no more domain of Cas9, but not a DNA cleavage domain, and (ii) a than 100 base pairs apart. In some embodiments, the DNA is fusion protein comprising a nuclease-inactivated Cas9 and a in a cell, for example, a eukaryotic cell or a prokaryotic cell, DNA cleavage domain. In other embodiments, the dimer which may be in an individual. Such as a human. comprises (i) a protein comprising a DNA cleavage domain of According to another embodiments, a complex comprising 15 Cas9, but not a gRNA binding domain, and (ii) a fusion a dimer of fusion proteins of the invention (e.g., a dimer of a protein comprising a nuclease-inactivated Cas9 and a gRNA fusion protein comprising a nuclease-inactivated Cas9 and a binding domain of Cas9. In some embodiments, a dimer is FokI DNA cleavage domain) are provided. In some embodi provided that comprises two fusion proteins, each fusion ments, the nuclease-inactivated Cas9 domain of each fusion protein comprising a nuclease-inactivated Cas9 and one half protein of the dimer binds a single extended gRNA, such that of a split Cas9. See e.g., FIG. 2C. For example, in certain one fusion protein of the dimer binds a portion of the gRNA, embodiments, such a dimer comprises: (i) a fusion protein and the otherfusion protein of the dimer binds another portion comprising a nuclease-inactivated Cas9 and a gRNA binding of the gRNA. See e.g., FIG. 1B. In some embodiments, the domain of Cas9, and (ii) a fusion protein comprising a gRNA is at least 50, at least 75, at least 100, at least 150, at nuclease-inactivated Cas9 and a DNA cleavage domain. In least 200, at least 250, or at least 300 nucleotides in length. In 25 Some embodiments, any of the provided protein dimers is some embodiments, the regions of the extended gRNA that associated with one or more gRNA(s). hybridize to a target nucleic acid comprise 15-25, 19-21, or 20 In some embodiments, methods for site-specific DNA nucleotides. cleavage utilizing the inventive protein dimers are provided. In another embodiment, methods for site-specific DNA For example, in some embodiments. Such a method com cleavage are provided comprising contacting a DNA with a 30 prises contacting DNA with a protein dimer that comprises (i) complex of two inventive fusion proteins bound to a single a protein comprising agRNA binding domain of Cas9, but not extended gRNA. In some embodiments, the gRNA contains a DNA cleavage domain, and (ii) a protein comprising a DNA two portions that hybridize to two separate regions of the cleavage domain of Cas9, but not a gRNA binding domain, DNA to be cleaved; the complex binds the DNA as a result of wherein the dimer binds a gRNA that hybridizes to a region of the portions of the gRNA hybridizing to the two regions; and 35 the DNA, and cleavage of the DNA occurs. See e.g., FIG. 2A. binding of the complex results in dimerization of the nuclease In some embodiments, the protein dimerused for site-specific domains of the fusion proteins, such that the domains cleave DNA cleavage comprises (i) a protein comprising a gRNA the DNA in a region between the bound fusion proteins. In binding domain of Cas9, but not a DNA cleavage domain, and some embodiments, the two portions of the gRNA hybridize (ii) a fusion protein comprising a nuclease-inactivated Cas9 to the same strand of the DNA. In other embodiments, the two 40 and a DNA cleavage. See e.g., FIG. 2B. In some embodi portions of the gRNA hybridize to opposing strands of the ments, the dimer used for site-specific DNA cleavage com DNA. In some embodiments, the two portions of the gRNA prises (i) a protein comprising a DNA cleavage domain of hybridize to regions of the DNA that are no more 10, no more Cas9, but not a gRNA binding domain, and (ii) a fusion than 15, no more than 20, no more than 25, no more than 30, protein comprising a nuclease-inactivated Cas9 and a gRNA no more than 40, no more than 50, no more than 60, no more 45 binding domain of Cas9. In some embodiments, the protein than 70, no more than 80, no more than 90, or no more than dimer binds two gRNAs that hybridize to two regions of the 100 base pairs apart. In some embodiments, the DNA is in a DNA, and cleavage of the DNA occurs. See e.g., FIG. 2B. In cell, for example, a eukaryotic cell or a prokaryotic cell, some embodiments, the two gRNAs hybridize to regions of which may be in an individual. Such as a human. the DNA that are no more than 10, no more than 15, no more According to another embodiment of the invention, split 50 than 20, no more than 25, no more than 30, no more than 40, Cas9 proteins (including fusion proteins comprising a split no more than 50, no more than 60, no more than 70, no more Cas9 protein) comprising fragments of a Cas9 protein are than 80, no more than 90, or no more than 100 base pairs provided. In some embodiments, a protein is provided that apart. In some embodiments, the dimer used for site-specific includes a gRNA binding domain of Cas9 but does not DNA cleavage comprises two fusion proteins: (i) a fusion include a DNA cleavage domain. In other embodiments, pro 55 protein comprising a nuclease-inactivated Cas9 and a gRNA teins comprising a DNA cleavage domain of Cas9, but not a binding domain of Cas9, and (ii) a fusion protein comprising gRNA binding domain, are provided. In some embodiments, a nuclease-inactivated Cas9 and a DNA cleavage domain. In a fusion protein comprising two domains: (i) a nuclease some embodiments, the protein dimer binds three gRNAs that inactivated Cas9 domain, and (ii) a gRNA binding domain of hybridize to three regions of the DNA, and cleavage of the Cas9 are provided, for example, wherein domain (ii) does not 60 DNA occurs. Having Such an arrangement, e.g., targeting include a DNA cleavage domain. See e.g., FIG. 2B. In some more than one region of a target nucleic acid, for example embodiments, fusion proteins comprising two domains: (i) a using dimers associated with more than one gRNA (or a nuclease-inactivated Cas9 domain, and (ii) a DNA cleavage gRNA comprising more than one region that hybridizes to the domain are provided, for example, wherein domain (ii) does target) increases the specificity of cleavage as compared to a not include a gRNA binding domain. See e.g., FIG. 2C (fu 65 nuclease binding a single region of a target nucleic acid. In sion protein on right side, comprising a “B” domain). In some some embodiments, the three gRNAs hybridize to regions of embodiments, protein dimers of any of the proteins described the DNA that are no more than 10, no more than 15, no more US 9,388,430 B2 5 6 than 20, no more than 25, no more than 30, no more than 40, nase fusion protein, wherein the nuclease-inactivated Cas9 no more than 50, no more than 60, no more than 70, no more domain of the second fusion protein binds a second gRNA than 80, no more than 90, or no more than 100 base pairs apart that hybridizes to a second region of the first DNA; (c) con between the first and second, and the second and third tacting a second DNA with a third RNA-guided recombinase regions. In some embodiments, the DNA is in a cell, for fusion protein, wherein the nuclease-inactivated Cas9 example, a eukaryotic cellora prokaryotic cell, which may be domain of the third fusion protein binds a third gRNA that in an individual. Such as a human. hybridizes to a region of the second DNA; and (d) contacting According to another embodiment, minimal Cas9 proteins the second DNA with a fourth RNA-guided recombinase are provided, for example, wherein the protein comprises N fusion protein, wherein the nuclease-inactivated Cas9 and/or C-terminal truncations and retains RNA binding and 10 DNA cleavage activity. In some embodiments, the N-terminal domain of the fourth fusion protein binds a fourth gRNA that truncation removes at least 5, at least 10, at least 15, at least hybridizes to a second region of the second DNA, wherein the 20, at least 25, at least 40, at least 40, at least 50, at least 75, binding of the fusion proteins in steps (a)-(d) results in the at least 100, or at least 150 amino acids. In some embodi tetramerization of the recombinase catalytic domains of the ments, the C-terminal truncation removes at least 5, at least 15 fusion proteins, under conditions such that the DNAs are 10, at least 15, at least 20, at least 25, at least 40, at least 40, recombined. In some embodiments, methods for site-specific at least 50, at least 75, at least 100, or at least 150 amino acids. recombination between two regions of a single DNA mol In some embodiments, the minimized Cas9 protein further ecule are provided. In some embodiments, the method com comprises a bound gRNA. prises (a) contacting a DNA with a first RNA-guided recom In some embodiments, methods for site-specific DNA binase fusion protein, wherein the nuclease-inactivated Cas9 cleavage are provided comprising contacting a DNA with domain binds a first gRNA that hybridizes to a region of the minimized Cas9 protein:gRNA complex. DNA; (b) contacting the DNA with a second RNA-guided According to another embodiment, dimers of Cas9 (or recombinase fusion protein, wherein the nuclease-inactivated fragments thereof) wherein the dimer is coordinated through Cas9 domain of the second fusion protein binds a second a single gRNA are provided. In some embodiments, the single 25 gRNA that hybridizes to a second region of the DNA; (c) gRNA comprises at least two portions that (i) are eachable to contacting the DNA with a third RNA-guided recombinase bind a Cas9 protein and (ii) each hybridize to a target nucleic fusion protein, wherein the nuclease-inactivated Cas9 acid sequence (e.g., DNA sequence). In some embodiments, domain of the third fusion protein binds a third gRNA that the portions of the gRNA that hybridize to the target nucleic hybridizes to a third region of the DNA; (d) contacting the acid each comprise no more than 5, no more than 10, or no 30 DNA with a fourth RNA-guided recombinase fusion protein, more than 15 nucleotides complementary to the target nucleic wherein the nuclease-inactivated Cas9 domain of the fourth acid sequence. In some embodiments, the portions of the fusion proteinbinds a fourth gRNA that hybridizes to a fourth gRNA that hybridize to the target nucleic acid are separated region of the DNA; wherein the binding of the fusion proteins by a linker sequence. In some embodiments, the linker in steps (a)-(d) results in the tetramerization of the recombi sequence hybridizes to the target nucleic acid. See e.g., FIG. 35 nase catalytic domains of the fusion proteins, under condi 4. In some embodiments, methods for site-specific DNA tions such that the DNA is recombined. In some embodiments cleavage are provided comprising contacting DNA with a involving methods for site-specific recombination, gRNAS dimer of Cas9 proteins coordinated through a single gRNA. hybridizing to the same DNA molecule hybridize to opposing According to another embodiment, the disclosure provides Strands of the DNA molecule. In some embodiments, e.g., fusion proteins and dimers and tetramers thereof, for 40 involving site-specific recombination of a single DNA mol example, fusion proteins comprising two domains: (i) a ecule, two gRNAs hybridize to one strand of the DNA, and the nuclease-inactivated Cas9 domain; and (ii) a recombinase other two gRNAs hybridize to the opposing strand. In some catalytic domain. See, e.g., FIG.5. The recombinase catalytic embodiments, the gRNAs hybridize to regions of their domain, in Some embodiments, is derived from the recombi respective DNAS (e.g., on the same strand) that are no more nase catalytic domain of Hin recombinase, Gin recombinase, 45 than 10, no more than 15, no more than 20, no more than 25, or Tn3 resolvase. The nuclease-inactivated Cas9 domain is no more than 30, no more than 40, no more than 50, no more capable of binding a gRNA, e.g., to target the fusion protein to than 60, no more than 70, no more than 80, no more than 90, a target nucleic acid sequence. The fusion proteins may fur or no more than 100 base pairs apart. In some embodiments, ther comprise a nuclear localization signal (NLS) domain, the DNA is in a cell, for example, a eukaryotic cell or a which signals for the fusion proteins to be transported into the 50 prokaryotic cell, which may be in or obtained from an indi nucleus of a cell. In some embodiments, one or more domains vidual. Such as a human. of the fusion proteins are separated by a linker. In certain According to another embodiment, polynucleotides are embodiments, the linker is a non-peptidic linker. In certain provided, for example, that encode any of the Cas9 proteins embodiments, the linker is a peptide linker. In the case of described herein (e.g., Cas9 variants, Cas9 dimers, Cas9 peptide linkers, the peptide linker may comprise an XTEN 55 fusion proteins, Cas9 fragments, minimized Cas9 proteins, linker, an amino acid sequence comprising one or more Cas9 variants without a cleavage domain, Cas9 variants with repeats of the tri-peptide GGS, or any sequence as provided in out a gRNA domain, Cas9-recombinase fusions, etc.). In FIG 12A. Some embodiments, polynucleotides encoding any of the In another embodiment, methods for site-specific recom gRNAs described herein are provided. In some embodiments, bination are provided, which utilize the inventive RNA 60 polynucleotides encoding any inventive Cas9 protein guided recombinase fusion proteins described herein. In described herein and any combination of gRNA(s) as Some embodiments, the method is useful for recombining two described herein are provided. In some embodiments, vectors separate DNA molecules, and comprises (a) contacting a first that comprise a polynucleotide described herein are provided. DNA with a first RNA-guided recombinase fusion protein, In some embodiments, vectors for recombinant protein wherein the nuclease-inactivated Cas9 domain binds a first 65 expression comprising a polynucleotide encoding any of the gRNA that hybridizes to a region of the first DNA; (b) con Cas9 proteins and/or gRNAs described herein are provided. tacting the first DNA with a second RNA-guided recombi In some embodiments, cells comprising genetic constructs US 9,388,430 B2 7 8 for expressing any of the Cas9 proteins and/or gRNAs Specific deletions and/or truncations decrease the size of described herein are provided. Cas9 without affecting its activity (e.g., gRNA binding and In Some embodiments, kits are provided. For example, kits DNA cleavage activity). The minimized Cas9 protein comprising any of the Cas9 proteins and/or gRNAs described increases the efficacy of for example, delivery to cells using herein are provided. In some embodiments, kits comprising viral vectors such as AAV (accommodating sequences any of the polynucleotides described herein, e.g., those ~<4700 bp) or lentivirus (accommodating sequences ~<9 encoding a Cas9 protein and/or gRNA, are provided. In some kb), or when pursuing multiplexed gRNA/Cas9 approaches. embodiments, kits comprising a vector for recombinant pro FIG. 4 shows how two Cas9 proteins can be coordinated tein expression, wherein the vectors comprise a polynucle through the action of a single extended RNA containing two otide encoding any of the Cas9 proteins and/or gRNAs 10 distinct gRNA motifs. Each gRNA targeting region is short described herein, are provided. In some embodiments, kits ened, such that a single Cas9:gRNA unit cannot bind effi comprising a cell comprising genetic constructs for express ciently by itself. The normal 20 nt targeting sequence has ing any of the Cas9 proteins and/or gRNAs described herein been altered so that some portion (e.g., the 5' initial 10 nt) has are provided. been changed to Some non-specific linker sequence. Such as Other advantages, features, and uses of the invention will 15 AAAAAAAAAA (SEQ ID NO:13), with only 10 nt of the be apparent from the Detailed Description of Certain Non gRNA remaining to direct target binding (alternatively this 5' Limiting Embodiments of the Invention; the Drawings, 10 nt is truncated entirely). This “low-affinity” gRNA unit which are schematic and not intended to be drawn to scale; exists as part of a tandem gRNA construct with a second, and the Claims. distinct low-affinity gRNA unit downstream, separated by a linker sequence. In some embodiments, there are more than BRIEF DESCRIPTION OF THE DRAWINGS two low-affinity gRNA units (e.g., at least 3, at least 4, at least 5, etc.). In some embodiments, the linker comprises a target FIG. 1 is a schematic detailing certain embodiments of the nucleic acid complementary sequence (e.g., as depicted by invention. (A) In this embodiment, nuclease-inactivated Cas9 the linker region contacting the DNA target). protein is fused to a monomer of the FokI nuclease domain. 25 FIG. 5 shows schematically, how Cas9-recombinase Double-strand DNA-cleavage is achieved through dimeriza fusions can be coordinated through gRNAs to bind and tion of FokI monomers at the target site and is dependent on recombine target DNAS at desired sequences (sites). (A, B) the simultaneous binding of two distinct Cas9:gRNA com Nuclease-inactivated Cas9 (dCas9) protein is fused to a plexes. (B) In this embodiment, an alternate configuration is monomer of a recombinase domain (Rec). Site-specific provided, wherein two Cas9-FokI fusions are coordinated 30 recombination is achieved through dimerization (A) of the through the action of a single extended gRNA containing two recombinase catalytic domain monomers at the target site, distinct gRNA motifs. The gRNA motifs comprise regions and then tetramerization (B) of two dimers assembled on that hybridize the target in distinct regions, as well as regions separate Cas9-recombination sites. The fusion to dCas9: that bind each fusion protein. The extended gRNA may gRNA complexes determines the sequence identity of the enhance cooperative binding and alter the specificity profile 35 flanking target sites while the recombinase catalytic domain of the fusions. determines the identity of the core sequence (the sequence FIG. 2 is a schematic detailing certain embodiments of the between the two dCas9-binding sites). (B) Recombination invention. (A) In this embodiment, dimeric split Cas9 sepa proceeds through Strand cleavage, exchange, and re-ligation rates A) gRNA-binding ability from B) dsDNA cleavage. within the dCas9-recombinase tetramer complex. DNA cleavage occurs when both halves of the protein are 40 FIG. 6 shows architectures of Cas9 and FokI-dCas'9 fusion co-localized to associate and refold into a nuclease-active variants. (A) Cas9 protein in complex with a guide RNA state. (B) In this embodiment, nuclease-inactivated Cas9 (gRNA) binds to target DNA. The S. pyogenes Cas9 protein mutant is fused to the A-half (or in some embodiments, the recognizes the PAM sequence NGG, initiating unwinding of B-half) of the split Cas9 nuclease. Upon binding of both the dsDNA and gRNA:DNA base pairing. (B) FokI-dCas9 fusion Cas9-A-half (or Cas9-B-half) fusion and the inactive gRNA 45 architectures tested. Four distinct configurations of NLS, binding Cas9 B-half (or A-half, respectively) at the target site, FokI nuclease, and dCas9 were assembled. Seventeen (17) dsDNA is enabled following split protein reassembly. This protein linker variants were also tested. (C) gRNA target sites split Cas9-pairing can use two distinct gRNA-binding Cas9 tested within GFP. Seven gRNA target sites were chosen to proteins to ensure the split nuclease-active Cas9 reassembles test FokI-dCas9 activity in an orientation in which the PAM is only on the correct target sequence. (C) In this embodiment, 50 distal from the cleaved spacer sequence (orientation A). nuclease-inactivated Cas9 mutant is fused to the A-half of the Together, these seven gRNAs enabled testing of FokI-dCas9 split Cas9 nuclease. A separate nuclease-inactivated Cas9 fusion variants across spacer lengths ranging from 5 to 43 bp. mutant is fused to the B-half of the split Cas9 nuclease. Upon See FIG.9 for guide RNAs used to test orientation B, in which binding of one nuclease-inactivated Cas9 mutant to a gRNA the PAM is adjacent to the spacer sequence. (D) Monomers of target site and binding of the other nuclease-inactivated Cas9 55 FokI nuclease fused to dCas9 bind to separate sites within the mutant to a second gRNA target site, the split Cas9 halves can target locus. Only adjacently bound FokI-dCas9 monomers dimerize and bind a third gRNA target to become a fully can assemble a catalytically active FokI nuclease dimer, trig active Cas9 nuclease that can cleave dsDNA. This split Cas9 gering dsDNA cleavage. The sequences shown in (C) are pairing uses three distinct gRNA-binding Cas9 proteins to identified as follows: “EmCFP (bp 326-415) corresponds to ensure the split nuclease-active Cas9 reassembles only on the 60 SEQ ID NO:204; “G1 corresponds to SEQ ID NO:205; correct target sequence. Any other DNA-binding domain in “G2 corresponds to SEQID NO:206: “G3” corresponds to place of the inactive Cas9 (zinc fingers, TALE proteins, etc.) SEQ ID NO:207; “G4” corresponds to SEQ ID NO:208; can be used to complete the reassembly of the split Cas9 “G5” corresponds to SEQID NO:209; “G6” corresponds to nuclease. SEQID NO:210; and “G7 corresponds to SEQID NO:211. FIG. 3 shows schematically a minimal Cas9 protein that 65 FIG. 7 shows genomic DNA modification by f(cas9, Cas9 comprises the essential domains for Cas9 activity. Full-length nickase, and wild-type Cas9. (A) shows a graph depicting Cas9 is a 4.1 kb gene which results in a protein of 150 kDa. GFP disruption activity offGas9, Cas9 nickase, or wild-type US 9,388,430 B2 9 10 Cas9 with either no gRNA, or gRNA pairs of variable spacer fusion variants across six spacer lengths ranging from 4 to 42 length targeting the GFP gene in orientation A. (B) is an bp. The sequences shown are identified as follows: “EmCFP image of a gel showing Indel modification efficiency from (bp 297-388)” corresponds to SEQID NO:212: “G8” corre PAGE analysis of a Surveyor cleavage assay of renatured sponds to SEQ ID NO:213: “G9” corresponds to SEQ ID target-site DNA amplified from cells treated with f(cas9, Cas9 5 NO:214; “G10” corresponds to SEQID NO:215; “G11” cor nickase, or wild-type Cas9 and two gRNAs spaced 14 bp responds to SEQID NO:216; “G12” corresponds to SEQID apart targeting the GFP site (gRNAs G3 and G7: FIG. 6C), NO:217; “G13” corresponds to SEQID NO:218; and “G14” each gRNA individually, or no gRNAs. The Indel modifica corresponds to SEQID NO:219. tion percentage is shown below each lane for samples with FIG. 10 shows a GFP disruption assay for measuring modification above the detection limit (-2%). (C–G) show 10 graphs depicting Indel modification efficiency for (C) two genomic DNA-modification activity. (A) depicts Schemati pairs of gRNAs spaced 14 or 25 bp apart targeting the GFP cally a HEK293-derived cell line constitutively expressing a site, (D) one pair of gRNAS spaced 19bp apart targeting the genomically integrated EmGFP gene used to test the activity CLTA site, (E) one pair of gRNAs spaced 23 bp apart target of candidate FokI-dCas'9 fusion constructs. Co-transfection ing the EMX site, (F) one pair of gRNAs spaced 16 bp apart 15 of these cells with appropriate nuclease and gRNA expression targeting the HBB site, and (G) two pairs of gRNAs spaced 14 plasmids leads to dsDNA cleavage within the EmGFP coding or 16 bp apart targeting the VEGF site. Error bars reflect sequence, stimulating error-prone NHEJ and generating standard error of the mean from three biological replicates indels that can disrupt the expression of GFP, leading to loss performed on different days. of cellular fluorescence. The fraction of cells displaying a loss FIG. 8 shows the DNA modification specificity of f(cas9, of GFP fluorescence is then quantitated by flow cytometry. Cas9 nickase, and wild-type Cas9. (A) shows a graph depict (B) shows typical epifluorescence microscopy images at ing GFP gene disruption by wild-type Cas9, Cas9 nickase, 200x magnification of EmGFP-HEK293 cells before and and f(cas9 using gRNA pairs in orientation A. High activity of after co-transfection with wild-type Cas9 and gRNA expres fCas9 requires spacer lengths of ~15 and 25 bp, roughly one sion plasmids. DNA helical turn apart. (B) shows a graph depicting GFP 25 FIG. 11 shows a graph depicting the activities of FokI gene disruption using gRNA pairs in orientation B. Cas9 dCas'9 fusion candidates combined with gRNA pairs of dif nickase, but not fas9, accepts either orientation of gRNA ferent orientations and varying spacer lengths. The fusion pairs. (C) shows a graph depicting GFP gene disruption by architectures described in FIG. 6B were tested for function fCas9, but not Cas9 nickase or wild-type Cas9, which ality by flow cytometry using the GFP loss-of-function depends on the presence of two gRNAs. Four single gRNAs 30 reporter across all (A) orientation A gRNA spacers and (B) were tested along with three gRNA pairs of varying spacer orientation B gRNA spacers (FIG. 6C and FIG.9). All FokI length. In the presence of gRNA pairs in orientation A with dCas9 fusion data shown are the results of single trials. Wild spacer lengths of 14 or 25bp (gRNAs 1+5, and gRNAs 3+7. type Cas9 and Cas9 nickase data are the average of two respectively), fcas9 is active, but not when a gRNA pair with replicates, while the no treatment negative control data is the a 10-bp spacer (gRNAs 1+4) is used. In (A-C), “no treatment' 35 average of 6 replicates, with error bars representing one stan refers to cells receiving no plasmid DNA. (D-F) show graphs dard deviation. The grey dotted line across the Y-axis corre depicting the indel mutation frequency from high-throughput sponds to the average of the no treatment controls per DNA sequencing of amplified genomic on-target sites and formed on the same day. The sequence shown as “(GGS)x3' off-target sites from human cells treated with f(cas9, Cas9 corresponds to SEQID NO:14. nickase, or wild-type Cas9 and (D) two gRNAs spaced 19 bp 40 FIG. 12 shows the optimization of protein linkers in NLS apart targeting the CLTA site (gRNAs C1 and C2), (E) two FokI-dCas9. (A) shows a table of all linker variants tested. gRNAs spaced 23 bp apart targeting the EMX site (gRNAS E1 Wild-type Cas9 and Cas9 nickase were included for compari and E2), or (F,G) two gRNAS spaced 14bp apart targeting the son. The initial active construct NLS-FokI-dCas'9 with a VEGF site (gRNAs V1 and V2). (G) shows a graph depicting (GGS) (SEQID NO:14) linker between FokI and dCas'9 was two in-depth trials to measure genome modification at VEGF 45 tested across a range of alternate linkers. The final choice of off-target site 1. Trial 1 used 150 ng of genomic input DNA linkers for fas9 is highlighted. (B) shows a graph depicting and >8x10 sequence reads for each sample; trial 2 used 600 the activity of FokI-dCas9 fusions with linker variants. Each ng of genomic input DNA and >23x10 sequence reads for variant was tested across a range of spacer lengths from 5 to each sample. In (D-G), all significant (Pvalues0.005 Fisher's 43 bp using gRNA pair orientation A. A control lacking Exact Test) indel frequencies are shown. P values are listed in 50 gRNA (“no gRNA) was included for each separate fusion Table 3. For (D-F) each on- and off-target sample was construct. NLS-FokI-dCas9 variant L8 showed the best activ sequenced once with >10,000 sequences analyzed per on ity, approaching the activity of Cas9 nickase. Variants L4 target sample and an average of 76.260 sequences analyzed through L9 show peak activity with 14- and 25-bp spacer per off-target sample (Table 3). The sequences shown in (C) lengths, suggesting two optimal spacer lengths roughly one are identified as follows, from top to bottom: the sequence 55 helical turn of dsDNA apart. The sequences shown in (A) are found at the top of FIG. 8C corresponds to SEQID NO:204: identified as follows: GGSGGSGGS corresponds to SEQ ID “G1” corresponds to SEQID NO:205; “G3” corresponds to NO:14; GGSGGSGGSGGSGGSGGS corresponds to SEQ SEQ ID NO:207; “G5” corresponds to SEQ ID NO:209; ID NO:15: MKIIEQLPSA corresponds to SEQ ID NO:22: “G7 corresponds to SEQID NO:211: “G1+4' corresponds VRHKLKRVGS corresponds to SEQID NO:23; VPFLLEP to SEQ ID NO:205 and SEQ ID NO:208; “G1+5” corre 60 DNINGKTC corresponds to SEQ ID NO:19; GHGTG sponds to SEQ ID NO:205 and SEQ ID NO:209; “G3+7 STGSGSS corresponds to SEQID NO:24; MSRPDPA cor corresponds to SEQID NO:207 and SEQID NO:211. responds to SEQID NO:25; GSAGSAAGSGEF corresponds FIG.9 shows the target DNA sequences in a genomic GFP to SEQID NO:20: SGSETPGTSESA corresponds to SEQID gene. Seven gRNA target sites were chosen to test FokI NO:17; SGSETPGTSESATPES corresponds to SEQ ID dCas'9 candidate activity in an orientation in which the PAM 65 NO:16: SGSETPGTSESATPEGGSGGS corresponds to is adjacent to the cleaved spacer sequence (orientation B). SEQID NO:18; GGSM corresponds to SEQID NO:301; and Together, these seven gRNAs enabled testing of FokI-dCas9 SIVAQLSRPDPA corresponds to SEQID NO:21. US 9,388,430 B2 11 12 FIG. 13 shows target DNA sequences in endogenous on-target sites amplified from 150 ng genomic DNA isolated human EMX, VEGF, CLTA, and HBB genes. The gRNA from human cells treated with a plasmid expressing either target sites tested within endogenous human EMX, VEGF, wild-type Cas9, Cas9 nickase, or f(cas9 and either a single CLTA, and HBB genes are shown. Thirteen gRNA target sites plasmid expressing a single gRNAs (G1, G3, G5 or G7), or were chosen to test the activity of the optimized focas9 fusion two plasmids each expressing a different gRNA (G1+G5, or in an orientation in which the PAM is distal from the cleaved G3+G7). As a negative control, transfection and sequencing spacer sequence (orientation A). Together, these 13 gRNAS were performed in triplicate as above without any gRNA enabled testing of fas9 fusion variants across eight spacer expression plasmids. Error bars represents.d. Sequences with lengths ranging from 5 to 47 bp. The sequences shown are more than one insertion or deletion at the GFP target site (the identified as follows: “CLTA-1” corresponds to SEQ ID 10 start of the G1 binding site to the end of the G7 binding site) NO:220; “C1 corresponds to SEQID NO:221; “C2 corre were considered indels. Indel percentages were calculated by sponds to SEQ ID NO:222; “C3” corresponds to SEQ ID dividing the number of indels by total number of sequences. NO:224; “C4” corresponds to SEQID NO:225; “HBC cor While wild-type Cas9 produced indels across all gRNA treat responds to SEQ ID NO:226: “H1' corresponds to SEQ ID ments, fcas9 and Cas9 nickase produced indels efficiently NO:227: “H2 corresponds to SEQID NO:228: “H3” corre 15 (>1%) only when paired gRNAs were present. Indels induced sponds to SEQ ID NO:229; “H4” corresponds to SEQ ID by f(cas9 and single gRNAs were not detected above the NO:230; “H5” corresponds to SEQID NO:231; “H6” corre no-gRNA control, while Cas9 nickase and single gRNAs sponds to SEQ ID NO:232; “H7” corresponds to SEQ ID modified the target GFP sequence at an average rate of 0.12%. NO:233; “EMX corresponds to SEQID NO:234; “E1 cor FIG. 17 shows a graph depicting how f(cas9 indel fre responds to SEQ ID NO:235; “E2 corresponds to SEQ ID quency of genomic targets reflects gRNA pair spacer length NO:236; “E3” corresponds to SEQ ID NO:237; “VEGF preference. The graph shows the relationship between spacer corresponds to SEQ ID NO:238; “V1” corresponds to SEQ length (number of by between two gRNAs) and the indel ID NO:239; “V2 corresponds to SEQ ID NO:240; “V3” modification efficiency of f(cas9 normalized to the indel corresponds to SEQ ID NO:241; and “V4 corresponds to modification efficiency of the same gRNAs co-expressed SEQID NO:242. 25 with wild-type Cas9 nuclease. Colored triangles below the FIG. 14 shows graphs depicting spacer length preference X-axis denote spacer lengths that were tested but which of genomic DNA modification by f(cas'9, Cas9 nickase, and yielded no detectable indels for the indicated target gene. wild-type Cas9. Indel modification efficiency for (A) pairs of These results suggest that f(cas9 requires ~15bp or ~25 bp gRNAs targeting the GFP site. (B) pairs of gRNAs targeting between half-sites to efficiently cleave DNA. the CLTA site, (C) pairs of gRNAs targeting the EMX site (D) 30 FIG. 18 shows modifications induced by Cas9 nuclease, pairs of gRNAs targeting the HBB site, and (E) pairs of Cas9 nickases, or f(as9 nucleases at endogenous loci. (A) gRNAs targeting the VEGF site. Error bars reflect standard shows examples of modified sequences at the VEGF on-target error of the mean from three biological replicates performed site with wild-type Cas9 nuclease, Cas9 nickases, or f(cas9 on different days. nucleases and a single plasmid expressing two gRNAS target FIG. 15 shows graphs depicting the efficiency of genomic 35 ing the VEGF on-target site (gRNA V1 and gRNAV2). For DNA modification by f(cas9, Cas9 nickase, and wild-type each example shown, the unmodified genomic site is the first Cas9 with varying amounts of Cas9 and gRNA expression sequence, followed by the top eight sequences containing plasmids. Indel modification efficiency from a Surveyor deletions. The numbers before each sequence indicate assay of renatured target-site DNA amplified from a popula sequencing counts. The gRNA target sites are bold and capi tion of cells treated with f(cas9, Cas9 nickase, or wild-type 40 talized. (B) is an identical analysis as in (A) for VEGF off Cas9 and two target site gRNAs. Either 700 ng of Cas9 target site 1 VEG Offl. (C) shows the potential binding mode expression plasmid with 250 ng of gRNA expression plasmid of two gRNAs to VEGF off-target site 1. The top strand is (950 ng total),350 ng of Cas9 expression plasmid with 125 ng bound in a canonical mode, while the bottom strand binds the of gRNA expression plasmid (475 ng in total), 175ng of Cas9 second gRNA, gRNAV2, through gRNA:DNA base pairing expression plasmid with 62.5 ng of gRNA expression plasmid 45 that includes G:U base pairs. The sequences shown in (A) are (238 ng in total) or 88 ng of Cas9 expression plasmid with 31 identified, top to bottom, as follows: SEQID NO:243: SEQ ng of gRNA expression plasmid (119 ng in total) were trans ID NO:244: SEQ ID NO:245; SEQ ID NO:246: SEQ ID fected with an appropriate amount of inert, carrier plasmid to NO:247; SEQ ID NO:248: SEQ ID NO:249; SEQ ID ensure uniform transfection of 950 ng of plasmid across all NO:250, SEQ ID NO:251; SEQ ID NO:252: SEQ ID treatments. Indel modification efficiency for (A) gRNAs 50 NO:253; SEQ ID NO:254; SEQ ID NO:255; SEQ ID spaced 19-bp apart targeting the CLTA site. (B) gRNAs NO:256; SEQ ID NO:257; SEQ ID NO:258; SEQ ID spaced 23 bp apart targeting the EMX site, and (C) gRNAs NO:259; SEQ ID NO:260; SEQ ID NO:261; SEQ ID spaced 14 bp apart targeting the VEGF site. Error bars repre NO:262; SEQ ID NO:263; SEQ ID NO:264; SEQ ID sent the standard error of the mean from three biological NO:265; SEQ ID NO:266; SEQ ID NO:267: SEQ ID replicates performed on separate days. 55 NO:268; and SEQID NO:269. The sequences shown in (B) FIG. 16 shows the ability of f(cas9, Cas9 nickase, and are identified, top to bottom, as follows: SEQ ID NO:270; wild-type Cas9 to modify genomic DNA in the presence of a SEQID NO:271; SEQID NO:272: SEQIDNO:273; SEQID single gRNA. (A) shows images of gels depicting Surveyor NO:274; SEQ ID NO:275; SEQ ID NO:276, SEQ ID assay of a genomic GFP target from DNA ofcells treated with NO:277; SEQ ID NO:278; SEQ ID NO:279; SEQ ID the indicated combination of Cas9 protein and gRNA(s). 60 NO:280; SEQ ID NO:281; SEQ ID NO:282: SEQ ID Single gRNAS do not induce genome modificationata detect NO:283: SEQ ID NO:284; SEQ ID NO:285; SEQ ID able level (<2% modification) for both f(cas9 and Cas9 nick NO:286; SEQ ID NO:287; SEQ ID NO:288: SEQ ID ase. Wild-type Cas9 effectively modifies the GFP target for all NO:289; SEQ ID NO:290; SEQ ID NO:291; SEQ ID tested single and paired gRNAs. For both f(cas9 and Cas9 NO:292; SEQ ID NO:293; SEQ ID NO:294; SEQ ID nickase, appropriately paired gRNAS induce genome modi 65 NO:295; and SEQID NO:296. The sequences shown in (C) fication at levels comparable to those of wild-type Cas9. (B) are identified, top to bottom, as follows: SEQ ID NO:297: shows a graph depicting the results from sequencing GFP SEQID NO:298; SEQID NO:299; and SEQ ID NO:300. US 9,388,430 B2 13 14 FIG. 19 shows the target DNA sequences in a genomic M., Ajdic D. J., Savic D. J., Savic G. Lyon K., Primeaux C. CCR5 gene. (A) Eight gRNA target sites were identified for Sezate S., Suvorov A.N., Kenton S. Lai H.S., Lin S. P., Qian testing Cas9 variant (e.g., FokI-dCas9) activity in an orienta Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J. tion in which the PAM is adjacent to the cleaved spacer Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. sequence (orientation A). (B) Six gRNA target sites were Natl. Acad. Sci. U.S.A. 98:4658-4663 (2001); “CRISPR identified for testing Cas9 variant (e.g., FokI-dCas9) activity RNA maturation by trans-encoded small RNA and host factor in an orientation in which the PAM is adjacent to the cleaved RNase III.” Deltcheva E., Chylinski K., Sharma C. M., spacer sequence (orientation B). Together, these fourteen Gonzales K. Chao Y., Pirzada Z.A., Eckert M. R., Vogel J. gRNAS enable testing of Cas9 fusion variants across spacer Charpentier E., Nature 471:602-607 (2011); and “A program lengths ranging from 0 to 74 bp. The sequences shown in (A) 10 mable dual-RNA-guided DNA endonuclease in adaptive bac are identified as follows: “CRA’ corresponds to SEQ ID terial immunity.” Jinek M. Chylinski K., Fonfara I., Hauer NO:302: “CRA-1” corresponds to SEQID NO:303: “CRA M., Doudna J. A., Charpentier E. Science 337:816-821 2” corresponds to SEQID NO:304; “CRA-3” corresponds to (2012), the entire contents of each of which are incorporated SEQID NO:305; “CRA-4” corresponds to SEQID NO:306; herein by reference). Cas9 orthologs have been described in “CRA-5” corresponds to SEQ ID NO:307: “CRA-6” corre 15 various species, including, but not limited to, S. pyogenes and sponds to SEQID NO:308: “CRA-7” corresponds to SEQID S. thermophilus. Additional suitable Cas9 nucleases and NO:309; and “CRA-8” corresponds to SEQID NO:310. The sequences will be apparent to those of skill in the art based on sequences shown in (B) are identified as follows: “CRB' this disclosure, and Such Cas9 nucleases and sequences corresponds to SEQID NO:311:“CB-1” corresponds to SEQ include Cas9 sequences from the organisms and loci dis ID NO:312: “CB-2” corresponds to SEQID NO:313; “CB-3” closed in Chylinski, Rhun, and Charpentier, “The tracrRNA corresponds to SEQID NO:314:“CB-4” corresponds to SEQ and Cas9 families oftype IICRISPR-Cas immunity systems’ ID NO:315; “CB-5” corresponds to SEQ ID NO:316; and (2013) RNA Biology 10:5, 726-737; the entire contents of “CB-6” corresponds to SEQID NO:317. which are incorporated herein by reference. In some embodi FIG. 20 depicts a vector map detailing an exemplary plas ments, a Cas9 nuclease has an inactive (e.g., an inactivated) mid containing a Fok1-dCas9 (f(cas9) construct. 25 DNA cleavage domain. A nuclease-inactivated Cas9 protein may interchangeably be referred to as a “dCas9 protein (for DEFINITIONS nuclease “dead Cas9). In some embodiments, dCas9 corre sponds to, or comprises in part or in whole, the amino acid set As used herein and in the claims, the singular forms 'a' forth as SEQID NO:5, below. In some embodiments, variants “an and “the include the singular and the plural reference 30 of dCas9 (e.g., variants of SEQID NO:5) are provided. For unless the context clearly indicates otherwise. Thus, for example, in some embodiments, variants having mutations example, a reference to “an agent” includes a single agent and other than D10A and H840A are provided, which e.g., result a plurality of Such agents. in nuclease inactivated Cas9 (dCas9). Such mutations, by The term “Cas9 or “Cas9 nuclease refers to an RNA way of example, include other amino acid Substitutions at guided nuclease comprising a Cas9 protein, or a fragment 35 D10 and H840, or other substitutions within the nuclease thereof (e.g., a protein comprising an active or inactive DNA domains of Cas9 (e.g., substitutions in the HNH nuclease cleavage domain of Cas9, and/or the gRNA binding domain subdomain and/or the RuvC1 subdomain). In some embodi of Cas9). A Cas9 nuclease is also referred to sometimes as a ments, variants or homologues of dCas9 (e.g., variants of casn1 nuclease or a CRISPR (clustered regularly interspaced SEQ ID NO:5) are provided which are at least about 70% short palindromic repeat)-associated nuclease. CRISPR is an 40 identical, at least about 80% identical, at least about 90% adaptive immune system that provides protection against identical, at least about 95% identical, at least about 98% mobile genetic elements (viruses, transposable elements and identical, at least about 99% identical, at least about 99.5% conjugative plasmids). CRISPR clusters contain spacers, identical, or at least about 99.9% to SEQ ID NO:5. In some sequences complementary to antecedent mobile elements, embodiments, variants of dCas9 (e.g., variants of SEQ ID and target invading nucleic acids. CRISPR clusters are tran 45 NO:5) are provided having amino acid sequences which are scribed and processed into CRISPRRNA (crRNA). In type II shorter, or longer than SEQID NO:5, by about 5 amino acids, CRISPR systems correct processing of pre-crRNA requires a by about 10 amino acids, by about 15 amino acids, by about trans-encoded small RNA (tracrRNA), endogenous ribonu 20 amino acids, by about 25 amino acids, by about 30 amino clease 3 (rinc) and a Cas9 protein. The tracrRNA serves as a acids, by about 40 amino acids, by about 50 amino acids, by guide for ribonuclease 3-aided processing of pre-crRNA. 50 about 75 amino acids, by about 100 amino acids or more. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically dCas9 (D10A and H840A): cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3'-5' exonucleolyti (SEO ID NO; 5) cally. In nature, DNA-binding and cleavage typically requires 55 MDKKYSIGLAIGTNSWGWAWITDEYKWPSKKFKWLGNTDRHSIKKNLIGA protein and both RNA. However, single guide RNAs (“sgRNA, or simply “gNRA) can be engineered so as to LLFDSGETAEATRLKRTARRRYTRRKNRICYLOEIFSNEMAKVDDSFFHR incorporate aspects of both the crRNA and tracrRNA into a LEESFLWEEDKKHERHPIFGNIWDEWAYHEKYPTIYHLRKKLVDSTDKAD single RNA species. See e.g.,Jinek M. Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816 60 LRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIOLVOTYNOLFEENP 821 (2012), the entire contents of which is hereby incorpo INASGWDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTP rated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent NFKSNFDLAEDAKLOLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI motif) to help distinguish self versus non-self. Cas9 nuclease LLSDILRVNTEITKAPLSASMIKRYDEHHODLTLLKALWRQOLPEKYKEI sequences and structures are well known to those of skill in 65 the art (see, e.g., "Complete genome sequence of an M1 Strain FFDOSKINGYAGYIDGGASOEEFYKFIKPILEKMDGTEELLWKLNREDLLR of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. US 9,388,430 B2 15 16 - Continued about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to the corre KORTFDNGSIPHOIHLGELHAILRROEDFYPFLKDNREKIEKILTFRIPY sponding fragment of wild type Cas9. In some embodiments, YVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAOSFIERMTNFDK wild type Cas9 corresponds to Cas9 from Streptococcus pyo genes (NCBI Reference Sequence: NC 017053.1, SEQ ID NLPNEKVLPKHSLLYEYFTWYNELTKWKYVTEGMRKPAFLSGEOKKAIVD NO:1 (nucleotide); SEQID NO:2 (amino acid)). LLFKTNRKVTVKOLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKI

IKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKWMKO (SEQ ID NO: 1) 10 ATGGATAAGAAATACT CAATAGGCTTAGATATCGGCACAAATAGCGTCGG LKRRRYTGWGRLSRKLINGIRDKOSGKTILDFLKSDGFANRNFMOLIHDD ATGGGCGGTGATCACTGATGATTATAAGGTTCCGTCTAAAAAGTTCAAGG TTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAATCTTATAGGGGCT SLTFKEDIOKAQVSGQGDSLHEHIANLAGSPAIKKGILOTWKVVDELVKW CTTTTATTTGGCAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGAC AGCTCGTAGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGG MGRHKPENIVIEMARENOTTOKGOKNSRERMKRIEEGIKELGSOILKEHP AGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTCATCGA CTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCC 15 VENTOLONEKLYLYYLONGRDMYWDOELDINRLSDYDVDAIVPOSFLKDD TATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAA CTATCTATCATCTGCGAAAAAAATTGGCAGATTCTACTGATAAAGCGGAT SIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWROLLNAKLITORKFDNL TTGCGCTTAATCTATTTGGCCTTAGCGCATATGATTAAGTTTCGTGGTCA TTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAAC TKAERGGLSELDKAGFIKROLVETROITKHVAOILDSRMNTKYDENDKLI TATTTATCCAGTTGGTACAAATCTACAATCAATTATTTGAAGAAAACCCT ATTAACGCAAGTAGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAG REVKVITLKSKLVSDFRKDFOFYKVREINNYHHAHDAYLNAVVGTALIKK TAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGA GAAATGGCTTGTTTGGGAATCTCATTGCTTTGTCATTGGGATTGACCCCT YPKLESEFWYGDYKVYDWRKMIAKSEOEIGKATAKYFFYSNIMNFFKTEI AATTTTAAATCAAATTTTGATTTGGCAGAAGATGCTAAATTACAGCTTTC AAAAGATACTTACGATGATGATTTAGATAATTTATTGGCGCAAATTGGAG TLANGEIRKRPLIETNGETGEIWWDKGRDFATVRKVLSMPOVNIVKKTEV ATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATT TTACTTTCAGATATCCTAAGAGTAAATAGTGAAATAACTAAGGCTCCCCT OTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVE 25 ATCAGCTTCAATGATTAAGCGCTACGATGAACATCATCAAGACTTGACTC TTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATC KGKSKKLKSWKELLGITIMERSSFEKNPIDFILEAKGYKEWKKDLIIKLPK TTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGC TAGCCAAGAAGAATTTTATAAATTTATCAAACCAATTTTAGAAAAAATGG YSLFELENGRKRMLASAGELOKGNELALPSKYWNFLYLASHYEKLKGSPE ATGGTACTGAGGAATTATTGGTGAAACTAAATCGTGAAGATTTGCTGCGC AAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGG DNEOKOLFVEOHKHYLDEIIEOISEFSKRVILADANLDKVLSAYNKHRDK 30 TGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAA AAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTAT PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEWLDATLIHO TATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCG GAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATA SITGLYETRIDLSQLGGD AAGGTGCTTCAGCTCAATCATTTATTGAACGCATGACAAACTTTGATAAA AATCTTCCAAATGAAAAAGTACTACCAAAACATAGTTTGCTTTATGAGTA Methods for generating a Cas9 protein (or a fragment 35 TTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAGGGAA TGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGAT thereof) having an inactive DNA cleavage domain are known TTACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGA (See, e.g., the Examples; and Jinek et al., Science. 337:816 TTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTG 821 (2012); Qi et al., “Repurposing CRISPR as an RNA AAGATAGATTTAATGCTTCATTAGGCGCCTACCATGATTTGCTAAAAATT ATTAAAGATAAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGA Guided Platform for Sequence-Specific Control of Gene GGATATTGTTTTAACATTGACCTTATTTGAAGATAGGGGGATGATTGAGG Expression” (2013) Cell. 28; 152(5):1173-83, the entire con 40 AAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAACAG tents of each of which are incorporated herein by reference). CTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGAT For example, the DNA cleavage domain of Cas9 is known to TAATGGTATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGA AATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTGATCCATGATGAT include two subdomains, the HNH nuclease subdomain and AGTTTGACATTTAAAGAAGATATTCAAAAAGCACAGGTGTCTGGACAAGG the RuvC1 subdomain. The HNH subdomain cleaves the CCATAGTTTACATGAACAGATTGCTAACTTAGCTGGCAGTCCTGCTATTA strand complementary to the gRNA, whereas the RuvC1 sub 45 AAAAAGGTATTTTACAGACTGTAAAAATTGTTGATGAACTGGTCAAAGTA domain cleaves the non-complementary Strand. Mutations ATGGGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCA. GACAACT CAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCG within these Subdomains can silence the nuclease activity of AAGAAGGTATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTT Cas9. For example, the mutations D10A and H840A com GAAAATACT CAATTGCAAAATGAAAAGCTCTATCTCTATTATCTACAAAA pletely inactivate the nuclease activity of S. pyogenes Cas9 TGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTG (See e.g., the Examples; and Jinek et al., Science. 337:816 50 ATTATGATGTCGATCACATTGTTCCACAAAGTTTCATTAAAGACGATTCA ATAGACAATAAGGTACTAACGCGTTCTGATAAAAATCGTGGTAAATCGGA 821 (2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In TAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGAC Some embodiments, proteins comprising fragments of Cas9 AACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACG are provided. For example, in some embodiments, a protein AAAGCTGAACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAA comprises one of two Cas9 domains: (1) the gRNA binding ACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCACAAATTT TGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTATTCGA 55 domain of Cas9; or (2) the DNA cleavage domain of Cas9. In GAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAA Some embodiments, proteins comprising Cas9 or fragments AGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCC thereof are referred to as “Cas9 variants.” A Cas9 variant ATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATAT shares homology to Cas9, or a fragment thereof. For example CCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGT TCGTAAAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAA a Cas9 variant is at least about 70% identical, at least about AATATTTCTTTTACTCTAATATCATGAACTTCTTCAAAACAGAAATTACA 80% identical, at least about 90% identical, at least about 95% 60 CTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTAATGGGGA identical, at least about 98% identical, at least about 99% AACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCA identical, at least about 99.5% identical, or at least about AAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAG ACAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAA 99.9% to wild type Cas9. In some embodiments, the Cas9 GCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTG variant comprises a fragment of Cas9 (e.g., a gRNA binding ATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAA domain or a DNA-cleavage domain). Such that the fragment is 65 GGGAAATCGAAGAAGTTAAAATCCGTTAAAGAGTTACTAGGGATCACAAT at least about 70% identical, at least about 80% identical, at TATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTTTTAGAAGCTA least about 90% identical, at least about 95% identical, at least

US 9,388,430 B2 19 20 - Continued protein dimer, a complex of a protein (or protein dimer) and a DNEOKOLFVEOHKHYLDEIIEOISEFSKRVILADANLDKVLSAYNKHRDK polynucleotide, or a polynucleotide, may vary depending on PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEWLDATLIHO various factors as, for example, on the desired biological SITGLYETRIDLSQLGGD response, the specificallele, genome, target site, cell, or tissue In some embodiments, Cas9 refers to Cas9 from: Coryne- 5 being targeted, and the agent being used. bacterium ulcerans (NCBI Refs: NC 015683.1, The term "homologous as used herein is an art-under NC 017317.1); Corynebacterium diphtheria (NCBI Refs: stood term that refers to nucleic acids or polypeptides that are NC 016782.1, NC 016786.1); Spiroplasma syrphidicola highly related at the level of nucleotide and/or amino acid (NCBI Ref: NC 021284.1): Prevotella intermedia (NCBI sequence. Nucleic acids or polypeptides that are homologous Ref: NC 017861.1); Spiroplasma taiwanense (NCBI Ref: 10 to each other are termed “homologues.’ Homology between NC 021846.1); Streptococcus iniae (NCBI Ref: two sequences can be determined by sequence alignment NC 021314.1): Belliella baltica (NCBI Ref: NC 018010.1): methods known to those of skill in the art. In accordance with Psychroflexus torquis (NCBI Ref: NC 018721.1); Strepto the invention, two sequences are considered to be homolo coccus thermophilus (NCBI Ref: YP 820832.1), Listeria 15 gous if they are at least about 50-60% identical, e.g., share innocua (NCBI Ref: NP 472073.1), Campylobacter jejuni identical residues (e.g., amino acid residues) in at least about (NCBI Ref: YP 002344900.1) or Neisseria meningitidis 50-60% of all residues comprised in one or the other (NCBI Ref: YP 002342100.1). sequence, at least about 70% identical, at least about 80% The terms "conjugating.” “conjugated, and "conjugation' identical, at least about 90% identical, at least about 95% refer to an association of two entities, for example, of two 20 identical, at least about 98% identical, at least about 99% molecules Such as two proteins, two domains (e.g., a binding identical, at least about 99.5% identical, or at least about domain and a cleavage domain), or a protein and an agent, 99.9% identical, for at least one stretch of at least 20, at least e.g., a protein binding domain and a small molecule. In some 30, at least 40, at least 50, at least 60, at least 70, at least 80, aspects, the association is between a protein (e.g., RNA at least 90, at least 100, at least 120, at least 150, or at least 200 programmable nuclease) and a nucleic acid (e.g., a guide 25 amino acids. RNA). The association can be, for example, via a direct or The term “linker, as used herein, refers to a chemical indirect (e.g., via a linker) covalent linkage. In some embodi group or a molecule linking two adjacent molecules or moi ments, the association is covalent. In some embodiments, two eties, e.g., a binding domain (e.g., dCas9) and a cleavage molecules are conjugated via a linker connecting both mol domain of a nuclease (e.g., FokI). In some embodiments, a ecules. For example, in Some embodiments where two pro- 30 linker joins a nuclear localization signal (NLS) domain to teins are conjugated to each other, e.g., a binding domain and another protein (e.g., a Cas9 protein or a nuclease or recom a cleavage domain of an engineered nuclease, to form a pro binase or a fusion thereof). In some embodiments, a linker teinfusion, the two proteins may be conjugated via a polypep joins a gRNA binding domain of an RNA-programmable tide linker, e.g., an amino acid sequence connecting the C-ter nuclease and the catalytic domain of a recombinase. In some minus of one protein to the N-terminus of the other protein. 35 embodiments, a linker joins a dCas9 and a recombinase. The term “consensus sequence, as used herein in the con Typically, the linker is positioned between, or flanked by, two text of nucleic acid sequences, refers to a calculated sequence groups, molecules, or other moieties and connected to each representing the most frequent nucleotide residues found at one via a covalent bond, thus connecting the two. In some each position in a plurality of similar sequences. Typically, a embodiments, the linker is an amino acid or a plurality of consensus sequence is determined by sequence alignment in 40 amino acids (e.g., a peptide or protein). In some embodi which similar sequences are compared to each other and ments, the linker is an organic molecule, group, polymer, or similar sequence motifs are calculated. In the context of chemical moiety. In some embodiments, the linker is a pep nuclease target site sequences, a consensus sequence of a tide linker. In some embodiments, the peptide linker is any nuclease target site may, in Some embodiments, be the stretch of amino acids having at least 1, at least 2, at least 3, at sequence most frequently bound, or bound with the highest 45 least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at affinity, by a given nuclease. In the context of recombinase least 10, at least 15, at least 20, at least 25, at least 30, at least target site sequences, a consensus sequence of a recombinase 40, at least 50, or more amino acids. In some embodiments, target site may, in Some embodiments, be the sequence most the peptide linker comprises repeats of the tri-peptide Gly frequently bound, or bound with the highest affinity, by a Gly-Ser, e.g., comprising the sequence (GGS), wherein n given recombinase. 50 represents at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more repeats. The term “engineered as used herein refers to a protein In some embodiments, the linker comprises the sequence molecule, a nucleic acid, complex, Substance, or entity that (GGS) (SEQID NO:15). In some embodiments, the peptide has been designed, produced, prepared, synthesized, and/or linker is the 16 residue “XTEN’ linker, or a variant thereof manufactured by a human. Accordingly, an engineered prod (See, e.g., the Examples; and Schellenberger et al. A recom uct is a product that does not occur in nature. 55 binant polypeptide extends the in vivo half-life of peptides The term “effective amount, as used herein, refers to an and proteins in a tunable manner. Nat. Biotechnol. 27, 1186 amount of a biologically active agent that is Sufficient to elicit 1190 (2009)). In some embodiments, the XTEN linker com a desired biological response. For example, in Some embodi prises the sequence SGSETPGTSESATPES (SEQ ID ments, an effective amount of a nuclease may refer to the NO:16), SGSETPGTSESA (SEQ ID NO:17), or SGSET amount of the nuclease that is sufficient to induce cleavage of 60 PGTSESATPEGGSGGS (SEQID NO:18). In some embodi a target site specifically bound and cleaved by the nuclease. In ments, the peptide linker is any linker as provided in FIG. Some embodiments, an effective amount of a recombinase 12A, for example, one or more selected from VPFLLEPDN may refer to the amount of the recombinase that is sufficient INGKTC (SEQ ID NO:19), GSAGSAAGSGEF (SEQ ID to induce recombinationata target site specifically bound and NO:20), SIVAQLSRPDPA (SEQID NO:21), MKIIEQLPSA recombined by the recombinase. As will be appreciated by the 65 (SEQ ID NO:22), VRHKLKRVGS (SEQ ID NO:23), skilled artisan, the effective amount of an agent, e.g., a GHGTGSTGSGSS (SEQ ID NO:24), MSRPDPA (SEQ ID nuclease, a recombinase, a hybrid protein, a fusion protein, a NO:25); or GGSM (SEQID NO:301). US 9,388,430 B2 21 22 The term “mutation, as used herein, refers to a substitution cleave a target nucleic acid molecule. Binding domains and of a residue within a sequence, e.g., a nucleic acid or amino cleavage domains of naturally occurring nucleases, as well as acid sequence, with another residue, or a deletion or insertion modular binding domains and cleavage domains that can be of one or more residues within a sequence. Mutations are fused to create nucleases binding specific target sites, are well typically described herein by identifying the original residue 5 known to those of skill in the art. For example, the binding followed by the position of the residue within the sequence domain of RNA-programmable nucleases (e.g., Cas9), or a and by the identity of the newly substituted residue. Various Cas9 protein having an inactive DNA cleavage domain, can methods for making the amino acid substitutions (mutations) be used as a binding domain (e.g., that binds a gRNA to direct provided herein are well known in the art, and are provided binding to a target site) to specifically bind a desired target by, for example, Green and Sambrook, Molecular Cloning. A 10 site, and fused or conjugated to a cleavage domain, for Laboratory Manual (4" ed., Cold Spring Harbor Laboratory example, the cleavage domain of FokI, to create an engi Press, Cold Spring Harbor, N.Y. (2012)). neered nuclease cleaving the target site. In some embodi The term “nuclease as used herein, refers to an agent, for ments, Cas9 fusion proteins provided herein comprise the example, a protein, capable of cleaving a phosphodiester cleavage domain of FokI, and are therefore referred to as bond connecting two nucleotide residues in a nucleic acid 15 “fGas9' proteins. In some embodiments, the cleavage domain molecule. In some embodiments, “nuclease' refers to a pro of FokI, e.g., in a fas9 protein corresponds to, or comprises tein having an inactive DNA cleavage domain, such that the in part or whole, the amino acid sequence (or variants thereof) nuclease is incapable of cleaving a phosphodiester bond. In set forth as SEQ ID NO:6, below. In some embodiments, Some embodiments, a nuclease is a protein, e.g., an enzyme variants or homologues of the FokI cleavage domain include that can bind a nucleic acid molecule and cleave a phosphodi- 20 any variant or homologue capable of dimerizing (e.g., as part ester bond connecting nucleotide residues within the nucleic offGas9 fusion protein) with another FokI cleavage domain acid molecule. A nuclease may be an endonuclease, cleaving at a target site in a target nucleic acid, thereby resulting in a phosphodiester bonds within a polynucleotide chain, or an cleavage of the target nucleic acid. In some embodiments, exonuclease, cleaving a phosphodiester bond at the end of the variants of the FokI cleavage domain (e.g., variants of SEQID polynucleotide chain. In some embodiments, a nuclease is a 25 NO:6) are provided which are at least about 70% identical, at site-specific nuclease, binding and/or cleaving a specific least about 80% identical, at least about 90% identical, at least phosphodiester bond within a specific nucleotide sequence, about 95% identical, at least about 98% identical, at least which is also referred to herein as the “recognition sequence.” about 99% identical, at least about 99.5% identical, or at least the “nuclease target site.” or the “target site.” In some embodi about 99.9% to SEQ ID NO:6. In some embodiments, vari ments, a nuclease is a RNA-guided (i.e., RNA-program- 30 ants of the FokI cleavage domain (e.g., variants of SEQ ID mable) nuclease, which is associated with (e.g., binds to) an NO:6) are provided having an amino acid sequence which is RNA (e.g., a guide RNA, "gRNA) having a sequence that shorter, or longer than SEQID NO:6, by about 5 amino acids, complements a target site, thereby providing the sequence by about 10 amino acids, by about 15 amino acids, by about specificity of the nuclease. In some embodiments, a nuclease 20 amino acids, by about 25 amino acids, by about 30 amino recognizes a single Stranded target site, while in other 35 acids, by about 40 amino acids, by about 50 amino acids, by embodiments, a nuclease recognizes a double-stranded target about 75 amino acids, by about 100 amino acids or more. site, for example, a double-stranded DNA target site. The Cleavage Domain of FokI: target sites of many naturally occurring nucleases, for example, many naturally occurring DNA restriction nucleases, are well known to those of skill in the art. In many 40 (SEQ ID NO : 6) cases, a DNA nuclease, such as EcoRI, HindIII, or BamPHI, GSOLVKSELEEKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEF recognize a palindromic, double-stranded DNA target site of FMKWYGYRGKHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGO 4 to 10 base pairs in length, and cut each of the two DNA Strands at a specific position within the target site. Some ADEMORYWEENOTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAO endonucleases cut a double-stranded nucleic acid target site 45 symmetrically, i.e., cutting both strands at the same position LTRLINHITNCNGAWLSWEELLIGGEMIKAGTLTLEEWRRKFNNGEINF so that the ends comprise base-paired nucleotides, also The terms “nucleic acid' and “nucleic acid molecule, as referred to herein as blunt ends. Other endonucleases cut a used herein, refer to a compound comprising a nucleobase double-stranded nucleic acid target sites asymmetrically, i.e., and an acidic moiety, e.g., a nucleoside, a nucleotide, or a cutting each Strand at a different position so that the ends 50 polymer of nucleotides. Typically, polymeric nucleic acids, comprise unpaired nucleotides. Unpaired nucleotides at the e.g., nucleic acid molecules comprising three or more nucle end of a double-stranded DNA molecule are also referred to otides are linear molecules, in which adjacent nucleotides are as “overhangs, e.g., as “5'-overhang' or as “3'-overhang.” linked to each other via a phosphodiester linkage. In some depending on whether the unpaired nucleotide(s) form(s) the embodiments, “nucleic acid refers to individual nucleic acid 5' or the 5' end of the respective DNA strand. Double-stranded 55 residues (e.g. nucleotides and/or nucleosides). In some DNA molecule ends ending with unpaired nucleotide(s) are embodiments, “nucleic acid refers to an oligonucleotide also referred to as sticky ends, as they can “stick to other chain comprising three or more individual nucleotide resi double-stranded DNA molecule ends comprising comple dues. As used herein, the terms "oligonucleotide' and “poly mentary unpaired nucleotide(s). A nuclease protein typically nucleotide' can be used interchangeably to refer to a polymer comprises a “binding domain that mediates the interaction 60 of nucleotides (e.g., a string of at least three nucleotides). In of the protein with the nucleic acid Substrate, and also, in Some embodiments, “nucleic acid encompasses RNA as Some cases, specifically binds to a target site, and a “cleavage well as single and/or double-stranded DNA. Nucleic acids domain that catalyzes the cleavage of the phosphodiester may be naturally occurring, for example, in the context of a bond within the nucleic acid backbone. In some embodiments genome, a transcript, an mRNA, tRNA, rRNA, siRNA, a nuclease protein can bind and cleave a nucleic acid mol- 65 SnRNA, gRNA, a plasmid, cosmid, chromosome, chromatid, ecule in a monomeric form, while, in other embodiments, a or other naturally occurring nucleic acid molecule. On the nuclease protein has to dimerize or multimerize in order to other hand, a nucleic acid molecule may be a non-naturally US 9,388,430 B2 23 24 occurring molecule, e.g., a recombinant DNA or RNA, an refer to a protein, peptide, or polypeptide of any size, struc artificial chromosome, an engineered genome, or fragment ture, or function. Typically, a protein, peptide, or polypeptide thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or will be at least three amino acids long. A protein, peptide, or including non-naturally occurring nucleotides or nucleo polypeptide may refer to an individual protein or a collection sides. Furthermore, the terms “nucleic acid, “DNA,” “RNA. of proteins. One or more of the amino acids in a protein, and/or similar terms include nucleic acid analogs, i.e. analogs peptide, or polypeptide may be modified, for example, by the having other than a phosphodiester backbone. Nucleic acids addition of a chemical entity Such as a carbohydrate group, a can be purified from natural sources, produced using recom hydroxyl group, a phosphate group, a farnesyl group, an binant expression systems and optionally purified, chemi isofarnesyl group, a fatty acid group, a linker for conjugation, cally synthesized, etc. Where appropriate, e.g., in the case of 10 functionalization, or other modification, etc. A protein, pep chemically synthesized molecules, nucleic acids can com tide, or polypeptide may also be a single molecule or may be prise nucleoside analogs such as analogs having chemically a multi-molecular complex. A protein, peptide, or polypep modified bases or Sugars, and backbone modifications. A tide may be just a fragment of a naturally occurring protein or nucleic acid sequence is presented in the 5' to 3’ direction peptide. A protein, peptide, or polypeptide may be naturally unless otherwise indicated. In some embodiments, a nucleic 15 occurring, recombinant, or synthetic, or any combination acid is or comprises natural nucleosides (e.g. adenosine, thy thereof. The term “fusion protein’ as used herein refers to a midine, guanosine, cytidine, uridine, deoxyadenosine, deox hybrid polypeptide which comprises protein domains from at ythymidine, deoxyguanosine, and deoxycytidine); nucleo least two different proteins. One protein may be located at the side analogs (e.g., 2-aminoadenosine, 2-thiothymidine, amino-terminal (N-terminal) portion of the fusion protein or inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methyl at the carboxy-terminal (C-terminal) protein thus forming an cytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorou “amino-terminal fusion protein’ or a “carboxy-terminal ridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl fusion protein, respectively. Any of the proteins provided cytidine, C5-methylcytidine, 2-aminoadeno sine, herein may be produced by any method known in the art. For 7-deaZaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-OX example, the proteins provided herein may be produced via oguanosine, O(6)-methylguanine, and 2-thiocytidine); 25 recombinant protein expression and purification, which is chemically modified bases; biologically modified bases (e.g., especially Suited for fusion proteins comprising a peptide methylated bases); intercalated bases; modified Sugars (e.g., linker. Methods for recombinant protein expression and puri 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hex fication are well known, and include those described by ose); and/or modified phosphate groups (e.g., phosphorothio Green and Sambrook, Molecular Cloning: A Laboratory ates and 5'-N-phosphoramidite linkages). 30 Manual (4" ed., Cold Spring Harbor Laboratory Press, Cold The term “pharmaceutical composition, as used herein, Spring Harbor, N.Y. (2012)), the entire contents of which are refers to a composition that can be administrated to a subject incorporated herein by reference. in the context of treatment and/or prevention of a disease or The term “RNA-programmable nuclease, and “RNA disorder. In some embodiments, a pharmaceutical composi guided nuclease' are used interchangeably herein and refer to tion comprises an active ingredient, e.g., a nuclease or recom 35 a nuclease that forms a complex with (e.g., binds or associates binase fused to a Cas9 protein, or fragment thereof (or a with) one or more RNA that is not a target for cleavage. In nucleic acid encoding a such a fusion), and optionally a Some embodiments, an RNA-programmable nuclease, when pharmaceutically acceptable excipient. In some embodi in a complex with an RNA, may be referred to as a nuclease: ments, a pharmaceutical composition comprises inventive RNA complex. Typically, the bound RNA(s) is referred to as Cas9 variant/fusion (e.g., f(cas9) protein(s) and gRNA(s) 40 a guide RNA (gRNA). gRNAs can exist as a complex of two Suitable for targeting the Cas9 variant/fusion protein(s) to a or more RNAs, or as a single RNA molecule. gRNAs that target nucleic acid. In some embodiments, the target nucleic exist as a single RNA molecule may be referred to as single acid is a gene. In some embodiments, the target nucleic acid guide RNAs (sgRNAs), though “gRNA is used interchange is an allele associated with a disease, whereby the allele is abley to refer to guide RNAs that exist as either single mol cleaved by the action of the Cas9 variant/fusion protein(s). In 45 ecules or as a complex of two or more molecules. Typically, some embodiments, the allele is an allele of the CLTA gene, gRNAs that exist as single RNA species comprise two the EMX gene, the HBB gene, the VEGF gene, or the CCR5 domains: (1) a domain that shares homology to a target gene. See e.g., the Examples: FIGS. 7, 8, 13, 14, 15, 17 and nucleic acid (e.g., and directs binding of a Cas9 complex to 19. the target); and (2) a domain that binds a Cas9 protein. In The term “proliferative disease, as used herein, refers to 50 Some embodiments, domain (2) corresponds to a sequence any disease in which cell or tissue homeostasis is disturbed in known as a tracrRNA, and comprises a stem-loop structure. that a cell or cell population exhibits an abnormally elevated For example, in Some embodiments, domain (2) is homolo proliferation rate. Proliferative diseases include hyperprolif gous to a tracrRNA as depicted in FIG. 1E of Jinek et al., erative diseases, such as pre-neoplastic hyperplastic condi Science 337:816-821 (2012), the entire contents of which is tions and neoplastic diseases. Neoplastic diseases are charac 55 incorporated herein by reference. Other examples of gRNAs terized by an abnormal proliferation of cells and include both (e.g., those including domain 2) can be found in U.S. Provi benign and malignant neoplasias. Malignant neoplasia is also sional Patent Application, U.S. Ser. No. 61/874,682, filed referred to as cancer. In some embodiments, the compositions Sep. 6, 2013, entitled “Switchable Cas9 Nucleases And Uses and methods provided herein are useful for treating a prolif Thereof.” U.S. Provisional Patent Application, U.S. Ser. No. erative disease. For example, in some embodiments, pharma 60 61/874,746, filed Sep. 6, 2013, entitled “Delivery System For ceutical compositions comprising Cas9 (e.g., f(as9) Functional Nucleases:” PCT Application WO 2013/176722, protein(s) and gRNA(s) suitable for targeting the Cas9 pro filed Mar. 15, 2013, entitled “Methods and Compositions for tein(s) to anVEGF allele, whereby the allele is inactivated by RNA-Directed Target DNA Modification and for RNA-Di the action of the Cas9 protein(s). See, e.g., the Examples. rected Modulation of Transcription;' and PCT Application The terms “protein,” “peptide.” and “polypeptide' are used 65 WO 2013/142578, filed Mar. 20, 2013, entitled “RNA-Di interchangeably herein, and refer to a polymer of amino acid rected DNA Cleavage by the Cas9-crRNA Complex,” the residues linked together by peptide (amide) bonds. The terms entire contents of each are hereby incorporated by reference US 9,388,430 B2 25 26 in their entirety. Still other examples of gRNAs and gRNA Brown et al., “Serine recombinases as tools for genome engi structure are provided herein. See e.g., the Examples. In some neering.” Methods. 2011; 53(4):372-9: Hirano et al., “Site embodiments, a gRNA comprises two or more of domains (1) specific recombinases as tools for heterologous gene integra and (2), and may be referred to as an “extended gRNA. For tion.” Appl. Microbiol. Biotechnol. 2011; 92(2):227-39; example, an extended gRNA will e.g., bind two or more Cas9 Chavez and Calos, “Therapeutic applications of the dC31 proteins and bind a target nucleic acid at two or more distinct integrase system.” Curr. Gene Ther: 2011; 11(5):375-81: regions, as described herein. The gRNA comprises a nucle Turan and Bode, “Site-specific recombinases: from tag-and otide sequence that complements a target site, which medi target- to tag-and-exchange-based genomic modifications.” ates binding of the nuclease/RNA complex to said target site, FASEB J. 2011; 25(12):4088-107; Venken and Bellen, providing the sequence specificity of the nuclease:RNA com 10 "Genome-wide manipulations of Drosophila melanogaster plex. In some embodiments, the RNA-programmable with transposons, Flp recombinase, and dC31 integrase.” nuclease is the (CRISPR-associated system) Cas9 endonu Methods Mol. Biol. 2012: 859:203-28; Murphy, “Phage clease, for example Cas9 (Csn1) from Streptococcus pyo recombinases and their applications.” Adv. Virus Res. 2012: genes (see, e.g., “Complete genome sequence of an M1 Strain 83:367-414: Zhang et al., “Conditional gene manipulation: of Streptococcus pyogenes.” Ferretti J. J., McShan W. M., 15 Creating a new biological era.J. Zhejiang Univ. Sci. B. 2012: Ajdic D. J., Savic D. J., Savic G. Lyon K., Primeaux C. 13(7):511-24; Karpenshif and Bernstein, “From yeast to Sezate S., Suvorov A.N., Kenton S. Lai H.S., Lin S. P., Qian mammals: recent advances in genetic control of homologous Y., Jia H. G., Najar F. Z. Ren Q., Zhu H., Song L., White J. recombination.” DNA Repair (Amst). 2012; 1; 11(10):781-8; Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. the entire contents of each are hereby incorporated by refer Natl. Acad. Sci. U.S.A. 98:4658-4663 (2001); “CRISPR ence in their entirety. The recombinases provided herein are RNA maturation by trans-encoded small RNA and host factor not meant to be exclusive examples of recombinases that can RNase III. Deltcheva E., Chylinski K., Sharma C. M., be used in embodiments of the invention. The methods and Gonzales K. Chao Y., Pirzada Z.A., Eckert M. R., Vogel J. compositions of the invention can be expanded by mining Charpentier E., Nature 471:602-607 (2011); and “A program databases for new orthogonal recombinases or designing Syn mable dual-RNA-guided DNA endonuclease in adaptive bac 25 thetic recombinases with defined DNA specificities (See, terial immunity.” Jinek M. Chylinski K., Fonfara I., Hauer e.g., Groth et al., “Phage integrases: biology and applica M., Doudna J. A., Charpentier E. Science 337:816-821 tions.” J. Mol. Biol. 2004; 335, 667-678; Gordley et al., “Syn (2012), the entire contents of each of which are incorporated thesis of programmable integrases.” Proc. Natl. Acad. Sci. herein by reference. USA. 2009; 106, 5053-5058; the entire contents of each are Because RNA-programmable nucleases (e.g., Cas9) use 30 hereby incorporated by reference in their entirety). Other RNA:DNA hybridization to determine target DNA cleavage examples of recombinases that are useful in the methods and sites, these proteins are able to cleave, in principle, any compositions described herein are known to those of skill in sequence specified by the guide RNA. Methods of using the art, and any new recombinase that is discovered or gen RNA-programmable nucleases, such as Cas9, for site-spe erated is expected to be able to be used in the different cific cleavage (e.g., to modify a genome) are known in the art 35 embodiments of the invention. In some embodiments, the (see e.g., Cong, L. et al. Multiplex genome engineering using catalytic domains of a recombinase are fused to a nuclease CRISPR/Cas systems. Science 339,819-823 (2013); Mali, P. inactivated RNA-programmable nuclease (e.g., dCas9, or a et al. RNA-guided human genome engineering via Cas9. fragment thereof). Such that the recombinase domain does not Science 339, 823-826 (2013); Hwang, W.Y. et al. Efficient comprise a nucleic acid binding domain or is unable to bind to genome editing in zebrafish using a CRISPR-Cas system. 40 a target nucleic acid (e.g., the recombinase domain is engi Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. neered such that it does not have specific DNA binding activ RNA-programmed genome editing in human cells. eLife 2, ity). Recombinases lacking DNA binding activity and meth e(00471 (2013); Dicarlo, J. E. et al. Genome engineering in ods for engineering Such are known, and include those Saccharomyces cerevisiae using CRISPR-Cas systems. described by Klippel et al., “Isolation and characterisation of Nucleic acids research (2013); Jiang, W. et al. RNA-guided 45 unusual gin mutants.” EMBO.J. 1988; 7:3983-3989: Burke et editing of bacterial genomes using CRISPR-Cas systems. al., “Activating mutations of Tn3 resolvase marking inter Nature biotechnology 31, 233-239 (2013); the entire contents faces important in recombination catalysis and its regulation. of each of which are incorporated herein by reference). Mol Microbiol. 2004:51:937-948; Olorunniji et al., “Synap The term “recombinase, as used herein, refers to a site sis and catalysis by activated Tn3 resolvase mutants. Nucleic specific enzyme that mediates the recombination of DNA 50 Acids Res. 2008:36: 7181-7191; Rowland et al., “Regulatory between recombinase recognition sequences, which results in mutations in Sin recombinase Support a structure-based the excision, integration, inversion, or exchange (e.g., trans model of the synaptosome.” Mol Microbiol. 2009: 74: 282 location) of DNA fragments between the recombinase recog 298: Akopian et al., “Chimeric recombinases with designed nition sequences. Recombinases can be classified into two DNA sequence recognition.” Proc Natl AcadSci USA. 2003: distinct families: serine recombinases (e.g., resolvases and 55 100: 8688-8691; Gordley et al., “Evolution of programmable invertases) and tyrosine recombinases (e.g., integrases). Zinc finger-recombinases with activity in human cells. J Mol Examples of serine recombinases include, without limitation, Biol. 2007: 367: 802-813; Gordley et al., “Synthesis of pro Hin, Gin, Tn3, B-six, CinH, Para, Yö, Bxb1, pC31, TP901, grammable integrases.” Proc Natl AcadSci USA. 2009; 106: TG1, pBT1, R4, pRV1, pFC1, MR11, A118, U153, and gp29. 5053-5058; Arnold et al., “Mutants of Tn3 resolvase which do Examples of tyrosine recombinases include, without limita 60 not require accessory binding sites for recombination activ tion, Cre, FLP, R, Lambda, HK101, HKO22, and pSAM2. The ity.” EMBO.J. 1999; 18: 1407-1414; Gaj et al., “Structure serine and tyrosine recombinase names stem from the con guided reprogramming of serine recombinase DNA sequence served nucleophilic amino acid residue that the recombinase specificity.” Proc Natl AcadSci USA. 2011; 108(2):498–503; uses to attack the DNA and which becomes covalently linked and Proudfoot et al., “Zinc finger recombinases with adapt to the DNA during strand exchange. Recombinases have 65 able DNA sequence specificity. PLoS One. 2011; 6(4): numerous applications, including the creation of gene knock e19537; the entire contents of each are hereby incorporated outs/knock-ins and gene therapy applications. See, e.g., by reference. For example, serine recombinases of the US 9,388,430 B2 27 28 resolvase-invertase group, e.g., Tn3 and Yö resolvases and the molecule, are modified by the action of a recombinase protein Hinand Gin invertases, have modular structures with autono (e.g., an inventive recombinase fusion protein provided mous catalytic and DNA-binding domains (See, e.g., Grind herein). Recombination can result in, interalia, the insertion, ley et al., “Mechanism of site-specific recombination.” Ann inversion, excision or translocation of nucleic acids, e.g., in or Rev Biochem. 2006: 75:567-605, the entire contents of which 5 between one or more nucleic acid molecules. are incorporated by reference). The catalytic domains of these The term “subject as used herein, refers to an individual recombinases are thus amenable to being recombined with organism, for example, an individual mammal. In some nuclease-inactivated RNA-programmable nucleases (e.g., embodiments, the Subject is a human. In some embodiments, dCas9, or a fragment thereof) as described herein, e.g., fol the Subject is a non-human mammal. In some embodiments, lowing the isolation of activated recombinase mutants 10 the Subject is a non-human primate. In some embodiments, which do not require any accessory factors (e.g., DNA bind the Subject is a rodent. In some embodiments, the Subject is a ing activities) (See, e.g., Klippel et al., “Isolation and charac sheep, a goat, a cattle, a cat, or a dog. In some embodiments, terisation of unusual gin mutants.” EMBO.J. 1988; 7: 3983 the Subject is a vertebrate, an amphibian, a reptile, a fish, an 3989: Burke et al., “Activating mutations of Tn3 resolvase insect, a fly, or a nematode. In some embodiments, the Subject marking interfaces important in recombination catalysis and 15 is a research animal. In some embodiments, the Subject is its regulation. Mol Microbiol. 2004:51:937-948; Olorunniji genetically engineered, e.g., a genetically engineered non et al., “Synapsis and catalysis by activated Tn3 resolvase human Subject. The Subject may be of either sex and at any mutants. Nucleic Acids Res. 2008:36: 7181-7191; Rowland stage of development. et al., “Regulatory mutations in Sin recombinase Support a The terms “target nucleic acid, and “target genome as structure-based model of the synaptosome.” Mol Microbiol. 20 used herein in the context of nucleases, refer to a nucleic acid 2009; 74: 282-298: Akopian et al., “Chimeric recombinases molecule or a genome, respectively, that comprises at least with designed DNA sequence recognition.” Proc Natl Acad one target site of a given nuclease. In the context of fusions Sci USA. 2003: 100: 8688-8691). Additionally, many other comprising a (nuclease-inactivated) RNA-programmable natural serine recombinases having an N-terminal catalytic nuclease and a recombinase domain, a “target nucleic acid domain and a C-terminal DNA binding domain are known 25 and a “target genome' refers to one or more nucleic acid (e.g., phiC31 integrase, TnpX transposase, IS607 trans molecule(s), or a genome, respectively, that comprises at least posase), and their catalytic domains can be co-opted to engi one target site. In some embodiments, the target nucleic neer programmable site-specific recombinases as described acid(s) comprises at least two, at least three, or at least four herein (See, e.g., Smith et al., “Diversity in the serine recom target sites. In some embodiments, the target nucleic acid(s) binases.” Mol Microbiol. 2002; 44; 299-307, the entire con- 30 comprise four target sites. tents of which are incorporated by reference). Similarly, the The term “target site' refers to a sequence within a nucleic core catalytic domains of tyrosine recombinases (e.g., Cre, w acid molecule that is either (1) bound and cleaved by a integrase) are known, and can be similarly co-opted to engi nuclease (e.g., Cas9 fusion proteins provided herein), or (2) neer programmable site-specific recombinases as described bound and recombined (e.g., at or nearby the target site) by a herein (See, e.g., Guo et al., “Structure of Cre recombinase 35 recombinase (e.g., a dCas9-recombinase fusion protein pro complexed with DNA in a site-specific recombination syn vided herein). A target site may be single-stranded or double apse Nature. 1997: 389:40-46; Hartung et al., “Cre mutants stranded. In the context of RNA-guided (e.g., RNA-program with altered DNA binding properties.” J Biol Chem 1998: mable) nucleases (e.g., a protein dimer comprising a Cas9 273:22884-22891; Shaikh et al., “Chimeras of the Flp and gRNA binding domain and an active Cas9 DNA cleavage Cre recombinases: Tests of the mode of cleavage by Flp and 40 domain or other nuclease domain Such as FokI), a target site Cre. J Mol Biol. 2000: 302:27-48: Rongrong et al., “Effect of typically comprises a nucleotide sequence that is comple deletion mutation on the recombination activity of Cre mentary to the gRNA(s) of the RNA-programmable nuclease, recombinase.” Acta Biochim Pol. 2005:52:541-544: Kilbride and a protospacer adjacent motif (PAM) at the 3' end adjacent et al., “Determinants of product topology in a hybrid Cre-Tm3 to the gRNA-complementary sequence(s). In some embodi resolvase site-specific recombination system.” J Mol Biol. 45 ments, such as those involving fas9, a target site can encom 2006; 355:185-195; Warren et al., “A chimeric cre recombi pass the particular sequences to which f(as9 monomers bind, nase with regulated directionality.” Proc Natl AcadSci USA. and/or the intervening sequence between the bound mono 2008 105:18278-18283: Van Duyne, “Teaching Creto follow mers that are cleaved by the dimerized FokI domains (See directions.” Proc Natl AcadSci USA. 2009 Jan. 6: 106(1):4-5: e.g., the Examples; and FIGS. 1A, 6D). In the context of Numrych et al., “A comparison of the effects of single-base 50 fusions between RNA-guided (e.g., RNA-programmable, and triple-base changes in the integrase arm-type binding nuclease-inactivated) nucleases and a recombinase (e.g., a sites on the site-specific recombination of bacteriophage W.” catalytic domain of a recombinase), a target site typically Nucleic Acids Res. 1990; 18:3953-3959; Tirumalai et al., comprises a nucleotide sequence that is complementary to the “The recognition of core-type DNA sites by w integrase.” J gRNA of the RNA-programmable nuclease domain, and a Mol Biol. 1998: 279:513-527; Aihara et al., “A conforma- 55 protospacer adjacent motif (PAM) at the 3' end adjacent to the tional switch controls the DNA cleavage activity of inte gRNA-complementary sequence. For example, in some grase.” Mol Cell. 2003: 12:187-198: Biswas et al., “A struc embodiments, four recombinase monomers are coordinated tural basis for allosteric control of DNA recombination by J. to recombine a target nucleic acid(s), each monomer being integrase.” Nature. 2005; 435:1059-1066; and Warren et al., fused to a (nuclease-inactivated) Cas9 protein guided by a “Mutations in the amino-terminal domain of w-integrase have 60 gRNA. In such an example, each Cas9 domain is guided by a differential effects on integrative and excisive recombina distinct gRNA to bind a target nucleic acid(s), thus the target tion.” Mol Microbiol. 2005; 55:1104-11 12; the entire con nucleic acid comprises four target sites, each site targeted by tents of each are incorporated by reference). a separate dCas9-recombinase fusion (thereby coordinating The term “recombine, or “recombination, in the context four recombinase monomers which recombine the target of a nucleic acid modification (e.g., a genomic modification), 65 nucleic acid(s)). For the RNA-guided nuclease Cas9 (or is used to refer to the process by which two or more nucleic gRNA-binding domain thereof) and inventive fusions of acid molecules, or two or more regions of a single nucleic acid Cas9, the target site may be, in some embodiments, 17-20 US 9,388,430 B2 29 30 base pairs plus a 3 PAM (e.g., NNN, wherein N ments, either or both half-sites are shorter or longer than e.g., independently represents any nucleotide). Typically, the first a typical region targeted by Cas9, for example shorter or nucleotide of a PAM can be any nucleotide, while the two longer than 20 nucleotides. In some embodiments, the left downstream nucleotides are specified depending on the spe and right half sites comprise different nucleic acid sequences. cific RNA-guided nuclease. Exemplary target sites (e.g., In some embodiments involving inventive nucleases, the tar comprising a PAM) for RNA-guided nucleases, such as Cas9, get site is a sequence comprising three (3) RNA-guided are known to those of skill in the art and include, without nuclease target site sequences, for example, three sequences limitation, NNG, NGN, NAG, and NGG, wherein Nindepen corresponding to Cas9 target site sequences (See, e.g., FIG. dently represents any nucleotide. In addition, Cas9 nucleases 2C), in which the first and second, and second and third Cas9 from different species (e.g., S. thermophilus instead of S. 10 target site sequences are separated by a spacer sequence. In pyogenes) recognizes a PAM that comprises the sequence Some embodiments, the spacer sequence is at least 5, at least NGGNG. Additional PAM sequences are known, including, 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least but not limited to, NNAGAAW and NAAR (see, e.g., Esvelt 12, at least 13, at least 14, at least 15, at least 16, at least 17, and Wang, Molecular Systems Biology, 9:641 (2013), the at least 18, at least 19, at least 20, at least 25, at least 30, at least entire contents of which are incorporated herein by refer 15 35, at least 40, at least 45, at least 50, at least 60, at least 70, ence). In some aspects, the target site of an RNA-guided at least 80, at least 90, at least 100, at least 125, at least 150, nuclease, Such as, e.g., Cas9, may comprise the structure at least 175, at least 200, or at least 250 bp long. In some N-PAM), where each N is, independently, any nucleotide, embodiments, the spacer sequence is between approximately and Z is an integer between 1 and 50, inclusive. In some 15bp and approximately 25bp long. In some embodiments, embodiments, Z is at least 2, at least 3, at least 4, at least 5, at the spacer sequence is approximately 15bp long. In some least 6, at least 7, at least 8, at least 9, at least 10, at least 11, embodiments, the spacer sequence is approximately 25 bp at least 12, at least 13, at least 14, at least 15, at least 16, at least long. 17, at least 18, at least 19, at least 20, at least 25, at least 30, The term “Transcriptional Activator-Like Effector.” at least 35, at least 40, at least 45, or at least 50. In some (TALE) as used herein, refers to bacterial proteins comprising embodiments, Z is 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 25 a DNA binding domain, which contains a highly conserved 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, 33-34 amino acid sequence comprising a highly variable 35,36, 37,38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50. two-amino acid motif (Repeat Variable Diresidue, RVD). The In some embodiments, Z is 20. In some embodiments, “target RVD motif determines binding specificity to a nucleic acid site' may also refer to a sequence within a nucleic acid mol sequence and can be engineered according to methods known ecule that is bound but not cleaved by a nuclease. For 30 to those of skill in the art to specifically bind a desired DNA example, certain embodiments described herein provide pro sequence (see, e.g., Miller, Jeffrey; et. al. (February 2011). “A teins comprising an inactive (or inactivated) Cas9 DNA TALE nuclease architecture for efficient genome editing. cleavage domain. Such proteins (e.g., when also including a Nature Biotechnology 29 (2): 143-8; Zhang, Feng; et. al. Cas9 RNA binding domain) are able to bind the target site (February 2011). “Efficient construction of sequence-specific specified by the gRNA; however, because the DNA cleavage 35 TAL effectors for modulating mammalian transcription” site is inactivated, the target site is not cleaved by the particu Nature Biotechnology 29 (2): 149-53; Gei?ler, R.; Scholze, lar protein. However, such proteins as described herein are H.; Hahn, S.; Streubel, J.; Bonas, U.: Behrens, S. E.; Boch, J. typically conjugated, fused, or bound to another protein (e.g., (2011), Shiu, Shin-Han. ed. “Transcriptional Activators of a nuclease) or molecule that mediates cleavage of the nucleic Human Genes with Programmable DNA-Specificity”. PLOS acid molecule. In other embodiments, such proteins are con 40 ONE 6 (5): e19509; Boch, Jens (February 2011). “TALEs of jugated, fused, or bound to a recombinase (or a catalytic genome targeting. Nature Biotechnology 29 (2): 135-6: domain of a recombinase), which mediates recombination of Boch, Jens; et. al. (December 2009). “Breaking the Code of the target nucleic acid. In some embodiments, the sequence DNA Binding Specificity of TAL-Type III Effectors”. Sci actually cleaved or recombined will depend on the protein ence 326 (5959): 1509-12; and Moscou, Matthew J.; Adam J. (e.g., nuclease or recombinase) or molecule that mediates 45 Bogdanove (December 2009). A Simple Cipher Governs cleavage or recombination of the nucleic acid molecule, and DNA Recognition by TAL Effectors' Science 326 (5959): in some cases, for example, will relate to the proximity or 1501; the entire contents of each of which are incorporated distance from which the inactivated Cas9 protein(s) is/are herein by reference). The simple relationship between amino bound. acid sequence and DNA recognition has allowed for the engi In the context of inventive proteins that dimerize (or mul 50 neering of specific DNA binding domains by selecting a timerize), for example, dimers of a protein comprising a combination of repeat segments containing the appropriate nuclease-inactivated Cas9 (or a Cas9 RNA binding domain) RVDS. and a DNA cleavage domain (e.g., FokI cleavage domain or The term “Transcriptional Activator-Like Element an active Cas9 cleavage domain), or fusions between a Nuclease.” (TALEN) as used herein, refers to an artificial nuclease-inactivated Cas9 (or a Cas9 gRNA binding domain) 55 nuclease comprising a transcriptional activator-like effector and a recombinase (or catalytic domain of a recombinase), a DNA binding domain to a DNA cleavage domain, for target site typically comprises a left-half site (bound by one example, a FokI domain. A number of modular assembly protein), a right-half site (bound by the second protein), and a schemes for generating engineered TALE constructs have spacer sequence between the half sites in which the cut or been reported (see e.g., Zhang, Feng, et. al. (February 2011). recombination is made. In some embodiments, either the 60 “Efficient construction of sequence-specific TAL effectors left-half site or the right half-site (and not the spacer for modulating mammalian transcription’. Nature Biotech sequence) is cut or recombined. In other embodiments, the nology 29 (2): 149-53; Gei?ler, R.; Scholze, H.; Hahn, S.; spacer sequence is cut or recombined. This structure (left Streubel, J.; Bonas, U.; Behrens, S. E.; Boch, J. (2011), Shiu, half site-spacer sequence-right-half site) is referred to Shin-Han. ed. “Transcriptional Activators of Human Genes herein as an LSR structure. In some embodiments, the left 65 with Programmable DNA-Specificity”. PLoS ONE 6 (5): half site and/or the right-half site correspond to an RNA e19509; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; guided target site (e.g., a Cas9 target site). In some embodi Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V. et al. US 9,388,430 B2 31 32 (2011). “Efficient design and assembly of custom TALEN The term "Zinc finger, as used herein, refers to a small and other TAL effector-based constructs for DNA targeting. nucleic acid-binding protein structural motif characterized by Nucleic Acids Research: Morbitzer, R.; Elsaesser, J.; Haus a fold and the coordination of one or more Zinc ions that ner, J.; Lahaye, T. (2011). “Assembly of custom TALE-type stabilize the fold. Zinc fingers encompass a wide variety of DNA binding domains by modular cloning. Nucleic Acids differing protein structures (see, e.g., Klug A, Rhodes D Research; Li, T.; Huang, S.; Zhao, X. Wright, D.A.: Carpen (1987). “Zinc fingers: a novel protein fold for nucleic acid ter, S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). recognition'. Cold Spring Harb. Symp. Ouant. Biol. 52: 473 “Modularly assembled designer TAL effector nucleases for 82, the entire contents of which are incorporated herein by targeted gene knockout and gene replacement in eukaryotes'. reference). Zinc fingers can be designed to bind a specific 10 sequence of nucleotides, and Zinc finger arrays comprising Nucleic Acids Research. Weber, E.; Gruetzner, R.; Werner, fusions of a series of Zinc fingers, can be designed to bind S.: Engler, C.; Marillonnet, S. (2011). Bendahmane, Moham virtually any desired target sequence. Such Zinc finger arrays med. ed. “Assembly of Designer TAL Effectors by Golden can form a binding domain of a protein, for example, of a Gate Cloning”. PLoS ONE 6 (5): e19722; the entire contents nuclease, e.g., if conjugated to a nucleic acid cleavage of each of which are incorporated herein by reference). 15 domain. Different types of zinc finger motifs are known to The terms “treatment,” “treat,” and “treating refer to a those of skill in the art, including, but not limited to, clinical intervention aimed to reverse, alleviate, delay the Cys-His, Gag knuckle, Treble clef, Zinc ribbon, Zn/Cys, onset of, or inhibit the progress of a disease or disorder, or one and TAZ2 domain-like motifs (see, e.g., Krishna SS. Majum or more symptoms thereof, as described herein. As used dar I, Grishin NV (January 2003). "Structural classification herein, the terms “treatment,” “treat,” and “treating refer to a of zinc fingers: survey and summary'. Nucleic Acids Res. 31 clinical intervention aimed to reverse, alleviate, delay the (2): 532–50). Typically, a single Zinc finger motif binds 3 or 4 onset of, or inhibit the progress of a disease or disorder, or one nucleotides of a nucleic acid molecule. Accordingly, a Zinc or more symptoms thereof, as described herein. In some finger domain comprising 2 Zinc finger motifs may bind 6-8 embodiments, treatment may be administered after one or nucleotides, a Zinc finger domain comprising 3 Zinc finger more symptoms have developed and/or after a disease has 25 motifs may bind 9-12 nucleotides, a Zinc finger domain com been diagnosed. In other embodiments, treatment may be prising 4 Zinc finger motifs may bind 12-16 nucleotides, and administered in the absence of symptoms, e.g., to prevent or So forth. Any Suitable protein engineering technique can be delay onset of a symptom or inhibit onset or progression of a employed to alter the DNA-binding specificity of zinc fingers disease. For example, treatment may be administered to a and/or design novel Zinc finger fusions to bind virtually any Susceptible individual prior to the onset of symptoms (e.g., in 30 desired target sequence from 3-30 nucleotides in length (see, light of a history of symptoms and/or in light of genetic or e.g., Pabo CO, Peisach E. Grant RA (2001). “Design and other susceptibility factors). Treatment may also be contin selection of novel cys2His2 Zinc finger proteins’. Annual ued after symptoms have resolved, for example, to prevent or Review of Biochemistry 70:313-340; Jamieson AC, Miller J delay their recurrence. C, Pabo CO (2003). “Drug discovery with engineered zinc The term “vector” refers to a polynucleotide comprising 35 finger proteins”. Nature Reviews Drug Discovery 2 (5): 361 one or more recombinant polynucleotides of the present 368; and Liu Q, Segal DJ. Ghiara J. B. Barbas C F (May invention, e.g., those encoding a Cas9 protein (or fusion 1997). “Design of polydactyl zinc-finger proteins for unique thereof) and/or gRNA provided herein. Vectors include, but addressing within complex genomes. Proc. Natl. Acad. Sci. are not limited to, plasmids, viral vectors, cosmids, artificial U.S.A. 94 (11); the entire contents of each of which are incor chromosomes, and phagemids. The vector is able to replicate 40 porated herein by reference). Fusions between engineered in a host cell and is further characterized by one or more Zinc finger arrays and protein domains that cleave a nucleic endonuclease restriction sites at which the vector may be cut acid can be used to generate a "Zinc finger nuclease. A Zinc and into which a desired nucleic acid sequence may be finger nuclease typically comprises a Zinc finger domain that inserted. Vectors may contain one or more marker sequences binds a specific target site within a nucleic acid molecule, and suitable for use in the identification and/or selection of cells 45 a nucleic acid cleavage domain that cuts the nucleic acid which have or have not been transformed or genomically molecule within or in proximity to the target site bound by the modified with the vector. Markers include, for example, binding domain. Typical engineered Zinc finger nucleases genes encoding proteins which increase or decrease either comprise a binding domain having between 3 and 6 indi resistance or sensitivity to antibiotics (e.g., kanamycin, ampi vidual Zinc finger motifs and binding target sites ranging from cillin) or other compounds, genes which encode enzymes 50 9 base pairs to 18 base pairs in length. Longer target sites are whose activities are detectable by standard assays known in particularly attractive in situations where it is desired to bind the art (e.g., B-galactosidase, alkaline phosphatase, or and cleave a target site that is unique in a given genome. luciferase), and genes which visibly affect the phenotype of The term “zinc finger nuclease,” as used herein, refers to a transformed or transfected cells, hosts, colonies, or plaques. nuclease comprising a nucleic acid cleavage domain conju Any vector Suitable for the transformation of a host cell (e.g., 55 gated to a binding domain that comprises a Zinc finger array. E. coli, mammalian cells such as CHO cell, insect cells, etc.) In some embodiments, the cleavage domain is the cleavage as embraced by the present invention, for example, vectors domain of the type II restriction endonuclease FokI. Zinc belonging to the puC series, pGEM series, pFT series, pFBAD finger nucleases can be designed to target virtually any series, pTET series, or pGEX series. In some embodiments, desired sequence in a given nucleic acid molecule for cleav the vector is suitable for transforming a host cell for recom 60 age, and the possibility to design Zinc finger binding domains binant protein production. Methods for selecting and engi to bind unique sites in the context of complex genomes allows neering vectors and host cells for expressing proteins (e.g., for targeted cleavage of a single genomic site in living cells, those provided herein), transforming cells, and expressing/ for example, to achieve a targeted genomic alteration of thera purifying recombinant proteins are well known in the art, and peutic value. Targeting a double-strand break to a desired are provided by, for example, Green and Sambrook, Molecu 65 genomic locus can be used to introduce frame-shift mutations lar Cloning: A Laboratory Manual (4" ed., Cold Spring Har into the coding sequence of a gene due to the error-prone bor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). nature of the non-homologous DNA repair pathway. Zinc US 9,388,430 B2 33 34 finger nucleases can be generated to target a site of interest by genes necessary for tumor neovascularization, can be used in methods well known to those of skill in the art. For example, the clinical context, and two site specific nucleases are cur Zinc finger binding domains with a desired specificity can be rently in clinical trials (Perez, E. E. et al., “Establishment of designed by combining individual Zinc finger motifs of HIV-1 resistance in CD4+ T cells by genome editing using known specificity. The structure of the Zinc finger protein 5 zinc-finger nucleases.” Nature biotechnology. 26, 808–816 Zif268 bound to DNA has informed much of the work in this (2008); ClinicalTrials.gov identifiers: NCT00842634, field and the concept of obtaining Zinc fingers for each of the NCT01044654, NCT01252641, NCT01082926). Accord 64 possible base pair triplets and then mixing and matching ingly, nearly any genetic disease can be treated using site these modular Zinc fingers to design proteins with any desired specific nucleases and/or recombinases and include, for sequence specificity has been described (Pavletich NP, Pabo 10 example, diseases associated with triplet expansion (e.g., C O (May 1991). “Zinc finger-DNA recognition: crystal Huntington's disease, myotonic dystrophy, spinocerebellar structure of a Zif268-DNA complex at 2.1 A. Science 252 ataxias, etc.), cystic fibrosis (by targeting the CFTR gene), (5007): 809-17, the entire contents of which are incorporated hematological disease (e.g., hemoglobinopathies), cancer, herein). In some embodiments, separate Zinc fingers that each autoimmune diseases, and viral infections. Other diseases recognizes a 3 base pair DNA sequence are combined to 15 that can be treated using the inventive compositions and/or generate 3-, 4-, 5-, or 6-finger arrays that recognize target sites methods provided herein include, but are not limited to, ranging from 9 base pairs to 18 base pairs in length. In some achondroplasia, achromatopsia, acid maltase deficiency, embodiments, longer arrays are contemplated. In other adenosine deaminase deficiency, adrenoleukodystrophy, embodiments, 2-finger modules recognizing 6-8 nucleotides aicardi syndrome, alpha-1 antitrypsin deficiency, alpha are combined to generate 4-, 6-, or 8-Zinc finger arrays. In thalassemia, androgen insensitivity syndrome, apert Syn Some embodiments, bacterial orphage display is employed to drome, arrhythmogenic right ventricular, dysplasia, ataxia develop a Zinc finger domain that recognizes a desired nucleic telangictasia, barth syndrome, beta-thalassemia, blue rubber acid sequence, for example, a desired nuclease target site of bleb nevus syndrome, canavan disease, chronic granuloma 3-30 bp in length. Zinc finger nucleases, in some embodi tous diseases (CGD), cri du chat syndrome, dercum’s disease, ments, comprise a Zinc finger binding domain and a cleavage 25 ectodermal dysplasia, fanconi anemia, fibrodysplasia ossifi domain fused or otherwise conjugated to each other via a cans progressive, fragile X syndrome, galactosemis, Gauch linker, for example, a polypeptide linker. The length of the er's disease, generalized gangliosidoses (e.g., GM1), hemo linker determines the distance of the cut from the nucleic acid chromatosis, the hemoglobin C mutation in the 6th codon of sequence bound by the Zinc finger domain. If a shorter linker beta-globin (HbO), hemophilia, Hurler Syndrome, hypo is used, the cleavage domain will cut the nucleic acid closer to 30 phosphatasia, Klinefleter syndrome, Krabbes Disease, the bound nucleic acid sequence, while a longer linker will Langer-Giedion Syndrome, leukocyte adhesion deficiency result in a greater distance between the cut and the bound (LAD), leukodystrophy, long QT syndrome, Marfan syn nucleic acid sequence. In some embodiments, the cleavage drome, Moebius syndrome, mucopolysaccharidosis (MPS), domain of a Zinc finger nuclease has to dimerize in order to nail patella syndrome, nephrogenic diabetes insipdius, neu cut a bound nucleic acid. In some Such embodiments, the 35 rofibromatosis, Neimann-Pick disease, osteogenesis imper dimer is a heterodimer of two monomers, each of which fecta, porphyria, Prader-Willi syndrome, progeria, Proteus comprise a different Zinc finger binding domain. For syndrome, retinoblastoma, Rett syndrome, Rubinstein-Taybi example, in some embodiments, the dimer may comprise one syndrome, Sanfilippo syndrome, severe combined immuno monomer comprising Zinc finger domain A conjugated to a deficiency (SCID), Shwachman syndrome, sickle cell disease FokI cleavage domain, and one monomer comprising Zinc 40 (sickle cell anemia), Smith-Magenis syndrome, Stickler Syn finger domain B conjugated to a FokI cleavage domain. In this drome, Tay-Sachs disease. Thrombocytopenia Absent Radius non-limiting example, Zinc finger domain A binds a nucleic (TAR) syndrome, Treacher Collins syndrome, trisomy, tuber acid sequence on one side of the target site, Zinc finger domain ous Sclerosis, Turner's syndrome, urea cycle disorder, Von B binds a nucleic acid sequence on the other side of the target Hippel-Landau disease, Waardenburg syndrome, Williams site, and the dimerize FokI domain cuts the nucleic acid in 45 syndrome, Wilson's disease, Wiskott-Aldrich syndrome, and between the Zinc finger domain binding sites. X-linked lymphoproliferative syndrome (XLP). One aspect of site-specific genomic modification is the DETAILED DESCRIPTION OF CERTAIN possibility of off-target nuclease or recombinase effects, e.g., EMBODIMENTS OF THE INVENTION the cleavage or recombination of genomic sequences that 50 differ from the intended target sequence by one or more Site-specific nucleases and site-specific recombinases are nucleotides. Undesired side effects of off-target cleavage/ powerful tools for targeted genome modification in vitro and recombination range from insertion into unwanted loci dur in vivo. It has been reported that nuclease cleavage in living ing a gene targeting event to severe complications in a clinical cells triggers a DNA repair mechanism that frequently results scenario. Off-target cleavage or recombination of sequences in a modification of the cleaved and repaired genomic 55 encoding essential gene functions or tumor Suppressor genes sequence, for example, via homologous recombination. by an endonuclease or recombinase administered to a subject Accordingly, the targeted cleavage of a specific unique may result in disease or even death of the Subject. Accord sequence within a genome opens up new avenues for gene ingly, it is desirable to employ new strategies in designing targeting and gene modification in living cells, including cells nucleases and recombinases having the greatest chance of that are hard to manipulate with conventional gene targeting 60 minimizing off-target effects. methods, Such as many human Somatic or embryonic stem The methods and compositions of the present disclosure cells. Another approach utilizes site-specific recombinases, represent, in Some aspects, an improvement over previous which possess all the functionality required to bring about methods and compositions providing nucleases (and methods efficient, precise integration, deletion, inversion, or translo of their use) and recombinases (and methods of their use) cation of specified DNA segments. 65 engineered to have improved specificity for their intended Nuclease-mediated modification of disease-related targets. For example, nucleases and recombinases known in sequences, e.g., the CCR-5 allele in HIV/AIDS patients, or of the art, both naturally occurring and those engineered, typi US 9,388,430 B2 35 36 cally have a target (e.g., DNA) binding domain that recog (2013): Jinek, M. et al. Science 337, 816-821 (2012); Gasiu nizes a particular sequence. Additionally, known nucleases nas, G., et al., Cas9-crRNA ribonucleoprotein complex medi and recombinases may comprise a DNA binding domain and ates specific DNA cleavage for adaptive immunity in bacteria. a catalytic domain in a single protein capable of inducing Proc. Natl. Acad. Sci. 109, E2579-E2586 (2012). Accord cleavage or recombination, and as such the chance for off ingly, aspects of the present disclosure aim at reducing the target effects are increased as cleavage or recombination chances for Cas9 off-target effects using novel engineered likely occurs upon off-target binding of the nuclease or Cas9 variants. In one example, a Cas9 variant (e.g., f(as9) is recombinase, respectively. Aspects of the present invention provided which has improved specificity as compared to the relate to the recognition that increasing the number of Cas9 nickases or wild type Cas9, exhibiting, e.g., >10-fold, sequences (e.g., having a nuclease bind at more than one site 10 >50-fold, >100-fold, >140-fold, >200-fold, or more, higher at a desired target), and/or splitting the activities (e.g., target specificity than wild type Cas9 (see e.g., the Examples). binding and target cleaving) of a nuclease between two or Other aspects of the present disclosure provide strategies, more proteins, will increase the specificity of a nuclease and methods, compositions, and systems utilizing inventive thereby decrease the likelihood of off-target effects. Other RNA-guided (e.g., RNA-programmable) Cas9-recombinase aspects of the present invention relate to the recognition that 15 fusion proteins. Whereas typical recombinases recognize and fusions between the catalytic domain of recombinases (or recombine distinct target sequences, the Cas9-recombinase recombinases having inactive DNA binding domains) and fusions provided herein use RNA:DNA hybridization to nuclease-inactivated RNA-programmable nucleases allow determine target DNA recombination sites, enabling the for the targeted recombination of DNA at any location. fusion proteins to recombine, in principle, any region speci In the context of site-specific nucleases, the strategies, fied by the gRNA(s). methods, compositions, and systems provided herein can be While of particular relevance to DNA and DNA-cleaving utilized to improve the specificity of any site-specific nucleases and/or recombinases, the inventive concepts, meth nuclease, for example, variants of the Cas9 endonuclease, ods, strategies and systems provided herein are not limited in Zinc Finger Nucleases (ZFNs) and Transcription Activator this respect, but can be applied to any nucleic acid:nuclease or Like Effector Nucleases (TALENs). Suitable nucleases for 25 nucleic acid:recombinase system. modification as described herein will be apparent to those of Nucleases skill in the art based on this disclosure. Some aspects of this disclosure provide site-specific In certain embodiments, the strategies, methods, composi nucleases with enhanced specificity that are designed using tions, and systems provided herein are utilized to improve the the methods and strategies described herein. Some embodi specificity of the RNA-guided (e.g., RNA-programmable) 30 ments of this disclosure provide nucleic acids encoding Such endonuclease Cas9. Whereas typical endonucleases recog nucleases. Some embodiments of this disclosure provide nize and cleave a single target sequence, Cas9 endonuclease expression constructs comprising such encoding nucleic uses RNA:DNA hybridization to determine target DNA acids (See, e.g., FIG. 20). For example, in some embodiments cleavage sites, enabling a single monomeric protein to cleave, an isolated nuclease is provided that has been engineered to in principle, any sequence specified by the guide RNA 35 cleave a desired target site within a genome. In some embodi (gRNA). While Cas9:guide RNA complexes have been suc ments, the isolated nuclease is a variant of an RNA-program cessfully used to modify both cells (Cong, L. et al. Multiplex mable nuclease, such as a Cas9 nuclease. genome engineering using CRISPR/Cas systems. Science. In one embodiment, fusion proteins are provided compris 339, 819-823 (2013); Mali, P. et al. RNA-guided human ing two domains: (i) an RNA-programmable nuclease (e.g., genome engineering via Cas9. Science. 339, 823-826 (2013): 40 Cas9 protein, or fragment thereof) domain fused or linked to Jinek, M. et al. RNA-programmed genome editing in human (ii) a nuclease domain. For example, in Some aspects, the cells. eLife 2, e()0471 (2013)) and organisms (Hwang, W.Y. et Cas9 protein (e.g., the Cas9 domain of the fusion protein) al. Efficient genome editing in Zebrafish using a CRISPR-Cas comprises a nuclease-inactivated Cas9 (e.g., a Cas9 lacking system. Nature Biotechnology. 31, 227-229 (2013)), a study DNA cleavage activity; “dcas9) that retains RNA (gRNA) using Cas9:guide RNA complexes to modify Zebrafish 45 binding activity and is thus able to bind a target site comple embryos observed toxicity (e.g., off-target effects) at a rate mentary to agRNA. In some aspects, the nucleasefused to the similar to that of ZFNs and TALENs (Hwang, W.Y. et al. nuclease-inactivated Cas9 domain is any nuclease requiring Nature Biotechnology. 31, 227-229 (2013)). Further, while dimerization (e.g., the coming together of two monomers of recently engineered variants of Cas9 that cleave only one the nuclease) in order to cleave a target nucleic acid (e.g., DNA strand (“nickases”) enable double-stranded breaks to be 50 DNA). In some embodiments, the nuclease fused to the specified by two distinct gRNA sequences (Cho, S. W. et al. nuclease-inactivated Cas9 is a monomer of the FokI DNA Analysis of off-target effects of CRISPR/Cas-derived RNA cleavage domain, e.g., thereby producing the Cas9 variant guided endonucleases and nickases. Genome Res. 24, 132 referred to as f(as9. The FokI DNA cleavage domain is 141 (2013); Ran, F. A. etal. Double Nicking by RNA-Guided known, and in some aspects corresponds to amino acids 388 CRISPR Cas9 for Enhanced Genome Editing Specificity. 55 583 of FokI (NCBI accession number J04623). In some Cell 154, 1380-1389 (2013); Mali, P. et al. CAS9 transcrip embodiments, the FokIDNA cleavage domain corresponds to tional activators for target specificity Screening and paired amino acids 300-583,320-583,340-583, or 360-583 of FokI. nickases for cooperative genome engineering. Nat. Biotech See also Wah et al., “Structure of FokI has implications for mol. 31, 833-838 (2013)), these variants still suffer from off DNA cleavage” Proc. Natl. Acad. Sci. USA. 1998; 1:95(18): target cleavage activity (Ran, F. A. et al. Double Nicking by 60 10564-9: Liet al., “TAL nucleases (TALNs): hybrid proteins RNA-Guided CRISPR Cas9 for Enhanced Genome Editing composed of TAL effectors and FokI DNA-cleavage domain Specificity. Cell 154, 1380-1389 (2013); Fu, Y., et al., Nucleic Acids Res. 2011;39(1):359-72; Kim et al., “Hybrid Improving CRISPR-Cas nuclease specificity using truncated restriction enzymes: Zinc finger fusions to FokI cleavage guide RNAs. Nat. Biotechnol. (2014)) arising from the ability domain Proc. Natl. Acad. Sci. USA. 1996: 93:1156-1160; the of each monomeric nickase to remain active when individu 65 entire contents of each are herein incorporated by reference). ally bound to DNA (Cong, L. et al. Multiplex Genome Engi In some embodiments, the FokI DNA cleavage domain cor neering Using CRISPR/Cas Systems. Science 339, 819-823 responds to, or comprises in part or whole, the amino acid US 9,388,430 B2 37 38 sequence set forth as SEQID NO:6. In some embodiments, CLTA, EMX, and VEGF off-target sites) are amplified from the FokI DNA cleavage domain is a variant of FokI (e.g., a genomic DNA isolated from cells (e.g., HEK293) treated variant of SEQID NO:6), as described herein. with a particular Cas9 protein or variant. The modifications In some embodiments, a dimer of the fusion protein is are then analyzed by high-throughput sequencing. Sequences provided, e.g., dimers of fas9. For example, in some containing insertions or deletions of two or more base pairs in embodiments, the fusion protein forms a dimer with itself to potential genomic off-target sites and present in significantly mediate cleavage of the target nucleic acid. In some embodi greater numbers (P values0.005, Fisher's exact test) in the ments, the fusion proteins, or dimers thereof, are associated target gRNA-treated samples versus the control gRNA with one or more gRNAS. In some aspects, because the dimer treated Samples are considered Cas9 nuclease-induced contains two fusion proteins, each having a Cas9 domain 10 genome modifications. having gRNA binding activity, a target nucleic acid is targeted Cas9 Nickase (Nucleotide Sequence): using two distinct gRNA sequences that complement two distinct regions of the nucleic acid target. See, e.g., FIGS. 1A, 6D. Thus, in this example, cleavage of the target nucleic acid (SEO ID NO: 7) does not occur until both fusion proteins bind the target 15 ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTA. nucleic acid (e.g., as specified by the gRNA:target nucleic acid base pairing), and the nuclease domains dimerize (e.g., CAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTA the FokI DNA cleavage domains; as a result of their proximity TCCACGGAGTCCCAGCAGCCGATAAAAAGTATTCTATTGGTTTAGCTATC based on the binding of the Cas9:gRNA domains of the fusion proteins) and cleave the target nucleic acid, e.g., in the region GGCACTAATTCCGTTGGATGGGCTGTCATAACCGATGAATACAAAGTACC between the bound Cas9 fusion proteins (the “spacer TTCAAAGAAATTTAAGGTGTTGGGGAACACAGACCGTCATTCGATTAAAA sequence'). This is exemplified by the schematics shown in FIGS. 1A and 6D. This approach represents a notable AGAATCTTATCGGTGCCCTCCTATTCGATAGTGGCGAAACGGCAGAGGCG improvement over wild type Cas9 and other Cas9 variants, ACTCGCCTGAAACGAACCGCTCGGAGAAGGTATACACGTCGCAAGAACCG such as the nickases (Ran et al. Double Nicking by RNA 25 Guided CRISPR Cas9 for Enhanced Genome Editing Speci AATATGTTACTTACAAGAAATTTTTAGCAATGAGATGGCCAAAGTTGACG ficity. Cell 154, 1380-1389 (2013); Mali et al. CAS9 tran Scriptional activators for target specificity Screening and ATTCTTTCTTTCACCGTTTGGAAGAGTCCTTCCTTGTCGAAGAGGACAAG paired nickases for cooperative genome engineering. Nat. AAACATGAACGGCACCCCATCTTTGGAAACATAGTAGATGAGGTGGCATA Biotechnol. 31, 833-838 (2013)), which do not require the 30 dimerization of nuclease domains to cleave a nucleic acid. TCATGAAAAGTACCCAACGATTTATCACCTCAGAAAAAAGCTAGTTGACT These nickase variants, as described in the Examples, can induce cleaving, or nicking upon binding of a single nickase CAACTGATAAAGCGGACCTGAGGTTAATCTACTTGGCTCTTGCCCATATG to a nucleic acid, which can occur at on- and off-target sites, ATAAAGTTCCGTGGGCACTTTCTCATTGAGGGTGATCTAAATCCGGACAA and nicking is known to induce mutagenesis. An exemplary 35 nucleotide encoding a Cas9 nickase (SEQ ID NO:7) and an CTCGGATGTCGACAAACTGTTCATCCAGTTAGTACAAACCTATAATCAGT exemplary amino acid sequence of Cas9 nickase (SEQ ID TGTTTGAAGAGAACCCTATAAATGCAAGTGGCGTGGATGCGAAGGCTATT NO:8) are provided below. As the variants provided herein require the binding of two Cas9 variants in proximity to one CTTAGCGCCCGCCTCTCTAAATCCCGACGGCTAGAAAACCTGATCGCACA another to induce target nucleic acid cleavage, the chances of 40 inducing off-target cleavage is reduced. See, e.g., the ATTACCCGGAGAGAAGAAAAATGGGTTGTTCGGTAACCTTATAGCGCTCT Examples. For example, in some embodiments, a Cas9 Vari CACTAGGCCTGACACCAAATTTTAAGTCGAACTTCGACTTAGCTGAAGAT ant fused to a nuclease domain (e.g., f(Cas9) has an on-target: off-target modification ratio that is at least 2-fold, at least GCCAAATTGCAGCTTAGTAAGGACACGTACGATGACGATCTCGACAATCT 5-fold, at least 10-fold, at least 20-fold, at least 30-fold, at 45 ACTGGCACAAATTGGAGATCAGTATGCGGACTTATTTTTGGCTGCCAAAA least 40-fold, at least 50-fold, at least 60-fold, at least 70-fold, at least 80-fold, at least 90-fold, at least 100-fold, at least ACCTTAGCGATGCAATCCTCCTATCTGACATACTGAGAGTTAATACTGAG 110-fold, at least 120-fold, at least 130-fold, at least 140-fold, ATTACCAAGGCGCCGTTATCCGCTTCAATGATCAAAAGGTACGATGAACA at least 150-fold, at least 175-fold, at least 200-fold, at least 250-fold, or more higher than the on-target: off-target modi 50 TCACCAAGACTTGACACTTCTCAAGGCCCTAGTCCGTCAGCAACTGCCTG fication ratio of a wild type Cas9 or other Cas9 variant (e.g., nickase). In some embodiments, a Cas9 variant fused to a AGAAATATAAGGAAATATTCTTTGATCAGTCGAAAAACGGGTACGCAGGT nuclease domain (e.g., f(as9) has an on-target:off-target TATATTGACGGCGGAGCGAGTCAAGAGGAATTCTACAAGTTTATCAAACC modification ratio that is between about 60- to 180-fold, between about 80- to 160-fold, between about 100- to 150 55 CATATTAGAGAAGATGGATGGGACGGAAGAGTTGCTTGTAAAACT CAATC fold, or between about 120- to 140-fold higher than the on target:off-target modification ratio of a wild type Cas9 or GCGAAGATCTACTGCGAAAGCAGCGGACTTTCGACAACGGTAGCATTCCA other Cas9 variant. Methods for determining on-target:off CATCAAATCCACTTAGGCGAATTGCATGCTATACTTAGAAGGCAGGAGGA target modification ratios are known, and include those TTTTTATCCGTTCCT CAAAGACAATCGTGAAAAGATTGAGAAAATCCTAA described in the Examples. In certain embodiments, the on 60 target:off-target modification ratios are determined by mea CCTTTCGCATACCTTACTATGTGGGACCCCTGGCCCGAGGGAACTCTCGG Suring the number or amount of modifications of known Cas9 off-target sites in certain genes. For example, the Cas9 off TTCGCATGGATGACAAGAAAGTCCGAAGAAACGATTACTCCATGGAATTT target sites of the CLTA, EMX, and VEGF genes are known, TGAGGAAGTTGTCGATAAAGGTGCGTCAGCTCAATCGTTCATCGAGAGGA and modifications at these sites can be measured and com 65 pared between test proteins and controls. The target site and TGACCAACTTTGACAAGAATTTACCGAACGAAAAAGTATTGCCTAAGCAC its corresponding known off-target sites (see, e.g., Table 5 for

US 9,388,430 B2 41 42 - Continued fCas9 (e.g., dCas'9-NLS-GGS3linker-FokI): PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEWLDATLIHO

SITGLYETRIDLSQLGGD (SEO ID NO: 9) ATGGATAAAAAGTATTCTATTGGTTTAGCTATCGGCACTAATTCCGTTGG In some embodiments, the gRNAs which bind the Cas9 5 variants (e.g., f(Cas9) can be oriented in one of two ways, with ATGGGCTGTCATAACCGATGAATACAAAGTACCTTCAAAGAAATTTAAGG respect to the spacer sequence, deemed the “A” and “B” orientations. In orientation A, the region of the gRNAs that TGTTGGGGAACACAGACCGTCATTCGATTAAAAAGAATCTTATCGGTGCC bind the PAM motifs is distal to the spacer sequence with the CTCCTATTCGATAGTGGCGAAACGGCAGAGGCGACTCGCCTGAAACGAAC 5' ends of the gRNAs adjacent to the spacer sequence (FIG. 10 6C); whereas in orientation B, the region of the gRNAs that CGCTCGGAGAAGGTATACACGTCGCAAGAACCGAATATGTTACTTACAAG bind the PAM motifs is adjacent to the spacer sequence (FIG. AAATTTTTAGCAATGAGATGGCCAAAGTTGACGATTCTTTCTTTCACCGT 9). In some embodiments, the gRNAs are engineered or TTGGAAGAGTCCTTCCTTGTCGAAGAGGACAAGAAACATGAACGGCACCC selected to bind (e.g., as part of a complex with a Cas9 variant, 15 such as f(as9) to a target nucleic acid in the A or B orienta CATCTTTGGAAACATAGTAGATGAGGTGGCATATCATGAAAAGTACCCAA tion. In some embodiments, the gRNAS are engineered or selected to bind (e.g., as part of a complex with a Cas9 variant CGATTTATCACCTCAGAAAAAAGCTAGTTGACTCAACTGATAAAGCGGAC Such as fas9) to a target nucleic acid in the A orientation. In CTGAGGTTAATCTACTTGGCTCTTGCCCATATGATAAAGTTCCGTGGGCA Some embodiments, the gRNAS are engineered or selected to bind (e.g., as part of a complex with a Cas9 variant, Such as CTTTCTCATTGAGGGTGATCTAAATCCGGACAACTCGGATGTCGACAAAC fCas9) to a target nucleic acid in the Borientation. TGTTCATCCAGTTAGTACAAACCTATAATCAGTTGTTTGAAGAGAACCCT In some embodiments, the domains of the fusion protein are linked via a linker e.g., as described herein. In certain ATAAATGCAAGTGGCGTGGATGCGAAGGCTATTCTTAGCGCCCGCCTCTC 25 embodiments, the linker is a peptide linker. In other embodi TAAATCCCGACGGCTAGAAAACCTGATCGCACAATTACCCGGAGAGAAGA ments, the linker is a non-peptidic linker. In some embodi ments, a functional domain is linked via a peptide linker (e.g., AAAATGGGTTGTTCGGTAACCTTATAGCGCTCTCACTAGGCCTGACACCA fused) or a non-peptidic linker to an inventive fusion protein. AATTTTAAGTCGAACTTCGACTTAGCTGAAGATGCCAAATTGCAGCTTAG In some embodiments, the functional domain is a nuclear localization signal (NLS) domain. An NLS domain com 30 TAAGGACACGTACGATGACGATCTCGACAATCTACTGGCACAAATTGGAG prises an amino acid sequence that "tags' or signals a protein for import into the cell nucleus by nuclear transport. Typi ATCAGTATGCGGACTTATTTTTGGCTGCCAAAAACCTTAGCGATGCAATC cally, this signal consists of one or more short sequences of CTCCTATCTGACATACTGAGAGTTAATACTGAGATTACCAAGGCGCCGTT positively charged lysines or arginines exposed on the protein Surface. NLS sequences are well known in the art (See e.g., 35 ATCCGCTTCAATGATCAAAAGGTACGATGAACATCACCAAGACTTGACAC Lange et al., "Classical nuclear localization signals: defini TTCTCAAGGCCCTAGTCCGTCAGCAACTGCCTGAGAAATATAAGGAAATA tion, function, and interaction with importin alpha.” J Biol. Chem. 2007 Feb. 23; 282(8):5101-5; the entire contents of TTCTTTGATCAGTCGAAAAACGGGTACGCAGGTTATATTGACGGCGGAGC which is hereby incorporated by reference), and include, for 40 GAGTCAAGAGGAATTCTACAAGTTTATCAAACCCATATTAGAGAAGATGG example those described in the Examples section. In some embodiments, the NLS sequence comprises, in part or in ATGGGACGGAAGAGTTGCTTGTAAAACT CAATCGCGAAGATCTACTGCGA whole, the amino acid sequence MAPKKKRKVGIHRGVP AAGCAGCGGACTTTCGACAACGGTAGCATTCCACATCAAATCCACTTAGG

(SEQID NO:318). The domains (e.g., two or more of a gRNA CGAATTGCATGCTATACTTAGAAGGCAGGAGGATTTTTATCCGTTCCTCA binding domain (dCas9 domain), a catalytic nuclease 45 domain, and a NLS domain) associated via a linker can be AAGACAATCGTGAAAAGATTGAGAAAATCCTAACCTTTCGCATACCTTAC linked in any orientation or order. For example, in some embodiments, any domain can be at the N-terminus, the TATGTGGGACCCCTGGCCCGAGGGAACTCTCGGTTCGCATGGATGACAAG C-terminus, or in between the domains at the N- and C-ter AAAGTCCGAAGAAACGATTACTCCATGGAATTTTGAGGAAGTTGTCGATA mini of the fusion protein. In some embodiments, the orien 50 tation or order of the domains in an inventive fusion protein AAGGTGCGTCAGCTCAATCGTTCATCGAGAGGATGACCAACTTTGACAAG are as provided in FIG. 6B. In some embodiments, wherein AATTTACCGAACGAAAAAGTATTGCCTAAGCACAGTTTACTTTACGAGTA the fusion protein comprises three domains (e.g., a gRNA TTTCACAGTGTACAATGAACTCACGAAAGTTAAGTATGTCACTGAGGGCA binding domain (e.g., dCas9 domain), a nuclease domain 55 (e.g., FokI), and an NLS domain), each domain is connected TGCGTAAACCCGCCTTTCTAAGCGGAGAACAGAAGAAAGCAATAGTAGAT via a linker, as provided herein. In some embodiments, the domains are not connected via a linker. In some embodi CTGTTATTCAAGACCAACCGCAAAGTGACAGTTAAGCAATTGAAAGAGGA ments, one or more of the domains is/are connected via a CTACTTTAAGAAAATTGAATGCTTCGATTCTGTCGAGATCTCCGGGGTAG linker. 60 In some embodiments, an inventive fusion protein (e.g., AAGATCGATTTAATGCGTCACTTGGTACGTATCATGACCTCCTAAAGATA fCas9) comprising three domains (e.g., a gRNA binding domain (e.g., dCas9 domain), a nuclease domain (e.g., FokI), ATTAAAGATAAGGACTTCCTGGATAACGAAGAGAATGAAGATATCTTAGA and an NLS domain) is encoded by a nucleotide sequence (or AGATATAGTGTTGACTCTTACCCTCTTTGAAGATCGGGAAATGATTGAGG fragment or variant thereof) set forth as SEQID NO:9, SEQ 65 ID NO:10, SEQ ID NO:11, SEQ ID NO:12, or SEQ ID AAAGACTAAAAACATACGCTCACCTGTTCGACGATAAGGTTATGAAACAG NO:319, as shown below.

US 9,388,430 B2 53 54 - Continued - Continued ATTATTGGGGATAACGATTATGGAGCGCTCGTCTTTTGAAAAGAACCCCA ACCAACTTTGACAAGAATTTACCGAACGAAAAAGTATTGCCTAAGCACAG TCGACTTCCTTGAGGCGAAAGGTTACAAGGAAGTAAAAAAGGATCTCATA TTTACTTTACGAGTATTTCACAGTGTACAATGAACTCACGAAAGTTAAGT ATTAAACTACCAAAGTATAGTCTGTTTGAGTTAGAAAATGGCCGAAAACG ATGTCACTGAGGGCATGCGTAAACCCGCCTTTCTAAGCGGAGAACAGAAG GATGTTGGCTAGCGCCGGAGAGCTTCAAAAGGGGAACGAACTCGCACTAC AAAGCAATAGTAGATCTGTTATTCAAGACCAACCGCAAAGTGACAGTTAA CGTCTAAATACGTGAATTTCCTGTATTTAGCGTCCCATTACGAGAAGTTG GCAATTGAAAGAGGACTACTTTAAGAAAATTGAATGCTTCGATTCTGTCG 10 AAAGGTTCACCTGAAGATAACGAACAGAAGCAACTTTTTGTTGAGCAGCA AGATCTCCGGGGTAGAAGATCGATTTAATGCGTCACTTGGTACGTATCAT CAAACATTATCTCGACGAAATCATAGAGCAAATTTCGGAATTCAGTAAGA GACCTCCTAAAGATAATTAAAGATAAGGACTTCCTGGATAACGAAGAGAA GAGTCATCCTAGCTGATGCCAATCTGGACAAAGTATTAAGCGCATACAAC TGAAGATATCTTAGAAGATATAGTGTTGACTCTTACCCTCTTTGAAGATC 15 AAGCACAGGGATAAACCCATACGTGAGCAGGCGGAAAATATTATCCATTT GGGAAATGATTGAGGAAAGACTAAAAACATACGCTCACCTGTTCGACGAT GTTTACTCTTACCAACCTCGGCGCTCCAGCCGCATTCAAGTATTTTGACA AAGGTTATGAAACAGTTAAAGAGGCGTCGCTATACGGGCTGGGGACGATT CAACGATAGATCGCAAACGATACACTTCTACCAAGGAGGTGCTAGACGCG GTCGCGGAAACTTATCAACGGGATAAGAGACAAGCAAAGTGGTAAAACTA. ACACTGATTCACCAATCCATCACGGGATTATATGAAACTCGGATAGATTT TTCTCGATTTTCTAAAGAGCGACGGCTTCGCCAATAGGAACTTTATGCAG GTCACAGCTTGGGGGTGAC CTGATCCATGATGACTCTTTAACCTTCAAAGAGGATATACAAAAGGCACA In some embodiments, an inventive fusion protein (e.g., GGTTTCCGGACAAGGGGACTCATTGCACGAACATATTGCGAATCTTGCTG fCas9) corresponds to or is encoded by a homologue of any GTTCGCCAGCCATCAAAAAGGGCATACTCCAGACAGTCAAAGTAGTGGAT 25 one of SEQID NO:9-12 or SEQID NO:319. In some embodiments, an inventive fusion protein (e.g., GAGCTAGTTAAGGTCATGGGACGTCACAAACCGGAAAACATTGTAATCGA fCas9) comprises, in part of in whole, one or more of the GATGGCACGCGAAAATCAAACGACT CAGAAGGGGCAAAAAAACAGTCGAG amino acid sequences set forth as SEQ ID NO:5, SEQ ID NO:320, SEQ ID NO:6, SEQ ID NO:16, SEQ ID NO:318, AGCGGATGAAGAGAATAGAAGAGGGTATTAAAGAACTGGGCAGCCAGATC 30 and SEQ ID NO:321, as provided herein and shown below. The various domains corresponding to SEQID NO:5, SEQ TTAAAGGAGCATCCTGTGGAAAATACCCAATTGCAGAACGAGAAACTTTA ID NO:320, SEQIDNO:6, SEQIDNO:16, SEQID NO:318, CCTCTATTACCTACAAAATGGAAGGGACATGTATGTTGATCAGGAACTGG and SEQ ID NO:321 may be arranged in any order with respect to each other. For example, in some embodiments, a ACATAAACCGTTTATCTGATTACGACGTCGATGCCATTGTACCCCAATCC 35 dCas9 domain (e.g., SEQID NO:5 or SEQID NO:320) is at TTTTTGAAGGACGATTCAATCGACAATAAAGTGCTTACACGCTCGGATAA the amino or carboxy terminus, or is somewhere in between the amino and carboxy termini. Similarly, each of the other GAACCGAGGGAAAAGTGACAATGTTCCAAGCGAGGAAGTCGTAAAGAAAA domains corresponding to SEQ ID NO:6, SEQ ID NO:16, TGAAGAACTATTGGCGGCAGCTCCTAAATGCGAAACTGATAACGCAAAGA SEQID NO:318, and SEQID NO:321 may beat the amino or 40 carboxy terminus, or somewhere in between the amino and AAGTTCGATAACTTAACTAAAGCTGAGAGGGGTGGCTTGTCTGAACTTGA carboxy termini of an inventive fusion protein (e.g., f(as9). Examples of inventive fusion proteins having various domain CAAGGCCGGATTTATTAAACGTCAGCTCGTGGAAACCCGCCAAATCACAA arrangements include the inventive fusion proteins corre AGCATGTTGCACAGATACTAGATTCCCGAATGAATACGAAATACGACGAG sponding to SEQID NOs:9-12 and SEQID NO:319. In some 45 embodiments, an inventive fusion protein comprises addi AACGATAAGCTGATTCGGGAAGTCAAAGTAATCACTTTAAAGTCAAAATT tional or other domains, such as other linkers, other NLS domains, other nuclease domains, or other Cas9 domains, GGTGTCGGACTTCAGAAAGGATTTTCAATTCTATAAAGTTAGGGAGATAA which may be in addition to or substituted for any of the ATAACTACCACCATGCGCACGACGCTTATCTTAATGCCGTCGTAGGGACC domains as provided herein. 50 FokI Cleavage Domain: GCACTCATTAAGAAATACCCGAAGCTAGAAAGTGAGTTTGTGTATGGTGA

TTACAAAGTTTATGACGTCCGTAAGATGATCGCGAAAAGCGAACAGGAGA (SEQ ID NO : 6) TAGGCAAGGCTACAGCCAAATACTTCTTTTATTCTAACATTATGAATTTC GSOLVKSELEEKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEF 55 TTTAAGACGGAAATCACTCTGGCAAACGGAGAGATACGCAAACGACCTTT FMKWYGYRGKHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGO

AATTGAAACCAATGGGGAGACAGGTGAAATCGTATGGGATAAGGGCCGGG ADEMORYWEENOTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAO

ACTTCGCGACGGTGAGAAAAGTTTTGTCCATGCCCCAAGTCAACATAGTA LTRLINHITNCNGAWLSWEELLIGGEMIKAGTLTLEEWRRKFNNGEINF

AAGAAAACTGAGGTGCAGACCGGAGGGTTTTCAAAGGAATCGATTCTTCC 60 dCas'9:

AAAAAGGAATAGTGATAAGCTCATCGCTCGTAAAAAGGACTGGGACCCGA (SEQ ID NO: 32O) AAAAGTACGGTGGCTTCGATAGCCCTACAGTTGCCTATTCTGTCCTAGTA DKKYSIGLAIGTNSWGWAWITDEYKWPSKKFKWLGNTDRHSIKKNLIGAL

GTGGCAAAAGTTGAGAAGGGAAAATCCAAGAAACTGAAGTCAGTCAAAGA 65 LFDSGETAEATRLKRTARRRYTRRKNRICYLOEIFSNEMAKVDDSFFHRL US 9,388,430 B2 55 56 - Continued matic shown in FIG. 1B. In some embodiments, the linker EESFLWEEDKKHERHPIFGNIWDEWAYHEKYPTIYHLRKKLVDSTDKADL sequence separating the two portions in the extended gRNA has complementarity with the target sequence. In some RLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIOLVOTYNOLFEENPI embodiments, the extended gRNA is at least 50, at least 60, at NASGWDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN least 70, at least 80, at least 90, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, FKSNFDLAEDAKLOLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL at least 350, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, or at least 1000 or more nucleotides LSDILRVNTEITKAPLSASMIKRYDEHHODLTLLKALWRQOLPEKYKEIF in length. Whether the fusion proteins are coordinated FDOSKNGYAGYIDGGASOEEFYKFIKPILEKMDGTEELLWKLNREDLLRK 10 through separate or a single gRNA, to form dimers that can cleave a target nucleic acid, it is expected that the specificity ORTFDNGSIPHOIHLGELHAILRROEDFYPFLKDNREKIEKILTFRIPYYH of Such cleavage is enhanced (e.g., reduced or no off-target WGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAOSFIERMTNFDKN cleavage) as compared to nucleases having a single target nucleic acid binding site. Methods for determining the speci LPNEKVLPKHSLLYEYFTWYNELTKWKYVTEGMRKPAFLSGEOKKAIVDL 15 ficity of a nuclease are known (see e.g., published PCT Appli cation, WO 2013/066438; pending provisional application LFKTNRKWTVKOLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKII U.S. 61/864.289; and Pattanayak, V., Ramirez, C. L., Joung, J. KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKOL K. & Liu, D. R. Revealing off-target cleavage specificities of zinc-finger nucleases by in vitro selection. Nature Methods 8, KRRRYTGWGRLSRKLINGIRDKOSGKTILDFLKSDGFANRNFMOLIHDDS 765-770 (2011), the entire contents of each of which are LTFKEDIOKAOVSGOGDSLHEHIANLAGSPAIKKGILOTVKVVDELVKVM incorporated herein by reference). According to another embodiment, dimers of Cas9 protein GRHKPENIVIEMARENOTTOKGOKNSRERMKRIEEGIKELGSOILKEHPV are provided. In some embodiments, the dimers are coordi ENTOLONEKLYLYYLONGRDMYWDOELDINRLSDYDVDAIVPOSFLKDDS nated through the action of a single extended gRNA that 25 comprises at least two portions that complement the target IDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWROLLNAKLITORKFDNLT nucleic acid. In some embodiments, the portions complemen tary to the target nucleic acid comprise no more than 25, no KAERGGLSELDKAGFIKROLVETROITKHVAOILDSRMNTKYDENDKLIR more than 24, no more than 23, no more than 22, no more than EVKVITLKSKLVSDFRKDFOFYKVREINNYHHAHDAYLNAVVGTALIKKY 21, no more than 20, no more than 19, no more than 18, no 30 more than 17, no more than 16, no more than 15, no more than PKLESEFWYGDYKWYDWRKMIAKSEOEIGKATAKYFFYSNIMNFFKTEIT 14, no more than 13, no more than 12, no more than 11, no LANGEIRKRPLIETNGETGEIWWDKGRDFATWRKVLSMPOVNIVKKTEVO more than 10, no more than 9, no more than 8, no more than 7, no more than 6, or no more than 5 nucleotides that comple TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTWAYSWLWWAKWEK ment the target nucleic acid. In some embodiments, the por 35 tions complementary to the target nucleic acid comprise 5-30, GKSKKLKSWKELLGITIMERSSFEKNPIDFILEAKGYKEWKKDLIIKLPKY 5-25, or 5-20 nucleotides. In some embodiments, the portions SLFELENGRKRMLASAGELOKGNELALPSKYWNFLYLASHYEKLKGSPED complementary to the target nucleic acid comprise 15-25, 19-21, or 20 nucleotides. In some embodiments, the portions NEOKOLFVEOHKHYLDEIIEOISEFSKRVILADANLDKVLSAYNKHRDKP comprise the same number of nucleotides that complement 40 the target nucleic acid. In some embodiments, the portions IREOAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEWLDATLIHOS comprise different numbers of nucleotides that complement ITGLYETRIDLSQLGGD the target nucleic acid. For example, in Some embodiments, the extended gRNA comprises two portions that complement 3XFLAG TAG: (e.g., and hybridize to) the target nucleic acid, each portion 45 comprising 5-19, 10-15, or 10 nucleotides that complement (SEQ ID NO: 321) the target nucleic acid. Without wishing to be bound by any MDYKDHDGDYKDHDIDYKDDDDK particular theory, having the portions comprise fewer than approximately 20 nucleotides typical of gRNAS (e.g., having NLS Domain: the portions comprise approximately 5-19, 10-15, or 10 50 complementary nucleotides), ensures that a single Cas9: (SEQ ID NO: 318) gRNA unit cannot bind efficiently by itself. Thus the coop MAPKKKRKWGIHRGWP erative binding between Cas9 proteins coordinated by such an extended gRNA improves the specificity and cleavage of XTEN Linker: intended target nucleic acids. In some embodiments, the 55 linker sequence separating the two portions of the extended gRNA has complementarity with the target sequence. For (SEQ ID NO: 16) SGSETPGTSESATPES example, in Some embodiments, the linker sequence has at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at In some embodiments, the fusion proteins forming the least 11, at least 12, at least 13, at least 14, at least 15, at least dimer are coordinated through the action of a single extended 60 20, at least 25, at least 30, at least 40, or at least 50 nucleotides gRNA (e.g., as opposed to two separate gRNAS, each binding that complement the target nucleic acid. Without wishing to a monomer of the fusion protein dimer). Thus, in some be bound by any particular theory, it is believed that having an aspects, the single extended gRNA contains at least two por extended gRNA that comprises multiple binding sites (e.g., tions, separated by a linker sequence, that complement the multiple low-affinity binding sites), including those that are target nucleic acid (e.g., bind the target nucleic acid at two 65 bound by a Cas9 protein as well as those in the linker distinct sites), and the gRNA is able to bind at least two fusion sequence, provides for increased specificity by promoting proteins, as described herein. This is exemplified by the sche cooperative binding. Certain aspects of this embodiment are US 9,388,430 B2 57 58 shown in FIG. 4. In some embodiments, any of the Cas9 to have improved specificity compared to e.g., a Cas9 protein proteins described herein may be coordinated through a having a single gRNA. This strategy is shown in FIG. 2B. single extended gRNA. In some embodiments, a protein dimer is provided that In another embodiment, proteins comprising a fragment of comprises two fusion proteins: (i) a fusion protein comprising an RNA-programmable nuclease (e.g., Cas9) are provided. 5 a nuclease-inactivated Cas9 and a gRNA binding domain of For example, in Some embodiments, a protein comprising the Cas9 (e.g., a nuclease-inactivated Cas9 fused to a Cas9 gRNA binding domain of Cas9 is provided. In some embodi A-half), and (ii) a fusion protein comprising a nuclease-inac ments, the protein comprising the gRNA binding domain of tivated Cas9 and a DNA cleavage domain (e.g., a nuclease Cas9 does not comprise a DNA cleavage domain (referred to inactivated Cas9 fused to a Cas9 B-half). In some embodi herein as the “A-half of Cas9). In other embodiments, pro 10 ments, the dimer is associated with (e.g., binds) one or more teins comprising the DNA cleavage domain(s) (e.g., the distinct gRNAs. For example, in some embodiments, the HNH, RuvC1 subdomains) of Cas9 are provided. In some dimer is associated with two or three gRNAs. In some embodiments, the “DNA cleavage domain refers collec embodiments, the dimer is associated with three gRNAs. For tively to the portions of Cas9 required for double-stranded example, upon binding of one nuclease-inactivated Cas9: DNA cleavage (e.g., the HNH, RuvC1 subdomains). In some 15 gRNA to a region of a nucleic acid target, and binding of the embodiments, the protein comprising the DNA cleavage other nuclease-inactivated Cas9:gRNA to a second region of domain of Cas9 does not comprise a gRNA binding domain the nucleic acid target, the split Cas9 halves (e.g., A-half and (referred to herein as the “B-half of Cas9). In some embodi B-half of the fusion proteins) can dimerize and bind a third ments, dimers are provided that comprise (i) a protein com gRNA complementary to a third region of the nucleic acid prising the gRNA binding domain of Cas9 (e.g., the A-half), target, to become a fully active Cas9 nuclease, which can and (ii) a protein comprising the DNA cleavage domain of cleave dsDNA. This strategy is illustrated in FIG. 2C. Cas9 (e.g., the B-half). In some embodiments, the dimer is According to another aspect of the invention, minimized bound by a gRNA. For example, such dimers are expected to Cas9 proteins are provided. By “minimized, it is meant that recapitulate the binding and cleaving activities of a full length the Cas9 protein comprises amino acid deletions and/or trun Cas9 protein. In some embodiments, such dimers are referred 25 cations, as compared to the wild type protein, but retains to herein as “dimeric split Cas9. Using a dimeric split Cas9 gRNA binding activity, DNA cleavage activity, or both. Any to cleave a target nucleic acid is expected to provide for of the embodiments herein describing Cas9 proteins (e.g., increased specificity as compared to a single full length Cas9 split Cas9 proteins, Cas9 A-half, Cas9 B-half, nuclease-inac protein because both halves of the protein must be co-local tivated Cas9 fusion proteins, etc.) can utilize a minimized ized to associate and re-fold into a nuclease-active state. This 30 Cas9 protein. In some embodiments, minimized Cas9 pro strategy is shown in the schematic of FIG. 2A. teins comprising N-terminal deletions and/or truncations are In some embodiments, fusion proteins comprising two provided. In some embodiments, minimized Cas9 proteins domains are provided: (i) a protein capable of specifically comprising C-terminal deletions and/or truncations are pro binding a target nucleic acid (e.g., a nuclease-inactivated vided. In some embodiments, minimized Cas9 proteins are RNA programmable nuclease, such as a nuclease-inactivated 35 provided that comprise N- and/or C-terminal deletions and/or Cas9, as described herein) fused or linked to (ii) a fragment of truncations. In some embodiments, the minimized Cas9 pro an RNA-programmable nuclease (e.g., the A- or B-half of tein retains both gRNA binding and DNA cleavage activities. Cas9, as described herein). In some embodiments, domain (i) In some embodiments, the minimized Cas9 protein com of the aforementioned fusion protein comprises a DNA bind prises an N-terminal truncation that removes at least 5, at least ing domain, for example, a DNA binding domain of a Zinc 40 10, at least 15, at least 20, at least 25, at least 40, at least 40, finger or TALE protein. In some embodiments, the fusion at least 50, at least 75, at least 100, at least 150, at least 200, protein comprises (i) a nuclease-inactivated Cas9, and (ii) a at least 250, at least 300, at least 350, at least 400, at least 450, gRNA binding domain of Cas9 (e.g., Cas9 A-half). In some or at least 500 amino acids. In some embodiments, the mini embodiments, domain (ii) of the fusion protein does not mized Cas9 protein comprises a C-terminal truncation that include a DNA cleavage domain. In other embodiments, the 45 removes at least 5, at least 10, at least 15, at least 20, at least fusion protein comprises (i) a nuclease-inactivated Cas9, and 25, at least 40, at least 40, at least 50, at least 75, at least 100, (ii) a DNA cleavage domain (e.g., Cas9 B-half). In some at least 150, at least 200, at least 250, at least 300, at least 350, embodiments, domain (ii) of the fusion protein does not at least 400, at least 450, or at least 500 amino acids. In some include a gRNA binding domain. embodiments, deletions are made within Cas9, for example in In some embodiments, dimers are provided that comprise 50 regions not affecting gRNA binding and/or DNA cleavage. In two proteins: (i) a fusion protein comprising a nuclease Some embodiments, the minimized Cas9 protein is associated inactivated Cas9 and a gRNA binding domain of Cas9 (e.g., with one or more gRNAs. In certain embodiments, the mini nuclease-inactivated Cas9 fused to Cas9 A-half), and (ii) a mized Cas9 protein is associated with one gRNA. protein comprising the DNA cleavage domain of Cas9 (e.g., Recombinases Cas9 B-half). In other embodiments, the dimer comprises (i) 55 Some aspects of this disclosure provide RNA-guided a fusion protein comprising a nuclease-inactivated Cas9 and recombinase fusion proteins that are designed using the meth a DNA cleavage domain of Cas9 (e.g., nuclease-inactivated ods and strategies described herein. Some embodiments of Cas9 fused to Cas9 B-half), and (ii) a protein comprising the this disclosure provide nucleic acids encoding such recombi gRNA binding domain of Cas9 (e.g., Cas9 A-half). In some nases. Some embodiments of this disclosure provide expres embodiments, the protein dimers include one or more 60 sion constructs comprising Such encoding nucleic acids. For gRNAS. For example, in some embodiments, the dimers example, in Some embodiments an isolated recombinase is include two gRNAs: one bound by the nuclease-inactivated provided that has been engineered to recombine a desired Cas9 domain of the fusion protein; the other bound by the target site (e.g., a site targeted by one or more gRNAS bound A-half domain (e.g., either the A-half of the fusion protein, or to one or more of the engineered recombinases) within a the A-half of the dimer not part of the fusion protein). Such a 65 genome, e.g., with another site in the genome or with an dimer (e.g., associated with two gRNAS having sequences exogenous nucleic acid. In some embodiments, the isolated binding separate regions of a target nucleic acid) is expected recombinase comprises a variant of an RNA-programmable US 9,388,430 B2 59 60 nuclease, such as a Cas9 nuclease. In some embodiments, the Cas9 variant is a nuclease-inactivated Cas9 (e.g., dCas9). In (SEQ ID NO: 322) ATGGCCCTGTTTGGCTACGCACGCGTGTCTACCAGTCAACAGTCACTCGA some embodiments, dCas9 is encoded by a nucleotide TTTGCAAGTGAGGGCTCTTAAAGATGCCGGAGTGAAGGCAAACAGAATTT sequence comprising in part or in whole, SEQ ID NO:5 or TTACTGATAAGGCCAGCGGAAGCAGCACAGACAGAGAGGGGCTGGATCTC SEQID NO:320. In some embodiments, dCas9 is encoded by 5 CTGAGAATGAAGGTAAAGGAGGGTGATGTGATCTTGGTCAAAAAATTGGA TCGACTGGGGAGAGACACAGCTGATATGCTTCAGCTTATTAAAGAGTTTG a nucleotide sequence comprising a variant of SEQID NO:5 ACGCTCAGGGTGTTGCCGTGAGGTTTATCGATGACGGCATCTCAACCGAC or SEQID NO:320. TCCTACATTGGTCTTATGTTTGTGACAATTTTGTCCGCTGTGGCTCAGGC In one embodiment, an RNA-guided recombinase fusion TGAGCGGAGAAGGATTCTCGAAAGGACGAATGAGGGACGGCAAGCAGCTA protein is provided. Typically, the fusion protein comprises AGTTGAAAGGTATCAAATTTGGCAGACGAAGG 10 two or more domains. In some embodiments, the fusion pro (SEQ ID NO: 325) tein comprises two domains. In some embodiments, one of MALFGYARVSTSQQSLDLOVRALKDAGVKANRIFTDKASGSSTDREGLDL the two or more domains is a nuclease-inactivated Cas9 (or LRMKVKEGDWILVKKLDRLGRDTADMLOLIKEFDAQGVAVRFIDDGISTD fragment thereof, e.g., Cas9 A-half), for example, those SYIGLMFVTILSAVAOAERRRILERTNEGROAAKLKGIKFGRRR described herein (e.g., dCas9). The Cas9 domain of the 15 Hin Recombinase (Nucleotide: SEQ ID NO:323; Amino recombinase fusion protein is capable of binding one or more Acid: SEQID NO:326): gRNAs, and thereby directs or targets the recombinase fusion protein(s) to a target nucleic acid, e.g., as described herein. Another domain of the two or more domains is a recombi (SEQ ID NO: 323) ATGGCAACCATTGGCTACATAAGGGTGTCTACCATCGACCAAAATATCGA nase, or a fragment thereof, e.g., a catalytic domain of a CCTGCAGCGCAACGCTCTGACATCCGCCAACTGCGATCGGATCTTCGAGG recombinase. By “catalytic domain of a recombinase it is ATAGGATCAGTGGCAAGATCGCCAACCGGCCCGGTCTGAAGCGGGCTCTG meant that a fusion protein includes a domain comprising an AAGTACGTGAATAAGGGCGATACTCTGGTTGTGTGGAAGTTGGATCGCTT GGGTAGATCAGTGAAGAATCTCGTAGCCCTGATAAGCGAGCTGCACGAGA amino acid sequence of (e.g., derived from) a recombinase, GGGGTGCACATTTCCATTCTCTGACCGATTCCATCGATACGTCTAGCGCC Such that the domain is sufficient to induce recombination ATGGGCCGATTCTTCTTTTACGTCATGTCCGCCCTCGCTGAAATGGAGCG when contacted with a target nucleic acid (either alone or with 25 CGAACTTATTGTTGAACGGACTTTGGCTGGACTGGCAGCGGCTAGAGCAC additional factors including other recombinase catalytic AGGGCCGACTTGGA domains which may or may not form part of the fusion pro (SEQ ID NO: 326) tein). In some embodiments, a catalytic domain of a recom MATIGYIRVSTIDONIDLORNAL TSANCDRIFEDRISGKIANRPGLKRAL binase excludes a DNA binding domain of the recombinase. KYWNKGDTLWWWKLDRLGRSWKNLWALISELHERGAHFHSLTDSIDTSSA In some embodiments, the catalytic domain of a recombinase 30 MGRFFFYVMSALAEMERELIVERTLAGLAAARAOGRLG includes part or all of a recombinase, e.g., the catalytic Gin Beta Recombinase (Nucleotide: SEQID NO:324; Amino domain may include a recombinase domain and a DNA bind Acid: SEQID NO:327): ing domain, or parts thereof, or the catalytic domain may include a recombinase domain and a DNA binding domain 35 (SEQ ID NO: 324) that is mutated or truncated to abolish DNA binding activity. ATGCTCATTGGCTATGTAAGGGTCAGCACCAATGACCAAAACACAGACTT Recombinases and catalytic domains of recombinases are GCAACGCAATGCTTTGGTTTGCGCCGGATGTGAACAGATATTTGAAGATA known to those of skill in the art, and include, for example, AACTGAGCGGCACTCGGACAGACAGACCTGGGCTTAAGAGAGCACTGAAA AGACTGCAGAAGGGGGACACCCTGGTCGTCTGGAAACTGGATCGCCTCGG those described herein. In some embodiments, the catalytic ACGCAGCATGAAACATCTGATTAGCCTGGTTGGTGAGCTTAGGGAGAGAG domain is derived from any recombinase. In some embodi 40 GAATCAACTTCAGAAGCCTGACCGACTCCATCGACACCAGTAGCCCCATG ments, the recombinase catalytic domain is a catalytic domain GGACGATTCTTCTTCTATGTGATGGGAGCACTTGCTGAGATGGAAAGAGA of a Tn3 resolvase, a Hin recombinase, or a Gin recombinase. GCTTATTATCGAAAGAACTATGGCTGGTATCGCTGCTGCCCGGAACAAAG In some embodiments, the catalytic domain comprises a Tn3 GCAGACGGTTCGGCAGACCGCCGAAGAGCGGC resolvase (e.g., Stark Tn3 recombinase) that is encoded by a (SEO ID NO: 327) nucleotide sequence comprising, in part or in whole, SEQID 45 MLIGYWRVSTNDONTDLORNALVCAGCEOIFEDKLSGTRTDRPGLKRALK NO:322, as provided below. In some embodiments, a Tn3 RLOKGDTLVWWKLDRLGRSMKHLISLVGELRERGINFRSLTDSIDTSSPM catalytic domain is encoded by a variant of SEQID NO:322. GRFFFYWMGALAEMERELIIERTMAGIAAARNKGRRFGRPPKSG In some embodiments, a Tn3 catalytic domain is encoded by In some embodiments, the recombinase catalytic domain is a polynucleotide (or a variant thereof) that encodes the fused to the N-terminus, the C-terminus, or somewhere in polypeptide corresponding to SEQ ID NO:325. In some 50 between the N- and C-terminiofa Cas9 protein (e.g., SOas9). embodiments, the catalytic domain comprises a Hin recom In some embodiments, the fusion protein further comprises a binase that is encoded by a nucleotide sequence comprising, nuclear localization signal (NLS; e.g., any of those provided in part or in whole, SEQID NO:323, as provided below. In herein). For example, in some embodiments, the general Some embodiments, a Hin catalytic domain is encoded by a architecture of exemplary RNA-guided recombinase fusion variant of SEQ ID NO:323. In some embodiments, a Hin 55 proteins (e.g., Cas9-recombinase fusions) comprise one of catalytic domain is encoded by a polynucleotide (or a variant the following structures: thereof) that encodes the polypeptide corresponding to SEQ NH-Cass-recombinase-COOH, IDNO:326. In some embodiments, the catalytic domain com NH2-recombinase-Cas9. prises a Gin recombinase (e.g., Gin beta recombinase) that is NH-NLS-Cas9-recombinase-COOH, encoded by a nucleotide sequence comprising, in part or in 60 NH-NLS-recombinase-Cas9-COOH, whole, SEQID NO:324, as provided below. In some embodi NH-Cas9-NLS-recombinase-COOH), ments, a Gin catalytic domain is encoded by a variant of SEQ NH-recombinase-NLS-Cas9-COOH, ID NO:324. In some embodiments, a Gin catalytic domain is NH-Cas9-recombinase-NLS-COOH, or encoded by a polynucleotide (or a variant thereof) that NH-recombinase-Cas9-NLS-COOH encodes the polypeptide corresponding to SEQID NO:327. 65 wherein NLS is a nuclear localization signal, NH is the Stark Tn3 Recombinase (Nucleotide: SEQ ID NO:322: N-terminus of the fusion protein, and COOH is the C-termi Amino Acid: SEQID NO:325): nus of the fusion protein. In some embodiments, a linker is US 9,388,430 B2 61 62 inserted between the Cas9 domain and the recombinase incompatible with a Substance or its derivatives, such as by domain, e.g., any linker provided herein. Additional features, producing any undesirable biological effect or otherwise Such as sequence tags (e.g., any of those provided herein), interacting in a deleterious manner with any other compon may also be present. ent(s) of the pharmaceutical composition, its use is contem Pharmaceutical Compositions plated to be within the scope of this disclosure. In some embodiments, any of the nucleases (e.g., fusion In some embodiments, compositions in accordance with proteins comprising nucleases or nuclease domains) and the present invention may be used for treatment of any of a recombinases (e.g., fusion proteins comprising recombinases variety of diseases, disorders, and/or conditions, including or recombinase catalytic domains) described herein are pro but not limited to one or more of the following: autoimmune vided as part of a pharmaceutical composition. For example, 10 disorders (e.g. diabetes, lupus, multiple Sclerosis, psoriasis, Some embodiments provide pharmaceutical compositions rheumatoid arthritis); inflammatory disorders (e.g. arthritis, comprising a nuclease and/or recombinase as provided pelvic inflammatory disease); infectious diseases (e.g. viral herein, or a nucleic acid encoding Such a nuclease and/or infections (e.g., HIV. HCV, RSV), bacterial infections, fungal recombinase, and a pharmaceutically acceptable excipient. infections, sepsis); neurological disorders (e.g. Alzheimer's Pharmaceutical compositions may optionally comprise one 15 disease, Huntington's disease; autism; Duchenne muscular or more additional therapeutically active Substances. dystrophy); cardiovascular disorders (e.g. atherosclerosis, In some embodiments, compositions provided herein are hypercholesterolemia, thrombosis, clotting disorders, angio administered to a Subject, for example, to a human Subject, in genic disorders such as macular degeneration); proliferative order to effect a targeted genomic modification within the disorders (e.g. cancer, benign neoplasms); respiratory disor subject. In some embodiments, cells are obtained from the ders (e.g. chronic obstructive pulmonary disease); digestive Subject and are contacted with a nuclease and/or recombinase disorders (e.g. inflammatory bowel disease, ulcers); muscu ex vivo. In some embodiments, cells removed from a subject loskeletal disorders (e.g. fibromyalgia, arthritis); endocrine, and contacted ex vivo with an inventive nuclease and/or metabolic, and nutritional disorders (e.g. diabetes, osteoporo recombinase are re-introduced into the Subject, optionally sis); urological disorders (e.g. renal disease); psychological after the desired genomic modification has been effected or 25 disorders (e.g. depression, Schizophrenia); skin disorders detected in the cells. Although the descriptions of pharma (e.g. wounds, eczema); blood and lymphatic disorders (e.g. ceutical compositions provided herein are principally anemia, hemophilia); etc. directed to pharmaceutical compositions which are Suitable Methods for Site-Specific Nucleic Acid Cleavage for administration to humans, it will be understood by the In another embodiment of this disclosure, methods for skilled artisan that such compositions are generally Suitable 30 site-specific nucleic acid (e.g., DNA) cleavage are provided. for administration to animals of all sorts. Modification of In some embodiments, the methods comprise contacting a pharmaceutical compositions suitable for administration to DNA with any of the Cas9:gRNA complexes described humans in order to render the compositions suitable for herein. For example, in some embodiments, the method com administration to various animals is well understood, and the prises contacting a DNA with a fusion protein (e.g., f(as9) ordinarily skilled Veterinary pharmacologist can design and/ 35 that comprises two domains: (i) a nuclease-inactivated Cas9 or perform Such modification with merely ordinary, if any, (dCas9); and (ii) a nuclease (e.g., a FokI DNA cleavage experimentation. Subjects to which administration of the domain), wherein the wherein the inactive Cas9 domain binds pharmaceutical compositions is contemplated include, but a gRNA that hybridizes to a region of the DNA. In some are not limited to, humans and/or other primates; mammals, embodiments, the method further comprises contacting the domesticated animals, pets, and commercially relevant mam 40 DNA with a second fusion protein described herein (e.g., mals such as cattle, pigs, horses, sheep, cats, dogs, mice, fCas9), wherein the nuclease-inactivated Cas9 (dCas9) and/or rats; and/or birds, including commercially relevant domain of the second fusion protein binds a second gRNA birds such as chickens, ducks, geese, and/or turkeys. that hybridizes to a second region of DNA, wherein the bind Formulations of the pharmaceutical compositions ing of the fusion proteins results in the dimerization of the described herein may be prepared by any method known or 45 nuclease domains of the fusion proteins, such that the DNA is hereafter developed in the art of pharmacology. In general, cleaved in a region between the bound fusion proteins. See Such preparatory methods include the step of bringing the e.g., FIGS. 1A, 6D. In some embodiments, the gRNAs bound active ingredient into association with an excipient, and then, to each fusion protein hybridize to the same strand of the if necessary and/or desirable, shaping and/or packaging the DNA, or they hybridize to opposing strands of the DNA. In product into a desired single- or multi-dose unit. 50 some embodiments, the gRNAs hybridize to regions of the Pharmaceutical formulations may additionally comprise a DNA that are no more than 10, no more than 15, no more than pharmaceutically acceptable excipient, which, as used 20, no more than 25, no more than 30, no more than 40, 50, no herein, includes any and all solvents, dispersion media, dilu more than 60, no more than 70, no more than 80, no more than ents, or other liquid vehicles, dispersion or Suspension aids, 90, or no more than 100 base pairs apart. The region between Surface active agents, isotonic agents, thickening or emulsi 55 the bound Cas9:gRNA complexes may be referred to as the fying agents, preservatives, Solid binders, lubricants and the 'spacer sequence.” which is typically where the target nucleic like, as Suited to the particular dosage form desired. Reming acid is cleaved. See, e.g., FIGS. 6C-D. In some embodiments, ton's The Science and Practice of Pharmacy, 21 Edition, A. the spacer sequence is at least 5, at least 10, at least 15, at least R. Gennaro (Lippincott, Williams & Wilkins, Baltimore, 20, at least 25, at least 30 at least 35, at least 40, at least 45, at Md., 2006; incorporated in its entirety herein by reference) 60 least 50, at least 60, at least 70, at least 80, at least 90, or at discloses various excipients used in formulating pharmaceu least 100 base pairs in length. In some embodiments, the tical compositions and known techniques for the preparation spacer sequence is between about 5 and about 50 base pairs, thereof. See also PCT application PCT/US2010/055131, about 10 and about 40, or about 15 and about 30 base pairs in incorporated in its entirety herein by reference, for additional length. In some embodiments, the spacer sequence is about Suitable methods, reagents, excipients and solvents for pro 65 15 to about 25 base pairs in length. In some embodiments, the ducing pharmaceutical compositions comprising a nuclease. spacer sequence is about 15, about 20, or about 25 base pairs Except insofar as any conventional excipient medium is in length. In some embodiments, the Cas9:gRNA complexes US 9,388,430 B2 63 64 are bound in the A orientation, as described herein. In some A-half of the dimer not part of the fusion protein). In some embodiments, the Cas9:gRNA complexes are bound in the B embodiments, the protein dimer comprises (i) a fusion protein orientation, as described herein. In some embodiments, the comprising a nuclease-inactivated Cas9 and a gRNA binding method has an on-target:off-target modification ratio that is at domain of Cas9 (e.g., a nuclease-inactivated Cas9 fused to a least 2-fold, at least 5-fold, at least 10-fold, at least 20-fold, at Cas9 A-half), and (ii) a fusion protein comprising a nuclease least 30-fold, at least 40-fold, at least 50-fold, at least 60-fold, inactivated Cas9 and a DNA cleavage domain (e.g., a at least 70-fold, at least 80-fold, at least 90-fold, at least nuclease-inactivated Cas9 fused to a Cas9 B-half). In some 100-fold, at least 110-fold, at least 120-fold, at least 130-fold, embodiments, the dimer is associated with one or more dis at least 140-fold, at least 150-fold, at least 175-fold, at least tinct gRNAs. For example, in some embodiments, the dimer 200-fold, or at least 250-fold or more higher than the on 10 is associated with two or three gRNAs. In some embodi target:off-target modification ratio of methods utilizing a wild ments, the dimer is associated with three gRNAs. For type Cas9 or other Cas9 variant. In some embodiments, the example, upon binding of one nuclease-inactivated Cas9: method has an on-target:off-target modification ratio that is gRNA to a region of a nucleic acid target, and binding of the between about 60-to about 180-fold, between about 80- to other nuclease-inactivated Cas9:gRNA to a second region of about 160-fold, between about 100- to about 150-fold, or 15 the nucleic acid target, the split Cas9 halves (e.g., A-half and between about 120- to about 140-fold higher than the on B-half of the fusion proteins) dimerize and binda third gRNA target:off-target modification ratio of methods utilizing a wild complementary to a third region of the nucleic acid target, to type Cas9 or other Cas9 variant. Methods for determining become a fully active Cas9 nuclease leading to cleave of the on-target:off-target modification ratios are known, and target DNA. include those described in the Examples. In some embodi In some embodiments, a method for site-specific cleavage ments, the fusion proteins are coordinated or associated of a nucleic acid comprises contacting a nucleic acid (e.g., through a single gRNA, e.g., as described herein. DNA) with a minimized Cas9 protein (e.g., as described In some embodiments, the method comprises contacting a herein) associated with a gRNA. nucleic acid with a dimer of Cas9 proteins (or fragments In some embodiments, any of the methods provided herein thereof) coordinated with (e.g., bound by) a single gRNA as 25 can be performed on DNA in a cell, for example a bacterium, described herein. In some embodiments, the single gRNA a yeast cell, or a mammalian cell. In some embodiments, the comprises at least two portions that hybridize to the nucleic DNA contacted by any Cas9 protein provided herein is in a acid. In some embodiments, the portions comprise at least 5, eukaryotic cell. In some embodiments, the methods can be at least 10, at least 15, or at least 19 complementary nucle performed on a cell or tissue in vitro or ex vivo. In some otides. In some embodiments, the portions comprise fewer 30 embodiments, the eukaryotic cell is in an individual. Such as than 20 complementary nucleotides. In some embodiments, a a patient or research animal. In some embodiments, the indi linker sequence separates the portions, wherein the linker vidual is a human. sequence also comprises nucleotides complementary to the Methods for Site-Specific Recombination target nucleic acid (e.g., but are not bound by a Cas9 protein). In another embodiment of this disclosure, methods for In some embodiments, the linker sequence does not hybridize 35 site-specific nucleic acid (e.g., DNA) recombination are pro to the target nucleic acid. vided. In some embodiments, the methods are useful for In some embodiments, the methods comprise contacting a inducing recombination of or between two or more regions of DNA with a protein dimer offusion proteins described herein, two or more nucleic acid (e.g., DNA) molecules. In other wherein the fusion proteins are bound by one or more gRNAs. embodiments, the methods are useful for inducing recombi For example, in Some embodiments, one fusion protein of the 40 nation of or between two or more regions in a single nucleic dimer comprises a gRNA binding domain of Cas9 (e.g., Cas9 acid molecule (e.g., DNA). In some embodiments, the recom A-half), wherein the protein does not comprise a DNA cleav bination of one or more target nucleic acid molecules requires age domain (e.g., Cas9 B-half); and the other fusion protein of the formation of a tetrameric complex at the target site. Typi the dimer comprises a DNA cleavage domain of Cas9 (e.g., cally, the tetramer comprises four (4) inventive RNA-guided Cas9 B-half), wherein the protein does not comprise a gRNA 45 recombinase fusion proteins (e.g., a complex of any four binding domain (e.g., Cas9 A-half). Thus, in Some embodi inventive recombinase fusion protein provided herein). In ments, the binding of a gRNA (e.g., that hybridizes to a target Some embodiments, each recombinase fusion protein of the nucleic acid) to one or both of the monomers of the dimer tetramer targets a particular DNA sequence via a distinct co-localizes the dimer to the target nucleic acid, allowing the gRNA bound to each recombinase fusion protein (See, e.g., dimer to re-fold into a nuclease-active state and cleave the 50 FIG. 5). target nucleic acid. In some embodiments, the method for site-specific recom In some embodiments, the method comprises contacting a bination between two DNA molecules comprises (a) contact nucleic acid with protein dimers comprising two proteins: (i) ing a first DNA with a first RNA-guided recombinase fusion a fusion protein comprising a nuclease-inactivated Cas9 and protein, wherein the nuclease-inactivated Cas9 domain binds a gRNA binding domain of Cas9 (e.g., nuclease-inactivated 55 a first gRNA that hybridizes to a region of the first DNA; (b) Cas9 fused to Cas9 A-half), and (ii) a protein comprising the contacting the first DNA with a second RNA-guided recom DNA cleavage domain of Cas9 (e.g., Cas9 B-half). In other binase fusion protein, wherein the nuclease-inactivated Cas9 embodiments, the dimer comprises (i) a fusion protein com domain of the second fusion protein binds a second gRNA prising a nuclease-inactivated Cas9 and a DNA cleavage that hybridizes to a second region of the first DNA; (c) con domain of Cas9 (e.g., nuclease-inactivated Cas9 fused to 60 tacting a second DNA with a third RNA-guided recombinase Cas9 B-half), and (ii) a protein comprising the gRNA binding fusion protein, wherein the nuclease-inactivated Cas9 domain of Cas9 (e.g., Cas9 A-half). In some embodiments, domain of the third fusion protein binds a third gRNA that the protein dimers are associated with one or more gRNAs. hybridizes to a region of the second DNA; and (d) contacting For example, in some embodiments, the dimers are associated the second DNA with a fourth RNA-guided recombinase with two gRNAs: one bound by the nuclease-inactivated Cas9 65 fusion protein, wherein the nuclease-inactivated Cas9 domain of the fusion protein; the other bound by the A-half domain of the fourth fusion protein binds a fourth gRNA that domain (e.g., either the A-half of the fusion protein, or the hybridizes to a second region of the second DNA. The bind US 9,388,430 B2 65 66 ing of the fusion proteins in steps (a)-(d) results in the tet respect to the target DNA molecule(s)) of the bound RNA ramerization of the recombinase catalytic domains of the guided recombinase fusion protein(s). In some embodiments, fusion proteins, such that the DNAs are recombined. In some the orientation, or direction, in which a RNA-guided recom embodiments, the gRNAs of steps (a) and (b) hybridize to binase fusion protein binds a target nucleic acid can be con opposing strands of the first DNA, and the gRNAs of steps (c) trolled, e.g., by the particular sequence of the gRNA bound to and (d) hybridize to opposing strands of the second DNA. In the RNA-guided recombinase fusion protein(s). Methods for some embodiments, the target sites of the gRNAs of steps controlling or directing a particular recombination event are (a)-(d) are spaced to allow for tetramerization of the recom known in the art, and include, for example, those described by binase catalytic domains. For example, in some embodi Turan and Bode, “Site-specific recombinases: from tag-and ments, the target sites of the gRNAS of steps (a)-(d) are no 10 target-to tag-and-exchange-based genomic modifications.” more than 10, no more 15, no more than 20, no more than 25, FASEB.J. 2011; December; 25(12):4088-107, the entire con no more than 30, no more than 40, no more than 50, no more tents of which are hereby incorporated by reference. than 60, no more than 70, no more than 80, no more than 90, In some embodiments, any of the methods for site-specific or no more than 100 base pairs apart. In some embodiments, recombination can be performed in vivo or in vitro. In some the two regions of the two DNA molecules being recombined 15 embodiments, any of the methods for site-specific recombi share homology, such that the regions being recombined are nation are performed in a cell (e.g., recombine genomic DNA at least 80%, at least 90%, at least 95%, at least 98%, or are in a cell). The cell can be prokaryotic or eukaryotic. The cell, 100% homologous. Such as a eukaryotic cell, can be in an individual. Such as a In another embodiment, methods for site-specific recom Subject, as described herein (e.g., a human Subject). The bination between two regions of a single DNA molecule are methods described herein are useful for the genetic modifi provided. In some embodiments, the methods comprise (a) cation of cells in vitro and in Vivo, for example, in the context contacting a DNA with a first RNA-guided recombinase of the generation of transgenic cells, cell lines, or animals, or fusion protein, wherein the nuclease-inactivated Cas9 in the alteration of genomic sequence, e.g., the correction of domain binds a first gRNA that hybridizes to a region of the a genetic defect, in a cell in or obtained from a subject. In DNA; (b) contacting the DNA with a second RNA-guided 25 Some embodiments, a cell obtained from a subject and modi recombinase fusion protein, wherein the nuclease-inactivated fied according to the methods provided herein, is re-intro Cas9 domain of the second fusion protein binds a second duced into a Subject (e.g., the same Subject), e.g., to treat a gRNA that hybridizes to a second region of the DNA; (c) disease, or for the production of genetically modified organ contacting the DNA with a third RNA-guided recombinase isms in agriculture or biological research. fusion protein, wherein the nuclease-inactivated Cas9 30 In applications in which it is desirable to recombine two or domain of the third fusion protein binds a third gRNA that more nucleic acids so as to insert a nucleic acid sequence into hybridizes to a third region of the DNA; and (d) contacting the a target nucleic acid, a nucleic acid comprising a donor DNA with a fourth RNA-guided recombinase fusion protein, sequence to be inserted is also provided, e.g., to a cell. By a wherein the nuclease-inactivated Cas9 domain of the fourth 'donor sequence it is meant a nucleic acid sequence to be fusion protein binds a fourth gRNA that hybridizes to a fourth 35 inserted at the target site induced by one or more RNA-guided region of the DNA. The binding of the fusion proteins in steps recombinase fusion protein(s). In some embodiments, e.g., in (a)-(d) results in the tetramerization of the recombinase cata the context of genomic modifications, the donor sequence lytic domains of the fusion proteins, such that the DNA is will share homology to a genomic sequence at the target site, recombined. In some embodiments, two of the gRNAs of e.g., 1%,2%,3%, 4%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, steps (a)-(d) hybridize to the same strand of the DNA, and the 40 70%, 80%, 85%, 90%. 95%, or 100% homology with the other two gRNAs of steps (a)-(d) hybridize to the opposing nucleotide sequences flanking the target site, e.g., within strand of the DNA. In some embodiments, the gRNAs of steps about 100 bases or less of the target site, e.g. within about 90 (a) and (b) hybridize to regions of the DNA that are no more bases, within about 80 bases, within about 70 bases, within 10, no more than 15, no more than 20, no more than 25, no about 60 bases, within about 50 bases, within about 40 bases, more than 30, no more than 40, no more than 50, no more than 45 within about 30 bases, within about 15 bases, within about 10 60, no more than 70, no more than 80, no more than 90, or no bases, within about 5 bases, or immediately flanking the more than 100 base pairs apart, and the gRNAs of steps (c) target site. In some embodiments, the donor sequence does and (d) hybridize to regions of the DNA that are no more than not share any homology with the target nucleic acid, e.g., does 10, no more 15, no more than 20, no more than 25, no more not share homology to a genomic sequence at the target site. than 30, no more than 40, no more than 50, no more than 60, 50 Donor sequences can be of any length, e.g., 10 nucleotides or no more than 70, no more than 80, no more than 90, or no more, 50 nucleotides or more, 100 nucleotides or more, 250 more than 100 base pairs apart. In some embodiments, the nucleotides or more, 500 nucleotides or more, 1000 nucle two regions of the DNA molecule being recombined share otides or more, 5000 nucleotides or more, 10000 nucleotides homology, such that the regions being recombined are at least or more, 100000 nucleotides or more, etc. 80%, at least 90%, at least 95%, at least 98%, or are 100% 55 Typically, the donor sequence is not identical to the target homologous. sequence that it replaces or is inserted into. In some embodi In some embodiments, any of the inventive methods for ments, the donor sequence contains at least one or more single site-specific recombination are amenable for inducing recom base changes, insertions, deletions, inversions or rearrange bination, Such that the recombination results in excision (e.g., ments with respect to the target sequence (e.g., target genomic a segment of DNA is excised from a target DNA molecule). 60 sequence). In some embodiments, donor sequences also com insertion (e.g., a segment of DNA is inserted into a target prise a vector backbone containing sequences that are not DNA molecule), inversion (e.g., a segment of DNA is homologous to the DNA region of interest and that are not inverted in a target DNA molecule), or translocation (e.g., the intended for insertion into the DNA region of interest. exchange of DNA segments between one or more target DNA The donor sequence may comprise certain sequence dif molecule(s)). In some embodiments, the particular recombi 65 ferences as compared to the target (e.g., genomic) sequence, nation event (e.g., excision, insertion, inversion, transloca for example restriction sites, nucleotide polymorphisms, tion, etc.) depends, inter alia, on the orientation (e.g., with selectable markers (e.g., drug resistance genes, fluorescent US 9,388,430 B2 67 68 proteins, enzymes etc.), which can be used to assess for modifications, for example, those that express a protein pro Successful insertion of the donor sequence at the target site or vided herein from an allele that has been incorporated in the in Some cases may be used for other purposes (e.g., to signify cells genome). Methods for transforming cells, genetically expression at the targeted genomic locus). In some embodi modifying cells, and expressing genes and proteins in Such ments, if located in a coding region, such nucleotide sequence 5 cells are well known in the art, and include those provided by, differences will not change the amino acid sequence, or will for example, Green and Sambrook, Molecular Cloning. A make silent amino acid changes (e.g., changes which do not Laboratory Manual (4" ed., Cold Spring Harbor Laboratory affect the structure or function of the protein). In some Press, Cold Spring Harbor, N.Y. (2012)) and Friedman and embodiments, these sequences differences may include Rossi, Gene Transfer. Delivery and Expression of DNA and flanking recombination sequences such as FLPs, loXP 10 RNA, A Laboratory Manual (1 ed., Cold Spring Harbor sequences, or the like, that can be activated at a later time for Laboratory Press, Cold Spring Harbor, N.Y. (2006)). removal of e.g., a marker sequence. Some aspects of this disclosure provide kits comprising a The donor sequence may be provided to the cell as single Cas9 variant and/or nuclease and/or recombinase, as pro stranded DNA, single-stranded RNA, double-stranded DNA, vided herein. In some embodiments, the kit comprises a poly or double-stranded RNA. It may be introduced into a cell in 15 nucleotide encoding an inventive Cas9 variant, nuclease, and/ linear or circular form. If introduced in linear form, the ends or recombinase, e.g., as provided herein. In some of the donor sequence may be protected (e.g., from exonucle embodiments, the kit comprises a vector for recombinant olytic degradation) by methods known to those of skill in the protein expression, wherein the vector comprises a poly art. For example, one or more dideoxynucleotide residues are nucleotide encoding any of the proteins provided herein. In added to the 3' terminus of a linear molecule and/or self Some embodiments, the kit comprises a cell (e.g., any cell complementary oligonucleotides are ligated to one or both Suitable for expressing Cas9 proteins or fusions comprising ends. See, e.g., Changet al., Proc. Natl. AcadSci USA. 1987: Cas9 proteins, such as bacterial, yeast, or mammalian cells) 84:4959-4963: Nehls et al., Science. 1996; 272:886-889. In that comprises a genetic construct for expressing any of the Some embodiments, a donor sequence can be introduced into proteins provided herein. In some embodiments, any of the a cell as part of a vector molecule having additional sequences 25 kits provided herein further comprise one or more gRNAs Such as, for example, replication origins, promoters and genes and/or vectors for expressing one or more gRNAS. In some encoding antibiotic resistance. In some embodiments, donor embodiments, the kit comprises an excipient and instructions sequences can be introduced as naked nucleic acid, as nucleic for contacting the nuclease and/or recombinase with the acid complexed with an agent such as a liposome or poloX excipient to generate a composition Suitable for contacting a amer, or can be delivered by viruses (e.g., adenovirus, AAV, 30 nucleic acid with the nuclease and/or recombinase Such that etc.). hybridization to and cleavage and/or recombination of a tar Polynucleotides,Vectors, Cells, Kits get nucleic acid occurs. In some embodiments, the composi In another embodiment of this disclosure, polynucleotides tion is suitable for delivering a Cas9 protein to a cell. In some encoding one or more of the inventive proteins and/or gRNAS embodiments, the composition is suitable for delivering a are provided. For example, polynucleotides encoding any of 35 Cas9 protein to a subject. In some embodiments, the excipient the proteins described herein are provided, e.g., for recombi is a pharmaceutically acceptable excipient. nant expression and purification of isolated nucleases and The function and advantage of these and other embodi recombinases, e.g., comprising Cas9 variants. In some ments of the present invention will be more fully understood embodiments, an isolated polynucleotide comprises one or from the Examples below. The following Examples are more sequences encoding a Cas9 half site (e.g., A-half and/or 40 intended to illustrate the benefits of the present invention and B-half). In some embodiments, an isolated polynucleotide to describe particular embodiments, but are not intended to comprises one or more sequences encoding a Cas9 fusion exemplify the full scope of the invention. Accordingly, it will protein, for example, any of the Cas9 fusion proteins be understood that the Examples are not meant to limit the described herein (e.g., those comprising a nuclease-inacti Scope of the invention. vated Cas9). In some embodiments, an isolated polynucle 45 otides comprises one or more sequences encoding a gRNA, EXAMPLES alone or in combination with a sequence encoding any of the proteins described herein. Example 1 In some embodiments, vectors encoding any of the pro teins described herein are provided, e.g., for recombinant 50 Fusion of Inactivated Cas9 to FokI Nuclease expression and purification of Cas9 proteins, and/or fusions Improves Genome Modification Specificity comprising Cas9 proteins (e.g., variants). In some embodi ments, the vector comprises or is engineered to include an Methods: isolated polynucleotide, e.g., those described herein. In some Oligonucleotides and PCR embodiments, the vector comprises one or more sequences 55 All oligonucleotides were purchased from Integrated DNA encoding a Cas9 protein (as described herein), a gRNA, or Technologies (IDT). Oligonucleotide sequences are listed in combinations thereof, as described herein. Typically, the vec Table 1. PCR was performed with 0.4 uL of 2 U/uL Phusion tor comprises a sequence encoding an inventive protein oper Hot Start Flex DNA polymerase (NEB) in 50 uL with 1xEHF ably linked to a promoter, Such that the fusion protein is Buffer, 0.2 mM dNTP mix (0.2 mM dATP, 0.2 mM dCTP, 0.2 expressed in a host cell. 60 mM dGTP, 0.2 mM dTTP) (NEB), 0.5uM of each primer and In some embodiments, cells are provided, e.g., for recom a program of 98°C., 1 min: 35 cycles of 98°C., 15s; 65°C., binant expression and purification of any of the Cas9 proteins 15s; 72°C., 30s unless otherwise noted. provided herein. The cells include any cell suitable for recom Construction of FokI-dCas9, Cas9 Nickase and gRNA binant protein expression, for example, cells comprising a Expression Plasmids genetic construct expressing or capable of expressing an 65 The human codon-optimized Streptococcus pyogenes Cas9 inventive protein (e.g., cells that have been transformed with nuclease with NLS and 3xFLAG tag (Addgene plasmid one or more vectors described herein, or cells having genomic 43861) was used as the wild-type Cas9 expression plasmid. US 9,388,430 B2 69 70 PCR (72°C., 3 min) products of wild-type Cas9 expression PCR Pla-rev primers, 1 ul DpnI (NEB) was added, and the plasmid as template with Cas9 Exp primers listed in Table 1 reaction was incubated at 37° C. for 30 min and then sub below were assembled with Gibson Assembly Cloning Kit jected to QIAquick PCRPurification Kit (Qiagen) for the "1" (New England Biolabs) to construct Cas9 and FokI-dCas9 gRNA--vector DNA’. PCR (72° C., 3 min) of 100 pg of variants. Expression plasmids encoding a single gRNA con BsmBI-digested gRNA expression plasmid as template with struct (gRNA G1 through G13) were cloned as previously PCR gRNA-fwd1, PCR gRNA-rev 1, PCR gRNA-rev2 and described. Briefly, gRNA oligonucleotides listed in Table 1 appropriate PCR gRNA primer listed in Table 1 was DpnI containing the 20-bp protospacer target sequence were treated and purified as above for the “2" gRNA instertDNA”. annealed and the resulting 4-bp overhangs were ligated into -200 ng of “1” gRNA--vector DNA” and -200 ng of "2" BsmBI-digested gRNA expression plasmid. gRNA expres 10 gRNA instert DNA” were blunt-end ligated in 1XT4 DNA sion plasmids encoding expression of two separate gRNA Ligase Buffer, 1 ul of T4 DNA Ligase (400 U?ul, NEB) in a constructs from separate promoters on a single plasmid were total volume of 20 ul at room temperature (-21°C.) for 15 cloned in a two-step process. First, one gRNA (gRNA E1, V1, min. For all cloning, 1 Jul of ligation or assembly reaction was C1, C3, H1, G1, G2 or G3) was cloned as above and used as transformed into Machl chemically competent cells (Life template for PCR (72° C., 3 min) with PCR Pla-fwd and Technologies). TABLE 1. Oligonucleotides. ASPhos/ indicates 5" phosphorylated oligonucleotides. dCase-NLS-FokI primers: Case Exp CNF Fok1 + Plas CGGCGAGATAAACTTTTAATGACCGGTCATCATCACCA (SEO ID NO: 26 Fwd.

Case Exp CNF Case coD10 CCAACGGAATTAGTGCCGATAGCTAAACCAATAGAATACTTTTTATC (SEQ Rew ID NO: 27) Case Exp CNF Case coD10 GATAAAAAGTATTCTATTGGTTTAGCTATCGGCACTAATTCCGTTGG (SEQ Fwd. ID NO: 28)

Case Exp CNF Case coH850 TTCAAAAAGGATTGGGGTACAATGGCATCGACGTCGTAATCAGATAAAC Rew (SEQ ID NO: 29)

Case Exp CNF Case coH850 GTTTATCTGATTACGACGTCGATGCCATTGTACCCCAATCCTTTTTGAA Fwd. (SEO ID NO: 3O) Case Exp CNF (Case) NLS + TTGGGATCCAGAACCTCCTCCTGCAGCCTTGTCATCG (SEQ ID NO : 31 GGS-Fok-Rew

Case Exp CNF (Case) NLS + TTGGGATCCAGAACCTCC GCTGCCGCCACTTCCACCTGA GGS3 - Fok-Rew TCCTGCAGCCTTGTCATCG (SEQ ID No. 32) Case Exp CNF (Case) NLS + CGATGACAAGGCTGCAGGAGGAGGTTCTGGATCCCAA (SEQ ID NO : 33) GGS-Fok-Fwd.

Case Exp CNF (Case) NLS + CGATGACAAGGCTGCAGGA TCAGGTGGAAGTGGCGGCAGC GGS3 - Fok-Fwd. GGAGGTTCTGGATCCCAA (SEO ID NO. 34) Case Exp CNF Fok1 + Plas TGGTGATGATGACCGGTCA TTAAAAGTTTATCTCGCCG (SEO ID NO: 35 Rew NLS-dCase-FokI primers: Case Exp. NCF Fok1 + Plas CGGCGAGATAAACTTTTAATGACCGGTCATCATCACCA (SEO ID NO: 36 Fwd. Case Exp. NCF Plass + FLAG TAGGGAGAGCCGCCACCATGGACTACAAAGACCATGACGG (SEQ ID NO : 37) (NLS-Fok1-Rev Case Exp. NCF NLS + TAAACCAATAGAATACTTTTTATC CATAGGTACCCCGCGGTGAATG (SEQ Casco)1O-Rew ID NO: 38) Case Exp. NCF Case coD10 GATAAAAAGTATTCTATTGGTTTAGCTATCGGCACTAATTCCGTTGG (SEQ Fwd. ID NO : 39)

Case Exp. NCF Case coH850 TTCAAAAAGGATTGGGGTACAATGGCATCGACGTCGTAATCAGATAAAC Rew (SEQ ID NO: 40)

Case Exp. NCF Case coH850 GTTTATCTGATTACGACGTCGATGCCATTGTACCCCAATCCTTTTTGAA Fwd. (SEQ ID NO: 41)

Case Exp. NCF Case End + TTGGGATCCAGAACCTCCGT CACCCCCAAGCTGTG (SEQ ID NO: 42) GGS-Fok-Rew

Case Exp. NCF Case End + TTGGGATCCAGAACCTCC GCTGCCGCCACTTCCACCTGA GGS3 - Fok-Rew GTCACCCCCAAGCTGTG (SEQ ID NO: 43)

US 9,388,430 B2 77 78 TABLE 1 - continued Oligonucleotides. /5Phos/ indicates 5' phosphorylated oligonucleotides. HTS EXM ON-rev TCATCTGTGCCCCTCCCTCC (SEQ ID No. 143) HTS EXM Off-rev CGAGAAGGAGGTGCAGGAG (SEQ ID NO: 144) HTS EXM Off-rev CGGGAGCTGTTCAGAGGCTG (SEO ID NO: 145) HTS EXM Off-rev CT CACCTGGGCGAGAAAGGT (SEQ ID NO: 146) HTS EXM Off-rev AAAACT CAAAGAAATGCCCAATCA (SEQ ID NO : 147) HTS WEFG ON-rev AGACGCTGCTCGCTCCATTC (SEO ID NO : 148) HTS EXM Off1-rev ACAGGCATGAATCACTGCACCT (SEO ID NO : 149) HTS EXM Off 2-rev GCGGCAACTTCAGACAACCGA (SEQ ID NO: 150) HTS EXM Off3-rev GACCCAGGGGCACCAGTT (SEO ID NO : 151) HTS EXM Off 4-rev CTGCCTTCATTGCTTAAAAGTGGAT (SEO ID NO: 152) HTS CLTA2 ON-rew ACAGTTGAAGGAAGGAAACATGC (SEQ ID NO: 153) HTS CLTA2 Off1 - rew GCTGCATTTGCCCATTTCCA (SEQ ID No. 154) HTS CLTA2 Off 2-rew GTTGGGGGAGGAGGAGCTTAT (SEQ ID NO: 155) HTS CLTA2 Off3 - rew CTAAGAGCTATAAGGGCAAATGACT (SEQ ID NO: 156)

Modification of Genomic GFP 2 hr at 55° C. and the subsequent steps in the Agencourt HEK293-GFP stable cells (GenTarget) were used as a cell DNAdvance kit protocol were followed. 40 ng of isolated line constitutively expressing an Emerald GFP gene (GFP) 30 genomic DNA was used as template to PCR amplify the integrated on the genome. Cells were maintained in Dulbec targeted genomic loci with flanking Survey primer pairs co's modified Eagle medium (DMEM, Life Technologies) specified in Table 1. PCR products were purified with a supplemented with 10% (vol/vol) fetal bovine serum (FBS, QIAquick PCR Purification Kit (Qiagen) and quantified with Life Technologies) and penicillin/streptomycin (1x, Quant-iTTM PicoGreen(R) dsDNA Kit (Life Technologies). Amresco).5x10 HEK293-GFP cells were plated on 48-well 35 250 ng of purified PCR DNA was combined with 2 uL of collagen coated Biocoat plates (Becton Dickinson). One day NEBuffer 2 (NEB) in a total volume of 19 uL and denatured following plating, cells at ~75% confluence were transfected then re-annealed with thermocycling at 95°C. for 5 min, 95 to with Lipofecatmine 2000 (Life Technologies) according to 85° C. at 2°C./s; 85 to 20° C. at 0.2°C/s. The re-annealed the manufacturer's protocol. Briefly, 1.5uL of Lipofecatmine DNA was incubated with 1 ul of T7 Endonuclease I (10U/ul, 2000 was used to transfect 950 ng of total plasmid (Cas9 40 NEB) at 37° C. for 15 min. 10 uL of 50% glycerol was added expression plasmid plus gRNA expression plasmids). 700 ng to the T7 Endonuclease reaction and 12 LL was analyzed on of Cas9 expression plasmid, 125 ng of one gRNA expression a 5% TBE 18-well Criterion PAGE gel (Bio-Rad) electro plasmid and 125 ng of the paired gRNA expression plasmid phoresed for 30 minat 150 V, then stained with 1xSYBRGold with the pairs of targeted gRNAs listed in FIG. 6D and FIG. (Life Technologies) for 30 min. Cas9-induced cleavage bands 9A. Separate wells were transfected with 1 lug of a near 45 and the uncleaved band were visualized on an AlphaImager infrared iRFP670 (Addgene plasmid 45457) as a transfec HP (Alpha Innotech) and quantified using Image.J software. tion control. 3.5 days following transfection, cells were The peak intensities of the cleaved bands were divided by the trypsinized and resuspended in DMEM supplemented with total intensity of all bands (uncleaved--cleaved bands) to 10% FBS and analyzed on a C6 flow cytometer (Accuri) with determine the fraction cleaved which was used to estimate a 488 nm laser excitation and 520 nm filter with a 20 nm band 50 gene modification levels as previously described. For each pass. For each sample, transfections and flow cytometry mea sample, transfections and Subsequent modification measure Surements were performed once. ments were performed in triplicate on different days. T7 Endonuclease I Surveyor Assays of Genomic Modifica High-Throughput Sequencing of Genomic Modifications tions HEK293-GFP stable cells were transfected with Cas'9 HEK293-GFP stable cells were transfected with Cas'9 expression and gRNA expression plasmids, 700 ng of Cas9 expression and gRNA expression plasmids as described 55 expression plasmid plus 250 ng of a single plasmid expres above. A single plasmid encoding two separate gRNAS was sion a pair of gRNAs were transfected (high levels) and for transfected. For experiments titrating the total amount of just Cas9 nuclease, 88 ng of Cas9 expression plasmid plus 31 expression plasmids (Cas9 expression+gRNA expression ng of a single plasmid expression a pair of gRNAS were plasmid), 700/250, 350/125, 175/62.5, 88/31 ng of Cas9 transfected (low levels). Genomic DNA was isolated as above expression plasmid/ng of gRNA expression plasmid were 60 and pooled from three biological replicates. 150 ng or 600 ng combined with inert carrier plasmid, puC19 (NEB), as nec of pooled genomic DNA was used as template to amplify by essary to reach a total of 950 ng transfected plasmid DNA. PCR the on-target and off-target genomic sites with flanking Genomic DNA was isolated from cells 2 days after trans HTS primer pairs specified in Table 1. Relative amounts of fection using a genomic DNA isolation kit, DNAdvance Kit crude PCR products were quantified by gel electrophoresis (Agencourt). Briefly, cells in a 48-well plate were incubated 65 and samples treated with different gRNA pairs or Cas9 with 40 uL of tryspin for 5 min at 37° C. 160 uL of DNAd nuclease types were separately pooled in equimolar concen vance lysis Solution was added and the Solution incubated for trations before purification with the QIAquick PCR Purifica US 9,388,430 B2 79 80 tion Kit (Qiagen). ~500 ng of pooled DNA was run a 5%TBE 2) Find the first 20 bases four bases from the start of the 18-well Criterion PAGEgel (BioRad) for 30 min at 200 V and reverse complement of SeqA-2" readin SeqA-1stread allow DNAs of length ~125bp to ~300 bp were isolated and purified ing for 1 mismatch: by QIAquick PCR Purification Kit (Qiagen). Purified DNA Reverse complement of SeqA-2" read: was PCR amplified with primers containing sequencing 5 adaptors, purified and sequenced on a MiSeq high-through put DNA sequencer (Illumina) as described previously." (SEO ID NO: 159) Data Analysis TTACCTGTACATCTGCACAAGATTGCCTTTACTCCATGCCTTTCTTCT Illumina sequencing reads were filtered and parsed with TCTGCTCTAACTCTGACAATCTGTCTTGCCATGCCATAAGCCCCTATT scripts written in Unix Bash. All scripts were written in bash. 10 The Patmatch program was used to search the human CTTTCTGTAACCCCAAGATGGTATAAAAGCATCAATGATTGGGCATTT genome (GRCh37/hg19 build) for pattern sequences corre CTTTGAGTTTT sponding to Cas9 binding sites (CCNN spacer NNGG for Orientation A and N'NGG spacer CCNN' for Orientation Position in SeqA-1 Read B). The steps for the identification of ingels in sequences of 15 genomic sites can be found below: (SEQ ID NO: 16O) 1) Sequence reads were initially filtered removing reads of TTCTGAGGGCTGCTACCTGTACATCTGCACAAGATTGCCTTTACTCCATG less than 50 bases and removing reads with greater than 10% of the Illumina base scores not being B-J: CCTTTCTTCTTCTGCTCTAACTCTGACAATCTGTCTTGCCATGCCATAAG Example SeqA-1'Read: CCCCTATTCTTTCTGTAACCCCAAGATGGTATAAAAGCATCAATGATTGG

(SEO ID NO: 157) TTCTGAGGGCTGCTACCTGTACATCTGCACAAGATTGCCTTTACTCCAT 3) Align and then combine sequences, removing any 25 sequence with greater than 5% mismatches in the simple base GCCTTTCTTCTTCTGCTCTAACTCTGACAATCTGTCTTGCCATGCCATAA pair alignment: GCCCCTATTCTTTCTGTAACCCCAAGATGGTATAAAAGCATCAATGATTG Combination of SeqA-1 Read and SeqA-2"Read:

GGC 30 (SEQ ID NO: 161) Example SeqA-2'Read: TTCTGAGGGCTGCTACCTGTACATCTGCACAAGATTGCCTTTACTCCATG

CCTTTCTTCTTCTGCTCTAACTCTGACAATCTGTCTTGCCATGCCATAAG

(SEQ ID NO: 158) CCCCTATTCTTTCTGTAACCCCAAGATGGTATAAAAGCATCAATGATTGG AAAACT CAAAGAAATGCCCAATCATTGATGCTTTTATACCATCTTGGG 35 GCATTTCTTTGAGTTTT GTTACAGAAAGAATAGGGGCTTATGGCATGGCAAGACAGATTGTCAGAG 4) To identify the target site the flanking genomic TTAGAGCAGAAGAAGAAAGGCATGGAGTAAAGGCAATCTTGTGCAGATG sequences were searched for with the Patmatch program TACAGGTAA allowing for varying amounts of bases from 1 to 300 between the flanking genomic sequences (Table 2): TABLE 2 Patmatch Sequences Target Site Downstream genomic sequence Upstream genomic sequence

EMX On GGCCTGCTTCGTGGCAATGC (SEQ ACCTGGGCCAGGGAGGGAGG D NO: 162) (SEQ ID NO: 163)

EMX Off1 CTCACTTAGACTTTCTCTCC (SEQ CTCGGAGTCTAGCTCCTGCA D NO: 164) (SEQ ID NO: 165)

EMX Off2 TGGCCCCAGTCTCTCTTCTA (SEQ CAGCCTCTGAACAGCTCCCG D NO: 166) (SEO ID NO : 167)

EMX Off3 TGACTTGGCCTTTGTAGGAA (SEQ GAGGCTACTGAAACATAAGT D NO: 168) (SEQ ID NO: 169)

EMX Off4 TGCTACCTGTACATCTGCAC (SEQ CATCAATGATTGGGCATTTC D NO : 17 O) (SEO ID NO : 171)

WEG. On ACTCCAGTCCCAAATATGTA (SEQ ACTAGGGGGCGCTCGGCCAC D NO : 172) (SEO ID NO : 173)

WEG Off1 CTGAGTCAACTGTAAGCATT (SEQ GGCCAGGTGCAGTGATTCAT D NO : 174) (SEO ID NO : 175)

WEG Off2 TCGTGTCATCTTGTTTGTGC (SEQ GGCAGAGCCCAGCGGACACT D NO : 176) (SEO ID NO : 177) US 9,388,430 B2 81 82 TABLE 2 - continued Patmatch Sedulences Target Site Downstream genomic sequence Upstream genomic sequence WEG Off3 CAAGGTGAGCCTGGGTCTGT (SEQ ATCACTGCCCAAGAAGTGCA D NO: 178) (SEO ID NO: 179) WEG Off4 TTGTAGGATGTTTAGCAGCA (SEQ ACTTGCTCTCTTTAGAGAAC D NO: 18O) (SEQ ID NO: 181)

CLT2. On CTCAAGCAGGCCCCGCTGGT (SEQ TTTTGGACCAAACCTTTTTG D NO: 182) (SEQ ID NO: 183)

CLT2 Off1 TGAGGTTATTTGTCCATTGT (SEQ TAAGGGGAGTATTTACACCA D NO: 184) (SEO ID NO: 185)

CLT2 Off2 TCAAGAGCAGAAAATGTGAC CTTGCAGGGACCTTCTGATT (SEQ ID NO: 186) (SEO ID NO: 187) CLT2 Off3 TGTGTGTAGGACTAAACTCT (SEQ GATAGCAGTATGACCTTGGG D NO: 188) (SEQ ID NO: 189) 2O Any target site sequences corresponding to the same size as sistent with any of the three Cas9 nuclease-induced cleavage. the reference genomic site in the human genome (GRCh37/ Total: total number of sequence counts while only the first hg19 build) were considered unmodified and any sequences 10,000 sequences were analyzed for the on-target site not the reference size were considered potential insertions or sequences. Modified: number of indels divided by total num deletions. Sequences not the reference size were aligned with 25 ber of sequences as percentages. Upper limits of potential ClustalW to the reference genomic site. Aligned sequences modification were calculated for sites with no observed indels with more than one insertion or one deletion in the DNA by assuming there is less than one indel then dividing by the spacer sequence in or between the two half-site sequences total sequence count to arrive at an upper limit modification were considered indels. Since high-throughput sequencing percentage, or taking the theoretical limit of detection (1/49, can result in insertions or deletions of one base pairs (mis 30 500), whichever value was larger. P-values: For wild-type phasing) at a low but relevant rates—indels of two by are Cas9 nuclease, Cas9 nickase or f(Cas9 nuclease, P-values more likely to arise from Cas9 induced modifications. were calculated as previously reported' using a two-sided Sample sizes for sequencing experiments were maximized Fisher's exact test between each sample treated with two (within practical experimental considerations) to ensure gRNAS targeting the CLTA on-target site and the control greatest power to detect effects. Statistical analyses for Cas9 sample treated with two gRNAs targeting the GFP on-target modified genomic sites in Table 3 were performed as previ 35 site. P-values of <0.0045 were considered significant and ously described with multiple comparison correction using shown based on conservative multiple comparison correction the Bonferroni method. using the Bonferroni method. On:off specificity is the ratio of Table 3, referred to in the Results below, shows (A) results on-target to off-target genomic modification frequency for from sequencing CLTA on-target and previously reported each site. (B) Shows experimental and analytic methods as in genomic off-target sites amplified from 150 ng genomic DNA 40 (A) applied to EMX target sites using a single plasmid isolated from human cells treated with a plasmid expressing expressing two gRNAS targeting the EMX on-target site either wild-type Cas9, Cas9 nickase, or f(cas9 and a single (gRNAE1 and gRNA E2). (C) shows experimental and ana plasmid expressing two gRNAS targeting the CLTA on-target lytic methods as in (A) applied to VEGF target sites using a site (gRNA C3 and gRNA C4). As a negative control, trans single plasmid expressing two gRNAS targeting the VEGF fection and sequencing were performed as above, but using 45 on-target site (gRNA V1 and gRNA v2). (D) shows experi two gRNAs targeting the GFP gene on-target site (gRNAG1. mental and analytic methods as in (A) applied to VEGF G2 or G3 and gRNA G4, G5, G6 or G7. Indels: the number of on-target and VEGF off-target site 1 amplified from 600 ng observed sequences containing insertions or deletions con genomic DNA to increase detection sensitivity to 1/198,000. TABLE 3 Cellular modification induced by wild-type Cas9, Cas9 nickase, and fas9 at on-target and off-target genomic sites. (A) Cas9 Cas9 Nuclease type: wt Cas9 wt Cas9 nickase fCas9 wt Cas9 nickase fCas9 gRNA pair target: CLTA CLTA CLTA CLTA GFP GFP GFP

Total expression 1OOO 125 1OOO 1OOO 1OOO 1OOO 1OOO plasmids (ng): CLTA Sites CLT2. On

Indels 3528 1423 3400 575 3 13 5 Total 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO Modified (%) 35.28O 14230 34.000 5.750 O.O3O O.130 O.OSO P-value <10E-3OO <10E-3OO <10E-3OO 1.4E-163 On:off specificity 1 1 1

US 9,388,430 B2 85 86 TABLE 3-continued Cellular modification induced by wild-type Cas9, Cas9 nickase, and fas9 at on-target and off-target genomic sites. VEG. On

indels 5253 2454 1230 1041 8 O 1 Total 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO 1OOOO Modified (%) 52.530 24.540 12.300 10.410 O.O8O <0.002 O.O10 P-value <10E-3OO <10E-300 <10E-3OO 6.6E-286 On:off specificity 1 1 1 1 VEG Off1

indels 2950 603 22 O O 4 1 Total 821.98 711.63 90434 77557 74765 79.738 74.109 Modified (%) 3.589 O.847 O.O24 <0.002 <0.002 O.OOS <0.002 P-value <10E-3OO 3.2E-188 2.5E-06 On:off specificity 15 29 SO6 >5150 VEG Off

indels 863 72 3 3 O 2 1 Tota 1 O2SO1 49836 1197O2 65107 S4247 65753 61556 Modified (%) O.842 0.144 O.OO3 O.OOS <0.002 O.OO3 <0.002 P-value 3.SE-159 9.6E-24 On:off specificity 62 170 >6090 >5150 VEG Off3

indels 260 33 3 2 3 1 O Tota 91277 83124 90063 84385 6.2126 6816S 69811 Modified (%) O.285 O.040 O.OO3 O.OO2 O.OOS <0.002 <0.002 P-value 6.8E-54 1.OE-05 On:off specificity 184 618 >6090 >5150 VEG Offa.

indels 1305 149 3 2 3 2 4 Tota 59827 412O3 65964 57828 60906 61219 62162 Modified (%) 2.181 O.362 O.OOS O.OO3 O.OOS O.OO3 O.OO6 P-value <10E-3OO 2.7E-54 On:off specificity 24 68 >6090 >5150 (D) Cas9 Cas9 Nuclease type: nickase fCas9 nickase fCas9 gRNA pair: VEGF VEGF GFP GFP

Total expression 1OOO 1OOO 1OOO 1OOO plasmids (ng): VEGF Sites VEG. On

Indels 2717 2122 10 13 Total 1OOOO 1OOOO 1OOOO 1OOOO Modified (%) 27.170 21.220 O.100 O.130 P-value <10E-300 <10E-300 On:off specificity 1 1 VEG Off1

Indels 67 30 3 2 Total 302573 233567 2O4454 190240 Modified (%) O.O22 O.O13 P-value 5.9E-12 2.5E-06 On:off specificity 1227 1652

Results 55 stringent spacing requirement than nickases. In human cells, Recently engineered variants of Cas9 that cleave only one fCas9 modified target DNA sites with efficiency comparable DNA strand (“nickases”) enable double-stranded breaks to be to that of nickases, and with >140-fold higher specificity than specified by two distinct gRNA sequences, 7 but still suffer wild-type Cas9. Target sites that conform to the substrate from off-target cleavage activity arising from the ability of requirements of fas9 are abundant in the human genome, each monomeric nickase to remain active when individually 60 occurring on average once every 34bp. bound to DNA.'' In contrast, the development of a FokI In cells, Cas9:gRNA-induced double strand breaks can nuclease fusion to a catalytically dead Cas9 that requires result in functional gene knockout through non-homologous simultaneous DNA binding and association of two FokI end joining (NHEJ) or alteration of a target locus to virtually dCas'9 monomers to cleave DNA is described here. Off-target any sequence through homology-directed repair (HDR) with DNA cleavage of the engineered FokI-dCas9 (f(cas'9) is fur 65 an exogenous DNA template.''' Cas9 is an especially ther reduced by the requirement that only sites flanked by two convenient genome editing platform,'” as a genome editing gRNAs ~15 or 25 base pairs apart are cleaved, a much more agent for each new target site of interest can be accessed by US 9,388,430 B2 87 88 simply generating the corresponding gRNA. This approach sites, 2324 a wide range of spacer sequence lengths was tested has been widely used to create targeted knockouts and gene between two gRNA binding sites within a test target gene, insertions in cells and model organisms, and has also been Emerald GFP (referred to hereafter as GFP) (FIG. 6C and recognized for its potential therapeutic relevance. FIG. 9). Two sets of gRNA binding-site pairs with different While Cas9:gRNA systems provide an unprecedented 5 orientations were chosen within GFP. One set placed the pair level of programmability and ease of use, studies' have of NGG PAM sequences distal from the spacer sequence, reported the ability of Cas9 to cleave off-target genomic sites, with the 5' end of the gRNA adjacent to the spacer (orientation resulting in modification of unintended loci that can limit the A) (FIG. 6C), while the other placed the PAM sequences usefulness and safety of Cas9 as a research tool and as a immediately adjacent to the spacer (orientation B) (FIG. 9). potential therapeutic. It was hypothesized that engineering 10 In total, seven pairs of gRNAs were suitable for orientation A, Cas9 variants to cleave DNA only when two simultaneous, and nine were suitable for orientation B. By pairwise combi adjacent Cas9:DNA binding events take place could substan nation of the gRNA targets, eight spacer lengths were tested tially improve genome editing specificity since the likelihood in both dimer orientations, ranging from 5 to 43 bp in orien of two adjacent off-target binding events is much Smaller than tation A, and 4 to 42 bp in orientation B. In total, DNA the likelihood of a single off-target binding event (approxi 15 constructs corresponding to 104 pairs of FokI-dCas9:gRNA mately 1/n vs. 1/n). Such an approach is distinct from the complexes were generated and tested, exploring four fusion recent development of mutant Cas9 proteins that cleave only architectures, 17 protein linker variants (described below), a single strand of dsDNA, such as nickases. Nickases can be both gRNA orientations and 13 spacer lengths between half used to nick opposite strands of two nearby target sites, gen sites. erating what is effectively a double strand break, and paired To assay the activities of these candidate FokI-dCas9: Cas9 nickases can effect substantial on-target DNA modifi gRNA pairs, a previously described flow cytometry-based cation with reduced off-target modification." Because fluorescence assay’ in which DNA cleavage and NHEJ of a each of the component Cas9 nickases remains catalytically stably integrated constitutively expressed GFP gene in active' and single-stranded DNA cleavage events are HEK293 cells leads to loss of cellular fluorescence was used weakly mutagenic," nickases can induce genomic modi 25 (FIG. 10). For comparison, the initial set of FokI-dCas9 vari fication even when acting as monomers.'" Indeed, Cas9 ants were assayed side-by-side with the corresponding Cas9 nickases have been previously reported to induce off-target nickases and wild-type Cas9 in the same expression plasmid modifications in cells. Moreover, since paired Cas9 nick across both gRNA spacer orientation sets A and B. Cas9 ases can efficiently induce dsDNA cleavage-derived modifi protein variants and gRNA were generated in cells by tran cation events when bound up to ~100 bp apart, the statistical 30 sient co-transfection of the corresponding Cas9 protein number of potential off-target sites for paired nickases is expression plasmids together with the appropriate pair of larger than that of a more spatially constrained dimeric Cas9 gRNA expression plasmids. The FokI-dCas9 variants, nick cleavage system. ases, and wild-type Cas9 all targeted identical DNA sites To further improve the specificity of the Cas9:gRNA sys using identical gRNAS. tem, an obligate dimeric Cas9 system is provided herein. In 35 Most of the initial FokI-dCas9 fusion variants were inac this example, fusing the FokI restriction endonuclease cleav tive or very weakly active (FIG. 11). The NLS-FokI-dCas9 age domain to a catalytically dead Cas9 (dCas9) created an architecture (listed from N to C terminus), however, resulted obligate dimeric Cas9 that would cleave DNA only when two in a 10% increase of GFP-negative cells above corresponding distinct FokI-dCas9:gRNA complexes bind to adjacent sites the no-gRNA control when used in orientation A, with PAMs (“half-sites') with particular spacing constraints (FIG. 6D). 40 distal from the spacer (FIG. 11A). In contrast, NLS-FokI In contrast with Cas9 nickases, in which single-stranded dCas'9 activity was undetectable when used on gRNA pairs DNA cleavage by monomers takes place independently, the with PAMs adjacent to the spacer (FIG.11B). Examination of DNA cleavage of FokI-dCas9 requires simultaneous binding the recently reported Cas9 structures’ reveals that the of two distinct FokI-dCas'9 monomers because monomeric Cas9 N-terminus protrudes from the RuvC1 domain, which FokI nuclease domains are not catalytically competent.' 45 contacts the 5' end of the gRNA:DNA duplex. Without wish This approach increased the specificity of DNA cleavage ing to be bound by any particular theory, it is speculated that relative to wild-type Cas9 by doubling the number of speci this arrangement places an N-terminally fused FokI distal fied target bases contributed by both monomers of the FokI from the PAM, resulting in a preference for gRNA pairs with dCas'9 dimer, and offered improved specificity compared to PAMs distal from the cleaved spacer (FIG. 6D). While other nickases due to inactivity of monomeric FokI-dCas9:gRNA 50 FokI-dCas9 fusion pairings and the other gRNA orientation complexes, and the more stringent spatial requirements for in some cases showed modest activity (FIG. 11), NLS-FokI assembly of a FokI-dCas9 dimer. dCas'9 with gRNAs in orientation A were chosen for further While fusions of Cas9 to short functional peptide tags have development. been described to enable gRNA-programmed transcriptional Next the protein linkers between the NLS and FokI regulation,’ it is believed that no fusions of Cas9 with active 55 domain, and between the FokI domain and dCas'9 in the enzyme domains have been previously reported. Therefore a NLS-FokI-dCas9 architecture were optimized. 17 linkers wide variety of FokI-dCas9 fusion proteins were constructed with a wide range of amino acid compositions, predicted and characterized with distinct configurations of a FokI flexibilities, and lengths varying from 9 to 21 residues were nuclease domain, dCas9 containing inactivating mutations tested (FIG. 12A). Between the FokI domain and dCas9 a D10A and H840A, and a nuclear localization sequence 60 flexible 18-residue linker, (GGS) (SEQ ID NO:15), and a (NLS). FokI was fused to either the N- or C-terminus of 16-residue “XTEN’ linker (FokI-L8 in FIG. 12A) were iden dCas9, and varied the location of the NLS to be at either tified based on a previously reported engineered protein with terminus or between the two domains (FIG. 6B). The length an open, extended conformation, as Supporting the highest of the linker sequence was varied as either one or three repeats levels of genomic GFP modification FIG. 12B). of Gly-Gly-Ser (GGS) between the FokI and dCas9 domains. 65 The XTEN protein was originally designed to extend the Since previously developed dimeric nuclease systems are serum half-life of translationally fused biologic drugs by sensitive to the length of the spacer sequence between half increasing their hydrodynamic radius, acting as protein US 9,388,430 B2 89 90 based functional analog to chemical PEGylation. Possess gene. Because decreasing the amount of Cas9 expression ing a chemically stable, non-cationic, and non-hydrophobic plasmid and gRNA expression plasmid during transfection primary sequence, and an extended conformation, it is generally did not proportionally decrease genomic modifica hypothesized that a portion of XTEN could function as a tion activity for Cas9 nickase and f(cas'9 (FIG. 15A-C), stable, inert linker sequence for fusion proteins. The sequence expression was likely not limiting under the conditions tested. of the XTEN protein tag from E-XTEN was analyzed, and As the gRNA requirements of f(cas9 potentially restricts repeating motifs within the amino acid sequence were the number of potential off-target substrates of f(cas9, the aligned. The sequence used in the FokI-dCas9 fusion con effect of guide RNA orientation on the ability offGas9, Cas9 struct FokI-L8 (FIG. 12A) was derived from the consensus nickase, and wild-type Cas9 to cleave target GFP sequences sequence of a common E-XTEN motif, and a 16 amino acid 10 were compared. Consistent with previous reports.'''Cas9 sequence was chosen from within this motif to test as a nickase efficiently cleaved targets when guide RNAs were FokI-dCas'9 linker. bound either in orientation A or orientation B, similar to Many of the FokI-dCas9 linkers tested including the opti wild-type Cas9 (FIG. 8A, B). In contrast, f(cas9 only cleaved mal XTEN linker resulted in nucleases with a marked pref the GFP target when guide RNAs were aligned in orientation erence for spacer lengths of ~15 and -25 bp between half 15 A (FIG. 8A). This orientation requirement further limits sites, with all other spacer lengths, including 20 bp, showing opportunities for undesired off-target DNA cleavage. substantially lower activity (FIG. 12B). This pattern of linker Importantly, no modification was observed by GFP disrup preference is consistent with a model in which the FokI tion or Surveyor assay when any of four single gRNAs were dCas'9 fusions must bind to opposite faces of the DNA double expressed individually with f(cas9, as expected since two helix to cleave DNA, with optimal binding taking place ~1.5 simultaneous binding events are required for FokI activity or 2.5 helical turns apart. The variation of NLS-FokI linkers (FIG. 7B and FIG. 8C). In contrast, GFP disruption resulted did not strongly affect nuclease performance, especially from expression of any single gRNA with wild-type Cas9 (as when combined with the XTEN FokI-dCas9 linker (FIG. expected) and, for two single gRNAs, with Cas9 nickase 12B). (FIG. 8C). Surprisingly, Surveyor assay revealed that In addition to assaying linkers between the FokI domain 25 although GFP was heavily modified by wild-type Cas9 with and dCas'9 in the NLS-FokI-dCas'9 architecture, four linker single gRNAs, neither f(cas9 nor Cas9 nickase showed detect variants between the N-terminal NLS and the FokI domain able modification (<-2%) in cells treated with single gRNAs were also tested (FIG. 12A). Although a NLS-GSAG (FIG. 16A). High-throughput sequencing to detect indels at SAAGSGEF(SEQ ID NO:20)-FokI-dCas9 linker exhibited the GFP target site in cells treated with a single gRNA and nearly 2-fold better GFP gene modification than the other 30 fCas9, Cas9 nickase, or wild-type Cas9 revealed the expected NLS-FokI linkers tested when a simple GGS linker was used substantial level of modification by wild-type Cas9 (3-7% of between the FokI and dCas9 domains (FIG.12B), the GSAG sequence reads). Modification by f(Cas9 in the presence of any SAAGSGEF (SEQ ID NO:20) linker did not perform sub of the four single gRNAs was not detected above background stantially better when combined with the XTEN linker (<-0.03% modification), consistent with the requirement of between the FokI and dCas'9 domains. 35 fCas9 to engage two gRNAs in order to cleave DNA. In The NLS-GGS-FokI-XTEN-dCas9 construct consistently contrast, Cas9 nickases in the presence of single gRNAS exhibited the highest activity among the tested candidates, resulted in modification levels ranging from 0.05% to 0.16% inducing loss of GFP in ~15% of cells, compared to ~20% and at the target site (FIG.16B). The detection of bona fide indels ~30% for Cas9 nickases and wild-type Cas9 nuclease, respec at target sites following Cas9 nickase treatment with single tively (FIG. 7A). All subsequent experiments were performed 40 gRNAS confirms the mutagenic potential of genomic DNA using this construct, hereafter referred to as f(as9. To confirm nicking, consistent with previous reports.''' the ability offCas9 to efficiently modify genomic target sites, The observed rate of nickase-induced DNA modification, the T7 endonuclease I Surveyor assay was used to measure however, did not account for the much higher GFP disruption the amount of mutation at each of seven target sites within the signal in the flow cytometry assay (FIG. 8C). Since the integrated GFP gene in HEK293 cells treated with fas9, 45 gRNAs that induced GFP signal loss with Cas9 nickase (gR Cas9 nickase, or wild-type Cas9 and either two distinct NAS G1 and G3) both target the non-template strand of the gRNAS in orientation A or no gRNAS as a negative control. GFP gene, and since targeting the non-template strand with Consistent with the flow cytometry-based studies, f(cas9 was dCas9 in the coding region of a gene is known to mediate able to modify the GFP target sites with optimal spacer efficient transcriptional repression,’ it is speculated that lengths of ~15 or ~25bp at a rate of -20%, comparable to the 50 Cas9 nickase combined with the G1 or G3 single guide RNAs efficiency of nickase-induced modification and approxi induced Substantial transcriptional repression, in addition to a mately two-thirds that of wild-type Cas9 (FIG. 7A-C). low level of genome modification. The same effect was not Next the ability of the optimized focas9 to modify four seen for fas9, Suggesting that fas9 may be more easily distinct endogenous genomic loci by Surveyor assay was displaced from DNA by transcriptional machinery. Taken evaluated. CLTA (two sites), EMX (two sites), HBB (six 55 together, these results indicate that f(cas9 can modify sites) VEGF (three sites), and were targeted with two gRNAs genomic DNA efficiently and in a manner that requires simul per site in orientation A spaced at various lengths (FIG. 13). taneous engagement of two guide RNAS targeting adjacent Consistent with the results of the experiments targeting GFP. sites, unlike the ability of wild-type Cas9 and Cas9 nickase to at appropriately spaced target half-sites fas9 induced effi cleave DNA when bound to a single guide RNA. cient modification of all four genes, ranging from 8% to 22% 60 The above results collectively reveal much more stringent target chromosomal site modification (FIG. 7D-G and FIG. spacer, gRNA orientation, and guide RNA pairing require 14). Among the gRNA spacer lengths resulting in the highest ments forfcas9 compared with Cas9 nickase. In contrast with modification at each of the five genes targeted (including fCas9 (FIG. 17), Cas9 nickase cleaved sites across all spacers GFP), fcas9 induced on average 15.6% (+6.3% s.d.) modifi assayed (5- to 47-bp in orientation A and 4 to 42 bp in cation, while Cas9 nickase and wild-type Cas9 induced on 65 orientation B in this work) (FIG. 8A, B). These observations average 22.1% (+4.9% s.d.) and 30.4% (+3.1% S.d.) modifi are consistent with previous reports of Cas9 nickases modi cation, respectively, from their optimal gRNA pairs for each fying sites targeted by gRNAs with spacer lengths up to 100 US 9,388,430 B2 91 92 bp apart. The more stringent spacer and gRNA orientation TABLE 4-continued requirements of f(cas9 compared with Cas9 nickase reduces the number of potential genomic off-target sites of the former Paired gRNA target site abundances for fas9 by approximately 10-fold (Table 4). Although the more strin and Cas9 nickase in the human genome. gent spacer requirements offCas9 also reduce the number of 5 17 6647352 6825904 potential targetable sites, sequences that conform to the fas9 18 6103603 697.3590 19 S896092 6349456 spacer and dual PAM requirements exist in the human 2O 6OOO683 5835825 genome on average once every 34 bp (9.2x10 sites in 3.1x 21 5858015 60S6352 10 bp) (Table 4). It is also anticipated that the growing 22 6116108 6531913 number of Cas9 homologs with different PAM specificities 10 23 599.1254 6941816 are amenable for use as described herein, and will further 24 61.14969 6572849 increase the number of targetable sites using the f(cas9 25 613S119 S671641 approach. (B) In Table 4 (A) column 2 shows the number of sites in the human genome with paired gRNA binding sites in orientation 15 Cas9 variant Preferred spacer lengths (bp) Total sites A allowing for a spacer length from -8bp to 25bp (column 1) fCas9 13 to 19, or 22 to 29, in orientation A 923S4891 between the two gRNA binding sites. gRNA binding sites in Cas9 nickase -8 to 100 in orientation A 953048977 orientation A have the NGG PAM sequences distal from the 4 to 42 in orientation B spacer sequence (CCNN-spacer-NoNGG). Column 3 shows the number of sites in the human genome with paired gRNA binding sites in orientation B allowing for a spacer To evaluate the DNA cleavage specificity of f(cas9, the length from 4 to 25 bp (column 1) between the two gRNA modification of known Cas9 off-target sites of CLTA, EMX, binding sites. gRNA binding sites in orientation B have the and VEGF genomic target sites were measured.''' The NGG PAM sequences adjacent to the spacer sequence target site and its corresponding known off-target sites (Table 5) were amplified from genomic DNA isolated from HEK293 (NoNGG spacer CCNNo). NC indicates the number of sites 25 in the human genome was not calculated. Negative spacer cells treated with f(cas9, Cas9 nickase, or wild-type Cas9 and lengths refer to target gRNA binding sites that overlap by the two gRNAs spaced 19 bp apart targeting the CLTA site, two indicated number of base pairs. Table 4 (B) shows the sum of gRNAs spaced 23 bp apart targeting the EMX site, two the number of paired gRNA binding sites in orientation A gRNAs spaced 14 bp apart targeting the VEGF site, or two with spacer lengths of 13 to 19 bp, or 22 to 29 bp, the spacer 30 gRNAS targeting an unrelated site (GFP) as a negative con preference of f(cas9 (FIG. 16). Sum of the number of paired trol. In total 11 off-target sites were analyzed by high gRNA binding sites with spacer lengths of -8bp to 100 bp in throughput sequencing. orientation A, or 4 to 42 bp in orientation B, the spacer The sensitivity of the high-throughput sequencing method preference of Cas9 nickases (4 to 42 bp in orientation B is for detecting genomic off-target cleavage is limited by the based on FIG. 8B, C, and -8bp to 100 bp in orientation A is 35 amount genomic DNA (gDNA) input into the PCR amplifi based on previous reports'). cation of each genomic target site. A 1 ng sample of human gDNA represents only ~330 unique genomes, and thus only TABLE 4 -330 unique copies of each genomic site are present. PCR amplification for each genomic target was performed on a Paired gRNA target site abundances for fas9 40 total of 150 ng of input gldNA, which provides amplicons and Cas9 nickase in the human genome. derived from at most 50,000 unique g|DNA copies. Therefore, (A) the high-throughput sequencing assay cannot detect rare genome modification events that occur at a frequency of less Number of paired Number of paired Spacer gRNA sites in gRNA sites in than 1 in 50,000, or 0.002%. length (b) orientation A orientation B 45 TABL E 5 -8 6874.293 NC -7 6785996 NC Known off-target substrates of Cas 9 target sites -6 6984064 NC in EMX, VEGF, and CLTA. List of genomic -5 7023260 NC on-target and off-targets sites of the EMX, VEGF, -4 64873O2 NC 50 and CLTA are shown with mutations from on-target -3 64O1348 NC in lower case and bold. -2 698.1383 NC PAMs are shown in upper case bold. -1 723OO98 NC O 7055143 NC Genomic target site 1 6598582 NC 2 6877046 NC 55 EMX On GAGTCCGAGCAGAAGAAGAAGGG 3 6971447 NC (SEQ ID NO: 19 O) 4 650S614 SS42549 5 6098107 5663458 EMX Off1 GAGgCCGAGCAGAAGAAagACGG 6 6254974 6819289 (SEQ ID NO: 1.91) 7 6680118 6061225 8 7687598 5702252 EMX Off2 GAGTCC tAGCAGgAGAAGAAGaG 9 6755736 7306646 60 (SEQ ID NO: 192) 10 6544849 6387485 11 691818.6 6172852 EMX Off3 GAGTCtaAGCAGAAGAAGAAGaG 12 6241723 S7994.96 (SEQ ID NO: 193) 13 6233385 7092283 14 6298717 7882433 EMX Off4 GAGT taCAGCAGAAGAAGAAAGG 15 6181422 7472725 65 (SEQ ID NO: 194) 16 6266909 6294684 US 9,388,430 B2 93 94 TABLE 5- continued larger input DNA samples and a greater number of sequence reads (150 versus 600 ng genomic DNA and >8x10 versus Known off-target substrates of Cas 9 target sites >23x10 reads for the initial and repeated experiments, in EMX, VEGF, and CLTA. List of genomic on-target and off-targets sites of the EMX, VEGF, respectively) to detect off-target cleavage at this site by Cas9 and CLTA are shown with mutations from on-target 5 nickase or fcas9. From this deeper interrogation, it was in lower case and bold. observed that Cas9 nickase and f(cas9 both significantly PAMs are shown in upper case bold. modify (P value-10, Fisher's Exact Test) VEGF off-target site 1 (FIG. 8G, Table 3D, FIG. 18B). For both experiments Genomic target site interrogating the modification rates at VEGF off-target site 1, WEG. On GGGTGGGGGGAGTTTGCTCCGG 10 fCas9 exhibited a greater on-target: off-target DNA modifica (SEO ID NO : 195) tion ratio than that of Cas9 nickase (>5,150 and 1,650 for WEG Off1 GGaTGGaGGGAGTTTGCTCCGG fCas9, versus 510 and 1,230 for Cas9 nickase, FIG. 8G). (SEQ ID NO: 196) On either side of VEGF off-target site 1 there exist no other sites with six or fewer mutations from either of the two WEG Off2 GGGaGGGGGAGTTTGCTCCGG 15 half-sites of the VEGF on-target sequence. The first 11 bases (SEO ID NO : 197) of one gRNA (V2) might hybridize to the single-stranded WEG Off3 cGGgGGaGGGAGTTTGCTCCTGG DNA freed by canonical Cas9:gRNA binding within VEGF (SEQ ID NO: 198) off-target site 1 (FIG. 18C). Through this gRNA:DNA WEG Off 4 GGGgaGGGGaAGTTTGCTCCTGG hybridization it is possible that a second Cas9 nickase or (SEQ ID NO: 199) fCas9 could be recruited to modify this off-target site at a very low, but detectable level. Judicious gRNA pair design could CLT2. On GCAGATGTAGTGTTTCCACAGGG eliminate this potential mode of off-target DNA cleavage, as (SEQ ID NO: 2OO) VEGF off-target site 1 is highly unusual in its ability to form CLT2 Off1 a CAaATGTAGTaTTTCCACAGGG 11 consecutive potential base pairs with the second gRNA of (SEQ ID NO: 2O1) 25 a pair. In general, f(as9 was unable to modify the genomic off-target sites tested because of the absence of any adjacent CLT2 Off2 cCAGATGTAGTaTTCCCACAGGG second binding site required to dimerize and activate the FokI (SEQ ID NO: 2O2) nuclease domain. CLT2 Off3 CAGATGaAGTGCTTCCACATGG The optimized FokI-dCas9 fusion architecture developed (SEQ ID NO: 2O3) 30 in this work modified all five genomic loci targeted, demon strating the generality of using fas9 to induce genomic Sequences containing insertions or deletions of two or modification in human cells, although modification with more base pairs in potential genomic off-target sites and fCas9 was somewhat less efficient than with wild-type Cas9. present in significantly greater numbers (P values0.005, The use offGas9 is straightforward, requiring only that PAM Fisher's exact test) in the target gRNA-treated samples versus 35 sequences be present with an appropriate spacing and orien the control gRNA-treated samples were considered Cas9 tation, and using the same gRNAs as wild-type Cas9 or Cas9 nuclease-induced genome modifications. For 10 of the 11 nickases. The observed low off-targeton-target modification off-target sites assayed, f(cas9 did not result in any detectable ratios offGas9, >140-fold lower than that of wild-type Cas9, genomic off-target modification within the sensitivity limit of likely arises from the distinct mode of action of dimeric FokI, the assay (<0.002%), while demonstrating substantial on 40 in which DNA cleavage proceeds only if two DNA sites are target modification efficiencies of 5% to 10% (FIG.8D-F and occupied simultaneously by two FokI domains at a specified Table 3). The detailed inspection of fas9-modified VEGF distance (here, ~15 bp or ~25 bp apart) and in a specific on-target sequences (FIG. 18A) revealed a prevalence of dele half-site orientation. The resulting unusually low off-target tions ranging from two to dozens of base pairs consistent with activity of f(cas') enable applications of Cas9:gRNA-based cleavage occurring in the DNA spacer between the two target 45 technologies that require a very high degree of target speci binding sites, similar to the effects of FokI nuclease domains ficity, such as ex vivo or in vivo therapeutic modification of fused to zinc finger or TALE DNA-binding domains.' human cells. In contrast, genomic off-target DNA cleavage was observed for wild-type Cas9 at all 11 sites assayed. Using the REFERENCES detection limit of the assay as an upper bound for off-target 50 fCas9 activity, it was calculated that f(cas9 has a much lower 1. Pattanayak, V. et al. High-throughput profiling of off-target off-target modification rate than wild-type Cas9 nuclease. At DNA cleavage reveals RNA-programmed Cas9 nuclease the 11 off-target sites modified by wild-type Cas9 nuclease, specificity. Nat. Biotechnol. 31, 839-843 (2013). fCas9 resulted in on-target:off-target modification ratios at 2. Fu, Y. etal. High-frequency off-target mutagenesis induced least 140-fold higher than that of wild-type Cas9 (FIG.8D-F). by CRISPR-Cas nucleases in human cells. Nat. Biotech Consistent with previous reports.' Cas9 nickase also 55 mol. 31, 822-826 (2013). induced substantially fewer off-target modification events 3. Hsu, P. D. et al. DNA targeting specificity of RNA-guided (1/11 off-target sites modified at a detectable rate) compared Cas9 nucleases. Nat. Biotechnol. 31, 827-832 (2013). to wild-type Cas9. An initial high-throughput sequencing 4. Cradick, T.J., Fine, E.J., Antico, C.J. & Bao, G. CRISPR/ assay revealed significant (Pvalue-10, Fisher's Exact Test) Cas9 systems targeting-globin and CCR5 genes have Sub modification induced by Cas9 nickases in 0.024% of 60 stantial off-target activity. Nucleic Acids Res. 41, 9584 sequences at VEGF off-target site 1. This genomic off-target 9592 (2013). site was not modified by f(cas9 despite similar VEGF on 5. Cho, S. W. et al. Analysis of off-target effects of CRISPR/ target modification efficiencies of 12.3% for Cas9 nickase Cas-derived RNA-guided endonucleases and nickases. and 10.4% for fas9 (FIG. 8F and Table 3C). Because Cas9 Genome Res. 24, 132-141 (2013). nickase-induced modification levels were within an order of 65 6. Ran, F. A. et al. Double Nicking by RNA-Guided CRISPR magnitude of the limit of detection and f(cas9 modification Cas9 for Enhanced Genome Editing Specificity. Cell 154, levels were undetected, the experiment was repeated with a 1380-1389 (2013). US 9,388,430 B2 95 96 7. Mali, P. et al. CAS9 transcriptional activators for target 30. Esvelt, K. M. et al. Orthogonal Cas9 proteins for RNA specificity Screening and paired nickases for cooperative guided gene regulation and editing. Nat. Methods 10, genome engineering. Nat. Biotechnol. 31, 833-838 (2013). 1116-1121 (2013). 8. Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. 31. Kim, Y., Kweon, J. & Kim, J.-S. TALENs and ZFNs are K. Improving CRISPR-Cas nuclease specificity using 5 associated with different mutation signatures. Nat. Meth truncated guide RNAs. Nat. Biotechnol. (2014). doi: ods 10, 185-185 (2013). 10.1038/nbt.2808 32. Shcherbakova, D. M. & Verkhusha, V. V. Near-infrared 9. Cong, L. et al. Multiplex Genome Engineering Using fluorescent proteins for multicolor in vivo imaging. Nat. CRISPR/Cas Systems. Science 339,819-823 (2013). Methods 10, 751-754 (2013). 10. Jinek, M. etal. A Programmable Dual-RNA-Guided DNA 10 33. Schneider, C.A., Rasband, W. S. & Eliceiri, K. W. NIH Endonuclease in Adaptive Bacterial Immunity. Science Image to Image.J. 25 years of image analysis. Nat. Methods 337,816-821 (2012). 9, 671-675 (2012). 11. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. 34. Sander, J. D. et al. In silico abstraction of zinc finger Cas9-crRNA ribonucleoprotein complex mediates specific nuclease cleavage profiles reveals an expanded landscape DNA cleavage for adaptive immunity in bacteria. Proc. 15 of off-target sites. Nucleic Acids Res. 41, e181-e181 Natl. Acad. Sci. 109, E2579-E2586 (2012). (2013). 12. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. 35. Schellenberger, V. et al. A recombinant polypeptide Genetic screens in human cells using the CRISPR-Cas9 extends the in vivo half-life of peptides and proteins in a system. Science 343, 80-84 (2014). tunable manner. Nat. Biotechnol. 27, 1186-1190 (2009). 13. Shalem, O. etal. Genome-Scale CRISPR-Cas'9 Knockout 20 36. Mali, P. et al. CAS9 transcriptional activators for target Screening in Human Cells. Science 343, 84-87 (2013). specificity Screening and paired nickases for cooperative 14. Perez, E. E. et al. Establishment of HIV-1 resistance in genome engineering. Nat. Biotechnol. 31, 833-838 (2013). CD4+ T cells by genome editing using zinc-finger 37. Ran, F. A. et al. Double Nicking by RNA-Guided CRISPR nucleases. Nat. Biotechnol. 26, 808–816 (2008). Cas9 for Enhanced Genome Editing Specificity. Cell 154, 15. Jinek, M. et al. RNA-programmed genome editing in 25 1380-1389 (2013). human cells. eLife 2, e()0471-e(00471 (2013). 38. Yan, T. et al. PatMatch: a program for finding patterns in 16. Mali, P. et al. RNA-Guided Human Genome Engineering peptide and nucleotide sequences. Nucleic Acids Res. 33. via Cas9. Science 339, 823-826 (2013). W262-W266 (2005). 17. Mali, P., Esvelt, K.M. & Church, G. M. Cas9 as a versatile 39. Larkin, M. A. et al. Clustal W and Clustal X version 2.0. tool for engineering biology. Nat. Methods 10, 957-963 30 Bioinformatics 23, 2947-2948 (2007). (2013). 18. Ramirez, C. L. et al. Engineered zinc finger nickases Example 2 induce homology-directed repair with reduced mutagenic Targeting CCR5 for Cas9 Variant-Mediated effects. Nucleic Acids Res. 40, 5560-5568 (2012). Inactivation 19. Wang, J. et al. Targeted gene addition to a predetermined as site in the human genome using a ZFN-based nicking In addition to providing powerful research tools, site-spe enzyme. Genome Res. 22, 1316-1326 (2012). cific nucleases also have potential as genetherapy agents, and 20. Gaj, T., Gersbach, C. A. & Barbas, C. F. ZFN, TALEN, site-specific Zinc finger endonucleases have recently entered and CRISPR/Cas-based methods for genome engineering. clinical trials: CCR5-2246, targeting a human CCR-5 allele Trends Biotechnol. 31,397-405 (2013). as part of an anti-HIV therapeutic approach. 21. Vanamee, E. S., Santagata, S. & Aggarwal, A. K. FokI 40 In a similar approach, the inventive Cas9 variants of the requires two specific DNA sites for cleavage. J. Mol. Biol. present disclosure may be used to inactivate CCR5, for 309, 69-78 (2001). example in autologous T cells obtained from a subject which, 22. Maeder, M. L. et al. CRISPR RNA-guided activation of once modified by a Cas9 variant, are re-introduced into the endogenous human genes. Nat. Methods 10, 977-979 subject for the treatment or prevention of HIV infection. (2013). 45 In this example, the CCR5 gene is targeted in T cells 23. Pattanayak, V., Ramirez, C. L., Joung, J. K. & Liu, D. R. obtained from a subject. CCR5 protein is required for certain Revealing off-target cleavage specificities of zinc-finger common types of HIV to bind to and enter T cells, thereby nucleases by in vitro selection. Nat. Methods 8, 765-770 infecting them. T cells are one of the white blood cells used by (2011). the body to fight HIV. 24. Guilinger, J. P. et al. Broad specificity profiling of TAL- 50 Some people are born lacking CCR5 expression on their T ENs results in engineered nucleases with improved DNA cells and remain healthy and are resistant to infection with cleavage specificity. Nat. Methods (2014). doi:10.1038/ HIV. Others have low expression of CCR5 on their T cells, nmeth.2845 and their HIV disease is less severe and is slower to cause 25. Nishimasu, H. etal. Crystal Structure of Cas9 in Complex disease (AIDS). with Guide RNA and Target DNA. Cell (2014). doi: In order to delete the CCR5 protein on the T cells, large 10.1016/j.cell.2014.02.001 55 numbers of T-cells are isolated from a subject. Cas9 variants 26. Jinek, M. et al. Structures of Cas9 Endonucleases Reveal (e.g., f(cas9) and gRNA capable of inactivating CCR5 are RNA-Mediated Conformational Activation. Science then delivered to the isolated T cells using a viral vector, e.g., (2014), doi:10.1126/science. 1247997 an adenoviral vector. Examples of suitable Cas9 variants 27. Schellenberger, V. et al. A recombinant polypeptide include those inventive fusion proteins provided herein. extends the in vivo half-life of peptides and proteins in a 60 Examples of Suitable target sequences for gRNAS targeting tunable manner. Nat. Biotechnol. 27, 1186-1190 (2009). the CCR5 allele include those described in FIG. 19, e.g., SEQ 28. Guschin, D. Y. et al. in Eng. Zinc Finger Proteins ID NOS:303-310 and 312-317. The viral vector(s) capable of (Mackay, J. P. & Segal, D.J.) 649, 247-256 (Humana Press, expressing the Cas9 variant and gRNA is/are added to the 2010). isolated T cells to knock out the CCR5 protein. When the T 29. Qi, L. S. et al. Repurposing CRISPR as an RNA-guided 65 cells are returned to Subject, there is minimal adenovirus or platform for sequence-specific control of gene expression. Cas9 variant protein present. The removal of the CCR5 pro Cell 152, 1173-1183 (2013). tein on the T cells Subjects receive, however, is permanent.

US 9,388,430 B2 107 108 - Continued claim can be modified to include one or more limitations found in any other claim that is dependent on the same base LADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTI claim. Furthermore, where the claims recite a composition, it DRKRYTSTKEWLDATLIHQSITGLYETRIDLSQLGGD is to be understood that methods of using the composition for 5 any of the purposes disclosed herein are included, and meth (underline: nuclear localization signal; bold: linker sequence) ods of making the composition according to any of the meth ods of making disclosed herein or other methods known in the Example 4 art are included, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradic Introduction of a Marker Gene by Homologous tion or inconsistency would arise. Recombination Using Cas9-Recombinase Fusion 10 Where elements are presented as lists, e.g., in Markush Proteins group format, it is to be understood that each subgroup of the elements is also disclosed, and any element(s) can be A vector carrying agreen fluorescent protein (GFP) marker removed from the group. It is also noted that the term “com gene flanked by genomic sequence of a host cell gene is prising is intended to be open and permits the inclusion of introduced into a cell, along with an expression construct 15 additional elements or steps. It should be understood that, in encoding a dOas9-recombinase fusion protein (any one of general, where the invention, or aspects of the invention, SEQ ID NO:328-339) and four appropriately designed is/are referred to as comprising particular elements, features, gRNAS targeting the GFP marker gene and the genomic locus steps, etc., certain embodiments of the invention or aspects of into which the GFP marker is recombined. Four dCas'9-re the invention consist, or consistessentially of such elements, combinase fusion proteins are coordinated at the genomic features, steps, etc. For purposes of simplicity those embodi locus along with the GFP marker gene through the binding of ments have not been specifically set forth in haec verba the gRNAs (FIG. 5B). The four recombinase domains of the herein. Thus for each embodiment of the invention that com fusion proteins tetramerize, and the recombinase activity of prises one or more elements, features, steps, etc., the inven the recombinase domains of the fusion protein results in the tion also provides embodiments that consist or consist essen recombination between the gemomic locus and the marker 25 tially of those elements, features, steps, etc. gene, thereby introducing the marker gene into the genomic Where ranges are given, endpoints are included. Further locus. Introduction of the marker gene is confirmed by GFP more, it is to be understood that unless otherwise indicated or expression and/or by PCR. otherwise evident from the context and/or the understanding of one of ordinary skill in the art, values that are expressed as EQUIVALENTS AND SCOPE ranges can assume any specific value within the stated ranges 30 in different embodiments of the invention, to the tenth of the Those skilled in the art will recognize, or be able to ascer unit of the lower limit of the range, unless the context clearly tain using no more than routine experimentation, many dictates otherwise. It is also to be understood that unless equivalents to the specific embodiments of the invention otherwise indicated or otherwise evident from the context described herein. The scope of the present invention is not and/or the understanding of one of ordinary skill in the art, intended to be limited to the above description, but rather is as 35 values expressed as ranges can assume any subrange within set forth in the appended claims. the given range, wherein the endpoints of the subrange are In the claims articles such as “a.”“an.” and “the may mean expressed to the same degree of accuracy as the tenth of the one or more than one unless indicated to the contrary or unit of the lower limit of the range. otherwise evident from the context. Claims or descriptions In addition, it is to be understood that any particular that include "or” between one or more members of a group are 40 embodiment of the present invention may be explicitly considered satisfied if one, more than one, or all of the group excluded from any one or more of the claims. Where ranges members are present in, employed in, or otherwise relevant to are given, any value within the range may explicitly be a given product or process unless indicated to the contrary or excluded from any one or more of the claims. Any embodi otherwise evident from the context. The invention includes ment, element, feature, application, or aspect of the compo embodiments in which exactly one member of the group is 45 sitions and/or methods of the invention, can be excluded from present in, employed in, or otherwise relevant to a given any one or more claims. For purposes of brevity, all of the product or process. The invention also includes embodiments embodiments in which one or more elements, features, pur in which more than one, or all of the group members are poses, or aspects is excluded are not set forth explicitly present in, employed in, or otherwise relevant to a given herein. product or process. All publications, patents and sequence database entries Furthermore, it is to be understood that the invention 50 mentioned herein, including those items listed above, are encompasses all variations, combinations, and permutations hereby incorporated by reference in their entirety as if each in which one or more limitations, elements, clauses, descrip individual publication or patent was specifically and indi tive terms, etc., from one or more of the claims or from vidually indicated to be incorporated by reference. In case of relevant portions of the description is introduced into another conflict, the present application, including any definitions claim. For example, any claim that is dependent on another herein, will control.

SEQUENCE LISTING

<16O > NUMBER OF SEQ ID NOS: 339

<210s, SEQ ID NO 1 <211 LENGTH: 4104 <212. TYPE: DNA <213> ORGANISM: Streptococcus pyogenes

<4 OOs, SEQUENCE: 1

US 9,388,430 B2 111 112 - Continued aaacgaatcg aagaaggt at Caaagaatta ggaagttcaga ttcttaaaga gcatcCtgtt 24 OO gaaaatactic aattgcaaaa togaaaagctic tat ct ctatt atctacaaaa toggaagagac 246 O atgitatgtgg accaagaatt agatattaat cqtttaagtg attatgatgt cqat cacatt 252O gttccacaaa gttt cattaa agacgattica ataga caata agg tactaac gcgttctgat 2580 aaaaatcgtg gtaaatcgga taacgttcca agtgaagaag tagt caaaaa gatgaaaaac 264 O tattggagac aacttctaaa cqccaagtta at cacticaac gitaagtttga taatttalacg 27 OO aaagctgaac gtggaggttt gagtgaactt gataaagctg gttitt at caa acgc.caattg 276 O gttgaaactic gccaaatcac taa.gcatgtg gcacaaattt toggatagt cq catgaatact 282O aaatacgatgaaaatgataa act tatt cqa gaggittaaag togattacctt aaaatctaaa 288O ttagtttctg act tcc.gaaa agattitccaa ttctataaag tacgtgagat taacaattac 294 O catcatgc cc atgatgcgta t ctaaatgcc gtcgttggaa citgctittgat taagaaatat 3 OOO c caaaacttgaatcqgagtt tdtctatogt gattataaag tittatgatgt togtaaaatg 3 O 6 O attgctaagt ctdagcaaga aataggcaaa goaac cqcaa aatattt citt ttactictaat 312 O at catgaact tcttcaaaac agaaattaca cittgcaaatg gagagatt.cg caaacgc cct 318O Ctaatcgaala Ctaatgggga aactggagaa attgtctggg ataaagggcg agattittgcc 324 O acagtgcgca aagtattgtc. catgc.cccaa gtcaat attgtcaagaaaac agaagtacag 33 OO acagg.cggat t ct coaagga gttcaattitta ccaaaaagaa attcqgacaa gottattgct 3360 cgtaaaaaag actgggat.cc aaaaaaat at ggtggittittgatagt ccaac ggtagctitat 342O tcagtCctag tigttgct aa gotggaaaaa giggaaatcga agaagttaala at CC9ttaala 3480 gagttactag gigatcacaat tatggaaaga agttcctittgaaaaaaatcc gattgactitt 354 O ttagaa.gcta aaggatataa gogaagittaaa aaagacittaa toattaaact acctaaatat 36OO agtictttittg agittagaaaa C9gtcgtaala C9gatgctgg Ctagtgc.cgg agaattacaa 366 O aaaggaaatg agctggct cit gccaa.gcaaa tatgtgaatt ttittatattt agctagt cat 372 O tatgaaaagt talagggtag ticcagaagat aacgaacaaa aacaattgtt ttggagcag 378 O cataag catt atttagatga gattattgag caaat cagtgaattittctaa gogtgttatt 384 O ttagcagatg cca atttaga taaagttctt agtgcatata acaaacatag agacaaacca 3900 atacgtgaac aag cagaaaa tattatt cat ttatttacgt togacgaatct toggagct coc 396 O gctgcttitta aatattittga tacaacaatt gatcgtaaac gatatacgt.c tacaaaagaa 4 O2O gttittagatg ccactic titat ccatcaatcc at cactggtc tittatgaaac acgcattgat 4 O8O ttgagt cagc taggaggtga Ctga 4104

<210s, SEQ ID NO 2 &211s LENGTH: 1367 212. TYPE: PRT <213> ORGANISM: Streptococcus pyogenes

<4 OOs, SEQUENCE: 2 Met Asp Llys Llys Tyr Ser Ile Gly Lieu. Asp Ile Gly Thr Asn. Ser Val 1. 5 1O 15

Gly Trp Ala Val Ile Thr Asp Asp Tyr Llys Val Pro Ser Lys Llys Phe 2O 25 3O Llys Val Lieu. Gly Asn. Thir Asp Arg His Ser Ile Llys Lys Asn Lieu. Ile 35 4 O 45

Gly Ala Lieu. Lieu. Phe Gly Ser Gly Glu Thir Ala Glu Ala Thr Arg Lieu. SO 55 6 O US 9,388,430 B2 113 114 - Continued

Lys Arg Thir Ala Arg Arg Arg Thir Arg Arg Asn Arg Ile 65 70

Luell Glin Glu Ile Phe Ser Asn Glu Met Ala Wall Asp Asp Ser 85 90 95

Phe Phe His Arg Lell Glu Glu Ser Phe Luell Wall Glu Glu Asp 105 11 O

His Glu Arg His Pro Ile Phe Gly Asn Ile Wall Asp Glu Wall Ala 115 12 O 125

His Glu Pro Thir Ile His Luell Arg Lys Luell Ala Asp 13 O 135 14 O

Ser Thir Ala Asp Lell Arg Luell Ile Tyr Lell Ala Luell Ala His 145 150 155 160

Met Ile Phe Arg Gly His Phe Luell Ile Glu Gly Asp Luell Asn Pro 1.65 17O 17s

Asp Asn Ser Asp Wall Asp Luell Phe Ile Glin Lell Wall Glin Ile 18O 185 19 O

Asn Glin Luell Phe Glu Glu Asn Pro Ile Asn Ala Ser Arg Wall Asp Ala 195

Ala Ile Luell Ser Ala Arg Luell Ser Ser Arg Arg Luell Glu Asn 21 O 215

Lell Ile Ala Glin Lell Pro Gly Glu Arg ASn Gly Lell Phe Gly Asn 225 23 O 235 24 O

Lell Ile Ala Luell Ser Lell Gly Luell Thir Pro ASn Phe Ser Asn Phe 245 250 255

Asp Luell Ala Glu Asp Ala Luell Glin Luell Ser Asp Thir Asp 26 O 265 27 O

Asp Asp Luell Asp Asn Lell Lell Ala Glin Ile Gly Asp Glin Ala Asp 285

Lell Phe Luell Ala Ala Asn Luell Ser Asp Ala Ile Lell Luell Ser Asp 29 O 295 3 OO

Ile Luell Arg Wall Asn Ser Glu Ile Thir Ala Pro Lell Ser Ala Ser 3. OS 310 315 32O

Met Ile Arg Tyr Asp Glu His His Glin Asp Lell Thir Luell Luell 3.25 330 335

Ala Luell Wall Arg Glin Glin Lell Pro Glu Glu Ile Phe Phe 34 O 345 35. O

Asp Glin Ser Asn Gly Ala Gly Ile Asp Gly Gly Ala Ser 355 360 365

Glin Glu Glu Phe Tyr Phe Ile Pro Ile Lell Glu Met Asp 37 O 375

Gly Thir Glu Glu Lell Lell Wall Luell Asn Arg Glu Asp Luell Luell Arg 385 390 395 4 OO

Glin Arg Thir Phe Asp Asn Gly Ser Ile Pro His Glin Ile His Luell 4 OS 41O 415

Gly Glu Luell His Ala Ile Lell Arg Arg Glin Glu Asp Phe Tyr Pro Phe 425 43 O

Lell Asp Asn Arg Glu Ile Glu Ile Lell Thir Phe Arg Ile 435 44 O 445

Pro Tyr Wall Gly Pro Lell Ala Arg Gly ASn Ser Arg Phe Ala Trp 450 45.5 460

Met Thir Arg Ser Glu Glu Thir Ile Thir Pro Trp Asn Phe Glu Glu 465 470 47s 48O US 9,388,430 B2 115 116 - Continued

Wall Wall Asp Gly Ala Ser Ala Glin Ser Phe Ile Glu Arg Met Thir 485 490 495

Asn Phe Asp Lys Asn Lell Pro Asn Glu Lys Wall Lell Pro Lys His Ser SOO 505 51O

Lell Luell Tyr Glu Tyr Phe Thir Wall Asn Glu Lell Thir Wall 515 525

Wall Thir Glu Gly Met Arg Pro Ala Phe Lell Ser Glu Glin 53 O 535 54 O

Lys Ala Ile Wall Asp Lell Luell Phe Thir Asn Arg Wall Thir 5.45 550 555 560

Wall Glin Luell Lys Glu Asp Phe Lys Ile Glu Phe Asp 565 st O sts

Ser Wall Glu Ile Ser Gly Wall Glu Asp Arg Phe Asn Ala Ser Luell Gly 585 59 O

Ala His Asp Lell Lell Ile Ile Asp Asp Phe Luell Asp 595 605

Asn Glu Glu Asn Glu Asp Ile Luell Glu Asp Ile Wall Lell Thir Luell Thir 610 615 62O

Lell Phe Glu Asp Arg Gly Met Ile Glu Glu Arg Lell Thir Ala 625 630 635 64 O

His Luell Phe Asp Asp Wall Met Glin Luell Arg Arg Arg 645 650 655

Thir Gly Trp Gly Arg Lell Ser Arg Lys Luell Ile Asn Gly Ile Arg Asp 660 665 67 O

Gln Ser Gly Lys Thr Ile Lieu Asp Phe Lieu. Ser Asp Gly Phe 675 685

Ala Asn Arg Asn Phe Met Glin Luell Ile His Asp Asp Ser Luell Thir Phe 69 O. 695 7 OO

Lys Glu Asp Ile Glin Lys Ala Glin Wall Ser Gly Glin Gly His Ser Luell 7 Os

His Glu Glin Ile Ala Asn Lell Ala Gly Ser Pro Ala Ile Lys Gly 72 73 O 73

Ile Luell Glin Thir Wall Ile Wall Asp Glu Luell Wall Wall Met Gly 740 74. 7 O

His Pro Glu Asn Ile Wall Ile Glu Met Ala Arg Glu Asn Glin Thir 760 765

Thir Glin Gly Glin Asn Ser Arg Glu Arg Met Arg Ile Glu 770 775 78O

Glu Gly Ile Glu Lell Gly Ser Glin Ile Luell Glu His Pro Wall 79 O 79.

Glu Asn Thir Glin Lell Glin Asn Glu Luell Lell Luell Glin 805 810 815

Asn Gly Arg Asp Met Wall Asp Glin Glu Luell Asp Ile Asn Arg Luell 825 83 O

Ser Asp Tyr Asp Wall His Ile Wall Pro Glin Ser Phe Ile Asp 835 84 O 845

Asp Ser Ile Asp Asn Wall Luell Thir Arg Ser Asp Asn Gly 850 855 860

Lys Ser Asp Asn Wall Pro Ser Glu Glu Wall Wall Met Asn 865 87O

Trp Arg Glin Lell Lell Asn Ala Luell Ile Thir Glin Arg Lys Phe 885 890 895

Asp Asn Luell Thir Lys Ala Glu Arg Gly Gly Luell Ser Glu Luell Asp