MAIMUUTTUNUT US009963689B2 PILIT MUUT AIDI MINI (12 ) United States Patent ( 10 ) Patent No. : US 9 ,963 , 689 B2 Doudna et al. (45 ) Date of Patent: May 8 , 2018 (54 ) CASI CRYSTALS AND METHODS OF USE FOREIGN PATENT DOCUMENTS THEREOF WO WO 2013 / 126794 AL 8 / 2013 ( 71 ) Applicant: The Regents of the University of wo WO 2013 / 142578 A19 / 2013 California , Oakland , CA (US ) WO WO 2013 / 176772 A1 11/ 2013 ( 72 ) Inventors : Jennifer A . Doudna , Oakland , CA OTHER PUBLICATIONS (US ) ; Samuel H . Sternberg , Oakland , McPherson , A . Current Approaches to Macromolecular Crystalli CA (US ) ; Martin Jinek , Oakland , CA zation . European Journal of Biochemistry . 1990 . vol . 189 , pp . (US ) ; Fuguo Jiang , Oakland , CA (US ); 1 - 23 . * Emine Kaya , Oakland , CA (US ) ; Kundrot, C . E . Which Strategy for a Crystallization Project ? Cellular Molecular Life Science . 2004 . vol . 61, pp . 525 - 536 . * David W . Taylor, Jr. , Oakland , CA Benevenuti et al. , Crystallization of Soluble in Vapor (US ) Diffusion for X - ray Crystallography , Nature Protocols , published on - line Jun . 28 , 2007 , 2 ( 7 ) : 1633 - 1651. * ( 73 ) Assignee : THE REGENTS OF THE Cudney R . Protein Crystallization and Dumb Luck . The Rigaku UNIVERSITY OF CALIFORNIA , Journal. 1999 . vol. 16 , No . 1 , pp . 1 - 7 . * Drenth , “ Principles of Protein X -Ray Crystallography ” , 2nd Edi Oakland, CA (US ) tion , 1999 Springer - Verlag New York Inc ., Chapter 1 , p . 1 - 21. * Moon et al . , “ A synergistic approach to protein crystallization : ( * ) Notice: Subject to any disclaimer, the term of this Combination of a fixed -arm carrier with surface entropy reduction ” , patent is extended or adjusted under 35 Protein Science , 2010 , 19 : 901- 913 . * U . S .C . 154 (b ) by 0 days . days . McPherson & Gavira, “ Introduction to protein crystallization ” , Acta Crystallographica Section F Structural Biology Communications, 2014 , F70 , pp . 2 - 20 . * (21 ) Appl. No. : 15 /108 , 545 Koo , et al. , “ Crystal Structure of Streptococcus pyogenes Csn2 Reveals Calcium - Dependent Conformational Changes in Its Ter (22 ) PCT Filed: Dec . 29 , 2014 tiary and Quaternary Structure ” ; PLoS One ; vol. 7 , No. 3 , e33401 (Mar . 30 , 2012 ) . Makarova , et al . ; “ Evolution and Classification of the CRISPR - Cas ( 86 ) PCT No . : PCT /US2014 /072590 Systems" ; Nat . Rev . Microbiol. , vol . 9 , No . 6 , pp . 467 -477 ( Jun . $ 371 (c )( 1) , 2011 ) . Mali , et al . ; " Cas9 as a Versatile Tool for Engineering Biology ” ; Nat. ( 2 ) Date : Jun . 27, 2016 Methods; vol. 10 , No. 10 , pp . 957 - 963 (Oct . 2013 ) . Nishimasu , et al . ; “ Crystal Structure of Cas9 in Complex With (87 ) PCT Pub. No. : WO2015 /103153 Guide RNA and Target DNA” ; Cell ; vol. 156 , No. 5 , pp . 935 - 949 (Feb . 27 , 2014 ) . PCT Pub . Date : Jul. 9 , 2015 Osawa , et al . , “ Crystallization and Preliminary X -Ray Diffraction Analysis of the Cmr2 -Cmr3 Subcomplex in the CRISPR - Cas RNA ( 65) Prior Publication Data Silencing Effector Complex ” ; Acta Crystallogr Sect F Struct Biol Cryst Commun . ; vol . 69, No. 5 , pp . 585 -587 (May 1 , 2013 ) . US 2016 /0319262 A1 Nov . 3 , 2016 International Search Report and Written Opinion of PCT Applica tion No . PCT/ US2014 /072590 , dated Apr. 22 , 2015 . Barrangou et al . , " CRISPR Provides Acquired Resistance Against Related U . S . Application Data Viruses in Prokaryotes ” Science 315 , 1709 - 1712 ( 2007 ) . Berger, S . J . Gamblin , S . C . Harrison , J . C . Wang, “ Structure and (60 ) Provisional application No. 61 /922 ,556 , filed on Dec . mechanism of DNA topoisomerase II ” Nature 379 , 225 - 232 ( 1996 ) . 31, 2013 . Bikard et al. , “ Programmable repression and activation of bacterial gene expression using an engineered CRISPR -Cas system ” Nucleic (51 ) Int. Ci. Acids Res 41 , 7429 - 7437 ( 2013 ). C12N 9 / 22 ( 2006 . 01 ) ( Continued ) C12N 15 / 10 ( 2006 .01 ) Primary Examiner — Suzanne M Noakes G06F 19 / 16 ( 2011. 01) (74 ) Attorney , Agent, or Firm — Bozicevic Field & (52 ) U . S . CI. Francis , LLP ; Paula A . Borden CPC ...... C12N 9 /22 ( 2013 .01 ) ; C12N 15 / 1031 (2013 .01 ) ; C12Y 301/ 00 ( 2013 .01 ) ; G06F (57 ) ABSTRACT 19 / 16 ( 2013 .01 ) ; COZK 2299 / 00 ( 2013 .01 ) The present disclosure provides atomic structures of Cas9 (58 ) Field of Classification Search with and without polynucleotides bound thereto . Also pro None vided is a computer- readable medium comprising atomic See application file for complete search history . coordinates for Cas9 polypeptides in both an unbound configuration and a configuration wherein the Cas9 poly peptide is bound to one or more polynucleotides. The (56 ) References Cited present disclosure provides crystals comprising Cas9 poly U . S . PATENT DOCUMENTS peptides ; and compositions comprising the crystals . The present disclosure provides methods for the engineering of 8 , 260 ,596 B2 9 / 2012 Kobilka et al. Cas9 polypeptides wherein Cas9 activity has been altered , 2011/ 0217739 A1 9 / 2011 Terns et al. 2012 /0028287 A12 / 2012 Chen et al. ablated , or preserved and amended with additional activities . 2014 /0179006 A1 6 / 2014 Zhang 6 Claims, 58 Drawing Sheets US 9 , 963, 689 B2 Page 2

( 56 ) References Cited Mali et al. , “ CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering " Nat Biotechnol; vol. 31, No . 9 , pp . 833 -838 ( Sep . 2013 ) . OTHER PUBLICATIONS Qi et al. , “ Repurposing CRISPR as an RNA -Guided Platform for Brouns et al. , “ Small CRISPR RNAs Guide Antiviral Defense in Sequence -Specific Control of Gene Expression ” Cell 152 , 1173 Prokaryotes ” Science 321 , 960 - 964 ( 2008 ) . 1183 (2013 ) . Chylinski, A . Le Rhun , E . Charpentier, “ The tracrRNA and Cas9 Sampson , S . D . Saroj, A . C . Llewellyn , Y . - L . Tzeng, D . S . Weiss, “ A families of type II CRISPR -Cas immunity systems. ” RNA Biology CRISPR / Cas system mediates bacterial innate immune evasion and 10 , 726 - 737 ( 2013 ). virulence” Nature ; vol . 497 , No . 7448, pp . 254 -257 (May 9 , 2013 ). Esvelt et al. , “ Orthogonal Cas9 proteins for RNA - guided gene Sapranauskas et al ., “ The Streptococcus thermophilus CRISPR /Cas regulation and editing ” Nat Methods; vol. 10 , No. 11 , pp . 1116 - 1121 system provides immunity in Escherichia coil Nucleic Acids Res (Nov . 2013 ) . 39 , 9275 - 9282 (2011 ) . Garneau et al. , “ The CRISPR / Cas bacterial immune system cleaves Shen , M . Landthaler, D . A . Shub , B . L . Stoddard , “ DNA binding bacteriophage and plasmid DNA ” Nature 468 , 67 - 71 ( 2010 ) . and cleavage by the HNH homing endonuclease I - Hmul. ” J Mol Gasiunas, R . Barrangou , P . Horvath , V . Siksnys , “ Cas9 -crRNA Biol 342 , 43 -56 ( 2004 ) . ribonucleoprotein complex mediates specific DNA cleavage for Wiedenheft et al. , " Structural basis for DNase activity of a con adaptive immunity in bacteria . ” Proc Natl Acad Sci US A 109 , served protein implicated in CRISPR -mediated genome defense ” ; E2579 - 86 ( 2012 ). Structure ; vol. 17 , pp . 904 - 912 (Jun . 10 , 2009 ) . Gilbert et al. , “ CRISPR -Mediated Modular RNA -Guided Regula Wiedenheft, S . H . Sternberg , J . A . Doudna , “ RNA - guided genetic tion of Transcription in Eukaryotes ” Cell 154 , 442 -451 (2013 ) . silencing systems in bacteria and archaea ” Nature 482 , 331 - 338 Hou et al. , “ Efficient genome engineering in human pluripotent stem ( 2012 ) . cells using Cas9 from Neisseria meningitidis ” Proc Natl Acad Sci Zhang et al . , “ Processing - independent CRISPR RNAs limit natural USA ; vol. 110 , No . 39 , pp . 15644 - 15649 ( Sep . 24 , 2013 ) . transformation in Neisseria meningitidis .” Mol Cell 50 , 488 -503 Jinek et al. , “ A Programmable Dual - RNA — Guided DNA (2013 ). Endonuclease in Adaptive Bacterial Immunity ” Science 337 , 816 Jinek , et al. ; “ Structures of Cas9 Endonucleases Reveal RNA 821 (2012 ) . Mediated Conformational Activation ” ; Science; vol. 343 , No . 6176 , Makarova , L . Aravind , Y . I. Wolf , E . V . Koonin , “ Unification of Cas pp . 1 -28 (Mar . 14 , 2014 ). protein families and a simple scenario for the origin and evolution of CRISPR - Cas systems” Biol Direct 6 , 38 (2011 ). * cited by examiner U . S . Patent May 8 , 2018 Sheet 1 of 58 US 9 , 963, 689 B2

FIGURE 1 Site -directed modifying polypeptide ? / 7 Target DNA wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwwww

Molecule one DNA - targeting RNA " targeter- RNA " ( double -molecule ) Molecule two " activator- RNA "

First segment Second segment " DNA - targeting segment " " Protein -binding segment"

Site -directed modifying polypeptide

.

.

.,' Target DNA

WWWW DNA -targeting RNA (single -molecule )

“ Linker nucleotides "

First segment Second segment " DNA -targeting segment" " Protein - binding segment" U . S . Patent May 8 , 2018 Sheet 2 of 58 US 9 , 963, 689 B2

turisti-*

r engti superposition: rmsd3.Å !13 121Caatoms

AN UA hii , multiplier ( .

. complex(PDB4LDO)

+

LUULULL.

RuvCsubstrate , WW** T.thermophilus Severipin -WELL Continentialtamente YUVA

FIGURE2A iling

. Lwithin spomines ne .

w P

*** * it:

the theretime Blog

24. Spycaseprices V U . S . Patent May 8 , 2018 Sheet 3 of 58 US 9 , 963 ,689 B2

FIGURE 2B 5 * end

non !11111111 !

11 3 end

WH WA U . S . Patent May 8 , 2018 Sheet 4 of 58 US 9, 963, 689 B2

WH143\H143 3 SY ?to D147 T.thermophilusRuvc ourmani

FIGURE2C

H983a duration ** H982 HTE766E766 H986 D10 SpyCas9 A E7621 U . S . Patent May 8 , 2018 Sheet 5 of 58 US 9 , 963, 689 B2

13681300 CTD 1902200

71876592511021200 -RuyHNH|Topo.--.'

705 798

Alpha-helicallobe Nucleasedomainlobe

S.pyogenesCas9(SpyCas9) 70 FIGURE3 02 RuveArg U . S . Patent May 8 , 2018 Sheet 6 of 58 US 9, 963, 689 B2

FIGURE 4A Ruvc H4HNH active site HHHW active site

alpha????? W w -helical lobe RuvcRu?c 10 /

XX.. ..

ht &900

CID ??? ? . ? &; .. . itti . .!! ., * * * * -- mienttiilit sti. -

!!% * Haiti : * . . :

!win In

in **

th

th U . S . Patent May 8 , 2018 Sheet 7 of 58 US 9 , 963 ,689 B2

HNW* FIGUREFIGURE 4B

WW* vw ww 777 wo ????????????? RUVC

Kulit CTD

buh WHICH

WORLD T U . S . Patent May 8 , 2018 Sheet 8 of 58 US 9, 963, 689 B2

ot TRKLEJOR you-buy TO1000 WA * 09 00000000000000 DET

OS OTT

+

+

+ a!DALLANYHX&A991VNUSNUNNOAXISCS14233NYOLXSKOVIPOZASLUNDUNILE

+ 0*

+ MUIHILCHASA125370mysaanacdscamoavaan ??, + METKEL2xAvaadazniamanungs72.61ComandasagaSNSambasador WIJNWALTER + *+ sunavazovalaxyspanaSATLICATRHsagauskaava201 UNLUXTSX011sd1A0ITEMNIXXSINOTAXMSANOILAVNOSRLDISHz +* SBEMGKVDDSTEHRLEDSFLYTEDRRGERMPITGULEZEVKERENGRLARRRGRILSLOKLEAPEILKIDEHEEARLNESELUPDERKOSRUPVEATIKEKSEROT MAULANTAVSHISosiaaa1V9ITRINISHEQINDTAXAsaaxxAGNISBAIAVIMO * 1-QANU ogoropa BULHYTHOVENE0IVOIIHAZAUCINITARNAVdQuadIIISRORSH107915XXRHOLAAVKORSNL9191sKarinsWHYLUXTROYNoSaIVOTIRMUSxauNDTANNYVANZA OOT Et EETLINSARVSYOplaasApNamassingINOVANNAsanadTASKORSNOITOLIXENXH9 20 HULLWR2XOo990THAIMARYIOMIS09VIXNXAIXOGIIAVKORENI919LILXWHE 06

I DOUD00000000OOVODOTTO0000

*

FIGURE5A you-on ATTADALXH 5RYTRRRNRILYLOED 1.Spyogenes 2.Sthermophilus 3.Linnocua4.Sagalactiae 5.Smutans 6.Efaecium ?????????????????? ?????????????????RYTRRRNRIL U . S . Patent May 8, 2018 Sheet 9 of 58 US 9 , 963, 689 B2

ESDLS.LE ! * TT22 200 270

: U BRGHFLIEGKFOTRNNOVORLEQEELAVYDNTE.ENSSLGE 260 190 1KGVOSTDKADLRLIYLALARMIKERGHELIE.GDLOPDRSDVDKLBIQVQTYNOLOSENPINA

to w 21QeQRQRUR 180 IIRERGHELZEDOSFOVRNTOISKOXQDELEIFNTTE. Wwwert RGUFLIE-CBFSKNNOIOKNEDELDTY 240 50NVOVEEILTDKISKSAXKDRVIKLEPNERSNGR.AEELKLIVGNADEKKHPBLEKAPLOFSKDTZEE Delicatlove ARQQQ2002 6EAVDCSFVETEKMSXTKKASTLIKYAPMEKSNGYLSQEIKLMVIGNOOSEKNVEGLEGEAKILOFSKETUSE 170i 6AZADSSERADIRLVILAMRELLERGHELLE.GELMTERSSVTETERQELSTYNGRESEACDKQTEKLD 3ELVXSSEKADLRLVILALAHIIKYRGNELLE.GALOTONTSVDGIYKQFIQTYNQVEAGIEDGSLKKLE 230 1SGVDAKAILSARLSKSRRLENLZAOZPGEKKNGLEGNLIALSIGLTPNIEKSNEDLASDAKLOLSKDTXDD2NSKOLEEIVKDKISTOLEKKDRILKLEPGEENSGIGSEFLKLINGNGADERKCENZDEKASLHESKESTDE 3DXKDVAKILVEKVTRKEKLERILIKLYPGEKSAGMHAQFISLIVGSKGNEOKPEDLIEKS 40VOVEAILTDKISKSAKKDRILAOXIKAERLKLIVGNGADEKKYENLEDKTPuo 160 QQQQQQQQQQ 220022L2R220 159 210 FIGURE5B IT2020 1.Spyogenes2.Sthermophilus 3.Linnocua 4.Sagalactiae5.Smutans 6.Efaecium 20, 2YLADSTKRADLRLY 4ELADKKKADIRII5YLADNPBKVPLRIVILA U . S . Patent May 8, 2018 Sheet 10 of 58 US 9 , 963, 689 B2

340 *10 TT 330 400 ENKLEKKNEOLRKORTEDNGSGSGYUDRIEREDFLRKORZEDNGSLP 320 20u002390 Suurid lllllURUR.+ + VELAQTGDNYAELELSAKLYDSILLGILTVTDVGIKAPSASHLORYNEHMDLAQLKHIROKI IKNEDFLRKORELDNGSICALOSENKIEK +.HX CH+ 310 DEESLLALIGDYAXLEVAAKMAYSAVVLSSIZTVAETEINAKLSASKISREDTHEEDLGELKASIKLHLDLENLLGOLGDECADIESAKKLYDSVILAGILTVIDLSTKAPLSASKLORYDEHREDLKOLKOZVNASL 380 Decallobe DEDNCTAQLGDOYADIELAKKELSDAIRLSDZURVNTEITKAPUSASMIKRYDETHODETLLKATVROOLA DLETLLGYIGDDYSDVELKARKLYDAILLSGELTVTDNBTEAPL9SAMIKRYNEEKEDLALLKEYIRNIS 22.000 2LKTYNEVEKODTKNGTAGUDGKINOEDENVYLAKLLAEFEGADYELEKIDREDETRKORTEDNGS 300 72/ 370 6PKKZRAFEGDNSVNGLAGYIBGHATDEAEXKEVKKELTGIRGSEVELTKIEQEXFLRKORTEDRGVL

290 DLEELLEKLODDYIDLEVOBKYVYDAVLLAEILSUSTKNIRAKLSAGHERRYDARKEDLVLLKRÝVKENLre 360 IDOKIKQADEXXYMKMTLENIESNTEKHGEAGY?????????????????? 2222ALROLL 1PEKKEITEDOSENGEAGLXDGGASOBEFIKFIKPILEKMOGTEELLVEZNREDZIRKORTERROS DGTGTDCINKUNEVEsbvs 280 350 FIGURE5C DRUROBURRRRRRROLLO 1.Spyogenes 2.Sthermophilus3.Linnocua 4.Sagalactiae 5.Smutans 6.Efaecium PKHOR3PRK 4 OUAWN 2 U . S . Patent May 8 , 2018 Sheet 11 of 58 US 9 , 963 ,689 B2

*

???????

????????????

? * 480 550 I Regen * OBKERLENDNIKOELFDG

*

* 470 64466-*540 PAMloop1* ROBH

* * LIERLEYVGDLARGK.SPEAHLSRKSADKITPHREDEI

* Upl * heticallobe 2000000000120-*530 WUCCI versenOGKTSKURVINOHACIOKYLVENEL SLITEKETVEHELIKVRYKNEOGETY.FYDSN RHSXVARNOLLOXFISIERELEKURYODERGOMN.WESSIEKELFHE

* 99999 TENVYRERTRVREIAZSMRDYQFLDSKOKKDIIVRI

*

9 * 520520 * 4501TREGETARTERROEDFYPEIXONRSKIEKILTFRIPXYVGPLARGN,.REARMTRKSBETETPANZEEV HLIELKAIRROS5YTPELKENODRZEXILIERUDYIGPLAREK.SDEALMTRKTDDSURPWNEEDLE 99999

* IHLQEMRATLDKQAKFIPELAKUKARIBXILTERIPLYVGPLARGN.9DEAHSIRKRNEKITPHREEDV

9 5VDKESISABAFINRATNYDLYXVLPKHSZLZEKETVIRELEKYKYKTSOGKTA.*DANMKOETEDG

* 1WOKGASAQSFTERUTNOKOLPNEUVEPKBLLIBYFTVEDELEKKYVTZGMRKPAK.SGEORKAUVDL RLBELEALLHOQAKYLPELKENYOXLXSLVTERUPTEVGPLANGO.SEFAKLTRKADGEIRPNDLEEK 510 440940 Hyata * ririniwiriririr.Yerry OPBE

* 500 belcailobe D2430 430 PAMloop1

* TERMINTDMYMDE 3VDFGKBAVDEIEKHTNKOTYL4????EAE????????? * 490 * $20 ALLE 2.DKESSARAFINRHTSFDLYL 0 *

* 1.Spyogenes2.Sthermophilus 3.Linnocua 4.Sagalactiae 5.Smutans 6.Efaecium FIGURE5D - D *

* de entre en regels tiene leerlilerlees OWNE U . S . Patent May 8 , 2018 Sheet 12 of 58 US 9 , 963 ,689 B2

20000610 RIIKDYDIONEEIINDKEELODSS

. i 600 * * . 2012QQ.2 888889 590 VEKVYKRYTKDKUMDFLB.,KEFDKFRIVDLTGLDKENKVERASYGILDLC-XILDKDSLDYSK LEKTNRKTVXQDKEDY•*KKIECFDSVEISGVEDR.FRASLGTIIDOL:* VEKEHRROSKOKOLDELA.,XEYEEFRIVDVIGLDKENKATRASLGILIDLEVILDKDYLDN bocitostobo BOUWE4604* -XVGIKOEILONPVLFKQKRKYKKXDLELEL.RNMSHVESPTIEGLEDSFRS9YSTERDLL 6LEEKNRKOTKXDZQEFLYLKYDIKHA.LSGIEKAERASYTETUDOLTMSENKREMKOLEDPE 580 YEKDKRKYTDEDIIEYLH.AIYGYDGIEDKGIEKOPOSSLSTERDLL

*2000000570

560* FIGURE5E LES 1.Spyogenes 2.Sthermophilus 3.Linnocua 4.Sagalactiae 5.Smutans 6.Efaecium U . S . Patent May 8, 2018 Sheet 13 of 58 US 9 , 963, 689 B2

geeeeee eeeeee one the thing

-* 089

* INS

* * 064 *

*

OLLISOXOLONI46LORDL DILEXUSRXLLOHIEVEDROHEV IOANH 000000000000000OE# ANAIDLIDUIWEDVINVOHKSgooo TSOMIOXNIKAsoyAVAIsICISO19IXITAE9Voday 099 tu QAZATSOT9XXXXX69TEXNIELIN-*LALIWOWX7MX2815aa OZL

m wwwwww

+ MAASIN.95HSOVO 5

+

+

+ SOLIDXXIVas92TSXNAXINoxd?qoIIO,2XNXasvaa DOOO00000059 EMILENIVKILIVEEDKRHIKEOLQOFSDVLDGVVLEKLERRHYTGKGRLSAKLLMGIROKOSHLTILD ONON OT MOYILADI YISOLIOYUNGSIASAAONantIONAKO*sxogain1991AOVUYsxasiva 09 2MEAIIEETIHTUTIFEDREKIKORLSKEENIEDKSVLKKLSRRHEIGHGKLSAKLINGTRD T6LASYFEETIKTUTVEEDRERIKTRLSHEATLGKHIIKKITKKRUTGXGRLSKELIGURDKOSNKLEID 11111111111111111111111111111 4MESILLDIVOTETLEEDREMTKKRLENYKDLFTESQLRKLYRRAZTONGRLSAKLINGEROKEEQXTILOSXLITASINI 0000000000000 069 2003 TOKARARLLORULUR 101 UDOKAxadaTHVLLYD92123023LUTALOazaSNL aa013Z2ZATQILIYONS 069 OROMOKRN•*suolaa Oc9 DONKI19* X FIGURE5F 2002 1LKSDGFA 1.Spyogenes 2.Sthermophilus 3.Linnocua4.Sagalactiae 5.Smutans 6.Efaecium 00000000000000 OUAWN OUAWNE atent May 8 , 2018 Sheet 14 of 58 US 9 , 963, 689 B2

OIHA HONOL.2x *•VÖROSH*Inax 604.4

. 098 * ** 00 . t. II ??????AQXSanagsusanSMS

* WNIOIXSIXLIN11X*

*

* 21.0-

+K 110

# ft.ttt* #

# a 4000 u200DMNR 105Invas9x1920ZLOKMARINOSSNOI*.10Sa ISOLTO.THOOSNYDOMONUYNDARIN310*GOES LATNOISINIlusantignONEJNHuaooIRROYONOIKIIataanagominovH H H WATIROIZana9aYDIRKONDROTKIAusanagasSATAYROLS19HA AassaAnOSSexasaxiasAALROAD2ABINTOTaðaAtKQUORO:.?????ASTS&AagsansvsSATAYNOISxiuaInnamazusuara700DIlhaxDROTEX ISISdags-ochPaithaiOSONATQJVSOLAKOMOROTR1200TAURLsa L&O NTSXATDATAOSNSADO ww0pWNW 08 000000000000000" ???????????, WIGUNYOXSIVXXTSLOLBAKILÖ CONTEO 14BH'ELIVLEI OLL HADHAANI19INDUbeyooxLONAIYRAIKIN INONIASIBRAUN13nogamnyasaHOONLOROVARIOS

09L OZo Konu ???? OOOOOINTLLL snj!ydowj?yi'SZ vaen OT8 sauebond'SL aerejebe'st Sue???S Osand OSL ??????? ????????????????? enoouur'7E wniosej'9 000 ?? ? ? HD9243? UAWN atent May 8 , 2018 Sheet 15 of 58 US 9 , 963, 689 B2

I XXIRRES919 AX 6#0 ERKAEDT OTOT HHSC=300

*

aqxsnaHAINBANELLOHan102H w * AUC-11 086 0001 *

QQQQ0000000R WNW ??? Dovoddomo0o0o Aus ÖZ6LESOTS510043a SODIOLAHYLLOMLALOM9QXXIENE 066 XT139Kaavio&ZXITVINAORTKYOUHAXBRENNIDNOEN6ITTIHANTAL9 Ruvo-01 DHAO992330OGAAYTIINYNASATLVONROXORACAYITONOMNOENVERTRAINOXKINONE DOTTOOLUTION016 satsaa15???????????????9092AHYLLONLARTOMO ???Oab1stpendanu??aaaz151xxsex OGSAJT'IVYOZAVRELEVODOXARIXA193MANSAINSALAIX&DAIYYNNIS . andabasaCIIMEAHYLLORLANTONLutxaS2100UZUNLINQANHOSHINYOTux ASUIIXONL33ssimaXXXIIVIDMAVRINOHERNXNRISHA:0ausasmxBNULLAY 006 dissabtesw SIGRYYTIYSVALYRIRLORYBNanIsaax343OSALGXLIIYAUZAVYNXZ *ww. haaaahTXOV110VXnGaxaxSWINDOA1x HIDRAHULIDHLINIDAXLbaas11*-X1100TXXS11XssxyaxHasNMA xHDMETHÀMÃCHINHPHATNHANNHA1CHIMHEP wwwwwww ooooooo WNWWOD 000ooo00000000000 690 ?????? FIGURE5H 880 1.Spyogenes 2.Sthermophilus 3.Linnocua 4.Sagalactiae 5.Smutans 6.Efaecium 1VKKMKNYWROOLNAKLOWN U . S . Patent May 8, 2018 Sheet 16 of 58 US 9 , 963, 689 B2

* * . 1080 * -QQQQQQQ * *

* * 02KKIGGED .ENGLIIRDK-XYLDT * 1139

*

1070 * .

*

*

-

*

*

w * *,SDKLIPRXHKDISYPONDIVKKTCIQTGOKSUESILARG.YLDPKKTGGEDSPIVM

1060 *

KINNREKTKVTLADGTVVVKDDIEVNNOTGEIVEDKKKHFAT * GEIIRKKDEHISNKDDVRTDKK.#NEK* PAMloop2 T * * 1120 FNALSSKPxPUSNENLVGA??????????YAk3?5NRRYLSYPOVRVVERVEEONHOLDGK.XGLWD2MKYGGLOSPNMIPRXINYROMUZVKKDEIQKGEFSKATIKPXGM.SKL, * D2ALLGGLGSENVSELLERKNN.VKKYINHHHHUZYKKTETQKGGESKETVEPKK OX M KKVESYPOVNIVERVEBOTGGFSKESILPxG.ODKLIPRKTIKKE,YWDKKEGGEDSEIV *

1050 * KAIVKKE6ONAHKAN.SNIMKEEESDTPVCDONGEIFHDKSKSIA . . HENNIEKKSISLADGRVIERPLIEVNEETGESVHUKESDLAT * . . : BVC 2002 1VRKYLSMPOVPZWERDEVOTGGESKESILPKR.-M5DKLIARKKP.

O * 10 *

.KATAKKQ3DHEKOLKANEAOKDRIID.. * . 1040 #* * 1119

& period# mm

SATERLE- +

+ 1030 + + 1109 . + 1YD.*VRKMIAKSEQEIGKAPARYHIHINZEKTEITLANGEIRKRPLIETNGRTGBIVADKGRDFAT • •

KRZARKFF5HGHKEN.* +

+ 2YNSERERK., Ruvchil DOOR1020 4YNSYKTRK. 1090 FIGURE51 KKUM 1.Spyogenes 2.Sthermophilus 3.Linnocua4.Sagalactiae 5.Smutans 6.Efaecium OUAWNE U . S . Patent May 8, 2018 Sheet 17 of 58 US 9 , 963 ,689 B2

MMYYY OH WUUUUUUU ON !

11 OOZT to

*000Ö61T OBIT aos47SIITG*IQX9XTT3NTMQXMANIOTSIOVISTARI·IXXXVONILOXATALas 57saIIITXGYPINTXOXSTAVSEN2ANSUNLLIOTTSLAEY*YOVYGNYITVAATASAV313 000 WYend BOLOAbogomon0001 6txr quiaXVTAY*TÕUZDOHETQOXUVMSZLAKIISUATXNXOVAIAAVA ITS&STYKSIS-V0339XNXX1AasosASUNCIOSIYRIU2·*299VOLKY24 W 3141SAXXIINEIOANUZOVUVAL.LASIWILADATVYAL*IXXSHIONINVIATISA .

. XXXIIICYXASXLOXYS112aranadsshZNILIDITEXAS"•TYSHOIVAATASKUL

.W

.

*

TINTA+* 09TT

+*

+

+*

+* + OSIT snj!ydowjay)'SZ rGanal: sauebokd'SL enoouur'7E weigejebe'stS'Sue??? wn?ej's9 WWWWWWWWWWWWWWWWWWWWW OWNE U . S . Patent May 8 , 2018 Sheet 18 of 58 US 9 , 963, 689 B2

O . HAHAthe to ry 000OLZE 0 . 6 . 67 WWW LI wwwwwwwwww

x3=4HàngIs?tan• 00009Z? Ecocer? LDYIESXRBMZA A91&2gayS94Sun79XMES! rrrrrrr .NEMQENAMGAPRDEXEEDV INSRIYHRANA20,s$*HSIYOXDURR2711ATH17TISRYHvax 200212. Laaxv9INZILA* #*syav9kVUVWa Xa3XLVAVDIVSTIN":• HAKRISNTIN.ENH?KYVENHKKES * 1.Spyogenes 2.Sthermophilus 3.Linnocua 4.Sagalactiae 5.Smutans 6.Efaecium

* . . . OSZT * NLAQKGRQQVLENHLVTLLHHAANC.EVSDGKS * *

* WWWWWWWWWWWW ??????????????????? DZE LICYYANAATHX20AX*HINHAUTLOIHNATISNOO34 1310 wwwvw 010 CTD toooor WWW rrrrr. 09ET GELORGRELALETQFMKFLYLASRYNESKGKPEEIEKKQEFVNQHVSYD 00ET 6ARVLBBAGKYTLABKDIEKIKKIYKEN.OTDDLAKTASskv Irrrrrrr. .addamLIGOXTILSATZXINHZanus *¥9141913191DGORLIZSÖMIZL1543anani IISA1HXHOSAZIVXOSNAas9X1XSLHS121NAXXSCOVI9850129*VISINUX *DaxaNLUSTAXAz-agtiaanse2YYILEENENYVGAKKUGELLNSAFQ.SWONHSDELCSSFIG AHVsZFAKRYTLAEALICNKINGLEKON.KEGDIKAIASEV 600 5DVSESKKYTLAEGULBKIKELYA.ONNGEDEKELASSFI rrrrrr. TIITTITTTTTTrrrrr . 4QLINDFSKRVILADAMLAKINKLY.ODNKENUSVDELANXII . IL.000OSET a59IoscaIHL1102LEOH122VATNEYLSERMaz919IXYZOTHISIDIASOMITIWax?rssalmaqaxuan OZZI 067T 6 -SRRMLASILSTNNXRGLIIKGRQIFLSQKFVKLL TTTTTTT 013 0991XDZGINI321905OHIZISNTASYLSLEX"udan BGRRRKLASA. NGRXRLLASA. OEI 4NGAKRILASA.* NGRKRMVAGN- ?? 099TYNIGINDLISÖRITTENTIELISEXHAIN FIGURE5K wwwwww OOOOD00000 L HY they OUAWNO ? ? ?? ?? AWNO OU U . S . Patent May 8, 2018 Sheet 19 of 58 US 9 , 963 ,689 B2

180°

C variableconserved

FIGURE6 1800 -10kTle positivelynegativelycharged helical lobecleft +10kTle 180°

. . nuclease lobecleft U . S . Patent May 8 , 2018 Sheet 20 of 58 US 9 , 963 ,689 B2

FIGURE 7

*** * *

** pinsporten white R403 linserimento1157 46 in A165) .L H160 illustration R71 in when the K103 Tripit Arg -rich region Hot

tinti

1 atent May 8 , 2018 Sheet 21 of 58 US 9 , 963 ,689 B2

PAM-binding loop1 (447-502) PAM-binding loop2 (1102-1137)

, **

** rest .. A !! de Hi

*

! HOWANIA ren 717

* ** f undit

FIGURE8

3'end

w

5'end U . S . Patent May 8, 2018 Sheet 22 of 58 US 9 , 963 ,689 B2

FIGURE 9 UV Er du G LG GS

INTUIT DNA CrRNA PAN TITUTI tracrRNA Cas9 Cas9: d WT d Trypsin UV : + + + Trpysin : - + + peptide /DNA G4649 heteroconjugate + DNA Nuclease S1, . Phosphatase

Cleavage LC -MS /MS product U . S . Patent May 8, 2018 Sheet 23 of 58 US 9 , 963 ,689 B2

FIGURE 10 Br- du , Br- du , Br- du ; Cas9 : - WC - WT d - WT d UV : m eo potente * om often Trypsin : *one * * ou ++ PSSvete + . o e NN

, , . , . , . , , , , , , , , , , , , , , , , , , , , . . . sisisisi , , isin , Miiiiiiii iiiiiiiiiiiiiiiiiiiiiii peptide DNA . heteroconjugate . . . : : : : . . : : : . . . . . : : : : . .

. DNA U

Cleavage product . . .

s?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? atent May 8 , 2018 Sheet 24 of 58 US 9 , 963 ,689 B2

Ana .Cje 1.Tde *.Sag.:Spy 1 # # + ."* + 1

+ : Vi.Ana Q 1 . ESFLVEEDKKHERH.PIH QQ000Q TG.CjeLYDOGVRIFINKAENPK ?

IEDLSAUSHTHDSG.VGKEGKAna 1

? NmeVRVFERAEVPKTG.ICLIDL ARC $ + TAKTDEGEFORMKESPFYABDKTILOENTLFNDTde alpha-trelicallobe RH

N

+ E : 88ESLAKAYKGSLI9PYEURERALNELS.KODFARVILKIAKRRGYDPIKNNCje : + FKINYEDY.QOFD AEGRRLKRIRNYTRRINAILYLGESIEMATEDDAFFORLDDSFLVPDDKRDSKY.PIFGSth HUIKAWIEWKVKPDPR.LLYLACHXIIXKRGHFLFEGD,FDSENQFDTSIQATde : •TANDKRLKRTARRRYTRRRNRILYLEIFAEEMOKVDDSFFHRLEDSFLVEEDKRGSKY.PIFASag . ,DSLAMAKRLARSVRRUTRRRAHALLRACRLL.KREGVLQAAÐFDBNme *TED?RRLKRTARRRYTRRNNRZLYEESEEMGKVDDSFFHRLEDSFLVTEDKRGERH.PIFGSmu huvc4 .NmeWGIIKSLP100WOLRANALDKKLEPLEWSAVILLELIXHROYLSORKN.*E

??????????????????

# +

+ REL #IAEVGWAMVEIDEDENP .. # Wimuya

F

I* V 319MKKPTNBVGW2VYIDOYKVPAKKMKVLGNTDKSHIEKWLLGALUFDSON.*Smu imir72412 LLLLLL 57KDHDTRXKLSCIARRARRLLHARRTOLOQLDZVLRDLGPIPIPG. SMK 2. .MTKPYSIGDIGINGVOWAVRIDNYKVPSKKMKVLGNTSKKYIKKNLLCVLIFOSCISth M MOYKLE SIGMNKPYDIGINSVGX3IITDDYKVPAKKMRVLGNTDKE¥IKANLIGALLEDGGN,.MDKKYSIGLDIGINGVOKAVIIDEYKVPSKKFKVLGNTDR#SIKKNLIGALLFDSGK 2222 HERKYLADSTKKADER.LVYLALAMIKYRGHFLIEGEFNSKNNDIQKNFODSth121NLVERKAYHDEFP NLEEEVKYMENFPITIYHLROYLADMPKVDLR.LVYLÄLALIIKFRGHELIEGKFDTRNNDVORLFQE ? 121NIVDZVAZHEKYPATIYHURKKUVRSTOKADLR.TIYLALAWMIKFRGHFLIEGDENPDNSDVDKLFIOSpy rgilinemisen r WRVRAALVEBKIPEELRGPRISKATRIIARHRGHRPYSK102EFLDINEODP:- 121TLQEBKDYHEFSIZYMURKBLADKOHADLR.LIXIALAXIIXFRGHFGIRDDSFDVRNTDISKQYODSag :22vivirRP-1712 29900226222222222 1 r000000DOQQQQQQQQQQQQQQQQQ0R2009 WWWWWW IERRKKRIK 21 rerwu ? ? ? ? .VROLYAM

UQQQQQQQQQ w Premiumrem WWW Árg-rich LARRKARINHEKASLALPRRIAR PRKNKRFABRRYPAEATRI# PYAV ww .MAAFKPNPINYILGLDIGIAS,NABILA MUSHI AEVRRLHRG celuledelle Q on image . # # # # # + berriwwrrrrrrrrrrrrrrr i MKKEIKDYFLG . . . . • i FIGURE11A + + 1 1MWYASLMSAHHIRVGID HU 114KDFADKYWH 14 4 U . S . Patent May 8, 2018 Sheet 25 of 58 US 9 , 963 ,689 B2

ne

* L-i I

. .- QeQ + QUO +

+ -. 1 + 1

+ - +

- .

+ - + .ARESPEKALRERIL + - . NineP4.-. - + + + * # *

KAIKONEEKLYCje.. + +

+49 00001002 . ! .KOLGALLKGVADNAHAL.Nme + - + +

+ - +

+ + - +

-

- 1 QRQUE, i V

+ -

H - *.DGGASQEEF319ASMIKRXDEHHQDLTLLKALVEQQLPEKYKEIFFDO•SKNGLAGYL,Spy + - DLYDNPDLKDAEKNSISPSKDDFDALSDDLASILGOSP.ELLLKAKAVYNCSVLSKVI-GDEOYLSTde246 - SAMIKRYNEHKEDLALLKEYIRNISLKTYNEVFKDD.TANGYAGYISthKTNQEDFDG.319 + + -

: .ATTGEVLODGE=..+ - 192QAMOVHON+#LLANI$.MR-GPEGILGKAna :

EVE. . # 319ASMIORYNEHOMDLAOLKOFIROKL3DKYNEVF9DV#SXDGYAGYI*.DGKTNQEAFSmu + + SHT.NORGDXIR. # W ASMIORYDEHREDLKOLXOFVXASLPZKYQEIFADSSXDGYAGYI#EGKTNQEAF320.Sag

+ + # 6+ V 2

+ + V

U 2

+

+ LFEYLREDME-VDIDADSQKVKEILKDSSLKNSEKQSRLNKILGLKPSDKQKKA.INIZSGNXINFATde180 . + .YOEVGXYI Anacasobelicallobeotdeletionsaipha-inthe + + + : + 189FLEIFNTTFENNOLLSONVDVEAILTDKISK.SAKKDRILAOYPNOKSTGIFABELKLIVGNOADFKSad 188LVOTYNOLFEENPINASGVDAKAILSARLSK.SRRLENLIAQLPGEKKNGLFONLIAL&LCLOPNFK 2000QR*1000000 deletionsinthealpha-relicallobeofAnacaso - deletionsinthealpha-helicatlobeotAnaCas9 +

+ 186FLDTYNAIFESDLSUEMSKOLEEIVKDKISK.*LEKKDRILKLFPGEKN8GITSEFZKLIVGNOADFRSth + +

+ + +

+ + SAKKDRVLKLFPNEKSNGRFAEFLXLIVGNOLADEKSmuONVOVEEILTDKISKFLAVYDNTFENSSLOE.188 Line

: + - – 310FAKVKIYEKHKTDLTKLKXVIKKHÉPKDYKKVEGYNKNEKNUNNYSGYVGVCKTKSKKLIINNSVNOEDETde : .3 - SGH. K+

+ 264SNFD.LAEDAKLOLSKDTYDDDLDNLLADIGOQVADLFLAAKNLŠDAILLSDILRVNTEITKAPLSSpy + +

+ + + - 255KYFN.LEDKTFLOPAKDSYDEDLENLEGOIGDEFADLFSAAKKLYDSVLL3GILTVIDLSTKAPLSSag W “ * 254KCFN.LDEKASLHFSKESYDEDLETLLGYIGODYSDVFLKAKKLYDAILLSGFLTVTDNBIEAPLSSth - vooooooo_ + . KHFE.LEEKAPLQFSKDTYEEELEVLLAGONYAELFISAKKLYDSILLSGILTVTDVGTKAPLSSmu254 : ????????????????????????????

- - REX +

1 . - *D - QULLUQQQQQQQQCRR + - 170YFOKFKEN.SK F +

7

T ccccccccco: + •i

7 + + -

: + + 356.N 179ELALXKH,. S 5 2.22 FIGURE11B + Q0000000 : : : 157 148 139 176170 170 U . S . Patent May 8, 2018 Sheet 26 of 58 US 9 , 963 ,689 B2

Mina CjeTde Nme TOPLAKNK.Sth PFLADNO-SmuYPFLKENO. Sag Q2RLQ. NmeGDAVOK XDSH ??? ya Q000 ICIENIIN .SRVAPDP KKV2V ! * + i .

.in HOGONANIEKICARO69.SPDVCKOLLAna JAQOFLXDELKLIFOKOREPOZSFSKKF.E TAFIT +i .

.

PAMbindingloopinSpyCas9 E . wwww TPIPI. + +

+ +

+ +

+

.

. + . PSRKDLQAELILLFEKOKEFONPHNSGGLKNme.* . 1000000 + + TYTI. 1 erenQ0000000000

= ? 1 SthERIEKILTFRIPYYVGPLARGNSDF.AWSIRKRNEKITPWNFEDVIDKESSABAFINR + LU00LQLLLL + + CjeEGILYTKDDLNTLLNEVLKNGTLWYKOT.CSFTDEKRAPENZPLAYMFVALTRIINELNNLENT 1 + SmuDRIEKLLTFRIPYVGPLARGKSDF.*AVLSRKSADKITPWNFDEIVDKESSAEAFINR FEPAEKAXKNTYTAERFIWLTKLNMURI.LXOGTOTERAILUDEPYRK9XLUYAQA II+ . . . 1 + + 373YKFIKPILEKM.•DGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRROEDFKEFLKDNRSO 111111111111LLLLLLLLLLLLLLL + 1

lobehelicalalpha- 1 + ??????380XKFLKTILSAKSEIKEVNDILTEIETGTTUPKQISKSNAKIPYURKMELEKILSNAEKTae + .DRIEKILTFRIPYYIGPLAREKSDF,AWMTRKTDDSIRPWNFEDLVDKEKSARAFIHRSag MTINDFYLPEEXVLPRHSLIYEKETVYNELTKVRYKNZOG.ETYTFSHIKOEIFDOVEXEMRXVOKKKLSag496 KVKYV ATE. CTYLVGESVLPKSLLYSBYSIVLNBINNLQIIIDGK.,NICPIKLKOKIYEDLFKKYKKINOKOIde + 2 + 1111111111

+ + 495MIWYDLYLPNQKVLPKHSL?LYEKFTVYNELTKVKYKTEOG.KTAYFDANMKOEZFDGVFKVYRXV2KOKLSmu + 200,000000sorethan + 1 .EXIEKILTFRIPLYVGPLARGNSRFAWMERKSEETITPWNFEEVVDKGASAOXFIERSpy + + Y woooooo + + .:

+ .:

+ ?

+ 1 .801ETILMTORPAQ

- RRAPKCDPBFORFRIISIVANLRISZTRGENRPZTADÖRRHVVTFLTEDSCADLTWVDVAnaCOCOK.LI264 + MTOFDLYLPEEKVLPKHELLYEYFNVYNELIKVRFIESMRDYQELDSKOKKDIVRLYFXDXRXVIDKDI495Sth 200110002 Anacasohelicallobeofalpha-delctionsinthe 374YKYLSKLLTKO.EDSENFLEKIKNEDFLRKORTFDNGSIPHOVHLTELKAITHROSE -- 20 + -RAV.FKADÖPRGSAV Yyyyyyyyyyyy

-

: - FIGURE11Cdeletionsinthealpha-helicallobeofAnaCas + 1

-

-

+ 373YKYLKGLLNKI.ECSGYFLDKIEREDFLRKORTFDNGSTPHOIHLOEMRATIRROAZF - 373YVYLKKLLAEF.EGADYPLEKIDREDFLRKORTEDNGSIPYOIHLOEMRAILOKOAKF - mm - EXVI + .200 EXIIMLLTEKIPYYIGPINDNHKKFFPDRCWVVKKEKSPSGKTTPWNFFDHIDKEKTSEAFITS - ! wwwwwwUW. . -

+

- .

. - # - + . # de000. DY + . # + O ZA + LLLL ......

- + •E. . La+

1 X . . .

WWWWWW $19 495MAF 215 195 ONNY254242 U . S . Patent May 8, 2018 Sheet 27 of 58 US 9 , 963 ,689 B2

Tde SmuSag NmeCje Tde Sth Smu Sag Spy Ana CjeTde Sth Smu + best

... +

+ Farscarrosssss WKLEI PVNII le -. wwwwwwwwwwwwwwwwwwwwwwwwwwwwww.ISAVE 4R38- + AM. * * GNESVDRTEKIVGRYLSAVESAna ALGEKNLKEFIX9.ODNLNEACje. -. . wKVÉRY Livim W . ?????????????:+???????? DEIS .-

Qllllll22.-222 .11 0000 1???QQQQQQ???? VP W W 000220000000000000OROPY QARQARORLAR PVGN LA les M 330EKZGVERRD.LROTAVHTDDGERGRARPPIDADRINROTKISDKT*WWEZADOBOROAna » .KRLENYSDLLTKROVKYLERRHYTGWORLDAELINGIRNKESRKTILDYLIS E.ERLKTWAHLFDOKVAKOLKRRRYTGWGRL:B&XLINGIRDKOEGKTILDFLKway Q02020 entre TAFS.LFKIDEDINGRLKDRIQPBILEALLKHISDKFVQILKALRRIVPLXEOG384 : - TAMREXOXNLMELLS.SEPTETENIKKINSGFEDAEKQPSYDGLVKPLrLeesVKKMLWOTLKLVKEISHY 565LOFLAKBYEEFRIVDVIGL.DKENKAF?NASLGTYHOLEXILDKDYLDNPDNESILEDIVIS565KEDYFKKIECF,DSVEISGW.*•EDEFNASLGEXHOLLKIIKDKDELDNEENEDLEDIVS 2000 THTPAGRAAYSRESITATSDHYLATT.388AMIRYLYBDPED98CARTIAELP@EDOAKLDSAna HLOIREXALKLITPLAL.EC*AKYDLNOHOIDSLSKLEEKDLIXDELKXKALKDI2 a?-????the alpha-helicallobe IKKIL - .GRY 564MDFLEKSFDEF,RIVOLTOL-D.KËNAVENASYGYHOLCX11DKDFIDNSKNEKILBDIVISM 687DDG.RSNRMFAOLINDDGLSFKSIISKAQAGSHSNLKEVVGELAGOPAIKXGILOSLKIVDELVK EXOFNSELSYHDLUNIINDKEFLDDSS.NEAIIEEL565IELHAIY,GYDGIELXGI* MT .-

- -VIELKAZKHEQICNKTDEVIILG586D.KECT3SL 321RYLLGLEDTAFFKGLRYGKDNAEAST.IMEMKAYHAISRALEKEGTKDKKSPLNLSPELQDELONme. - .KRYDECAKIYGDHYGKKNTEEKIYLPPIPADEURNPVVIRALSOARKVINGVVRme439 DDG.NSNRWFMOLINDDALSEKEETAKAOVIGETNLNQVVSDIAGSPAZKXGILOSLKIVDELVK 685SDG.FRNRKFOLIKDDSLTFKEDIQKAQVSGOGDSLHEHIANLAGselaZXKGILOVKVVDELVKJag - - RWAT,IYDEGEGKTILKTKIXAEYGKYCSDBIKKILNIKSG.WGALSRXFLBTYTSBAPOOSEPVNII 412.,KKYDEAYNELNLKVAINEDKKDFLPAFNETYYKDEVINEVVIRATKEYRKVLNALLK 684DDG.ISNRNFMOLIMDDALSFKKKIQKAQIIGDEDKGNIKEVVKSLPGSDAIKXGILOSIKIVDELVK II+ + 11119 623QTLEFEDREM«LR.KRLENYKDLFTESQLKKLYRRHYTGWORLOAKLINGIRDKESQKTILDYLI wwwwwwwwwwwwwwwwwY . 620*.QRLSKEENIFDXSVLKXLSRRHYTG,WGXLSAKLINGIRDEKSGXTILDYLIHTLZ-IFZDREM g ibi w

AKKEESE WY ANY

* 000000000000000 Quel20.22022002 LGLEDDY.BEK ARXRLFGVD.DDER446 lobeHelicalalpha-

LTC.1K2DREM 621LTLO.LEBDREM -. AL22 www . ???????? :TFIGURE11D KXL83 Qe Berell OO VI NON N # OR1 WWW?? 61 VVO 357642 622 710 686 U . S . Patent May 8, 2018 Sheet 28 of 58 US 9 , 963 ,689 B2

Ana CjeTde Smu Smu *Spy CjeTde Sth Smu Sag + : IsNme

+ . + . +PVE PVD +

+ + : ! :: :

+ -- pyelleledellee

# : :

* KENIPAKLSKIDLGSKILSth

. :* 2 FSSBIL.KIRNEDKDLSG KRH H 222-QQQQQ HM 22 + + YSRSFØDSYMNKVLVFTXMEKLNOCie . . + . + QRGFUNPLORNNEISLEKENALTRATE.PISDDETAKTIARO . . . . Quelllllll.202 1 * . + . . .LGSOILKEH750VMCRHKPENDVIBNARENOTEXCOXNSRERMKRIEEGIKEPVESov+ DEE. + +

3 •.CLTDDDKAGFIXROLMK2YMOKLLSAKLITORKPDNLTKAERGCKVFWKXLLDAKLMSQRKYDNLTKAERG.•GLTSDDKARFIOROL 40HWA Wdomain CM.K?NTG8328*LD-DVKTLÃ3,6c?TX•PTNSao ?KHHHH ?? ? ?????? ???? ? ??? 4 494.NmeKELEKROEEXRKDREXRAXKFREYFPNFVGEPKSHFYGSTAREVGKSFZDA,SP BE2 . 2 . To PTAVWAQXCGIPHVGVKEAIGRVRGNRKOTPNT3.SEDLTRLNKEVIARLRRTOODPEIDERSMAna.NSRENOEFKARVETOREPROKKORILLQKFDEDGFKERI KDAEL ! ! WGVRDOFTSERMADERDKANRRRYNDNQDQMXKIQRDYG.MEQYTSRQ•*TPEVAH KAK Quel.000 Q00- KWOKIEV.AKNLPTKKOKRILDKNYKDKZOKDFKDIRNE ellenerelledeLeclere KGAELEPARTKYRXKILODILYNNCEN. OPELKONSIDNKVLVSSASKRGKSDD810NNALOXDALYLYYLSthGXDMETODDLDIDR.LSNYDIDRI VANwin H.DIVRZDALELOGCACLECGTTIGYHTCIQLPQAGOGONKORONLVAVCERCKRSKSNAna552 QBKIKDDSISNRVLVCS839NLRLESDXZYLYYTSCXKNKEDXLGKCECGKPIEIGHVFDTSNYDIRTde NGRDNETGEELDIDY.-LSQYDI803PQAFIKNSQLONDRENSIDNRVLTSSXEXRGXSDDFLYBONFIKDDSIDNRVLVS808NQALONER2FLYYLONGRDMETGEALDIDN.«LSQYDIDAIISAKURGKSDD Sag VPSI.EVVXK•GLSPEDRKTPNYOLLKSKIISQRKPDNLTKAERG:*KAGFIOROL 870VPBT.EVVKK*MKRYDROLINAKOLLITORKZDNITKAERGGLSELDKAGFLKROL AKIXKZ DILKZRLYHGKCLES@KEINLGRLNEKGYVEIPESRIWDDSFNNKVLVLGOENONKGNONme.552 803NTQLONEKRYLXXLONGEDMEVDOELDINR.,LSDYQUDAIVPOSFUKDDSIDNKVLTRSDKYRGLODNSO Llalle

usoa 751VMCGREPESIVVEMARENYTNOGKONSOQALXRLE.KELXE ! . 1 . it . -. T E . - . +

+

+ +

+

- AR * . . . . . AHHH . + . + *

I * 751IMG.HOPENIVVEMARENOFINOGRRN30QRLXGLT .NILARLYKE523KEYCASIKIKISDLQD2XMLXI LULUIelle YBUKSEI909OSK. VPSX.DVV2K870. * KVHXNI eleven PYEYFKGXD.618 VPOL.EIVXD875 LR SM-YT? FIGURE11ENuvchi 469KYG.KVHK :.*-Illlllll APFFAFGND. 589 2 ITAPP 495195 4697779 752 612 877 U . S . Patent May 8 , 2018 Sheet 29 of 58 US 9 , 963 ,689 B2

GrsCje LYSthLYSmu FYSag KP.Cje GURNme DIVTde ee. 0 0 . . + . AnaZVGA. # # , # ONPXIAOTTde

+ :

+

+ ALLEELEEEEEJ # * 5*WW4VYVEK : IULIJ

4 - FEI, ERKSATEXVSthVYGD,YPKYNSFR. .ENKATAXKGDYPHFHGHKSmu v.VLDKIDEIFVSXPERKKPCje.

ED.VPKYNSYKTRKSATEKILSag U

U NmeLHOK.KVRAENDRELALDRVVVACETVAMOQKITRFVRYKEMNAEDGKTIDKETGEV 2000 V .

AnaINLIGEKG.PETTVMVYRG87FRAARKAAGIDS i KM.NmeZHIPOPHEFFAQEVMIRVEGKPDGKPEFEEADTPEKLRTLLABKLSSRPEAVHEYVTPL7VSRAFNR .

QQOLBOOM . www.fileCR202 DYKNK.CjeAKDANNHLRAIDRVIIAVANNSIVKAFSDFXKEQESN3AEL+YAKKISEL698 + + +

. 922VBROIEKHVAPILDSPMNTK.YDENDKLIRIEVKVITLXSKLV*SDYRKDFQEYSpy * . REVO-111 101HEVC- RLSAna784REUFEMWRGHMLHLTELF.NERLARDKVYVENIRE RKORIDROKHAVDAEVVALXEMVAKTLARREST.RGERRIT@KEQTWxQYTO 974KVRBINNY.BAHDAYLNAVVGTALIKKYPKLZS2PVYGD«,YKVYDVRKMIAKSELIGKATKYSpy REVO . TG1029.SmuKKDEHISNIKKVLSYPOVIVXKVEBORTDKNGEIIFYSWIMMFFKKDDVW. FYSNIMKFFKTKVTLADGTYVYKDDIEVNNDTGEIV .WDKKKHFATVRKVLSYPONNIVKKTEI2 1035TGSag 000002200. KG.TdeWEKGKTIITVKDMLKRNTPIYIKOAACK1049DVKRNNITA-.NYYKVFDY Score . FYSNIMNFEKTEITLA.NGEIRKRPLIETNGETGEIV1037,WDKGRDFATVRKVLSMPOVNIVKKTEVIGSpy

. FYSNIMKIFKKSISLA.DGRVIERPLIEVNEETGE9VWNKESDLATVRRVLSYPOVNVVXKVEBONHSth1037 668WDTRYVNREICQFVADRMRLTG*.KGKKRVIASNGOIINLLRGFW 922VETROTYXHVARILDXXENTE.-,TDENNKKIROVKIVYLXSNLVENIRKËT 927VETRITKHVARILDEKFNNE.LDSKGRRIRKVKIVOLXSNLVINFRKEF ?MQQQPQAQQQ?0QQQQQQQQC02.221 960VETRQATKVAAKVLEKMFP.*ETKIVY9XABTVSMERNKF929VET2QITKHVARLLDEKFNNK.KDENNREAVRIVKIITLX3TLVSOFRKDF

QULLUQUQ-222•.••••••••••••2002 i

.

#

i 676ESVAWMAXELHARIANAY. 638NDRYIARLVLKYIKDYLDFLPLSDDENTKLNDTOKGSKVHVEAKSOML7SALRHTW Qoll200,00 KCRBINDFHDHDNYIMIVVGNVYNYKFTHNEWN1001 - ????????? RIKEFBPFGFROK753.••••• V . . KVRBINDFORHDRYINAVVASALIXKY 20 FIGURE11F XVREINDYRAHDAYLNAVIGWALLQVYP974 N * VV 981On On 979KIREVNNYIBAHDAYLNAVVAKAILYKYPOLOr 170 U . S . Patent May 8, 2018 Sheet 30 of 58 US 9 , 963 ,689 B2

Sag . *Sag = +

=

= R

= .WKKG, #LPKG.Smu I . E =

L

= .

. +

= + KGLFNANLS.*KPKPSth +

+

.

6 +

I #

# +

- ??????? # .

- +

# + •!- -

- # # + - # # + wwwwwww - # , WWWWWWWWWWWWW 839SGOGHMETV.KSAKRLDEGYSVLRVPISTOLKLKDLBKMVNREREPKLYBALKARLEAHKDDNme $ - . +

- + . LLGITIMERSRFEKN.PŠAFLESKCYLNIRADKLIILPKYSLFELBGRRRLLASAGELSad164

. PAMbindingloop2inSeyCasg. . ???.

: *+-EPPYKYDKAGNRTQQVKAVRV.EQVQKTGVWVRRINGI,Nme - + AIRVHGHEIKSSDYIQVPSKRKKTDSDRDETPPGAIAVRGGFV.RCIGPSIHHANIYRVEOKKPAna .BYGGKEGVLKALRHGWIRXVNOKICie .*+11711-21111:1T%1711415241ETLE41E NSDKLIPRKTKDIYLDPKXYGEDSPIVAYSVLVVADIKXGKAOKLKVTESag . . . UN 1163LLCITIKENSBYEEN,PIDFLEAXGYKEVKKDLIIKLE.KYSLFELENORKRMLASAGELS : LVGVTIMEKMTPERD1145.PVAFLERKGYRNVQEENIIKLPKYSLFKLE,GRKRLLASAREL.Smu - +

• 1

• • + +

I *

H + *NSDKLIPRKTKKFYWDTKKYGGIDSPIVAYSILVIADIEKGESKKLKEVKASmu Excta-hairpindomalt(insertioninAnaCas9) + + = i -NSNENL.VGAKEYLDPKKYGGYAGISNSF&VLVKGTIEKGAKKKITXVLBSth . . = + * 942ADNAEMVRVDVE.*XGDKYYLVPIVOVAKGILPDRAVVQGKDEENme

= + -. 940VYAMLAVFHDL.DLESAVIPPOSISURCA2PKLRKAITTGNATYLAna40RM = + +

= + + + FOGASILDZINYRXD.KLNFLLEKCYKDIELILELEKYSLPELSGORRMLASILSTNNKRGEISth1174 +200000.022000000 , YLVYDIOKOQDVIKSYLTDLLGKKEL.+*FKILVEKIKINSULKIXGEPCHITGKTNDSFLL,Tde1157 = . . : ! = . . . . . VKNGOMFOYDIFXH.*KKTNKYYAVRIYTMDFALKVLPNKAVVQGKDKK3GLIKCje = 822DGNAHTVNPSKLVSHRLGDGLTVOQ.IDRACTPALWCALTREKDFDEKNGLPAREDRAna # + , . . t + + i + 1000-20 i VVAKVEKGXSKXIKSVKESriEDSPTVAYSVILNSDKLIARKKDWDPKKOG.* it - ,

+ + +

+ +

+ + + + # ELIEVEEKGNKIRSLEPIPLTdeHPLKKEGPFSNISKYGGYNKVSAAXY.LGQ . * ...... +

+

- + +

- + +

- + beta-hairpindomain(insertioninAnaCas9) + 1 * - 1 1 + + * : - . . N + 1 7843GÄLHBETF.RKEREFHQ - LLLLLLLL. + .GFSKEI,+ + + 1 *

1102.GPSKEI+ + * • 222

+ # + + #20000000000000000 + . : . FIGURE11G .ELFUOTI1097 .GLDRGKP1104 GFSKESI 1083 P. . s. 1104 899 801 1108 1125 1094 1113 1115 U . S . Patent May 8 , 2018 Sheet 31 of 58 US 9 , 963 ,689 B2

HSmu*H Sag .H Spy Smu PIAna IGTde 1 # the „NCjeGIQ 4 , . . . * # . # # diemense

#

* het

e

#

h Met # 200QQ.*2002• in

# +IENHKYVEN.•*HSth

wwwremont. *

.TNmeHIGKNGILEGI.*GVKEX. *

*

*.DEPXHLDYVDK * ???????????????????????????????,Sth.NHSODBLCSSEIGP

#

#

# * 200000 # # 1240KDEFKELLDVVSNF8KKYTLAEGNLEKIKELYANNG.*EDLKELASSFI # dit :1264VSYFDDILQLINDFSKRVILADANLEKINKLYDNKE,*NISVDBA.NNII : 987DWOLIDDSFNEKFSLH.PYDLVETKKARMEGYFASCHRGTGMINIRIHNme # .EGLE"*PSAVNETVELK 1263KHYLDEIIEQISEF8KRVILADANLDKVLSAYNYHRD.*KPZREDAENII CID iné CID

# # 877DWILMDENYEFCFSLY.-•*KDSLILITKDMQEPELVYFNAFTSSWVSLI.*V9#KHCje # # *

GWVVVGDELEINVD2r.TKYAIGRTL2DFPNTTRWRICCYDINSKITLK # # * *EIEKKOEFVNO.1222itin*QKGNELALPTQFMKFLYTAS.-RYNESKGKPE.DNEOKOLFVEO HYEKLKGSPE1221QKGNSIALPSKYVNFLYLAS- . KNOKILFENA.NEKEVMAXIETLSha..

#

# YKKRPNSATTDTLVKGKEKFKSLIIRNOVTde1277EKEFYDL.LOKKNLEIYDMULTKHKDTI

112 imereka. HAK1

wel :

.RPAVQECCSNNEVLYFKKIIRTSEIRSOREKIGKTISPYEOLSFRSYEKENLWK Helen 200QQQQQQ *

* *HKGNQIFLSKFVKLLYHAK.1237 * QQQQQQQQQQQQQQQQQQQQQQ

*

*

*

* *

* 1041VLAA. 1036DL. 928DNKF. FIGURE11H 1000 * V NW 991 1215NNNNN 1203 U . S . Patent May 8, 2018 Sheet 32 of 58 US 9 , 963 ,689 B2

NmeCje + +

+ w.Ana + + +

+ +

+

+

+

+ +

+ +

M

+

+ +

+

+

RYSSRSNLPTSWTIE.+, +

+ +

+ +

+ .:+ GSERKGLPELIGROSAADFEFLGVKIPRYRDYTOSSLLKDATLIHOSVTGLYETRIDLAKLGESth *NLLT4141GAPATFKFPDKNIDRKRYTSTELLNATLIHOSITGLYETRIDLNKZGGSmu CTD

CRIK+ +

+ KLFSATRNVSDIssIDNCILIYOSITGIFEKRIDLLKVILEIL.1335MZOGSKYSGVAKIONTde +

+

+

+

+

+ A .NLPPPIOLGAPAAPKFPDKIVDRKRYTSTXBVLNSTLINOSITGLYETRIDLGKLGESad .HUTTLINLGAPAAFKYFDTTIDRKEYTSTXEVLDATLIHOSITGLYETRIDLSOLGGS

-4GGGGGGGGGGG1066AINVLT.-KVHPTYVRRDALGHP .

i +

i +

+ + -.el 1055A.+LSFQEYQIDELGKEIRP 4 + + + . + 960L.-KVFEKYIV + . i AnaNme •Cje Tde DSagD Spy FIGURE111 + 1388GSth1345DSmu 1325 1288 1313 1311 1370 1368 atent May 8 , 2018 Sheet 33 of 58 US 9 , 963 ,689 B2

0€ X PWN75477+AAA X 9899

PWN475477+AAADWD1125-11271425AAA 30 'XXXS3XXV.

0€ 1:59121

- — . - - — — - 30 - .

. .

.

.

. . .

WT . .-

.

0 Cas9: Time(min): FIGURE12 atent May 8 , 2018 Sheet 34 of 58 US 9 , 963, 689 B2

.-:' 6748229249981101 -.: RUVC-111BhairpinTopaCTD2929

1.-:

.- 468513 RuvcHNH

Alpha-helicallobe Nucleaselobe 252 A.naeslundiiCas9(AnaCas9) 648095 FIGURE13A RuvoArg L. 22 U . S . Patent May 8, 2018 Sheet 35 of 58 US 9 , 963 ,689 B2

FIGURE 13B

MOM KG

.. ... : . .:. : . " - - - * * *

" alonastalical obe ( 252 . 68 ** * RUVC

PLE Y . NI r . - LYRIRIPRA Y - 111 ! ... poony ordered *** * * * . Y

. . ** * ** . ! . Arg - rich . isthe in ......

domen udsen U . S . Patent May 8, 2018 Sheet 36 of 58 US 9 , 963 ,689 B2

FIGURE 13C Zn

Insertion in Insertions in SpyCas9 SpyCas9 CTD (residues 205 -307Spy (1228 - 1332Spy ) and 315 -403Spy )

u utilifficultati

5028px - 7139py ( structurally similar to Deletion in SpyCas9 (8 -hairpin in AnaCas9 ) 252Ang_ 468Ane ) U . S . Patent May 8, 2018 Sheet 37 of 58 US 9 , 963 ,689 B2

FIGURE 14

Overall superposition HNH- Ruy - Topo ( rmsd 9. 1 Å over 654Ca ) (rmsd 3 .31? over 175 Ca ) U . S . Patent May 8, 2018 Sheet 38 of 58 US 9 , 963 ,689 B2

FIGURE 14C

** * Free

H

12 urry

vurHiert wir

alpha -helical lobe 252 Ana - 468Ana vs 502Spy - 713Spy showing as cartoon ( rmsd 3 .63Å over 149 Ca ) U . S . Patent May 8, 2018 Sheet 39 of 58 US 9 , 963 ,689 B2

FIGURE 15 HÌNH ?o?i

RuvCRuc domain for 8 L A alpha -helical lobe (Residues 2524 46841* ) AH 4. attest MAdisenting

timewith **** *

B - hairpin domain Arg - rich and disorded subdomain in alpha- helical lobe (Residues 95V4_ 254 ) U . S . Patent May 8, 2018 Sheet 40 of 58 US 9 , 963 ,689 B2

FIGURE 16 RIWA FUNDULIKU W in

N6061 Wat2Watt af Platz H582 DE D581

.:1 : 12 U . S . Patent May 8, 2018 Sheet 41 of 58 US 9 , 963 ,689 B2

FIGURE 17

The M af NAY whiteningeren :

.

:

:

:9 estinutustutishning tilityin a u wi C566 10602 WHIR WWW

.

......

".

| HÀNH THI3 U . S . Patent May 8, 2018 Sheet 42 of 58 US 9 , 963 ,689 B2

FIGURE 18 SpyCas9 AnaCas9 lation timpi ** laittharsiller white en rosar *

*** * 8

3. **visib

mmmmmmmmm w

W N863 omofiti

H840 sh anuncionamento

WHE U . S . Patent May 8 , 2018 Sheet 43 of 58 US 9 , 963 ,689 B2

FIGURE 19 SpyCas9

Arg - rich

WWKURIKU Topo alphabeticalha bolcallobo lobo AnaCas9

ww w 0

Arg - rich

CTDCTD -nairpin airpin Topo atent May 8 , 2018 Sheet 44 of 58 US 9 , 963, 689 B2

FIGURE 20

AnaCasg SpyCase

conserved variable Biohella lobo SPSoros )

ixraortion * AnaCas9 SpyCas9

Positive Negative U . S . Patent May 8, 2018 Sheet 45 of 58 US 9 , 963 ,689 B2

FIGURE 21 140 A

**

Wowowinningsomrww * VOC! 6- wwwwwww

140 A VOCLWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 8-E U . S . Patent May 8, 2018 Sheet 46 of 58 US 9, 963 , 689 B2

60

Airity

Art 3900 :? ?? int

“? ??

180° 1800 180°

? tw.www

90°

140A A140 1 ? 140A

120A A 120 120A FIGURE22 Cas9-apo CasgRNADO CasiRNA U . S . Patent May 8, 2018 Sheet 47 of 58 US 9 , 963 ,689 B2

FIGURE 23A ClassW Avg Reproj Class Avg Reproj

WWW

. U . S . Patent May 8, 2018 Sheet 48 of 58 US 9 , 963 ,689 B2

FIGURE 23B Class Avg Aepyo Class Avg Reproj OMWWW 0 . 0 U . S . Patent May 8, 2018 Sheet 49 of 58 US 9 , 963 ,689 B2

FIGURE 24

sik:

biti

6

y

... U . S . Patent May 8 , 2018 Sheet 50 of 58 US 9 , 963 , 689 B2

FIGURE 25 tulary

net : ?? ::

ang , 1. 1 ??

? soutube ?-6

.“ and $T U . S . Patent May 8, 2018 Sheet 51 of 58 US 9 , 963 ,689 B2

FIGURE 26 apo - Cas9 Cas9 :RNA Cas9: RNA :DNA Time (min ): - 5 10 20 5 10 20 5 10 20 kDa : 170

. . .JI gg . . . . : ...... WW WW ,

. . . . .

.. #FILLEET . ?den . U . S . Patent May 8, 2018 Sheet 52 of 58 US 9 , 963 ,689 B2

3DDiffMap

Reproj .. . " ALI 2DDiff

Cas9:RNADNAMap Cas9:RNADNA Cas9:RNA Cas9:RNA

FIGURE27

SA-bothends,Cas9:RNADNA SA-tracrRNA,Cas9:RNA SA-nonPAMend,Cas9:RNADNA Wiki SA-CrRNA,Cas9:RNA 0309 200000 ODD. U . S . Patent May 8, 2018 Sheet 53 of 58 US 9 , 963 ,689 B2

FIGURE 28

PAM proximal CYRNA

*** wind tracrRNAkonu

Ruvc Lobe

PAM Lobe distal U . S . Patent May 8, 2018 Sheet 54 of 58 US 9 , 963, 689 B2

FIGUREFIGURE 29A Exonuclease III Non -target strand Target strand ( 1) – + 1 * + 11 DNACasg FOK !Ball + I DNACas9 Foki Bg

T42 G39

.

.

. .! . 13

rumicamen****

7 ,

. . .

.

. . .. : :

2010 " . . ,

( 1) = targeting crRNA ( 2 ) = non -targeting crRNA U . S . Patent May 8 , 2018 Sheet 55 of 58 US 9 , 963 ,689 B2

FIGURE 29B Nuclease P1 Non- target strand Target strand ( 1) - - - 1 I ( 2 ) + DNACasg + DNA

*

.

.

.

...... Un C32

T22 1888 888

,

. P . . .

. .

. . ESTEERLINERIE*

.

:.,•

.

.

?

:

.:

. ( 1 ) = targeting crRNA (2 ) = non - targeting crRNA U . S . Patent May 8, 2018 Sheet 56 of 58 US 9 , 963, 689 B2

Non-targetstrand Targetstrand CCGACAGTTTTAACTCG-5* do

|

#

crRNA'3.-1217UU

26-???, 202

FIGURE30 30

#5-GAGTGGAAGGATGCCAGTGATAAGTGGAATGCCATGTGGGCTGTCAAAATTGAGC3' 30 # #

#

#

#

# # WWW 5-GUGAUAAGUGGAAUGCCAUGGRokok TCACTATICACCITACGGTACA # 10 40CTCACCTTCCTACG-3 | U . S . Patent May 8, 2018 Sheet 57 of 58 US 9 , 963 ,689 B2

000$0. 10*2.63 P2:2, 99.4(9281 thrrerósálsoak SLSOX 1.00392 7.01595 111111NNNNNNNNVVINININ 90. 09179

161.3,210091015952092999190 10.4(21) Als2.22 2. 98.4(9) 2.1(:) 900, 5.71397) EPCBscétateLLLLLLLLLLLLLL101 1,475991 47.936043:39-)27453304330;473850(69959

100,20040.0 U6178.3 14.2303 100.0(; Co40aX LUUUUUUUUU SUSPX P2,22 1.58955 7.1701

LLLLLL L 173(4.1) P2,2 14.011] Burxoslatesoek SUSPX 160.3,305905 90.0,00 1.2142 +234870 99.9199,31 111111

8764.420(91209773904400-30

158.92041,297 152(71.4) ALS8.22 0,.:90 99.9(8 2:22 9,72. Sektet 0.979166 5.0(50

PR:2 20.090, 19,112:31 100.041000) 7.53-309 79.1 91.8 97.6 MinBoak SLSFOI 000001. V.D(940) 7.11731 SøZCC 4.255x025 1862 13 623 622 0003 0.76 ON 82

159.32099108,531

Resolution(A)*127.07-262269;47:4831001840

900,4001010 P22. 2312:31 58.9 59.B 61.7 45.2 007 96.2 8.3 :103. Native 106 4.7635) 13.02(1041 47.52-252 92409 0.215/200 1899 27 208 1.07 00 10 SISPYI companies1207*38.2(985

BorldlengthsA.) Wavekarzyun(A Beidanglas( Romáchándranplot ä,) city Clase: 04.47) Aesolutionin a ex AN3,deviations outers Kerpowicz X-raysound ?liaffe?? AmatMeer ?.???? favoured Dålåset LLLLLLLLLLLLLL Sp??gfOK Aggress Herday Bettnement Noractions Bactors Prota 001 Water Molproblty FIGURE31Dalaset U . S . Patent May 8, 2018 Sheet 58 of 58 US 9 , 963 ,689 B2

1 FIGURE 32 Dataset Seket Mn soak wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww wwwwwwwwwNative w wwwwwwwwwwwwwwwwwwwwwwwwwwwwww

X - ray source ALS 8 . 3 . 1 ALS 8. 3 .1 ALS 8 .22 Space group P12 , 1 PT 2 : 1 P12, 1 Cell dimensions a , b , c ( A ) 74. 58 , 133 .09 , 80 .17 75 .415 , 133 .025 , 80 .69 7461, 132. 56 , 80 . 04 (1 , 170 ) 90 .00 , 95 ,79 , 90 . 00 90 . 96 .22 , 90 90 , 95 ,38 : 90 Wavelength ( A ) 0 . 978 1 .116 5 .000 Resolution (A )* 79 .76 -3 . 19 ( 3. 37 - 3 . 19 ) 80 .2 -2 . 2 (2 . 32 - 2 . 2) 79 .69 - 2 . 80 (2 . 95 - 2. 80 ) Alant% )" 0 . 12480 .428 ) 0 .096 ( 0 .795 ) 0 .090 (0 .628 ) 0 . 05 ( 0 . 176 ) 0. 029 ( 0. 322 ) 0 ,05 (0 .358 ) Vor 11. 944 ,6 ) 14, 89 ( 2 . 24 ) 10 . 86 (2 .27 ) Completeness ( ** )* 99 .94997 ) 98. 0 (86 .8 ) 99. 98 ( 100 .00 ) 7 .9878 ) 86( 5 . 8 ) 4 . 0 (40 )

Refinement Resolution (A ) 68 , 0 -22 68 . 3 - 2 . 8 ( 2 . 9 - 2 . 8 ) No , rettections 78398 38217 0 . 1867 /0 . 2283 0 .1941 / 0 .2310 No . atoms Protein 7693 6888 Nucleic Acid

Ligands por24 Water 348 B -tactors mean 67 .9 67 .3 Protein 58. 6 67 : 3 lon 52. 2 64 . 9 Water 4201 42. 7 A . m . . deviations Bond lengths (A ) 0 .009 Bond angles (0 ) 1. 22 0 .84 Ramachandran plat % favoured 94. 00 Sallowed 5 .80 4. 77 ? ?utkia? 0 :20 0 .23 Molprobity Clashscore * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Values in parenthexs derole ighest rosaktion shall Reproduicn - indiciting (mukp .city weighied ) RAW US 9 ,963 , 689 B2 CASE CRYSTALS AND METHODS OF USE and versatile platform for genome engineering . When THEREOF directed to target loci in eukaryotes by either a natural dual crRNA :tracrRNA guide or a chimeric single - guide RNA , CROSS -REFERENCE Cas9 generates site - specific DSBs that are repaired either by 5 non - homologous end joining (NHEJ ) or homologous recom This application claims the benefit of U .S . Provisional bination (HR ). Patent Application No . 61 / 922 , 556 , filed Dec . 31 , 2013 , Despite these ongoing successes , there is a need in the art which application is incorporated herein by reference in its for understanding the structural basis for guide RNA recog entirety. nition and DNA targeting by Cas9 . 10 INCORPORATION BY REFERENCE OF SUMMARY SEQUENCE LISTING PROVIDED AS A TEXT FILE The present disclosure provides atomic structures of Cas9 with and without polynucleotides bound thereto . Also pro A Sequence Listing is provided herewith as a text file , 15 vided is a computer- readable medium comprising atomic “ BERK -231 WO SEQLIST _ ST25 . txt” created on Dec. 23 , coordinates for Cas9 polypeptides in both an unbound 2014 and having a size of 188 KB . The contents of the text configuration and a configuration wherein the Cas9 poly file are incorporated by reference herein in their entirety . peptide is bound to one or more polynucleotides . The computer readable medium may further contain program TABLES PROVIDED IN ELECTRONIC FORM 20 ming for displaying a of a Cas9 polypep tide and for identifying an amino acid residue that binds to This application includes Table 1 . Table 1 is a text file a polynucleotide or an amino acid residue that, when sub named “ BERK - 231PRV Table 1 - Atomic coordinates stituted with a different amino acid , alters Cas9 function , SpyCas9 _ apo " created on Dec . 30 , 2013 . The size of the e . g . , polynucleotide binding . The present disclosure also “ BERK - 231PRV Table 1 - Atomic coordinates 25 provides crystals comprising Cas9 polypeptides and com SpyCas9 _ apo ” text file is 2 , 465 KB . The information con - positions comprising the crystals . tained in Table 1 is hereby incorporated by reference in this The present disclosure also provides methods for the application . engineering of Cas9 polypeptides wherein Cas9 activity has been altered , ablated , or preserved and amended with addi INTRODUCTION 30 tional activities . In general terms, methods comprise using the atomic coordinates to computationally identify a site for Bacteria and archaea use RNA - guided adaptive immune amino acid residue substitution , insertion , or deletion to alter systems encoded by CRISPR ( Clustered Regularly Inter - a function or chemical property of a Cas9 polypeptide . A spaced Short Palindromic Repeats ) - Cas ( CRISPR -associ - method is also provided that comprises computationally ated ) genomic loci to recognize and destroy invasive DNA . 35 identifying candidates sites within the Cas9 polypeptide for Upon viral infection or plasmid transformation , short frag - the insertion of heterologous sequences using the Cas9 ments of foreign DNA are integrated into the CRISPR array atomic structures provided herein . Also provided are meth within the host chromosome. Enzymatic processing of ods of engineering chimeric Cas9 polypeptides through the CRISPR transcripts produces mature CRISPR RNAs ( crR replacement of Cas9 domains with orthologous Cas9 NAs ) that direct Cas protein -mediated targeting of DNA 40 domains or the insertion of heterologous protein domains. bearing complementary sequences ( protospacers ) . While Such altered or chimeric Cas9 polypeptides have utility for Type I and III CRISPR - Cas systems rely on large , multi - controlling site - specific gene regulation as well as engineer protein complexes for crRNA - guided DNA targeting , Type ing or editing of prokaryotic and eukaryotic genomes and II systems employ a single enzyme, Cas9. Cas9 is a dual epigenomes . RNA - guided endonuclease that requires both a mature 45 crRNA and a trans -activating crRNA (tracrRNA ) for target BRIEF DESCRIPTION OF THE DRAWINGS DNA recognition and cleavage . The two nuclease active sites in Cas9 act together to generate blunt double - stranded FIGS. 1A - B provide a general schematic drawing of a breaks (DSBs ) . Both a seed sequence and conserved proto - generalized site -directed modifying polypeptide associated spacer adjacent motif (PAM ) sequence are crucial for effi - 50 with two exemplary subject DNA - targeting RNAs and target cient target binding and cleavage by Cas9 . DNA . Cas9 proteins are abundant across the bacterial kingdom , FIGS . 2A - C provide a structure of SpyCas9 and but they vary widely in both sequence and size . All known structural superposition of SpyCas9 with Thermus thermo Cas9 enzymes contain an HNH domain that cleaves the philus RuvC resolvase bound to a Holliday junction sub DNA strand complementary to the guide RNA sequence 55 strate ( PDB entry 4LDO ) including close -up views of the ( target strand ) , and RuvC nuclease motifs required for RuvC active site ( A ) without and ( B ) with six nucleotides of cleaving the non - complementary strand ( non -target strand ) . DNA modeled in the active site and ( C ) close - up views of In addition , Cas9 enzymes contain a highly conserved the catalytic sites. arginine - rich ( Arg - rich ) region that has been suggested to FIG . 3 provides a cartoon of schematic of the polypeptide mediate nucleic acid binding . Based on CRISPR - Cas locus 60 sequence and domain organization of the Type II - A Cas9 architecture and protein sequence phylogeny, Cas9 genes protein from S . pyogenes (SpyCas9 ) . have been classified into three subfamilies : Type II - A , II - B , FIGS . 4A - B provide orthogonal views of the overall and II- C . The Type II - A and - C subfamilies represent most structure of SpyCas9 shown in ( A ) ribbon and (b ) surface known Cas9 genes, encoding proteins of ~ 1400 or - 1100 representations. amino acids in length , respectively . 65 FIGS . 5A - K provide multiple of Cas9 The ability to program Case for DNA cleavage at sites proteins associated with Type II- A CRISPR loci. Primary defined by guide RNAs has led to its adoption as a robust sequences of Cas9 proteins from Streptococcus pyogenes US 9 , 963 ,689 B2 (GI 15675041 ; SEQ ID NO : 1) , Streptococcus thermophilus FIG . 18 provides models showing the (A ) overall auto LMD -9 (GI 116628213 ; SEQ ID NO :2 ) , Listeria innocua inhibited conformations of SpyCas9 and AnaCas9 in the Clip 11262 (GI 16801805 : SEO ID NO : 3 ) . Streptococcus outbound state and a ( B ) zoomed in view of the HNH domain (yellow ) active site in SpyCas9 occluded by the agalactiae A909 (GI 76788458 ; SEQ ID NO : 4 ), Strepto 5 1049Spy - 10599py beta -hairpin (black ). coccus mutans UA159 (GI 24379809 ; SEQ ID NO : 5 ), and 5 FIG . 19 provides a common Cas9 functional core through Enterococcus faecium 1 ,231 , 408 (GI 257893735 ; SEQ ID structural comparison of surface representations of SpyCas9 NO :6 ) were aligned using MAFFT . and AnaCas9 with conserved RuvC , HNH , Arg - rich , Topo FIG . 6 provides surface representations of the SpyCas9 homology and CTD domains. architecture , electrostatic surface potential, and evolutionary . FIG . 20 provides surface feature comparison of SpyCas9 amino acid conservation . 10 and AnaCas9 evolutionary amino acid residue conservation FIG . 7 provides a close -up view of the helical lobe of and electrostatic potential. SpyCas9 including the Arg - rich region and identifying con FIGS. 21A - B provide reconstructions of ( A ) apo - Cas9 served amino acid residues in - and - stick format. and ( B ) Cas9 :RNA :DNA produced from negative -stain elec tron microscopy . FIG . 8 provides a Model of non -complementary DNA 1615 FIG . 22 provides cartoon representations and single par strand bound to the RuvC domain based on a superposition ticle EM reconstructions of negatively stained apo -Cas9 , with the DNA - bound complex of Thermus thermophilus Cas9 : RNA : DNA, and Cas9 :RNA at 19 -, 19 - , and 21 - Å RuvC Holliday junction resolvase (PDB entry 4LDO ) and a resolution , respectively . zoomed - in view of the modeled DNA binding site showing FIGS. 23A - B provide reference free 2D class averages of the modeled non - target DNA strand ( stick format) and the 20 ( A ) apo - Cas9 and ( B ) Cas9: RNA :DNA matched to projec predicted path of the downstream ( 3 ') sequence containing tions of the final reconstructions . the PAM . FIG . 24 provides computational of X -ray crystal FIG . 9 provides a cartoon showing the design and work - structure of S . pyogenes apo -Cas9 into the apo -Cas9 EM flow of crosslinking experiments with DNA substrates con density model. taining 5 -bromodeoxyuridine ( Br - dU ) nucleotides for LC - 25 FIG . 25 provides 3D difference maps between the N - ter MS/ MS analysis . The denaturing polyacrylamidelyacrylamide gel minal MBP - labeled and unlabeled reconstructions of apo demonstrates the generation of covalent peptide- DNA Cas9 and Cas9 :RNA :DNA mapped onto the corresponding adducts with Br- dU and catalytically inactive SpyCas9 unlabeled reconstructions. following UV irradiation and trypsin digestion . FIG . 26 provides a proteolysis assay performed on apo FIG . 10 provides the results of a DNA cleavage assay 30 Cas9 , Cas9 :RNA , and Cas9: RNA :DNA resolved by SDS performed with wild type (WT ) and catalytically inactive ( d ) FIG . 27 provides single particle EM analyses of strepta Cas9 and analyzed by denaturing PAGE . vidin (SA ) labelled nucleic acids bound to Cas9 . For each FIGS. 11A -I provide multiple sequence alignment of combination of labelled nucleic acid and Cas9 included are Cas9 orthologs associated with Type II - A CRISPR loci. schematics of structures and labels , five representative ref Primary sequences of Cas9 proteins from Ana (Actinomyces 35 erence - free 2D class averages, the corresponding reference naeslundii str. Howell 279 , EJN84392 . 1 ; SEQ ID NO : 7 ) , free 2D class average of unlabeled Cas9 bound to nucleic Nme ( Neisseria meningitidis , WP _ 019742773 . 1 ; SEQ ID acid , a 2D difference map between the unlabeled and labeled NO : 8 ) , Cje (Campylobacter jejuni, WP _ 002876341 . 1 ; SEQ structures, the corresponding reprojection of the Cas9 bound ID NO : 9 ) , Tde ( Treponema denticola , WP _ 002676671 . 1 ; to nucleic acid , and corresponding reconstructions . SEQ ID NO : 10 ) , Sth (Streptococcus thermophilus LMD - 9 , 40 FIG . 28 provide a reconstruction of the central channel of YP _ 820832 . 1 ; SEQ ID NO :11 ) , Smu ( Streptococcus the Cas9: RNA : DNA (transparent surface ) with ~ 25 bp of an mutans, WP _ 019803776 . 1 ; SEQ ID NO : 12 ), Sag ( Strepto - A - form duplex . The positions of DNA and guide RNA coccus agalactiae , WP _ 001040088 . 1 ; SEQ ID NO : 13 ) , and termini based on our labelling experiments are marked . Spy ( Streptococcus pyogenes, YP _ 282132 .1 ; SEQ ID FIGS . 29A - B provide footprinting assays resolved by NO : 14 ) were aligned using CLUSTALW and generated in 45 polyacrylamide gel electrophoresis with target DNA bound ESPript. by Cas9 :RNA where a 55 -bp DNA substrate was 5 '- radio FIG . 12 provides DNA cleavage activity assays with labeled on either the target ( SEQ ID NO : 18 ) or non - target SpyCas9 constructs containing mutations in residues iden strand (SEQ ID NO : 17 ) and incubated with catalytically tified by crosslinking and LC -MS / MS experiments . inactive Cas9: RNA containing a complementary crRNA FIGS. 13A - C provide ( A ) a cartoon of schematic of the 50 (targeting ) (SEQ ID NO : 16 ) or a mismatched control crRNA polypeptide sequence and domain organization of the Cas9 ( non -targeting ) (SEQ ID NO : 24 ) , before being subjected to protein from A . naeslundii ( AnaCas9 ), ( B ) orthogonal views ( A ) exonuclease III or ( B ) nuclease P1 treatment. of the overall structure of AnaCas9 shown in ribbon repre - FIG . 30 provides a schematic representation of the results sentation , ( C ) superposition of AnaCas9 with SpyCas9 . from the footprinting assays in FIGS . 29A - B showing the FIGS. 14A - C provide superpositions of AnaCas9 and 55 boarders of the DNA target protected by Cas9: RNA ( gray SpyCas9 showing the structural alignment of ( A ) the overall box ) and the nucleotides susceptible to P1 digestion (hash proteins, ( B ) the catalytic core , and ( C ) the alpha -helical tags ) in respect to the crRNA (SEQ ID NO : 16 ), Target strand lobe . (SEQ ID NO : 25 ) , and Non - target strand (SEQ ID NO : 17 ) . FIG . 15 provides a B - factor putty plot of AnaCas9 FIG . 31 provides Table 2 , Data collection , refinement and wherein thin loops represent low B - values , while broad 60 model statistics for Spy Cas9 . tubes represent high B - values . FIG . 32 provides Table 4 , Data collection , refinement and FIG . 16 provides a close - up view of the active site of model statistics for AnaCas9 . AnaCas9 HNH domain superimposed with the structure of I -Hmul - DNA complex (PDB entry 1U3E ) . DEFINITIONS FIG . 17 provides a close - up view of the zinc -binding site 65 in the HNH domain of AnaCas9 and the coordinating The terms “ polynucleotide” and “ nucleic acid , ” used resides . interchangeably herein , refer to a polymeric form of nucleo US 9 , 963 ,689 B2 tides of any length , either ribonucleotides or deoxyribo - for a hybridizable nucleic acid is at least about 10 nucleo nucleotides . Thus, this term includes, but is not limited to , tides. Illustrative minimum lengths for a hybridizable single - , double -, or multi - stranded DNA or RNA , genomic nucleic acid are: at least about 15 nucleotides ; at least about DNA , cDNA , DNA -RNA hybrids, or a polymer comprising 20 nucleotides , at least about 22 nucleotides , at least about purine and pyrimidine bases or other natural, chemically or 5 25 nucleotides ; and at least about 30 nucleotides ) . Further biochemically modified , non -natural , or derivatized nucleo - more , the skilled artisan will recognize that the temperature tide bases . “ Oligonucleotide” generally refers to polynucle - and wash solution salt concentration may be adjusted as otides of between about 5 and about 100 nucleotides of necessary according to factors such as length of the region single - or double - stranded DNA . However, for the purposes of complementation and the degree of complementation . of this disclosure , there is no upper limit to the length of an 10 The terms “ peptide , " " polypeptide, ” and “ protein ” are oligonucleotide . Oligonucleotides are also known as " oli - used interchangeably herein , and refer to a polymeric form gomers” or “ oligos ” and may be isolated from genes , or of amino acids of any length , which can include coded and chemically synthesized by methods known in the art. The non - coded amino acids , chemically or biochemically modi terms“ polynucleotide ” and “ nucleic acid ” should be under- fied or derivatized amino acids, and polypeptides having stood to include, as applicable to the embodiments being 15 modified peptide backbones described , single - stranded (such as sense or antisense ) and “ Binding” as used herein ( e . g . with reference to an double - stranded polynucleotides . RNA -binding domain of a polypeptide ) refers to a non By " hybridizable ” or “ complementary ” or “ substantially covalent interaction between macromolecules ( e. g ., between complementary ” it is meant that a nucleic acid ( e . g . RNA ) a protein and a nucleic acid ) . While in a state of non comprises a sequence of nucleotides that enables it to 20 covalent interaction , the macromolecules are said to be non - covalently bind , i. e . form Watson -Crick base pairs “ associated ” or “ interacting ” or “ binding " ( e . g . , when a and / or G / U base pairs , " anneal” , or “ hybridize , " to another molecule X is said to interact with a molecule Y , it is meant nucleic acid in a sequence -specific , antiparallel , manner the molecule X binds to molecule Y in a non -covalent ( i. e ., a nucleic acid specifically binds to a complementary manner ) . Not all components of a binding interaction need nucleic acid ) under the appropriate in vitro and / or in vivo 25 be sequence - specific ( e . g . , contacts with phosphate residues conditions of temperature and solution ionic strength . As is in a DNA backbone ) , but some portions of a binding known in the art , standard Watson - Crick base - pairing interaction may be sequence - specific . Binding interactions includes : adenine ( A ) pairing with thymidine ( T ) , adenine are generally characterized by a dissociation constant (Kd ) ( A ) pairing with uracil ( U ) , and guanine ( G ) pairing with of less than 10 - M , less than 10 - 7 M , less than 10 - M , less cytosine ( C ) [DNA , RNA ) . In addition , it is also known in 30 than 10 - 9 M , less than 10 - 10 M , less than 10 - 11 M , less than the art that for hybridization between two RNA molecules 10 - 12 M , less than 10 - 13 M , less than 10 - 14 M , or less than ( e . g . , dsRNA ), guanine ( G ) base pairs with uracil ( U ) . For 10 - 15 M . “ Affinity ” refers to the strength of binding , example , G / U base - pairing is partially responsible for the increased binding affinity being correlated with a lower Kd . degeneracy (i . e ., redundancy) of the genetic code in the By “ binding domain ” it is meant a protein domain that is context of tRNA anti -codon base -pairing with codons in 35 able to bind non - covalently to another molecule . A binding mRNA . In the context of this disclosure , a guanine ( G ) of a domain can bind to , for example , a DNA molecule ( a protein - binding segment ( dsRNA duplex ) of a subject DNA - DNA - binding protein ), an RNA molecule ( an RNA - binding targeting RNA molecule is considered complementary to a protein ) and /or a protein molecule ( a protein - binding pro uracil ( U ) , and vice versa . As such , when a G / U base -pair tein ) . In the case of a protein domain -binding protein , it can can be made at a given nucleotide position a protein - binding 40 bind to itself ( to form homodimers , homotrimers , etc . ) segment (dsRNA duplex ) of a subject DNA - targeting RNA and / or it can bind to one or more molecules of a different molecule, the position is not considered to be non - comple - protein or proteins . mentary, but is instead considered to be complementary . The term “ conservative amino acid substitution ” refers to Hybridization and washing conditions are well known and the interchangeability in proteins of amino acid residues exemplified in Sambrook , J. , Fritsch , E . F . and Maniatis , T . 45 having similar side chains. For example , a group of amino Molecular Cloning: A Laboratory Manual, Second Edition , acids having aliphatic side chains consists of glycine , ala Cold Spring Harbor Laboratory Press, Cold Spring Harbor nine, valine, leucine, and isoleucine; a group of amino acids ( 1989 ), particularly Chapter 11 and Table 11. 1 therein ; and having aliphatic -hydroxyl side chains consists of serine and Sambrook , J . and Russell , W . , Molecular Cloning : A Labo - threonine ; a group of amino acids having amide containing ratory Manual, Third Edition , Cold Spring Harbor Labora - 50 side chains consisting of asparagine and glutamine; a group tory Press , Cold Spring Harbor (2001 ) . The conditions of of amino acids having aromatic side chains consists of temperature and ionic strength determine the “ stringency ” of phenylalanine, tyrosine , and tryptophan ; a group of amino the hybridization . acids having basic side chains consists of lysine , arginine , Hybridization requires that the two nucleic acids contain and histidine ; a group of amino acids having acidic side complementary sequences, although mismatches between 55 chains consists of glutamate and aspartate ; and a group of bases are possible . The conditions appropriate for hybrid amino acids having sulfur containing side chains consists of ization between two nucleic acids depend on the length of cysteine and methionine . Exemplary conservative amino the nucleic acids and the degree of complementation , vari - acid substitution groups are valine - leucine - isoleucine , phe ables well known in the art . The greater the degree of nylalanine -tyrosine , lysine -arginine , alanine - valine, and complementation between two nucleotide sequences, the 60 asparagine - glutamine . greater the value of the melting temperature ( Tm ) for polynucleotide or polypeptide has a certain percent hybrids of nucleic acids having those sequences . For hybrid - “ sequence identity " to another polynucleotide or polypep izations between nucleic acids with short stretches of tide , meaning that , when aligned , that percentage of bases or complementarity ( e . g . complementarity over 35 or less , 30 amino acids are the same , and in the same relative position , or less , 25 or less , 22 or less , 20 or less, or 18 or less 65 when comparing the two sequences . Sequence identity can nucleotides ) the position of mismatches becomes important be determined in a number of different manners. To deter ( see Sambrook et al. , supra, 11. 7 - 11 . 8 ) . Typically , the length mine sequence identity , sequences can be aligned using US 9 ,963 , 689 B2 various methods and computer programs ( e . g ., BLAST, fused to a heterologous polypeptide ( i . e . a polypeptide other T -COFFEE , MUSCLE , MAFFT, etc . ) , available over the than Cas9 ) , which exhibits an activity that will also be world wide web at sites including ncbi. nlm . nili. gov /BLAST , exhibited by the fusion variant Cas9 site -directed polypep ebi .ac .uk / Tools /msa /tcoffeel , ebi. ac .uk / Tools /msa /musclel , tide. A heterologous nucleic acid sequence may be linked to mafft . cbrc . jp / alignment/ / . See , e . g . , Altschul et al. 5 a variant Cas9 site -directed polypeptide ( e . g . , by genetic (1990 ), J. Mol. Bioi. 215 :403 - 10 . engineering ) to generate a nucleotide sequence encoding a The term " naturally -occurring ” or “ unmodified " or fusion variant Cas9 site - directed polypeptide . “ native " as used herein as applied to a nucleic acid , a “ Recombinant, " as used herein , means that a particular polypeptide , a cell , or an organism , refers to a nucleic acid , nucleic acid (DNA or RNA ) is the product of various polypeptide, cell , or organism that is found in nature . For 10 combinations of cloning, restriction , polymerase chain reac example , a polypeptide or polynucleotide sequence that is tion (PCR ) and /or ligation steps resulting in a construct present in an organism (including viruses ) that can be having a structural coding or non - coding sequence distin isolated from a source in nature and which has not been guishable from endogenous nucleic acids found in natural intentionally modified by a human in the laboratory is systems. DNA sequences encoding polypeptides can be naturally occurring . 15 assembled from cDNA fragments or from a series of syn The term “ chimeric” as used herein as applied to a nucleic thetic oligonucleotides , to provide a synthetic nucleic acid acid or polypeptide refers to two components that are which is capable of being expressed from a recombinant defined by structures derived from different sources. For transcriptional unit contained in a cell or in a cell - free example , where “ chimeric ” is used in the context of a transcription and translation system . Genomic DNA com chimeric polypeptide (e . g ., a chimeric Cas9 protein ), the 20 prising the relevant sequences can also be used in the chimeric polypeptide includes amino acid sequences that are formation of a recombinant gene or transcriptional unit. derived from different polypeptides . A chimeric polypeptide Sequences of non - translated DNA may be present 5 ' or 3 ' may comprise either modified or naturally occurring poly from the open reading frame, where such sequences do not peptide sequences ( e .g ., a first amino acid sequence from a interfere with manipulation or expression of the coding modified or unmodified Cas9 protein ; and a second amino 25 regions, and may indeed act to modulate production of a acid sequence other than the Cas9 protein ) . In some cases , desired product by various mechanisms ( see “ DNA regula the two different sources of a chimeric polypeptide may also tory sequences” , below ) . Alternatively , DNA sequences refer to two different Cas9 proteins ( e. g ., a first amino acid encoding RNA (e . g. , DNA -targeting RNA ) that is not trans sequence from a Streptococcus pyogenes Cas9 protein and a lated may also be considered recombinant. Thus , e . g . , the second amino acid sequence from an Actinomyces naeslun - 30 term “ recombinant" nucleic acid refers to one which is not dii Cas9 ) , for example . Similarly , " chimeric ” in the context naturally occurring , e. g . , is made by the artificial combina of a polynucleotide encoding a chimeric polypeptide tion of two otherwise separated segments of sequence includes nucleotide sequences derived from different coding through human intervention . This artificial combination is regions ( e . g ., a first nucleotide sequence encoding a modi - often accomplished by either chemical synthesis means, or fied or unmodified Cas9 protein ; and a second nucleotide 35 by the artificialmanipulation of isolated segments of nucleic sequence encoding a polypeptide other than a Cas9 protein ). acids, e .g ., by genetic engineering techniques. Such is usu The term “ chimeric polypeptide” or “ chimeric protein " ally done to replace a codon with a codon encoding the same refers to a polypeptide which is made by the combination amino acid , a conservative amino acid , or a non - conserva ( . e . , “ fusion ” ) of two otherwise separated segments of tive amino acid . Alternatively , it is performed to join amino sequence , usually through human intervention . A 40 together nucleic acid segments of desired functions to gen polypeptide that comprises a chimeric amino acid sequence erate a desired combination of functions. This artificial is a chimeric polypeptide. Chimeric polypeptides may be combination is often accomplished by either chemical syn derived in may ways; e . g . , through the fusion of two ormore thesis means, or by the artificial manipulation of isolated amino acid sequences end -to - end , or e . g ., through the inser segments of nucleic acids, e . g ., by genetic engineering tion of one ormore amino acid sequences into another amino 45 techniques . When a recombinant polynucleotide encodes a acid sequence , or e . g . , through the mutation or removal of polypeptide , the sequence of the encoded polypeptide can be individual amino acid residues in a polypeptide such that naturally occurring (“ wild type ” ) or can be a variant ( e. g ., a motifs or domains within the polypeptide more similarly mutant) of the naturally occurring sequence . Thus, the term resemble motifs or domains within a different polypeptide . “ recombinant” polypeptide does not necessarily refer to a “ Heterologous, " as used herein , means a nucleotide or 50 polypeptide whose sequence does not naturally occur . polypeptide sequence that is not found in the native nucleic Instead , a " recombinant" polypeptide is encoded by a acid or protein , respectively . For example , in a chimeric recombinant DNA sequence, but the sequence of the poly Cas9 protein , the RNA -binding domain of a naturally - peptide can be naturally occurring wild type ” ) or non occurring bacterial Cas9 polypeptide (or a variant thereof) naturally occurring (e . g. , a variant, a mutant , etc . ). Thus, a may be fused to a heterologous polypeptide sequence ( i . e . a 55 “ recombinant ” polypeptide is the result of human interven polypeptide sequence from a protein other than Cas9 or a tion , but may be a naturally occurring amino acid sequence . polypeptide sequence from another organism ) . The heter - A “ vector” or “ expression vector ” is a replicon , such as ologous polypeptide sequence may exhibit an activity ( e . g ., plasmid , phage, virus, or cosmid , to which another DNA enzymatic activity ) that will also be exhibited by the chi- segment, i. e . an “ insert” , may be attached so as to bring meric Cas9 protein ( e . g ., methyltransferase activity , acetyl- 60 about the replication of the attached segment in a cell . transferase activity , kinase activity , ubiquitinating activity , A “ target DNA " as used herein is a DNA polynucleotide etc . ). A heterologous nucleic acid sequence may be linked to that comprises a " target site ” or “ target sequence . ” The terms a naturally - occurring nucleic acid sequence (or a variant " target site” or “ target sequence ” or “ target protospacer thereof) ( e . g ., by genetic engineering ) to generate a chimeric DNA ” are used interchangeably herein to refer to a nucleic nucleotide sequence encoding a chimeric polypeptide . As 65 acid sequence present in a target DNA to which a DNA another example , in a fusion variant Cas9 site -directed targeting segment of a subject DNA - targeting RNA will bind polypeptide, a variant Cas9 site -directed polypeptide may be (see FIG . 1) , provided sufficient conditions for binding exist . US 9 , 963, 689 B2 10 For example , the target site ( or target sequence ) 5 ' -GAG The DNA - targeting segment (or “ DNA - targeting CATATC - 3 ' within a target DNA is targeted by (or is bound sequence ” ) comprises a nucleotide sequence that is comple by , or hybridizes with , or is complementary to ) the RNA mentary to a specific sequence within a target DNA ( the sequence 5 -GAUAUGCUC - 3 '. Suitable DNA/ RNA binding complementary strand of the target DNA ) . The protein conditions include physiological conditions normally pres - 5 binding segment ( or “ protein -binding sequence ” ) interacts ent in a cell . Other suitable DNA /RNA binding conditions with a site - directed modifying polypeptide. When the site ( e. g ., conditions in a cell - free system ) are known in the art; directed modifying polypeptide is a Cas9 or Cas9 related see , e . g . , Sambrook , supra . The strand of the target DNA that is complementary to and hybridizes with the DNA - targeting polypeptide (described in more detail below ) , site -specific RNA is referred to as the " complementary strand ” and the 10 cleavage of the target DNA occurs at locations determined strand of the target DNA that is complementary to the by both ( i) base -pairing complementarity between the DNA " complementary strand ” ( and is therefore not complemen targeting RNA and the target DNA ; and ( ii ) a short motif tary to the DNA - targeting RNA ) is referred to as the ( referred to as the protospacer adjacentmotif (PAM ) ) in the “ noncomplementary strand ” or “ non -complementary target DNA . strand .” 15 General methods in molecular and cellular biochemistry By “ cleavage ” it is meant the breakage of the covalent can be found in such standard textbooks as Molecular backbone of a DNA molecule . Cleavage can be initiated by Cloning : A Laboratory Manual, 3rd Ed . (Sambrook et al. , a variety ofmethods including, but not limited to , enzymatic HARBor Laboratory Press 2001) ; Short Protocols in Molecu or chemical hydrolysis of a phosphodiester bond. Both lar Biology , 4th Ed . ( Ausubel et al . eds. , John Wiley & Sons single - stranded cleavage and double - stranded cleavage are 20 1999 ); Protein Methods (Bollag et al ., John Wiley & Sons possible , and double -stranded cleavage can occur as a result 1996 ) ; Nonviral Vectors for Gene Therapy (Wagner et al . of two distinct single - stranded cleavage events . DNA cleav eds ., Academic Press 1999 ) ; Viral Vectors (Kaplift & Loewy age can result in the production of either blunt ends or eds. , Academic Press 1995 ); Immunology Methods Manual staggered ends . In certain embodiments , a complex com - ( I . Lefkovits ed ., Academic Press 1997 ) ; and Cell and Tissue prising a DNA - targeting RNA and a site -directed modifying 25 Culture : Laboratory Procedures in Biotechnology (Doyle & polypeptide is used for targeted double - stranded DNA cleav - Griffiths, John Wiley & Sons 1998 ), the disclosures of which age . are incorporated herein by reference . By “ cleavage domain ” or “ nuclease domain ” of a nucle - Structural similarity may be inferred from , e . g . , sequence ase it is meant the polypeptide sequence or domain within similarity, which can be determined by one of ordinary skill the nuclease which possesses the catalytic activity for 30 through visual inspection and comparison of the sequences , nucleic acid cleavage . A cleavage domain can be contained or through the use of well -known alignment software pro in a single polypeptide chain or cleavage activity can result grams such as (Wilbur , W . J . and Lipman , D . J . from the association of two (or more) polypeptides . A single Proc. Natl. Acad . Sci. USA , 80 , 726 - 730 (1983 ) ) or CLUST nuclease domain may consist of more than one isolated ALW ( Thompson , J. D . , Higgins , D . G . and Gibson , T . J. , stretch of amino acids within a given polypeptide . 35 CLUSTAL W : improving the sensitivity of progressive The RNA molecule that binds to the site -directed modi - multiple sequence alignment through sequence weighting , fying polypeptide and targets the polypeptide to a specific positions- specific gap penalties and weight matrix choice , location within the target DNA is referred to herein as the Nucleic Acids Research , 22 : 4673 -4680 ( 1994 ) ) or BLAST “ DNA - targeting RNA ” or “ DNA - targeting RNA polynucle - ( Altschul S F , Gish W , et al. , J. Mol. Biol ., October 5 ; otide ” (also referred to herein as a " guide RNA ” or 40 215 ( 3 ) :403 - 10 ( 1990 ) ) , a set of similarity search programs “ RNA ” ) . A subject DNA - targeting RNA comprises two designed to explore all of the available sequence databases segments , a “ DNA - targeting segment” and a “ protein - bind regardless of whether the query is protein or DNA . ing segment. ” By “ segment” it is meant a segment/ section / CLUSTAL W is available on the internet at ebi . ac. uk / region of a molecule, e . g . , a contiguous stretch of nucleo - clustalw / ; BLAST is available on the internet at ncbi. nlm . tides in an RNA . A segment can also mean a region / section 45 nih . gov /BLAST / . A residue within a first protein or nucleic of a complex such that a segment may comprise regions of acid sequence corresponds to a residue within a second more than one molecule . For example, in some cases the protein or nucleic acid sequence if the two residues occupy protein - binding segment (described below ) of a DNA - tar - the same position when the first and second sequences are geting RNA is one RNA molecule and the protein - binding aligned . segment therefore comprises a region of that RNA molecule . 50 The term “ atomic coordinates ” refers to the Cartesian In other cases , the protein -binding segment (described coordinates corresponding to an atom ' s spatial relationship below ) of a DNA - targeting RNA comprises two separate to other atoms in a molecule or molecular complex . Atomic molecules that are hybridized along a region of complemen - coordinates may be obtained using X - ray crystallography tarity . As an illustrative , non - limiting example , a protein techniques or nuclear magnetic resonance techniques , or binding segment of a DNA - targeting RNA that comprises 55 may be derived using molecular replacement analysis or two separate molecules can comprise ( i ) base pairs 40 - 75 of homology modeling . Reconstructions of atomic coordinates a first RNA molecule that is 100 base pairs in length ; and ( ii ) may be informed from practical data , for example from base pairs 10 -25 of a second RNA molecule that is 50 base electron microscopy. Various software programs allow for pairs in length . The definition of " segment, ” unless other the graphical representation of a set of structural coordinates wise specifically defined in a particular context , is not 60 to obtain a three dimensional representation of a molecule or limited to a specific number of total base pairs , is not limited molecular complex . The atomic coordinates of the present to any particular number of base pairs from a given RNA disclosure may be modified from the original set by math molecule , is not limited to a particular number of separate ematical manipulation , such as by inversion or integer molecules within a complex , and may include regions of additions or subtractions . As such , it is recognized that the RNA molecules that are of any total length and may or may 65 structural coordinates of the present disclosure are relative , not include regions with complementarity to other mol - and are in no way specifically limited by the actual x , y , z ecules . coordinates . US 9 ,963 , 689 B2 12 The term “ atomic structure ” refers to a three dimensional medium or optical disk may carry the programming, and can representation of the atoms in a molecule or molecular be read by a suitable reader communicating with each complex . An atomic structure may be derived from atomic processor at its corresponding station . coordinates as described above. An atomic structure may “ Computer readable medium " as used herein refers to any also be derived from computational manipulation of a 5 storage or transmission medium that participates in provid received or previously obtained set of atomic coordinates . ing instructions and /or data to a computer for execution Such computational manipulation may be performed to and / or processing. Examples of storage media include produce an alternative or new atomic structure of a previ ously derived atomic structure based on new information . floppy disks, magnetic tape , USB , CD - ROM , a hard disk An alternative or new atomic structure of an initially mod - 10 drive , a ROM or integrated circuit, a magneto -optical disk , eled molecule may represent an alternate conformation of or a computer readable card such as a PCMCIA card and the the modeled molecule or the conformation of a second like, whether or not such devices are internal or external to molecule that is closely related to the initially modeled the computer . A file containing information may be " stored ” molecule . The new information used to inform manipulation on computer readable medium , where “ storing ” means may be obtained from practical data , for example electron 15 recording information such that it is accessible and retriev density maps derived from electron microscopy. New infor able at a later date by a computer . A file may be stored in mation that may inform computational manipulation may permanent memory . also be obtained from computational data , for example the With respect to computer readable media , " permanent results of computationally docking two atomic structures or memory ” refers to memory that is permanently stored on a molecular models . A computationally manipulated atomic 20 data storage medium . Permanent memory is not erased by structure may be utilized to produce a new set of atomic termination of the electrical supply to a computer or pro coordinates representing the newly derived three dimen cessor. Computer hard - drive ROM ( i. e . ROM not used as sional molecule or molecular complex . virtual memory ) , CD -ROM , floppy disk and DVD are all “ Root mean square deviation ” is the square root of the examples of permanent memory . Random Access Memory arithmetic mean of the squares of the deviations from the 25 (RAM ) is an example of non -permanent memory . A file in mean , and is a way of expressing deviation or variation from permanent memory may be editable and re -writable . the structural coordinates described herein . The present To “ record " data , programming or other information on a disclosure includes all embodiments comprising conserva computer readable medium refers to a process for storing tive substitutions of the noted amino acid residues resulting information , using any convenient method . Any convenient in same structural coordinates within the stated root mean 30 data storage structure may be chosen , based on the means square deviation . It will be apparent to the skilled practitio - used to access the stored information . A variety of data ner that the numbering of the amino acid residues of processor programs and formats can be used for storage , Streptococcus pyogenes Cas9 endonuclease ( SpyCas9 ) or e . g ., word processing text file , database format, etc . Actinomyces naeslundii Cas9 endonuclease (AnaCas9 ) may “ memory ” or “memory unit ” refers to any device which be different than that set forth herein , and may contain 35 can store information for subsequent retrieval by a proces certain conservative amino acid substitutions that yield the sor, and may include magnetic or optical devices ( such as a same three dimensional structures as those defined by Table hard disk , floppy disk , CD , or DVD ) , or solid state memory 1 . Corresponding amino acids and conservative substitutions devices ( such as volatile or non - volatile RAM ) . A memory in other isoforms or analogues are easily identified by visual or memory unit may have more than one physical memory inspection of the relevant amino acid sequences or by using 40 device of the same or different types ( for example , a memory commercially available homology software programs ( e . g . , may have multiple memory devices such as multiple hard MODELLER , Accelrys, San Diego , Calif ., Sali and drives or multiple solid state memory devices or some Blundell (1993 ) J Mol Biol 234 : 779 - 815 ; Sanchez and Sali combination of hard drives and solid state memory devices) . ( 1997 ) Curr Opin Struct Biol 7 : 206 -214 ; and Sanchez and A system can include hardware components which take Sali ( 1998 ) Proc Natl Acad Sci USA 95 : 13597 - 13602 ) . 45 the form of one or more platforms, e . g ., in the form of The terms “ system ” and “ computer -based system ” refer to servers , such that any functional elements of the system , i . e . , the hardware means, software means, and data storage those elements of the system that carry out specific tasks means used to analyze the information of the present dis (such as managing input and output of information , process closure . The minimum hardware of the computer- based ing information , etc .) of the system may be carried out by the systems of the present disclosure comprises a central pro - 50 execution of software applications on and across the one or cessing unit (CPU ) , input means , output means, and data more computer platforms represented of the system . The one storage means. As such , any convenient computer -based or more platforms present in the subject systemsmay be any system may be employed in the present disclosure . The data convenient type of computer platform , e . g . , such as a server, storage means may comprise any manufacture comprising a main - frame computer , a work station , etc . Where more than recording of the present information as described above , or 55 one platform is present, the platforms may be connected via a memory access means that can access such a manufacture . any convenient type of connection , e . g ., cabling or other A " processor” references any hardware and / or software communication system including wireless systems, either combination which will perform the functions required of it . networked or otherwise . Where more than one platform is For example, any processor herein may be a programmable present, the platforms may be co - located or they may be digital microprocessor such as available in the form of an 60 physically separated . Various operating systems may be electronic controller , mainframe, server or personal com - employed on any of the computer platforms, where repre puter (desktop or portable ). Where the processor is program - sentative operating systems include Windows, MacOS , Sun mable , suitable programming can be communicated from a Solaris , , OS /400 , Compaq Tru64 , SGI IRIX , remote location to the processor, or previously saved in a Siemens Reliant Unix , and others . The functional elements computer program product (such as a portable or fixed 65 of system may also be implemented in accordance with a computer readable storage medium , whether magnetic , opti - variety of software facilitators , platforms, or other conve cal or solid state device based ). For example , a magnetic n ient method . US 9 ,963 , 689 B2 13 14 Items of data are " linked ” to one another in a memory reference to disclose and describe the methods and / or mate when the same data input ( for example , filename or directory rials in connection with which the publications are cited . name or search term ) retrieves the linked items ( in a same It must be noted that as used herein and in the appended file or not) or an input of one or more of the linked items claims, the singular forms “ a ,” “ an ,” and “ the ” include plural retrieves one or more of the others . 5 referents unless the context clearly dictates otherwise . Thus , Subject computer readable media may be at a " remote for example , reference to " a Cas9 polypeptide ” includes a location ” , where “ remote location ,” means a location other plurality of such polypeptides and reference to “ the atomic than the location at which the x - ray crystallographic or other structure ” includes reference to one or more atomic struc analysis is carried out. For example, a remote location could tures and equivalents thereof known to those skilled in the be another location ( e . g . , office, lab , etc . ) in the same city , 10 art , and so forth . It is further noted that the claims may be another location in a different city, another location in a drafted to exclude any optional element. As such , this different state , another location in a different country , etc . As statement is intended to serve as antecedent basis for use of such , when one item is indicated as being “ remote ” from such exclusive terminology as “ solely ,” “ only ” and the like another, what is meant is that the two items may be in the in connection with the recitation of claim elements , or use of same room but separated , or at least in different rooms or 15 a “ negative ” limitation . different buildings , and may be at least one mile , ten miles, It is appreciated that certain features of the invention , or at least one hundred miles apart . which are , for clarity , described in the context of separate " Communicating ” information references transmitting embodiments , may also be provided in combination in a the data representing that information as , e . g. , electrical or single embodiment. Conversely, various features of the optical signals over a suitable communication channel ( e . g ., 20 invention , which are , for brevity , described in the context of a private or public network ) . “ Forwarding ” an item refers to a single embodiment, may also be provided separately or in any means of getting that item from one location to the next, any suitable sub - combination . All combinations of the whether by physically transporting that item or otherwise embodiments pertaining to the invention are specifically (where that is possible , and includes, at least in the case of embraced by the present invention and are disclosed herein data , physically transporting a medium carrying the data or 25 just as if each and every combination was individually and communicating the data . Examples of communicating media explicitly disclosed . In addition , all sub -combinations of the include radio or infra - red transmission channels as well as a various embodiments and elements thereof are also specifi network connection to another computer or networked cally embraced by the present invention and are disclosed device , and the Internet or Intranets including email trans - herein just as if each and every such sub -combination was missions and information recorded on websites and the like . 30 individually and explicitly disclosed herein . Before the present invention is further described , it is to All publications and patents cited in this specification are be understood that this invention is not limited to particular herein incorporated by reference as if each individual pub embodiments described , as such may, of course, vary. It is lication or patent were specifically and individually indi also to be understood that the terminology used herein is for cated to be incorporated by reference and are incorporated the purpose of describing particular embodiments only, and 35 herein by reference to disclose and describe the methods is not intended to be limiting, since the scope of the present and /or materials in connection with which the publications invention will be limited only by the appended claims. are cited . The citation of any publication is for its disclosure Where a range of values is provided , it is understood that prior to the filing date and should not be construed as an each intervening value , to the tenth of the unit of the lower admission that the present invention is not entitled to ante limit unless the context clearly dictates otherwise , between 40 date such publication by virtue of prior invention . Further, the upper and lower limit of that range and any other stated the dates of publication provided may be different from the or intervening value in that stated range , is encompassed actual publication dates which may need to be independently within the invention . The upper and lower limits of these confirmed . smaller ranges may independently be included in the smaller ranges , and are also encompassed within the invention , 45 DETAILED DESCRIPTION subject to any specifically excluded limit in the stated range . Where the stated range includes one or both of the limits , The present disclosure provides atomic structures of Cas9 ranges excluding either or both of those included limits are with and without polynucleotides bound thereto . Also pro also included in the invention . vided is a computer readable medium comprising atomic Certain ranges are presented herein with numerical values 50 coordinates for Cas9 polypeptides in both an unbound being preceded by the term “ about. ” The term “ about” is configuration and a configuration wherein the Cas9 poly used herein to provide literal support for the exact number peptide is bound to one or more polynucleotides . The that it precedes , as well as a number that is near to or present disclosure provides crystals comprising Cas9 poly approximately the number that the term precedes. In deter peptides ; and compositions comprising the crystals . The mining whether a number is near to or approximately a 55 present disclosure provides methods for the engineering of specifically recited number, the near or approximating unre - Cas9 polypeptides wherein Cas9 activity has been altered , cited number may be a number which , in the context in ablated , or preserved and amended with additional activities . which it is presented , provides the substantial equivalent of The present disclosure provides a computer readable the specifically recited number. medium comprising: atomic coordinates for a Cas9 poly Unless defined otherwise , all technical and scientific 60 peptide, wherein said Cas9 polypeptide comprises an amino terms used herein have the same meaning as commonly acid sequence having at least 75 % , at least 80 % , at least understood by one of ordinary skill in the art to which this 85 % , at least 90 % , at least 95 % , at least 98 % , at least 99 % , invention belongs . Although any methods and materials or 100 % , amino acid sequence identity to an amino acid similar or equivalent to those described herein can also be sequence depicted in FIG . 5 or FIG . 11 . In some cases , the used in the practice or testing of the present invention , the 65 computer readable medium further comprises programming preferred methods and materials are now described . All for displaying a molecular model of said Cas9 polypeptide . publications mentioned herein are incorporated herein by In some cases, the Cas9 polypeptide comprises an amino US 9 , 963 ,689 B2 15 16 acid sequence having at least 75 % , at least 80 % , at least tide . In some cases, the Cas9 polypeptide comprises an 85 % , at least 90 % , at least 95 % , at least 98 % , at least 99 % , amino acid sequence having at least 75 % , at least 80 % , at or 100 % , amino acid sequence identity to an amino acid least 85 % , at least 90 % , at least 95 % , at least 98 % , at least sequence of a Streptococcus pyogenes Cas9 polypeptide or 99 % , or 100 % , amino acid sequence identity to an amino an Actinomyces naeslundii Cas9 polypeptide . In some cases , 5 acid sequence depicted in FIG . 5 or FIG . 11 . the atomic coordinates for said Cas9 polypeptide further The present disclosure provides a method comprising : a ) comprise a polynucleotide bound to a nucleic acid binding forwarding to a remote location a set of atomic coordinates site in said Cas9 polypeptide . In some cases , the computer for a Cas9 polypeptide, wherein said Cas9 polypeptide readable medium further comprises programming for iden - comprises an amino acid sequence having at least 75 % , at tifying amino acid residues of said Cas9 polypeptide that 10 least 80 % , at least 85 % , at least 90 % , at least 95 % , at least bind the polynucleotide . In some cases, the computer -read - 98 % , at least 99 % , or 100 % , amino acid sequence identity able medium further comprises programming for identifying to an amino acid sequence depicted in FIG . 5 or FIG . 11 ; and amino acid substitutions of said Cas9 polypeptide that alter b ) receiving the identity of a site within Cas9 for the the binding of the Cas9 polypeptide to the polynucleotide. insertion or substitution of heterologous sequence , wherein The present disclosure provides a computer comprising a 15 the insertion of heterologous sequence results in the pres computer -readable medium of the present disclosure , as ervation of Cas9 activities and the addition of chimeric described herein . activities to Cas9 . The present disclosure provides a crystal comprising a Atomic Structures Cas9 polypeptide in crystalline form , wherein the crystal is The present disclosure provides the atomic structures of characterized with space group P2 , 2 , 2 , and has unit cell 20 the SpyCas9 and AnaCas9 endonucleases . The present dis parameters of a = 160 Å , b = 209 Å , c = 91 Å , a = B = y = 90° . The closure provides the atomic structures of a complex com present disclosure provides a composition comprising the prising: i ) a Cas9 polypeptide ; and ii) a polynucleotide ( a crystal. In some cases, the Cas9 polypeptide shares at least “ target ” ) bound to the Cas9 polypeptide . 75 % , at least 80 % , at least 85 % , at least 90 % , at least 95 % , The terms “ Cas9” and “ Cas9 polypeptide ” are used inter at least 98 % , at least 99 % , or 100 % , amino acid sequence 25 changeably herein to refer to an enzyme that exhibits at least identity with SEQ ID NO : 1 . endonuclease activity ( e . g . cleaving the phosphodiester The present disclosure provides a crystal comprising a bond within a polynucleotide ) guided by a CRISPR RNA Cas9 polypeptide in crystalline form , wherein the crystal is (crRNA ) bearing complementary sequence to a target poly characterized with space group P1 2 , 1 , and has unit cell nucleotide . Cas9 polypeptides are known in the art , and parameters of a = 75 Å , b = 133 Å , c = 80 Å , a = y = 90° and 30 include Cas9 polypeptides from any of a variety of biologi B = 95º . The present disclosure provides a composition com cal sources , including , e. g . , prokaryotic sources such as prising the crystal. In some cases, the Cas9 polypeptide bacteria and archaea . Bacterial Cas9 includes , Actinobacte shares at least 75 % , at least 80 % , at least 85 % , at least 90 % , ria (e . g ., Actinomyces naeslundii ) Cas9, Aquificae Cas9 , at least 95 % , at least 98 % , at least 99 % , or 100 % , amino acid Bacteroidetes Cas 9 , Chlamydiae Cas9 , Chloroflexi Cas9 , sequence identity with SEO ID NO : 7 . 35 Cyanobacteria Cas9 , Elusimicrobia Cas9 , Fibrobacteres The present disclosure provides a method comprising: a ) Cas9 , Firmicutes Cas9 ( e . g . , Streptococcus pyogenes Cas9 , receiving a set of atomic coordinates for a Cas9 polypeptide , Streptococcus thermophilus Cas9, Listeria innocua Cas9, wherein said Cas9 polypeptide comprises an amino acid Streptococcus agalactiae Cas9 , Streptococcus mutans Cas9 , sequence having at least 75 % , at least 80 % , at least 85 % , at and Enterococcus faecium Cas9 ), Fusobacteria Cas9 , Pro least 90 % , at least 95 % , at least 98 % , at least 99 % , or 100 % , 40 teobacteria ( e . g . , Neisseria meningitides, Campylobacter amino acid sequence identity to an amino acid sequence jejuni) Cas9 , Spirochaetes (e . g ., Treponema denticola ) Cas9 , depicted in FIG . 5 or FIG . 11 ; and b ) identifying a site within and the like . Archaea Cas 9 includes Euryarchaeota Cas9 said Cas9 polypeptide for the insertion of a heterologous ( e .g ., Methanococcus maripaludis Cas9 ) and the like. A amino acid sequence using said coordinates . In some cases, variety of Cas9 polypeptides are known , and are reviewed the insertion of the heterologous amino acid sequence results 45 in , e . g . , Makarova et al . (2011 ) Nature Reviews Microbiol in the preservation of at least one biological activity of said ogy 9 :467 - 477 , Makarova et al. ( 2011 ) Biology Direct 6 : 38 , Cas9 polypeptide . In some cases, the insertion of the heter - Haft et al. ( 2005 ) PLOS Computational Biology 1 :e60 and ologous amino acid sequence results in the addition of at Chylinski et al . (2013 ) RNA Biology 10 :726 -737 . The term least one non - native activity to said Cas9 polypeptide . “ Cas9 ” includes a Cas9 polypeptide of any Cas9 family , The present disclosure provides a method of engineering 50 including any isoform of Cas9 . a chimeric Cas9, the method comprising : a ) receiving a set Amino acid sequences of various Cas9 homologs are of atomic coordinates for a Cas9 polypeptide , wherein said known in the art and are publicly available . See, e . g . , Cas9 polypeptide comprises an amino acid sequence having GenBank Accession No . AGM26527 . 1, GenBank Accession at least 75 % , at least 80 % , at least 85 % , at least 90 % , at least No . AGZ01981. 1 , GenBank Accession No. ERJ56406 . 1 , 95 % , at least 98 % , at least 99 % , or 100 % , amino acid 55 GenBank Accession No. ERM89468 . 1 , GenBank Accession sequence identity to an amino acid sequence depicted in No . G3ECR1. 2 , GenBank Accession No . Q03J16 . 1 , Gen FIG . 5 or FIG . 11 ; and b ) identifying a site within said Cas9 Bank Accession No . 0927P4 . 1 , GenBank Accession No . polypeptide for replacement of a Cas9 domain of a first Cas9 WP _ 002664048 . 1 , GenBank Accession No . species with a Cas9 domain of a second species . In some WP _ 002665199 . 1 , GenBank Accession No. cases, the replacement of a Cas9 domain of a first Cas9 60 WP _ 002678519 . 1 , GenBank Accession No. species with a Cas9 domain of a second species results in WP _ 002837826 . 1 , GenBank Accession No. altered activity of said first Cas9 . WP _ 002841804 . 1, GenBank Accession No. The present disclosure provides a method comprising : a ) WP _ 003004889 . 1 , GenBank Accession No. receiving a set of atomic coordinates for a Cas9 polypeptide ; WP _ 003710997 . 1 , GenBank Accession No. and b ) identifying a site within said Cas9 polypeptide for the 65 WP _ 004369789 . 1 , GenBank Accession No. substitution , insertion , or deletion of one or more amino acid WP _ 004918207 . 1, GenBank Accession No . residues resulting in altered activity of said Cas9 polypep - WP _ 005399084 . 1 , GenBankGenBank Accession No. US 9 , 963 ,689 B2 17 18 WP _ 005728738 . 1 , GenBank Accession No. 85 % , at least about 90 % , at least about 95 % , at least about WP _ 005728739 . 1, GenBank Accession No. 98 % , at least about 99 % , or 100 % , amino acid sequence WP_ 005729619 . 1 , GenBank Accession No . identity to a Cas9 polypeptide represented in FIG . 5 or FIG . WP_ 005760293 . 1 , GenBank Accession No. 11 ( SEQ ID NOs : 1 - 14 ) , and having at least about 75 % , at WP _ 005791619 . 1 , GenBank Accession No . 5 least about 80 % , at least about 85 % , at least about 90 % , at WP _ 005855543 . 1 , GenBank Accession No . least about 95 % , at least about 98 % , at least about 99 % , or WP _ 007093045 . 1 , GenBank Accession No . 100 % of the enzymatic activity of a polypeptide represented WP 007210085 . 1 , GenBank Accession No . in FIG . 5 or FIG . 11 ( SEQ ID NOs: 1 - 14 ) . WP _ 007407075. 1 , GenBank Accession No . A Cas9 polypeptide can exhibit one or more of the WP _ 007711412 . 1 , GenBank Accession NoNo . 10 following native enzymatic activities: hydrolase activity , WP _ 008146746 . 1 , GenBank Accession No. nuclease activity ; standard DNA endonuclease and / or exo WP 008582100 . 1 , GenBank Accession No . nuclease activity ( e . g . cleavage of linear double stranded WP_ 008610988 . 1 , GenBank Accession No. DNA (dsDNA ) and circular dsDNA ) ; non - standard DNA WP _ 008770229 . 1 , GenBank Accession No. endonuclease and / or exonuclease activity ( e . g . cleavage of WP_ 008780913 . 1, GenBank Accession No. 15 single stranded DNA ( ssDNA ), branched DNA ( e . g . Holli WP 008822925 . 1 , GenBank Accession No . day junctions, replication forks , 5 - flaps , and the like ) , WP 008991033 . 1 , GenBank Accession No . Triple - stranded DNA , G - quadruplex DNA , synthetic DNA , WP_ 008997907 . 1 , GenBank Accession No. artificial DNA , and double stranded hybrids of DNA and WP _ 009035786 . 1, GenBank Accession No . RNA ) , and ribonuclease activity . WP 009217841. 1 , GenBank Accession No. 20 A Cas9 polypeptide can exhibit one or more of the WP _ 009293010 . 1, GenBank Accession No. following binding activities : binding of a dsDNA target WP _ 009392516 . 1 , GenBank Accession No. sequence ; binding of a ssDNA target sequence ; binding of a WP _ 009417297 . 1 , GenBank Accession No. branched DNA target sequence ; binding of a CRISPR RNA WP _ 009434997 . 1 , GenBank Accession No . ( crRNA ) ( e . g . a pre -crRNA , a trans - encoded small RNA WP 010254321 . 1 , GenBank Accession No . 25 ( tracrRNA ) ; binding of non - crRNA target; binding of a WP _ 011963637 . 1 , GenBank Accession No . polypeptide ; binding of a small molecule , and the like. WP_ 012290141. 1 , GenBank Accession No . The atomic structures described herein are useful as WP _ 013073784 . 1, GenBank Accession No . models for rationally designing Cas9 derivatives, either de WP _ 013997568 . 1 , GenBank Accession No . novo or by modification of known Cas9 polypeptides. One WP _ 014411267 . 1 , GenBank AccessionAccession No . 30 or moren amino acid residues or sites within the polypeptide , WP 014708934 . 1 , GenBank Accession No . represented by atomic coordinates ,may be identified that are WP _ 014773653 . 1 , GenBank Accession useful for mutation by replacement , insertion , or deletion of WP _ 014938037 . 1 , GenBank Accession No . one or more amino acid resides . Such mutations may have WP _ 015781852. 1 , GenBank Accession No . utility for altering , enhancing , or ablating Cas9 activities. WP 016341167 . 1 , GenBank Accession No. 35 Replacement of amino acid residues may be performed by WP _ 018280040 . 1, GenBank Accession No. conservative or non -conservative amino acid substitution . WP _ 018626154 . 1, GenBank Accession Amino acid residues may be signally replaced within a WP_ 022599516 . 1 , GenBankA ccession No . protein , domain , or motif , or multiple individual replace WP _ 022832948 . 1 , and GenBank Accession No. ments may be performed within a protein , domain , or motif . YP 008027038 . 1 . 40 In some embodiments single or multiple sequences or The term " SpyCas9 " as used herein encompasses wild stretches of sequential amino acids may be replaced . In some type Streptococcus pyogenes Cas9 , e . g . a polypeptide com - cases , naturally occurring amino acids may be replaced with prising an amino acid sequence having at least about 75 % , naturally occurring amino acids . Such replacement of natu at least about 80 % , at least about 85 % , at least about 90 % , rally occurring amino acids may be performed to alter , either at least about 95 % , at least about 98 % , at least about 99 % , 45 increasing or decreasing, the charge or hydrophobicity of a or 100 % , amino acid sequence identity to amino acids three dimensional site , region , or domain , of a Cas9 poly 1 - 1368 of the amino acid sequence set forth in SEQ ID peptide , thus altering local or global Cas9 polypeptide NO : 1 , and having at least about 75 % , at least about 80 % , at chemical properties or activity . least about 85 % , at least about 90 % , at least about 95 % , at In other cases, naturally occurring amino acids may be least about 98 % , at least about 99 % , or 100 % of the 50 replaced with synthetic amino acids or modified amino enzymatic activity of a polypeptide comprising amino acids acids. Synthetic or unnatural amino acids may contain 1 - 1368 of SEQ ID NO : 1 . functional side chains Non - limiting examples of functional The term “ AnaCas9 " as used herein encompasses wild side chains include: fluorophores , post - translational modi type Actinomyces naeslundii Cas9, e. g . a polypeptide com fications, metal ion chelators , photocaged and photocross prising an amino acid sequence having at least about 75 % , 55 linking moieties , reactive functional groups , and NMR at least about 80 % , at least about 85 % , at least about 90 % , ( nuclear magnetic resonance ), IR ( infrared ) and X - ray crys at least about 95 % , at least about 98 % , at least about 99 % , tallographic probes , and the like . The replacement of natu or 100 % , amino acid sequence identity to amino acids rally occurring amino acids with synthetic amino acids may 1 - 1101 of the amino acid sequence set forth in SEQ ID have utility in rendering a Cas9 polypeptide traceable at the NO :7 , and having at least about 75 % , at least about 80 % , at 60 cellular, molecular , or atomic level. Synthetic amino acid least about 85 % , at least about 90 % , at least about 95 % , at substitution may also have utility for adding additional least about 98 % , at least about 99 % , or 100 % of the physical, chemical, or biological activities to a Cas9 poly enzymatic activity of a polypeptide comprising amino acids peptide. 1 - 1101 of SEQ ID NO :7 . Insertion or deletion of amino acid residues into or out of The term " Cas9 ” as used herein encompasses wild type 65 a Cas9 protein , domain , or motif may be performed singly , Cas9 , e . g . a polypeptide comprising an amino acid sequence such as one at a time, or multiply, such as more than one at having at least about 75 % , at least about 80 % , at least about a time. Naturally occurring or synthetic amino acids may be US 9 ,963 , 689 B2 19 20 inserted , as described above for amino acid residue replace solid phase / liquid phase syntheses; recombinant DNA meth ment or substitution . In some embodiments one or more ods, including cDNA cloning, optionally combined with site amino acid residues may be inserted or deleted at or from a directed mutagenesis ; and purification of the polypeptide single location . In other cases , one or more amino acid from a natural source . residues may be inserted or deleted at or from multiple 5 Computer Models , Computer- Readable Media , and Com locations. Insertion or deletion of amino acid residues may puter Systems have utility in altering the conformation or three dimen In certain embodiments , a computer readable medium sional structure of a Cas9 polypeptide . Such a change in may comprise programming for displaying a molecular structure may ablate or alter Cas9 activities, such as nucleic model of Cas9 , programming for identifying candidate sites acid binding , nuclease activity , DNA target site recognition 10 for mutagenesis or insertion of heterologous sequence as ( e .g . protospacer adjacent motif recognition ), and the like . described above , for example. In certain embodiments , the Furthermore , sites within the polypeptide , represented by atomic coordinates of the computer readable medium may atomic coordinates , may be identified wherein insertion of comprise the atomic coordinates provided in Table 1 . A heterologous amino acids or polypeptides does not alter or computer system comprising the computer- readable ablate Cas9 activities ; such sites are useful for the addition 15 medium is also provided . of heterologous activities to Cas9 . Non - limiting examples of As noted above , the atomic coordinates may be employed heterologous activities include : fluorescence , phosphory - in conjunction with a modeling program to provide a model lation , dephosphorylation , acetylation , deacetylation ,meth - of Cas9 . As used herein , the term " model” refers to a ylation , demethylation , ubiquitination , deubiquitination , representation in a tangible medium of the three dimensional glycosylation , deglycosylation , membrane transduction , and 20 structure of Cas9. For example , a model can be a represen the like . In some cases, sites within the polynucleotide may tation of the three dimensional structure in an electronic file , be identified wherein insertion of heterologous sequence on a display, e . g . , a computer screen , on a piece of paper does not alter Cas9 function . (( 1 i. e ., on a two dimensional medium ), and /or as a ball- and In other embodiments , domains or motifs within the stick figure . Physical three - dimensional models are tangible polynucleotide , represented by atomic coordinates , may be 25 and include , but are not limited to , stick models and space identified thatmay be substituted or exchanged with ortholo - filling models . The phrase " imaging the model on a com gous domains or motifs from a related CRISPR - Cas poly puter screen ” refers to the ability to express (or represent) peptide , e . g . , Cas9 polypeptides. Domains or motifs may be and manipulate the model on a computer screen using represented by linear sequence of 2 or more sequential appropriate computer hardware and software technology amino acids or may comprise discontinuous or non - sequen - 30 known to those skilled in the art. Such technology is tial amino acids related in three dimensional space . Such available from a variety of sources including, for example , substitutions may have utility for altering the overall per - Evans and Sutherland , Salt Lake City , Utah , and Biosym formance characteristics of a particular Cas9 , such as Technologies , San Diego , Calif. The phrase " providing a increasing or decreasing binding affinity , processivity , and picture of the model” refers to the ability to generate a “ hard the like. Such substitutions may also have utility for 35 copy” of the model. Hard copies include both motion and exchanging the activity of one species of Cas9 with another still pictures. Computer screen images and pictures of the by exchanging domains or motifs , e . g ., exchanging proto - model can be visualized in a number of formats including spacer adjacent motif (PAM ) recognition domains, PAM spanspace - filling representations, backbone traces, ribbon dia binding loops, catalytic domains , nuclease domains, DNA grams, and electron density maps. Exemplary modeling binding domains, crRNA binding domains , tracrRNA bind - 40 programs include , but are not limited to PYMOL , GRASP, ing domains, RuvC - I domains, RuvC - II domains, RuvC - III or O software , for example . domains, Arginine rich domains, alpha -helical lobes, beta - Atomic coordinates may derived by a variety of means . hairpin domains, HNH domains , Topo ( Topoisomerase ) Generally , atomic coordinates can be derived from electron domains, C - terminal domains, N - terminal domains, and the density maps derived by experimental means. Such electron like . Such exchange of domains between different Cas9 45 density maps may be produced through X - ray crystallo polypeptides may be performed by the engineering of chi- graphic methods or electron microscopic methods. Atomic meric Cas9 proteins . structures may be generated from electron density maps. In other embodiments, sites within the polypeptide may Two or more atomic structures of related molecules, e . g . , be identified wherein deletion of amino acids does not alter molecules related between different species or the same or ablate Cas9 activities locally or globally ; such sites are 50 molecule in different conformations, may be computation useful for decreasing the size of a Cas9 polypeptide while ally compared and the atomic coordinates of one may be retaining Cas9 activities. Such atomic locations may be used to inform the other. In one embodiment of the present determined according to any method known in the art, disclosure the atomic coordinates of a Cas9 polypeptide including the methods described herein . derived from a first species are used to determine the atomic Crystals and Crystal Compositions 55 coordinates of a Cas9 polypeptide of a second species . In yet The present disclosure provides crystals that include wild another embodiment , atomic coordinates from a Cas9 poly type and mutant Cas9 polypeptides. In some embodiments , peptide in an unbound conformation are used to determine the crystal has a unit cell dimension of a = 160 Å , b = 209 Å , the atomic coordinates of a Cas9 polypeptide in a bound c = 90 Å , and a = B = y = 90°, and belongs to space group conformation . Computational comparison of atomic struc P2 , 2 , 2 . In some embodiments , the crystal has a unit cell 60 tures and / or electron density maps can be performed as dimension of a = 74 Å , b = 133 Å , c = 80 Å , and a = y = 90° and described herein . B = 95°, and belongs to space group P1 2 . 1 . The present One embodiment of the present disclosure relates to a disclosure also provides a composition comprising a subject computer readable medium with Cas9 structural data and /or crystal. information stored thereon . As used herein , the phrase The Cas9 polypeptide can be produced using any of a 65 “ computer readable medium ” refers to storage media read variety of well known methods , including , e . g ., synthetic able by a computer , which media may be used to store and methods, such as solid phase , liquid phase and combination retrieve data and software programs incorporating computer US 9 ,963 , 689 B2 22 code . Exemplary computer readable media include floppy NO :7 ) , the Arginine- rich region ( residues 64Ana- 804na ), the disk , CD - ROM , tape, memory (such as flash memory or non - conserved zinc site ( residues Cys566Ana, Cys5694na , system memory ) , hard drive , and the like . Cys602Ana and Cys6054na ), the HNH active site ( residues In another embodiment, the disclosure provides a com - Asp581Ana and Asn6064na ) , the RuvC domain (residues puter system having a memory comprising the above -de - 5 Asp17Ana , Glu505Ana , His736Ana and Asp739Ana ) , the cata scribed atomic coordinates ; and a processor in communica - lytic residue His5824na , as well as those atoms that are close tion with the memory , wherein the processor generates a thereto , e . g . , within 5 Å , within 10 Å , within 20 Å or within molecular model having a three dimensional structure rep - 30 Å of those amino acids. resentative of Cas9 . The processor can be adapted for The present disclosure also provides a method of identi identifying candidate sites for mutagenesis or insertion of 10 fying candidate sites for the insertion of heterologous heterologous sequence for example . sequence within the amino acid sequence of a polynucle Methods of Designing Modified Cas9 Polypeptides or Cas9 otide of Cas9 such that the inserted heterologous sequence Derivatives does not affect a desired activity of Cas9 . The method The present disclosure provides methods for designing generally involves determining the atomic locations of modified Cas9 polypeptides or Cas9 derivatives, as well as 15 amino acid residues that are non -essential to the enzymatic methods for studying the Cas9 mechanism . A subject and / or binding activities of Cas9 , involving determining the method generally involves computationally identifying can three dimensional location of a residue or residues relative didate sites for mutation and / or insertion of heterologous to known Cas9 enzymatic or catalytic motifs and/ or the sequence within the atomic structure and / or amino acid binding groups of a bound polynucleotide . The method sequence of a polypeptide of Cas9 using atomic coordinates 20 further involves computationally inserting a heterologous for a Cas9 polypeptide or atomic structures of a Cas9 sequence near or next to a non -essential amino acid residue polypeptide with or without a bound polynucleotide. For or between non -essential residues and modeling and/ or example , in some embodiments , the atomic coordinates are testing the resulting altered Cas9 polypeptide . as disclosed in Table 1 . In certain cases , a subject method will further comprise a The present disclosure provides methods for identifying 25 test performed computationally or in silico with or without candidate sites for the mutation of Cas9 such that substitu - comparison to the atomic coordinates provided herein . In tion , insertion , or deletion of amino acids at the candidate some embodiments , computer models are analyzed to deter site will affect an activity of Cas9 . The method generally mine whether an altered Cas9 retains its three dimensional involves determining the atomic locations of amino acid structure or whether altered Cas9 performs a desired enzy residues that are essential to the enzymatic and / or binding 30 matic activity or binding function . In other embodiments , activities of Cas9 , involving determining the three dimen - the testing is performed by obtaining a physical polypeptide sional location of a residue or residues relative to known of the altered Cas9 ( e . g ., purchasing or synthesizing the Cas9 enzymatic or catalytic motifs and / or the binding polypeptide , or utilizing cloning and protein expression ) and groups of a bound polynucleotide. The method further performing an in vitro or in vivo chemical, biochemical, or involves computationally substituting, inserting or deleting 35 biological assay to determine if the altered Cas9 demon one or more essential amino acid residues and modeling strates altered activity ( e . g ., a loss of endonuclease activity , and /or testing the resulting altered Cas9 polypeptide. increased or decreased binding of a polynucleotide , or the In certain cases , a subjectmethod will involve identifying presence of an activity novel to Cas9 ( e . g . , ( de ) acetylation , amino acid residues critical to the activity or structure of (de )phosphorylation , (de )methylation , (de )ubiquitination , Cas9 and computationally engineering mutations at those 40 peptide binding , transduction ) ) . critical amino acid residues. In some embodiments the The present disclosure provides methods for identifying method will further comprise computationally determining candidate sites within a Cas9 polypeptide for the exchange an amino acid insertion , deletion , or substitution or multiple of orthologous protein domains or motifs between different amino acid insertions, deletions , or substitutions that will Cas9 polypeptides for the purpose of engineering a chimeric affect Cas9 activity using the atomic coordinates or atomic 45 Cas9 polypeptide that includes activities from 2 or more structures provided herein . In particular embodiments, a native Cas9 polypeptides . A subject method generally subject method involves engineering mutations at SpyCas9 involves computationally identifying a candidate site within critical sites including the Topo - homology domain ( residues a donor Cas9 polypeptide that includes a protein domain or 1136Spy - 1200Spy, wherein the superscript " Spy ” indicates motif that provides for a desirable activity , e . g ., binding of reference to the amino acid sequence of S . pyogenes previ- 50 a particular protospacer sequence or DNA- targeting ously disclosed , SEQ ID NO : 1 ) , the C - terminal domain sequence , using the atomic coordinates disclosed herein . ( residues 1201 Spy - 1363Spy ) , the Arginine - rich region ( resi The method further includes computationally identifying a dues 59Spy - 76Spy ) , the disordered linker (residues 714Spy . candidate site within the recipient Cas9 polypeptide for 717Spy ) , PAM loop 1 ( residues 448Spy- 501Spy ; including receiving the donor candidate site . The recipient candidate W476PY ) , PAM loop 2 (residues 1102 py - 1136py, 55 site may contain a domain or motif that is orthologous to the W1126Spy ), basic amino acid residues in the nucleic acid domain or motif within the candidate donor site. In other binding groove ( residues R69Spy , R70Spy , R71 Spy, R75. Spy cases , the recipient site may not contain domains or motifs K76Spy, His160Spy, Lys163Spy , Lys288Spy, Arg400Spy orthologous to the domains or motifs present in the donor Lys401 Spy , and Arg403. Spy ) , the auto - inhibitory beta hairpin site . An identified candidate site may consist of a continuous of the RuvC domain ( residues 1049 $py - 1059 $py ) , as well as 60 sequence of 2 or more sequential amino acid residues , those atoms that are close thereto , e . g . , within 5 Å , within 10 discontinuous sequences of 2 or more amino acids, or 2 or Å , within 20 Å or within 30 Å of those amino acids. In more amino acids that are dispersed in primary protein particular embodiments , a subject method involves engi sequence . The method further includes computationally neering mutations at AnaCas9 critical sites including the replacing the recipient candidate site with the donor candi beta - hairpin domain ( residues 8224na - 9244na, wherein the 65 date site thus producing a model of a chimeric Cas9 poly superscript “ Ana ” indicates reference to the amino acid peptide consisting of the recipient Cas9 polypeptide con sequence of A . naeslundii previously disclosed , SEQ ID taining the desirable domain or motif of the donor Cas9 US 9 ,963 , 689 B2 23 24 polypeptide . In some embodiments , the resulting model Software programs also may be used to aid one skilled in chimeric Cas9 may be used to computationally test activity the art in visualizing or designing Cas9 derivatives . These of the polypeptide and to further refine the donor and include, but are not limited to, Abalone (Agile Molecule ), recipient candidate sites. The method may further include ACEMD ( Acellera Ltd ), ADUN (adun .imim .es ) , AMBER testing performed by obtaining a physical polypeptide of the 5 ( ambermd .org ) , (Ascalaph Project) , chimeric Cas9 ( e . g ., purchasing or synthesizing the poly - Automated Topology Builder (Automated Topology peptide , or utilizing cloning and protein expression ) and Builder ) , ( Avogadro ) , Balloon (Åbo Akademi) , performing an in vitro or in vivo chemical, biochemical , or BOSS ( Yale University ) , CHARMM ( .org ), Chemi biological assay to determine if the chimeric Cas9 demon - torium (weltweitimnetz .de ), ChemSketch (Advanced Chem strates chimeric activity ( e . g . , binding of non - native DNA - 10 istry Development, Inc. ) , COSMOS (COSMOS Software ) , target sequence ) . Culgi ( Culgi BV ) , Deneb ( AtelGraphics inc . ), ( D . In certain cases, a subject method will further comprise E . Shaw Research Schrödinger) , ( Accel testing a mutated , altered , or chimeric Cas9 polypeptide to rys ) , fold . it ( fold .it ), FoldX (CRG ) , GOVASP (Windiks Con determine if it physically binds a polynucleotide as repre sulting ), GPIUTMD (GPIUTMD ) , GROMACS ( gromac sented by a computational model derived from the atomic 15 s .org ), GROMOS (GROMOS . net ), GULP coordinates or atomic structures provided herein . In some (projects . ivec. org ) , HOOMD - blue ( codeblue. umich .edu ) , embodiments , a subject method will further comprise ICM (Molsoft ) , LAMMPS ( Sandia ), Macro Model obtaining the polynucleotide ( e . g ., purchasing, synthesizing, ( Schrödinger ), MAPS ( Scienomics ), (AC isolating , or generating the polynucleotide through cloning , celrys ), MedeA (Materials Design ), MCCCS Towhee ( To PCR or in vitro transcription ) and testing the polynucleotide 20 whee Project) , MDynaMix (Stockholm University ) , MOE to determine if it binds a mutated , altered , or chimeric Cas9 . (Chemical Computing Group ) , MOIL (cbsu . tc .cornell .edu ) , In certain cases, binding of the polynucleotide to the Cas9 MOLDY (Moldy ) , ORAC ( chim .unifi . it ), NAB (Case may be tested using any method described herein or known group ), Packmol ( ime. unicamp . br ), Prime (Schrödinger ) , to the art ( e. g . , immunoprecipitation ( IP ) , chromatin immu - Protein Local Optimization Program ( PLOP wiki ) , p4vasp noprecipitation (ChIP ), DNA immunoprecipitation (DIP ), 25 (p4vasp . at) , PyMOL ( PyMol. org ), QMOL (DNASTAR , electrophoretic mobility shift assay (EMSA ) , exonuclease Inc .) , RasMol (RasMol ) , Raster3D (University of Washing footprinting assay, nuclease protection assay, polynucleotide ton ), RedMD (University of Warsaw , ICM ), StruMM3D labeling with negative - stain electron microscopy, and the (STR3DI32 ) ( Exorga , Inc . ) , Selvita Protein Modeling Plat like) . form (Selvita Ltd ), (SCIGRESS .com ) , SimBio A method that comprises receiving a set of atomic coor- 30 Sys ' MoDeST ( SimBioSys Inc . ), Spartan (Wavefunction , dinates for a Cas9 polypeptide; and identifying sites for Inc .) , SwissParam (SwissParam .) , TeraChem (PetaChem mutation , insertion of heterologous sequences, or exchange LLC ), (Washington University ) , Tremolo - X of orthologous domains using the coordinates is also pro - ( Tremolo - X ) , UCSF Chimera (University of California ) , vided , as is a method comprising : forwarding to a remote VEGAZZ (VEGAZZ Web site ), VLifeMDS (Vlife Sciences location a set of atomic coordinates for a Cas9 polypeptide ; 35 Technologies ) , VMD +NAMD (Beckman Institute ) , WHAT and receiving the identity of sites for mutation , insertion of IF ( swift. cmbisu .nl ) , xeo (xeo . sourceforge .net ) , YASARA heterologous sequences , or exchange of orthologous ( YASARA . org ) , and Zodiac ( zeden . org ) . These programs domains. may be implemented , for instance , using a computer work A subject method can provide for one or more of: 1 ) station , as are well known in the art , for example, a Win reducing a native activity of a Cas9 polypeptide ; 2 ) ablating 40 dows, Macintosh , LINUX , SGI or Sun workstation . Other a native activity of a Cas9 polypeptide; 3 ) increasing a native hardware systems and software packages will be known to activity of a Cas9 polypeptide; 4 ) altering a native DNA those skilled in the art . binding activity of a Cas9 polypeptide ; 5 ) altering a DNA - The structure data provided herein can be used in con targeting activity of a Cas9 polypeptide ; 6 ) altering a native junction with computer -modeling techniques to design Cas9 RNA binding activity of a Cas9 polypeptide ; 7 ) altering a 45 derivatives with or without altered Cas9 activity . The models native nuclease activity of a Cas9 polypeptide ; 8 ) altering a characterize the three - dimensional surface topography of a native endonuclease activity of a Cas9 polypeptide ; 9 ) Cas9 derivative , as well as factors including van der Waals conferring a non -native enzymatic activity to a Cas9 poly - contacts , electrostatic interactions , hydrogen -bonding peptide ; 10 ) improving or altering desirable activities of a opportunities , and electron density . Computer simulation Cas9 polypeptide by producing a chimeric Cas9 polypep - 50 techniques are then used to map intramolecular and inter tide ; 11 ) improving or altering a desirable chemical property molecular interaction positions for functional groups includ of a Cas9 polypeptide by producing a chimeric Cas9 poly - ing but not limited to protons, hydroxyl groups, amine peptide; 12 ) designing a compact Cas9 polypeptide that groups , divalent cations , aromatic and aliphatic functional retains desirable activities by producing a chimeric Cas9 groups , amide groups, alcohol groups, etc . that are modified polypeptide. 55 to produce the Cas9 derivative. In certain embodiments , a computer system comprising a The ability of an altered Cas9 to bind to a particular memory comprising the atomic coordinates of a Cas9 poly - polynucleotide can be analyzed prior to actual synthesis peptide with or without a bound ligand polynucleotide is using computer modeling techniques . Only those candidate provided . The atomic coordinates are useful as models for Cas9 derivatives that are indicated by computer modeling to rationally identifying derivatives of a Cas9 polypeptide . 60 bind the target polynucleotide ( e . g . , a particular DNA Such derivatives may be designed either de novo , or by sequence ) with sufficient binding energy may be synthesized modification of a disclosed Cas9 polypeptide , for example . and tested for their ability to bind the target polynucleotide In other cases, derivatives may be identified by testing using binding assays known to those of skill in the art and /or known polynucleotide sequences to determine if they “ bind ” described herein . The computational evaluation step thus with a molecular model of a Cas9 polypeptide. Such com - 65 avoids the unnecessary synthesis or cloning of Cas9 deriva putational binding methods are generally well known in the tives that are unlikely to bind a particular polynucleotide art. target with adequate affinity . US 9 ,963 , 689 B2 25 26 Specific computer software is available in the art to methyltransferase activity , demethylase activity , acetyltrans evaluate binding deformation energy and electrostatic inter ferase activity , deacetylase activity , kinase activity , phos action . Examples of programs designed for such uses phatase activity , ubiquitin ligase activity , deubiquitinating include: 94 , revision C (Frisch , Gaussian , Inc ., activity , adenylation activity , deadenylation activity , Pittsburgh , Pa . ( 1995 ) ; AMBER , version 7 . (Kollman , Uni - 5 SUMOvlating activity (small ubiquitin -related modifier ) , versity of California at San Francisco , (2002 ) ; QUANTA deSUMOylating activity , ribosylation activity , deribosy CHARMM (Accelrys , Inc . , San Diego , Calif ., ( 1995 ) ; lation activity , myristoylation activity or demyristoylation Insight II /Discover ( Accelrys , Inc ., San Diego , Calif ., activity . ( 1995 ) ; Delphi ( Accelrys , Inc ., San Diego , Calif. , ( 1995 ) ; and AMSOL (University of Minnesota ) ( Quantum Chemis - 10 Sites identified for the deletion of amino acid resides that try Program Exchange, Indiana University ). These programs do not alter Cas9 activity or sites identified for the engi may be implemented , for instance , using a computer work neering of chimeric Cas9 polypeptides using a method as station , as are well known in the art, for example , a Win described above are useful, for example in producing a dows, Macintosh , LINUX , SGI or Sun workstation . Other smaller or more compact Cas9 polypeptide. The above hardware systems and software packages will be known to 15 described methods have utility in research applications for those skilled in the art . determining how large and small Cas9 variants are related Once a candidate Cas9 derivative has been optimally and how related Cas9 variants carry out similar catalytic selected or designed , as described above , substitutions may functions, e . g. , computational comparison of AnaCas9 then be made in some of its amino acid residues , atoms, or domains with domains of Cas9 from a different species, e . g ., side groups to improve or modify its binding properties . 20 computational comparison of SpyCas9 domains with Generally , initial substitutions are conservative in that the domains of Cas9 from a different species, e . g ., computa replacement group will have either approximately same size , tional comparison between SpyCas9 and AnaCas9 . or overall structure , or hydrophobicity , or charge as the original group . Components known in the art to alter con EXAMPLES formation should be avoided in making substitutions . Sub - 25 stituted candidates may be analyzed for efficiency of binding The following examples are put forth so as to provide to a target polynucleotide using the samemethods described those of ordinary skill in the art with a complete disclosure above . and description of how to make and use the present inven Once a candidate Cas9 derivative has been identified tion , and are not intended to limit the scope of what the using any of the methods described above, it can be screened 30 inventors regard as their invention nor are they intended to for biological activity . Any one of a number of assays of for represent that the experiments below are all or the only Cas9 polynucleotide binding or nuclease activity disclosed experiments performed . Efforts have been made to ensure here or known to those of skill in the art may be used . accuracy with respect to numbers used ( e . g . amounts, tem Utility perature , etc . ) but some experimental errors and deviations A method for identifying sites for the mutation , insertion 35 should be accounted for. Unless indicated otherwise , parts of heterologous domains , or replacement with chimeric are parts by weight, molecular weight is weight average domains within a Cas9 polypeptide according to the present molecular weight, temperature is in degrees Celsius, and disclosure finds use in a variety of applications , which are pressure is at or near atmospheric . Standard abbreviations also provided . Applications include research applications may be used , e . g ., bp , base pair ( s ); kb , kilobase ( s ) ; pl, and industrial applications. 40 picoliter ( s ) ; s or sec , second ( s ) ; min , minute ( s ) ; h or hr, Research and industrial applications include , e . g . , identi - hour ( s ) ; aa , amino acid ( s ) ; kb , kilobase ( s ) ; bp , base pair ( s ) ; fying a site within Cas9 that renders a Cas9 domain inactive nt, nucleotide( s ); and the like . and thus produces a Cas9 derivative useful in research or industrial processes , e . g ., identifying a site for the insertion Example 1 of heterologous sequence to alter the enzymatic activity of 45 or add enzymatic activity to a Cas9 polypeptide and thus SpyCas9 producing a Cas9 derivative useful in research or industrial processes, and the like . SpyCas9 Expression and Purification In some cases, the Cas9 enzymatic activity or the enzy . Streptococcus pyogenes Cas9 ( SpyCas9 ) was cloned into matic activity of the heterologous sequence modifies the 50 a custom pET -based expression vector encoding an N - ter target DNA . In some cases , the Cas9 enzymatic activity or minal Hisg - tag followed by Maltose -Binding Protein (MBP ) the enzymatic activity of the heterologous sequence is and a tobacco etch virus ( TEV ) protease cleavage site nuclease activity , methyltransferase activity , demethylase (plasmid pMJ806 , SEQ ID NO :33 ) . Point mutations were activity , DNA repair activity , DNA damage activity , deami introduced into SpyCas9 using site -directed mutagenesis nation activity , dismutase activity , alkylation activity , 55 and verified by DNA sequencing. depurination activity , oxidation activity , pyrimidine dimer For crystallization , wild - type (WT ) (SEQ ID NO : 1 ) and forming activity , integrase activity , transposase activity , K848C mutant SpyCas9 (SEQ ID NO : 26 ) proteins were recombinase activity , polymerase activity , ligase activity , expressed and purified . The protein was purified by a helicase activity , photolyase activity or glycosylase activity . combination of Ni- NTA (nitrilotriacetic acid ) affinity , cation In some cases , the Cas9 enzymatic activity or the enzymatic 60 exchange (SP sepharose ) and gel filtration (Superdex 200 ) activity of the heterologous sequence is nuclease activity . In chromatography steps. The final gel filtration step was some cases , the nuclease activity introduces a double strand carried out in elution buffer containing 20 mM HEPES ( 4 break in the target DNA . In some cases, the Cas9 enzymatic ( 2 - hydroxyethyl) - 1 -piperazineethanesulfonic acid ) -KOH activity or the enzymatic activity of the heterologous pH 7 . 5 , 250 mM KCl and 1 mM TCEP ( tris ( 2 - carboxyethyl) sequence modifies a target polypeptide associated with the 65 phosphine ) . The protein was concentrated to 4 - 6 mg ml- 1 target DNA . In some cases , the Cas9 enzymatic activity or and flash frozen in liquid N , Selenomethionine ( SeMet ) the enzymatic activity of the heterologous sequence is substituted SpyCas9 was expressed and purified as for native US 9 ,963 , 689 B2 27 28 SpyCas9, except that all chromatographic solutions were Phenix .refine . Refinement and model statistics are provided supplemented with 5 mM TCEP . in Table 2 (provided in FIG . 31 ) . SpyCas9 Crystallization and Structure Determination The final atomic model has Rwork and R free values of 0 .245 SpyCas9 crystals were grown using the hanging drop and 0 . 290 , respectively , and good stereochemistry, as vapor diffusion method at 20° C . by mixing equal volumes 5 assessed with MolProbity , with 96 . 2 % of the residues in the ( 1 .5 ul+ 1 . 5 ul) of protein solution and crystallization buffer most favored regions of the Ramachandran plot and no outliers . The model contains two SpyCas9 molecules that ( 0 . 1 M Tris -Cl pH 8 . 5 , 0 . 2 - 0 . 3 M Li2SO4 and 14 - 15 % ( w / v ) superimpose with an overall rmsd of 1 .1 Å over 1060 Ca PEG 3350 ). Crystal nucleation and growth was gradually atoms, the major difference being a - 5° hinge - like rotation improved using iterative microseeding . For diffraction 10 of the HNH domain . In the atomic model , molecule A experiments , the crystals were cryoprotected in situ by contains residues 4 - 102 , 116 - 307, 314 -447 , 503 - 527 , 539 stepwise exchange into a solution containing 0 . 1 M Tris - C1 570 , 587 -672 , 677 -714 , 718 -765 , 774 -791 , 800 - 859 , 862 pH 8 .5 , 0 . 1 M Li2SO4, 35 % ( w / v ) PEG (polyethylene 902 , 908 - 1027 , 1036 - 1102 , 1137 - 1147 , 1159- 1186 , 1192 glycol) 3350 , and 10 % ethylene glycol in five steps executed 1242. and 1259 - 1363 of SEO ID NO : 1 . Molecule B contains at 5 min intervals . In each step , 0 . 5 ul of mother liquor was 15 residues 4 - 103. 116 - 309 310 - 447 502 - 527 539 -570 587 removed from the crystal drop and replaced with 0 . 5 ul 673 , 678 -713 , 718 -764 , 773 -791 , 800 - 859 , 862 - 902, 908 cryoprotectant. After the final cryoprotectant addition , the 1025 , 1036 - 1102 , 1137 - 1148 , 1160 - 1185 , 1188 - 1241, and crystals were incubated for an additional 5 min , transferred 1256 - 1363 of SEQ ID NO : 1 . The remaining residues do not to a drop containing 100 % cryoprotectant for 30 s , and then appear ordered in electron density maps and could not be flash cooled in liquid N2. Diffraction data were measured at 20 built . The description of the SpyCas9 structure herein is beamlines 8 . 2 . 1 and 8 . 2 . 2 of the Advanced Light Source based on molecule B , which is better ordered . (Lawrence Berkeley National Laboratory ) , and beamlines An additional dataset (at 3 . 1 Å resolution ) was measured PXI and PXIII of the Swiss Light Source (Paul Scherer using a SpyCas9 crystal soaked in 20 mM MnCl, during the Institute ) and processed using XDS . Data collection statis - cryoprotection procedure . Fo - Fc difference maps calculated tics are shown in Table 2 . The crystals belonged to space 25 using the high -resolution model revealed two Mn2 + ions group P2 , 2 , 2 and contained two molecules of SpyCas9 in bound in the RuvC domain active site (FIG . 2 ) and 4 the asymmetric unit related by pseudotranslational , non - additional Mn2 + ions bound to each of the two SpyCas9 crystallographic symmetry . High -resolution native data to molecules. The HNH domain active site remained poorly 2 .62 Å resolution were measured from an unusually large ordered in this structure , and no Mn2 + binding was observed . crystal cryoprotected in the presence of 1 mM MgCl2 . A 30 The model was refined to an Rwork and R free of 0 . 255 and complete native data set was obtained by collecting four 0 . 280 , respectively . datasets ( 40° rotation per dataset ) from different exposed Endonuclease Cleavage Assays with SpyCas9 parts of the crystal . A synthetic 42- nt crRNA targeting a protospacer from the Phasing was performed as follows. A 4 .2 Å resolution bacteriophage y genome was purchased from Integrated single - wavelength anomalous diffraction (SAD ) dataset was 35 DNA Technologies (IDT ) and purified via 10 % denaturing measured at the selenium peak wavelength using a SeMet- PAGE (polyacrylamide gel electrophoresis ) . tracrRNA was substituted SpyCas9 crystal. However, due to small crystal in vitro transcribed from a synthetic DNA template (IDT ) size and low resolution , the anomalous signal in this dataset using T7 RNA polymerase and corresponds to nucleotides was too weak to locate the selenium sites . Additional phases 15 - 87 as described previously . crRNA : tracrRNA duplexes were therefore obtained from SpyCas9 crystals soaked in 40 ( 10 uM ) were prepared by mixing equimolar amounts of sodium tungstate . The crystals were soaked by stepwise crRNA and tracrRNA in Hybridization Buffer ( 20 mM exchange of the lithium sulfate containing mother liquor Tris - Cl pH 7 . 5 , 100 mM KC1, 5 mM MgCl ,) , heating at 95° with 0 . 1 M Tris - C1 pH 8 . 5 , 0 . 1 M Na WO4, 15 % ( w / v ) PEG C . for 30 sec , and slow -cooling on the benchtop . SpyCas9 : 3350 , and then cryoprotected by stepwise exchange (as RNA complexes were reconstituted by mixing SpyCas9 described above ) of the soak solution with cryoprotectant 45 with a 2x molar excess of the crRNA : tracrRNA duplex in solution supplemented with 10 mM Na W04. Using these Reconstitution Buffer (20 mM Tris - C1 pH 7 . 5 , 100 mM KCI, crystals , a highly redundant SAD 3 . 9 # dataset was mea - 5 mM MgCl, , 1 mM DTT ) and incubating at 37° C . for 10 sured at the tungsten L - III absorption edge ( 1 .2149 Å ) , and minutes . 16 tungstate sites were located using SHELXD . Further A 55 base - pair (bp ) DNA target derived from the bacte phase information came from peak -wavelength SAD data - 50 riophage 2 genome was prepared by mixing equimolar sets obtained from a crystal of SpyCas9 K848C mutant amounts of individual synthetic oligonucleotides (IDT ) in soaked in 1 mM thimerosal for 6 hr prior to cryoprotection Hybridization Buffer supplemented with 5 % glycerol, heat ( thimerosal soak ), a WT SpyCas9 crystal soaked with 10 ing for 1 -2 minutes , and slow -cooling on the benchtop . mM CoCl2 during the cryoprotection procedure (Co soak ), Duplexes were separated from single -stranded DNA by 6 % and a WT SpyCas9 crystal grown in the presence of 1 mM 55 native PAGE conducted at 4° C ., with 5 mM MgCl , added Er( III ) - acetate . Refinement of the substructures and phase to the gel and the running buffer. The DNA was excised , calculations were performed using the MIRAS procedure in eluted into 10 mM Tris -C1 , pH 8 at 4° C . overnight, ethanol AutoSHARP by combining initial tungstate SAD phases precipitated , and resuspended in Hybridization Buffer . Br with the additional SAD data sets (SeMet , Co, Er and dU (bromodeoxyuridine ) containing ssDNAs used in ana thimerosal) and the high -resolution native data . Phases were 60 lytical crosslinking reactions were radiolabeled and hybrid improved by density modification and two - fold non - crys- ized with a 5x molar excess of the unlabeled complementary tallographic symmetry averaging using the Resolve module strand . Cleavage reactions were performed at room tempera of the Phenix suite . The resulting electron density maps were ture in Reaction Buffer ( 20 mM Tris -Cl pH 7 . 5 , 100 mm of excellent quality and allowed manual model building in KC1, 5 mM MgCl2 , 5 % glycerol, 1 mM DTT (dithiothre COOT. Selenium positions aided in assigning the sequence 65 itol) ) using 1 nM radiolabeled dsDNA substrates and 1 nM register. The atomic model of SpyCas9 was completed by or 10 nM Cas9 :RNA . Aliquots ( 10 ul) were removed at iterative model building in COOT and refinement using various time points and quenched by mixing with an equal US 9 , 963, 689 B2 29 30 volume of formamide gel loading buffer supplemented with nal domain in one lobe ( the nuclease lobe ) and a large 50 mM EDTA (ethylenediaminetetraacetic acid ). Cleavage alpha -helical domain in the other . The RuvC domain forms products were resolved by 10 % denaturing PAGE and the structural core of the nuclease lobe , a six -stranded beta visualized by phosphorimaging (GE Healthcare ). The sheet surrounded by four alpha helices, with all three con sequences of DNA and RNA oligonucleotides used in this 5 served motifs contributing catalytic residues to the active study are listed in Table 3 . site ( FIG . 5 ) . In the Cas9 primary sequence (SEQ ID NO : 1 ) , the HNH domain is inserted between the second and third TABLE 3 RuvC domain motifs . The HNH and RuvC domains are juxtaposed in the SpyCas9 structure , with their active sites List of nucleic acid reagents 10 located ~ 25 A apart . The HNH domain active site is poorly used in this study : ordered in apo -SpyCas9 crystals , showing that the active site SEQ undergoes conformational ordering upon nucleic acid bind # Description ID NO . : Sequence ( 5 ' - 3 ' ) ing . The C -terminal region of SpyCas9 contains a B - B - a - ß Greek key domain that is structurally similar to a domain 1 tracrRNA ( nts 15 - 87 ) 15 GGACAGCAUAGCAAGUUAAAAUAAGGCUAGUC 15 found in topoisomerase II (hereafter referred to as the CGUUAUCAACUUGAAA Topo -homology domain , residues 1136SPY - 1200Spy ) . A AAGUGGCACCGAGUCG mixed alßregion ( C - terminal domain , residues 1201 Spy . GUGCUUUUU 1363Spy ) forms a protrusion on the nuclease domain lobe. 2 Targeting crRNA 16 GUGAUAAGUGGAAUGC The structural halves of SpyCas9 are connected by two CAUGGUUUUAGAGCUA 20 linking segments , one formed by the Arginine - rich region UGCUGUUUUG ( residues 59Spy- 76Spy ) and the other by a disordered linker 3 55 - bp DNA substrate , 17 GAGTGGAAGGATGCCA comprising residues 714Spy -717Spy (FIG . 4A ). non - target stranda GTGATAAGTGGAATGC SpyCas9 Contains Two Putative Nucleic Acid Binding CATGTGGGCTGTCAAA Grooves ATTGAGC 25 The SpyCas9 structure contains two prominent clefts on 4 55 -bp DNA substrate , 18 GCTCAATTTTGACAGC one face of the molecule : a deep and narrow groove located target stranda cCACATGGCATT???? within the nuclease lobe and a wider groove within the TTATCACTGGCATcc? alpha -helical lobe (FIG . 6 ) . The nuclease lobe cleft is ?ccACTC approximately 40 À long , 15 - 20 A wide and 15 A deep , with 5 Br - dul containing 19 GAGTGGAAGGATGCCA 30 the RuvC active site located at its bottom . The C - terminal 55 nt DNA substrate , GTGATAAGTGGAATGC domain forms one side of the cleft, while the HNH domain non - target stranda CATG ( BrdU1 ) GGGCT and a protrusion of the alpha -helical lobe forms the other . GTCAAAATTGAGC The concave surface of the alpha - helical lobe creates a 6 Br - dU2 containing 55 nt 20 GCTCAATTTTGACAGC wider, shallower groove that extends over almost its entire DNA substrate , target CC ( BrdU2 ) CATGGCA 35 length (FIG . 6 ) . The groove is more than 25 Å across at its strandº TTCCACTTATCACTGG widest point, which is sufficient to accommodate an RNA CATCCTTCCACTC RNA or DNA -RNA duplex . Its surface is highly positively charged ( FIG . 6 ) , especially at the Arg - rich segment com 7 reverse complement 21 GAGTGGAAGGATGCCA for # 60 GTGATAAGTGGAATGC prising R69Spy , R70Spy , R71 Spy , R75Spy and K76Spy . Addi CATGAGGGCTGTCAAA 40 tional basic residues that are conserved in Type II - A proteins ATTGAGC project their side chains into the groove : His 160 - py , Lys163Spy, Lys288Spy, Arg400 %py , Lys401 Spy, and Arg403. Spy 8 Br - du3 containing 22 GAGTGGAAGGATGCCA 55 nt DNA substrate , GTGATAAGTGGAATGC (FIG . 7) .Multiple sulfate or tungstate ions are bound to the non - target strandº CATGTGG ( Brdu3 ) CT alpha -helical lobe in the SpyCas9 crystals ( FIG . 7 ), showing GTCAAAATTGAGC 45 a role for this lobe in nucleic acid recognition . Amino acid 9 reverse complement 23 GCTCAATTTTGACAGA residues located in both the nuclease and alpha -helical lobe for # 89 CCACATGGCATTCCAC claclefts are highly conserved within Type II - A Cas9 proteins TTATCACTGGCATcc? (FIG . 6 ) , while the opposite face of the SpyCas9 molecule TCCACTC lacks extensive surface conservation . The above observa 50 tions demonstrate that both clefts play important functional The protospacer is in italics and the PAM is underlined . roles . The RuvC domain mediates cleavage of the non - target S . pyogenes Cas9 Structure Reveals a Two - lobed Archi - DNA strand and the nuclease domain cleft is the binding site tecture with Adjacent Active Sites of the displaced non - target strand . Conversely , the alpha Streptococcus pyogenes Cas9 (SpyCas9 ; SEQ ID NO : 1 ) helical lobe , which contains the Arg - rich segment, is is a prototypical Type II - A Cas9 protein consisting of 55 involved in binding the crRNA: tracrRNA guide RNA and/ or well -conserved RuvC motifs and an HNH domain , as well the crRNA - target DNA heteroduplex . as flanking regions that lack apparent sequence similarity to Identification of Cas9 PAM Binding Site known protein structures (FIG . 3 ). SpyCas9 was the first SpyCas9 recognizes a 5 -NGG - 3' PAM sequence located biochemically characterized Cas9 and has been employed in three base pairs to the 3' side of the cleavage site on the the majority of current CRISPR -based genetic engineering 60 non - complementary DNA strand , whereas other Cas9 methodologies . To obtain structural insights into the multi - orthologs have different PAM requirements . To gain insight domain architecture of SpyCas9 , the 2 . 6 Å resolution crystal into PAM binding by SpyCas9 , the SpyCas9 RuvC nuclease structure of the enzyme as determined , see Table 2 . The domain structure was compared to that of the RuvC Holliday structure reveals that SpyCas9 is a crescent -shaped molecule junction resolvase - substrate complex (FIG . 8 ) . Superposi with approximate dimensions of - 100 ÅX - 100 ÅX - 50 Å 65 tion of these RuvC structures enabled us to model the ( FIG . 4 and FIG . 5 ) . The enzyme adopts a distinct bi -lobed trajectory of the non - target DNA strand in the SpyCas9 architecture comprising the nuclease domains and C - termi- holoenzyme (FIG . 8 , FIG . 2B - C ) . The DNA strand is located US 9 , 963 ,689 B2 31 32 along the length of the nuclease lobe cleft in an orientation - 80° C . Selenomethionine -labeled AnaCas9 protein was that would position the 3 ' end of the DNA , and hence the expressed in Rosetta (DE3 ) cells grown in M9 minimal PAM , at the junction of the two lobes, in the vicinity of the medium supplemented with 50 mg ml- ? L - SeMet ( L -sele Arg -rich segment and the Topo -homology domain (FIG . 8 ). nomethionine ) (Sigma ) and specific amino acids to inhibit To directly identify regions of Cas9 involved in PAM 5 endogenous methionine synthesis . The SeMet- substituted binding , the catalytically inactive SpyCas9 (D10A /H840A ; protein was then purified using the same procedure as for the SEQ . ID NO :27 ) was reconstituted with a crRNA :tracrRNA native AnaCas9 protein . guide RNA and bound to DNA targets carrying a photoac - AnaCas9 Crystallization and Structure Determination tivatable 5 -bromodeoxyuridine (Br - dU ) nucleotide adjacent Crystals of native and SeMet- substituted AnaCas9 were to either end of the GG PAM motif on the non -target strand 10 grown by the hanging drop vapor diffusion method at 20° C . ( FIG . 9 ) . Following UV irradiation and trypsin digestion , Aliquots ( 2 . 5 ul) of 4 . 5 mg ml- native AnaCas9 protein in covalent peptide - DNA crosslinks were detected (FIG . 9 and 50 mM HEPES 7 .5 , 300 mM KC1, 2 mM TCEP, 5 % glycerol FIG . 10 ) , whereas a DNA substrate containing Br- dU on the were mixed with 2 . 5 ul of reservoir solution containing 10 % target strand opposite the PAM failed to produce a crosslink ( w / v ) PEG 8000 , 0 .25 M calcium acetate , 50 mM magne ( FIG . 10 ) . After treatment with nuclease and phosphatase to 15 sium acetate and 5 mM spermidine . Crystals appeared after digest cross - linked DNA , nano -HPLC ( high - performance 1 - 2 days , and they grew to a maximum size of 0 . 15x0 .20x liquid chromatography ) MS/ MS ( tandem mass spectrom - 0 . 35 mm over the course of 6 days. SeMet- substituted etry ) was performed to identify tryptic peptides containing AnaCas9 crystals were grown and optimized under the same an extra mass resulting from covalent dU or p -dU adducts conditions. Crystals of AnaCas9 bound to manganese ( II ) ( FIG . 9 ) . The nucleotide immediately 5 ' to the GG motifwas 20 ions were prepared by soaking AnaCas9 native crystals in found to be cross- linked to residue W476Spy, whereas the mother liquor supplemented with 20 mM MnCl, for 2 hr. For residue immediately 3 ' to the motif was found to be cross - cryogenic data collection , crystals were transferred into linked to residue W1126Spy . Both tryptophans are located in crystallization solutions containing 30 % ( v /v ) glycerol as the disordered regions of the SpyCas9 structure that are ~ 30 Å cryoprotectant and then flash - cooled at 100 K . Native and apart. W4769py resides in a 53 -aa loop at the edge of the 25 SeMet single -wavelength anomalous diffraction (SAD ) alpha helical lobe underneath the Arg - rich region , whereas datasets were collected at beamline 8 . 3 . 1 of the Advanced W1126 -py is in a 33 - aa loop that connects the RuvC domain Light Source , Lawrence Berkeley National Laboratory . Data and the Topo -homology domain (FIG . 8 ) . These tryptophan from manganese - soaked AnaCas9 crystals were collected at residues are conserved among Type II- A Cas9 proteins that the 8 . 2 . 2 beamline of the Advanced Light Source , Lawrence utilize the same NGG PAM to cleave target DNA in vitro , 30 Berkeley National Laboratory . All diffraction data were but are absent from the Neisseria meningitidis and Strepto integrated using Mosflm and scaled in SCALA . coccus thermophilus Type II- C Cas9 proteins , which are The AnaCas9 structure was solved using the single known to recognize different PAMS (FIG . 5 and FIG . 11 ). anomalous dispersion phasing method . Using SeMet data To test the roles of both loops in DNA target recognition between 79 . 0 and 3 . 2 Å resolution , both SHELXDI and cleavage , triple alanine substitutions of residues 475Spy . 35 HKL2MAP and HySS in Phenix detected a total of 13 out 477Spy (P - W - N ) and 1125Spy - 1127Spy (D - W - D ) were made of 18 possible selenium sites in the asymmetric unit. Initial and cleavage assays with double - stranded DNA targets were phases were calculated using SOLVE followed by solvent performed (FIG . 12 ) . SpyCas9 mutated in residues 1125Spy . flattening with RESOLVE to produce an electron -density 1127 -Spy showed wild - type cleavage activity , whereas muta - map into which most of the protein residues could be tions in residues 475Spy - 477Spy caused a decrease of activity 40 unambiguously built . The initial model automatically gen compared to wild - type. Mutating both loops simultaneously erated from Phenix AutoBuild module was subjected to almost completely abolished SpyCas9 activity (FIG . 12 ) . subsequent iterative rounds of manual building with COOT These data demonstrate that at least one tryptophan is and refinement against the 2 . 2 A native data in Refmac and necessary to promote the DNA cleavage reaction . The Phenix . The final model contains one zinc ion , two magne spatial constraints of crosslink formation and the distance of 45 sium ions, AnaCas9 residues 8 -49 , 65 - 98 , 134 - 170 , and both tryptophan residues from either nuclease domain show 225 - 1101 of SEQ ID NO : 7 , and has Rort and Ry values that they are involved in PAM binding . of 0 . 19 and 0 . 23 , respectively . The N terminus ( residues 1 -7 ) , loop regions ( residues 50 -64 ) , and a portion of the Example 2 alpha - helical lobe ( residues 99 - 133 , 171 - 224 ) are com 50 pletely disordered . Model validation showed 94 % of the Ana Cas9 residues in themost favored and 5 . 8 % in the allowed regions of the Ramachandran plot. The structure of Mn2 + -bound AnaCas9 Expression and Purification AnaCas9 was obtained by molecular replacement using the Full - length Actinomyces naeslundii Cas9 ( AnaCas9 ; SEQ program Phaser , which revealed two unambiguously refined ID NO : 7 , residues 1 - 1101 ) was subcloned into a custom 55 Mn2 + ions present in the RuvC active site . All statistics of PET -based expression vector with an N - terminal His1o - tag the data processing and structure refinement of AnaCas9 are followed by Maltose - Binding Protein (MBP ) and a TEV summarized in Table 4 (provided in FIG . 32 protease cleavage site . The protein was overexpressed in A . naeslundii Cas9 Structure Reveals the Architecture of Escherichia coli strain Rosetta (DE3 ) and was purified to a Smaller Cas9 Variant homogeneity by immobilized metal ion affinity chromatog - 60 The 2 . 2 Å resolution crystal structure of the Type II - C raphy and heparin affinity chromatography . An additional Cas9 enzyme from Actinomyces naeslundii ( AnaCas9 ) was gel filtration chromatography step (HiLoad 16 /60 Super - determined ( Table 4 ). AnaCas9 folds into a bi- lobed struc dex200 , GE Healthcare ) was added to further purify Ana - ture with approximate dimensions 105 Åx80 Åx55 Å . The Cas9 and remove trace nucleic acid contaminants prior to RuvC and HNH nuclease domains, a Topo -homology crystallization . Purified AnaCas9 protein in gel filtration 65 domain , and the C - terminal domain form an extended nucle buffer (50 mM HEPES 7 . 5 , 300 mM KCI, 2 mM TCEP , 5 % ase lobe with the RuvC domain located at its center (FIG . glycerol) was snap frozen in liquid nitrogen and stored at 13A - B ) . Similar to SpyCas9 , the RuvC and HNH domains US 9 ,963 , 689 B2 33 34 comprise a compact catalytic core , with the two active sites domain , and the RNA -DNA heteroduplex would addition positioned ~ 30 Å apart. In contrast to SpyCas9 , an addi- ally clash sterically with the C - terminal domain (FIG . 18 ) . In tional domain (residues 8224na- 924Ana , hereafter referred to AnaCas9 , the bound crRNA - target DNA heteroduplex as the beta -hairpin domain ) is found between the RuvC - III would conversely make few contacts with the protein out motif and the Topo -homology domain , and adopts a novel 5 side of the HNH domain in the absence of HNH domain fold composed primarily of three anti - parallel beta -hairpins . reorientation ( FIG . 18 ) . The two Cas9 structures in similar As in SpyCas9 , the polypeptide sequence found between the auto - inhibited states shows the general feature of auto RuvC - I and RuvC - II motifs forms an alpha -helical lobe . inhibition of Cas9 enzymes that is consistent with the However , the AnaCas9 alpha -helical lobe is much smaller in observation that Cas9 enzymes are inactive as nucleases in size, and its relative orientation to the nuclease lobe is 10 the absence of bound guide RNAs. Cas9 enzymes clearly different ( FIG . 13C and FIG . 14A - C ) . The Arg - rich region undergo a conformational rearrangement upon guide RNA (residues 64Ana - 80Ana ) connecting the nuclease lobe and the and /or target DNA binding . alpha -helical lobe is highly flexible , as indicated by elevated B - factors (FIG . 15 ) . Comparison of the helical lobes in Example 4 AnaCas9 and SpyCas9 reveals that regions 95Ana - 251Ana 15 and 77 $ py - 447 - py are highly divergent and do not align in A Common Cas9 Functional Core Shows Structural sequence and structure (FIG . 11 ). Moreover, the 95Ana Plasticity that Supports RNA -Guided DNA 251 Ana region is poorly ordered , and only parts of it could be Cleavage modeled . By contrast, residues 2524na- 468Ana and 502Spy . 713Spy , which share - 32 % sequence identity , superimpose 20 Comparison of the SpyCas9 and AnaCas9 structures with a root mean square deviation ( rmsd ) of ~ 3 . 6 Å over 149 reveals a conserved functional core consisting of the Ruvc Ca ( FIG . 13C and FIG . 14 ). The position and orientation of and HNH domains, the Arg - rich region , and the Topo this portion of the alpha -helical domain with respect to the homology domain , with divergent C - terminal and alpha RuvC domain in the AnaCas9 and SpyCas9 structures are helical domains ( FIG . 19 ) . In both SpyCas9 and AnaCas9 substantially different, with a large displacement of ~ 70 Å 25 structures, the Arg - rich region connects the nuclease and towards the RuvC domain and an approximately 35° rotation helical lobes of the proteins . The central position of the about the junction between two domains in AnaCas9 (FIG . Arg - rich segment and its proximity to the PAM -binding 14C ) . loops in SpyCas9 shows that this is involved in guide RNA The higher resolution of the AnaCas9 structure provides and/ or target DNA binding and functions as a hinge to insights into active - site chemistries for both nuclease 30 enable conformational rearrangements in the enzyme. domains. The well -defined AnaCas9 HNH active site con - Differences between SpyCas9 and AnaCas9 illustrate the tains a two - stranded antiparallel B -sheet flanked by two structural divergence that allows Cas9 enzymes to associate a -helices on each side , as well as a non - conserved zinc site with different guide RNAs and have different PAM require coordinated by Cys566Ana , Cys569Ana , Cys602Ana and m ents . Although the helical lobes of SpyCas9 and AnaCas9 Cys6054na ( FIG . 16 and FIG . 17 ). The HNH active site 35 share a common region ( residues 2524na -4684na versus reveals Asp581 Ana and Asn606Ana coordinating a hydrated 502 - py - 713 -py ) , the orientation of this part of the protein magnesium ion ( FIG . 16 ) that is involved in binding the relative to the nuclease lobe varies widely between the two scissile phosphate in the target DNA strand . The imidazole structures ( FIG . 19 ) . The divergent conformations of the side chain of the catalytic residue His582 Ana ( corresponding alpha helical lobes controls recognition of diverse guide to His840Spy) acts as a general base in deprotonating the 40 RNAs present in Type II - A versus II - C CRISPR - Cas9 sys attacking water nucleophile , in agreement with a one -metal - tems. The PAM interacting regions identified in SpyCas9 are ion catalytic mechanism common to endonucleases contain located in loops that are highly variable within Cas9 ing the Bha -Metal motif . In the RuvC domain , two Mn + enzymes . In AnaCas9 , the beta - hairpin domain ( residues ions , spaced 3 . 8 Å apart and coordinated by the invariant 822 Ana - 924Ana ) is inserted at a position corresponding to one residues Asp17Ana , Glu505Ana , His736Ana and Asp739Ana 45 of the SpyCas9 PAM loops ( 1102 Spy - 1136Spy) , showing that are consistent with a two -metal ion mechanism . AnaCas9 employs a distinctmechanism of PAM recognition (FIG . 13C and FIG . 11) . The beta -hairpin domain is not Example 3 conserved in all Type II - C Cas9 proteins (FIG . 11 and FIG . 20 ) , further underscoring the notion that the sequence - and SpyCap9 and AnaCap9 Adopt Auto - Inhibited 50 structurally -divergent regions of Cas9 proteins may have Conformations in the Apo State co - evolved with specific guide RNA structures and PAM sequences . Target DNA cleavage by Cas9 enzymes requires the concerted nuclease activities of the RuvC and HNH domains Example 5 following base - pairing between the crRNA guide and the 55 target DNA to induce R - loop formation . Although SpyCas9 Determination of the Structures of Nucleotide and AnaCas9 adopt distinct conformations in their helical Bound Cas9 lobes, the relative orientations of the RuvC and HNH active sites within the nuclease lobes are very similar (FIG . 13C Preparation of crRNA and tracrRNA and FIG . 14 ). In both structures , the HNH active sites face 60 A42 -nucleotide ( nt) crRNA targeting a protospacer found outwards, away from the putative nucleic acid binding clefts in the bacteriophage y genome was ordered synthetically ( FIG . 4A - B ) . Structural superpositions with the DNA -bound ( Integrated DNA Technologies ; IDT; SEQ ID NO : 16 ) and complex of the HNH homing endonuclease I -Hmul (FIG . purified by 10 % denaturing PAGE . Biotinylated crRNA was 16 ) shows that this orientation is unlikely to be compatible prepared similarly and contained a 5 -adenosine linker fol with target DNA binding and cleavage (FIG . 18 ). In Spy - 65 lowed by biotin at its 3' end . tracrRNA was in vitro tran Cas9, the HNH domain active site is blocked by a beta - scribed from a synthetic DNA template using T7 polymerase hairpin formed by residues 1049Spy - 10599py of the RuvC (SEQ ID NO : 15 ) . crRNA : tracrRNA duplexes ( 5 uM ) were US 9 ,963 , 689 B2 35 36 prepared by mixing equimolar amounts of crRNA and the biotinylated species , according to the manufacturer ' s tracrRNA in Hybridization Buffer (20 mM Tris -C1 pH 7 . 5 , unit definition ( ~ 65 ng /uL in the final reaction volume) , 100 mM KCI, 5 mM MgCl2 ) , heating at 95° C . for 30 followed by an additional 10 minute incubation at 37° C . seconds , and slow - cooling on the benchtop . To attach a before storing on ice . Catalytically inactive Cas9 (D10A / biotin moiety to the tracrRNA , a modified tracrRNA was 5 H840A ) was used to generate the following samples : unla transcribed that carries the following additional sequence at beled Cas9 :RNA :DNA . Cas9 :RNA :DNA containing biotin its 3 ' end beyond residue U89 : 5 -GCUCGUGCGC - 3 ' (SEO modifications on one or both ends of the duplex , and ID NO :28 ). A complementary biotinylated DNA oligonucle Cas9: RNA :DNA containing an N - terminal MBP . Wild - type otide , 5 '- biotin - TTGCGCACGAGCAAA - 3 ' ( IDT; SEQ ID Cas9 was used to generate apo - Cas9 and all Cas9 :RNA NO : 29 ), was included during the crRNA : tracrRNA hybrid - 10 complexes. ization reaction at a 2x molar excess over tracrRNA . Preparation of Double - stranded DNA Substrates Negative - stain Electron Microscopy A 55 base - pair (bp ) DNA target (SEQ ID NO : 17 - 18 ) Cas9 complexes were diluted for negative - stain EM to a derived from the bacteriophage y genome was prepared by concentration of ~ 25 -60 nM in 20 mM Tris -HCl pH 7 . 5 , 200 mixing 5 nmol of individual synthetic oligonucleotides 15 mmmM KCl, 1 mM DTT , and 5 % glycerol immediately before ( IDT) in Hybridization Buffer supplemented with 5 % glyc - applying the sample to glow - discharged 400 mesh continu erol, heating for 1 - 2 minutes, and slow - cooling on the ous carbon grids. After adsorption for 1 min , the samples benchtop . Duplexes were separated from single - stranded were stained consecutively with six droplets of 2 % (w /v ) DNA by 5 % native PAGE conducted at 4° C . , with 5 mM uranyl acetate solution . Then the residual stain was gently MgCl2 added to the gel and the running buffer . The DNA 20 blotted off and the samples were air -dried in a fume hood . was excised , eluted into 10 mM Tris - C1 pH 7 . 5 overnight at Data were acquired using a Tecnai F20 Twin transmission 4° C . , ethanol precipitated , and resuspended in Hybridiza electron microscope operated at 120 keV (kiloelectron volt ) tion Buffer . Duplexes containing a 3 '- biotin on one or both at a nominal magnification of either 80 ,000x ( 1 .45 Å at the strands were prepared similarly . specimen level) or 100 , 000x ( 1 .08 Å at the specimen level ) Activity Assays 25 using low - dose exposures (~ 20 e - Å - 2 ) with a randomly set Cas9: RNA complexes were reconstituted by mixing defocus ranging from - 0 . 5 to - 1 . 3 um . A total of 300 - 400 equimolar amounts of Cas9 and the crRNA : tracrRNA images of each Cas9 sample were automatically recorded on duplex in Reaction Buffer ( 20 mM Tris - C1 pH 7 . 5 , 100 mm KC1, 5 mM MgCl2 , 5 % glycerol, 1 mM DTT ) and incubat a Gatan 4 kx4 k CCD camera using the MSI- Raster appli ing at 37° C . for 10 minutes . Cleavage reactions were 30 cation within the automated macromolecular microscopy conducted at room temperature in Reaction Buffer using tre1 - 2 30 software LEGINON . nM radiolabeled DNA substrates and 100 nM Cas9 :RNA . Single -particle Pre - processing Aliquots were removed at various time points and quenched All image pre- processing and two- dimensional classifi by mixing with an equal volume of formamide gel loading cation was performed in Appion . The contrast transfer buffer supplemented with 50 mM EDTA . Cleavage products 3535 function (CTF ) of each micrograph was estimated , and were resolved by 10 % denaturing PAGE and visualized by particles were selected concurrently with data collection phosphorimaging . The fraction of DNA cleaved at each time using ACE2 and a template -based particle picker, respec point was quantified using ImageQuant (GE Healthcare ), tively . Micrograph phases were corrected using ACE2 , and and kinetic time courses were fit with a single exponential the negatively - stained Cas9 particles were extracted using a decay using Kaleidagraph (Synergy Software ) to extract 40 288x288 - pixel box size . The particle stacks were binned by pseudo first- order rate constants . Each modified Cas9, RNA , a factor of 2 for processing , and particles were normalized and DNA construct was tested in three independent experi - to remove pixels whose values were above or below 4 .5 - 0 ments . of the mean pixel value using XMIPP . Binding reactions ( 15 ul) were conducted in Reaction Random Conical Tilt Reconstruction Buffer and contained - 0 .25 nM radiolabeled DNA and 45 Initial models for reconstructions of both apo - Cas9 and increasing concentrations of D10A / H840A Cas9 : RNA com - Cas9 :RNA :DNA samples were determined using random plex that was reconstituted with a 10x molar excess of conical tilt (RCT ) methodology . Tilt - pairs of micrographs crRNA : tracrRNA duplex over Cas9 . Reactions were incu were recorded manually at 0° and 55° , and ab initio models bated at 37° C . for one hour and resolved by 5 % native were generated using the RCT module in Appion . Particles PAGE conducted at 4° C ., with 5 mM MgCl2 added to the 50 gel and the running buffer . Bound and unbound DNA was were correlated between tilt -pairs using TiltPicker , binned visualized by phosphorimaging, and the fraction of DNA by 2 , and extracted from raw micrographs . Reference - free bound at each Cas9 concentration was quantified using class averages were produced from untilted particle images ImageQuant (GE Healthcare ). Binding curves were fit using by iterative 2D alignment and classification using MSA Kaleidagraph ( Synergy Software ) . 5555 MRA in IMAGIC . These class averages served as references Complex Reconstitution for Negative - stain EM for SPIDER reference- based alignment and classification , All samples for EM ( electron microscopy ) ( 10 ul vol- and RCT volumes were calculated for each class average umes ) were prepared in Reaction Buffer at a final Cas9 using back - projection in SPIDER based on these angles and concentration of 1 uM . Cas9 :RNA complexes contained 2 shifts . The RCT model from the most representative class UM crRNA : tracrRNA duplex and were incubated at 37° C . 60 ( largest number of particles ) was low -pass filtered to 60 - Å for 10 minutes before storing on ice until grid preparation . resolution and used to assign Euler angles to the entire data Cas9 :RNA :DNA complexes were prepared by first generat- set of reference - free class averages . The resulting low ing Cas9 :RNA as before and then adding the DNA duplex at resolution model was again low -pass filtered to 60 - Å reso 5 uM ( unlabeled ) or 2 uM (biotin labeled ) and incubating an lution and used as the initial model for refinement of the additional 10 minutes at 37° C . When present, streptavidin 65 three - dimensional structure by iterative projection matching (New England Biolabs ) was added after formation of Cas9 : using the untilted particle images as previously described , RNA or Cas9: RNA : DNA complexes at a 2x unit excess over with libraries from EMAN2 and SPARX software packages. US 9 , 963 ,689 B2 37 38 Domain Mapping and Localization of RNA - and DNA double -stranded DNA substrate . Reconstitutions were con ends ducted at substrate concentrations expected to saturate Cas9 , Particle stacks were binned by a factor of 2 and subjected to five rounds of iterative multivariate statistical analysis erence - free 2D class averages of the DNA -bound complex (MSA ) and multi - reference alignment (MRA ) using the 5 (Cas9 :RNA :DNA ) showed a large - scale conformational IMAGIC software package , to generate two - dimensional change , with both lobes separating from one another into class averages of each complex . The resulting set of class discrete structural units. Using the apo -Cas9 structure low averages for each species was normalized using 'proc2d ' in pass filtered to 60 Å as a starting model, we obtained a 3D reconstruction of Cas9: RNA : DNA at - 19 Å resolution (us EMAN . The EMAN classification program ' classesbymra ing the 0 .5 FSC criterion ) that reveals a substantial reorga was used to match the labeled class average to the best - 10 nization of the major lobes (FIG . 22 ). The shape of the larger matching unlabeled class average based on cross - correlation lobe remains relatively unchanged from that in apo - Cas9 coefficients . The difference maps were calculating by sub ( CCC of 0 . 78 ) , but the smaller lobe rotates by ~ 100 degrees tracting the unlabeled class average from the labeled class with respect to its position in the apo structure (FIG . 22 ) . A averages using ‘proc2d ' in EMAN . This same strategy was reconstruction of Cas9 :RNA :DNA using the N - terminal used to match the unlabeled class average to the best - 15 MBP fusion (FIG . 25 ) confirmed that the nuclease domain matching reprojection of the corresponding structure . The containing lobe is rearranged with respect to the alpha Euler angles used for creating the reprojection were applied helical lobe in this complex . This rearrangement forms a to the 3D electron density using ' proc3d ,' and the surface central channel with a width of - 25 - A that spans the length representation visualized in Chimera is shown along with its of both lobes . corresponding reprojection . 20 Determination of Cas9 Domain Reorientation Due to 3D Reconstruction and Analysis RNA Binding Three - dimensional reconstructions were all performed The architecture of Cas9 :RNA in the absence of a bound using an iterative projection -matching refinement with target DNA molecule was analysed and reference - free 2D libraries from the EMAN2 and SPARX software packages . class averages of the Cas9 :RNA showed a clear central Refinement of the RCT starting models began using an 25 channel similar to Cas9 : RNA :DNA . Using the 3D recon angular increment of 25° , progressing down to 4° for all struction of Cas9 :RNA :DNA low - pass filtered to 60 Å as a reconstructions. The resulting model was again low -pass stostarting model , a reconstruction of Cas9: RNA at ~ 21 Å filtered to 60 - Å resolution and subjected to iterative projec - resolution ( using the 0 . 5 FSC criterion ) was obtained , which tion -matching refinement to obtain the final structure . In an revealed a conformation similar to that of the DNA - bound alternative approach for apo - Cas9 and Cas9 :RNA :DNA , we 30 complex (CCC of 0 .89 with DNA - bound vs. 0 .81 with apo ) , used a low -pass filtered model of the other structure after with a central channel extending between the two lobes initial refinement with untilted particles as an initial model (FIG . 22 ) . Limited proteolysis experiments show that both for the above -mentioned projection matching refinement . the Cas9 :RNA and Cas9 :RNA :DNA complexes are more This led to EM densities with similar structural features as resistant to trypsin than apo -Cas9 and display similar diges the RCTmodels ( FIG . 21A - B ) , and the structures converged 35 tion patterns , in agreement with these nucleic acid - bound to the finalmodels presented in FIG . 22 . The resolution was complexes occupying a similar structural state (FIG . 26 ) . estimated by splitting the particle stack into two equally While the smaller lobe undergoes an additional - 50 degree sized data sets and calculating the Fourier shell correlation rotation along an axis perpendicular to the channel in the ( FSC ) between each of the back -projected volumes . The DNA -bound complex compared to Cas9 : RNA , the same final reconstructions of Caso , Cas9 :RNA , and Cas9 :RNA : 40 ~ 100 degree rotation around the channel is present in both DNA showed structural features to - 19 - A , - 21- A , and structures . Thus , loading of crRNA and tracrRNA alone is - 19 - Å resolution , respectively , based on the 0 . 5 Fourier sufficient to convert the endonuclease into an active confor shell correlation criterion . Reprojections of the final three - mation for target surveillance . dimensional reconstruction showed excellent agreement Determination of Bound DNA and RNA Orientation with the reference - free class averages ( FIG . 23A - C ) and 45 within Cas9 displayed a large distribution of Euler angles , despite some Cas9 :RNA :DNA complexes were formed using DNA preferential orientations of the particles on the carbon film . substrates containing 3 - biotin modifications ( Table 5 ) to The final reconstruction was segmented using Segger in visualize the duplex ends via streptavidin labeling . Nega Chimera based on inspection of the similarities between tive - stain EM analysis of samples labeled at either the lobes in the apo -Cas9 and Cas9 :RNA :DNA reconstructions. 50 PAM - distal (non -PAM ) end or both ends showed additional A modeled A - form duplex was manually docked into the circular density below , or both above and below the com map with Chimera , using information from the labeling plex , respectively , along the central channel positioned experiments and map segmentation , and by accommodating between the two structural lobes ( FIG . 27 ) . These data show the substrate within the channel in the EM reconstruction . that the major lobes of Cas9 enclose the target DNA , The EM - derived density map correlated closely with the 55 positioning the RNA :DNA heteroduplex along the central structural features present in the X - ray crystal structure . The channel with the PAM oriented near the top . Finally , the alpha - helical and nuclease domain lobes of the X -ray crystal orientation of RNA within Cas9 :RNA complexes was deter structure were computationally docked as rigid bodies into mined using streptavidin labeling of crRNA and tracrRNA the larger and smaller lobes of our EM structure using containing biotin at their 3 ' termini, after ensuring that Cas9 SITUS with cross - correlation coefficients (CCC ) of 0 .74 and 60 retains full activity with these modified RNAs. Using the 0 .66 , respectively ( FIG . 24 ) . same 2D and 3D difference mapping approach , the 3 ' end of Determination of Cas9 Domain Reorientation Due to the crRNA was localized to the top of the channel (FIG . 27 ) Nucleic Acid Binding and the 3 ' end of the tracrRNA was shown to extend roughly Using a catalytically inactive D10A /H840A - Cas9 mutant perpendicular to the central channel from the side of the that retains DNA binding activity , ribonucleoprotein com - 65 nuclease domain lobe ( FIG . 27 ) . The similar positions above plexes containing full - length crRNA and tracrRNA (Cas9 : the channel of the 3 ' end of the crRNA and the PAM RNA ) were prepared and bound to a 55 base -pair (bp ) proximal side of the target shows that the crRNA : DNA US 9 , 963 ,689 B2 39 40 heteroduplex orients roughly in parallel with the crRNA : TABLE 5 - continued tracrRNA duplex . Sequences of nucleic acids used . TABLE 5 SEQ 5 ID Sequences of nucleic acids used . Description NO : Sequence (5 ' - 3 ' ) SEQ ID 3 ' - Biotinylated DNA , 17 GAGTGGAAGGATGCCAGTG Description NO : Sequence (5 ' - 3 ' ) non - target strand ATAAGTGGAATGCCATGtg 10 gGCTGTCAAAATTGAGC - Oligo for preparing 30 TAATACGACTCACTATA Biotin doublestranded T7 promoters for in 3 ' - Biotinylated DNA , 18 GCTCAATTTTGACAGCCCA vitro transcription target strand CATGGCATTCCACTTATCA ??GGCATcc???????? SSDNA template for 31 AAAAAGCACCGACTCGGTG Biotin transcribing tracrRNA CCACTTTTTCAAGTTGATA 15 AcGGACTAGccTTATTTTA Reverse complement of the T7 promoter is in bold uppercase ACTTGCTATGCTGTCCTAT Nucleoltides hybridizing between the tracrRNA ext and biotin AGTGAGTCGTATTA DNA are in italics Protospacer is underlined and the PAM is in bold lowercase tracrRNA ( nts 15 - 87 ) 15 GGACAGCAUAGCAAGUUAA AAAAGGCUAGUCCGUUAU 20 The channel between the lobes of Cas9 accommodates CAACUUGAAAAAGUGGCAC ~ 25 bp of a modeled A - form helix (FIG . 28 ) . Exonuclease CGAGUCGGUGCUUUUU III footprinting experiments show that Cas9 protects a ssDNA template for 32 GCGCACGAGCAAAAAAAGC ~ 26 - bp segment of the target DNA (FIG . 29A - B ) . Addition transcribing ACCGACTCGGTGCCACTTT ally , P1 nuclease mapping experiments reveal that the dis tracrRNA ext TTCAAGTTGATAACGGACT 25 placed non - target strand is susceptible to degradation AGCCTTATTTTAACTTGCT towards the 5 ' end of the protospacer, while the target strand ATGCTGTCCTATAGTGAGT that hybridizes to crRNA is protected along nearly its entire CGTATTA length . These results are consistent with the formation of an tracrRNA _ ext 28 GGACAGCAUAGCAAGUUAA R - loop structure ( FIG . 29A - B ) . AAUAAGGCUAGUCCGUUAU RNA loading drives critical rearrangements of the Cas9 CAACUUGAAAAAGUGGCAC 30 enzyme to enable productive encounters with target DNA CGAGUCGGUGCUUUUUUUG (FIG . 30 ) . Binding of crRNA : tracrRNA to Cas9 causes a CUCGUGCGC substantial rotation of the small nuclease lobe relative to the Biotinylated DNA 29 Biotin - TTGCGCACGAGC larger lobe to form a central channel. This RNA - induced oligo to hybridize AAA conformational change occurs either through allostery or via to tracrRNA _ ext 35 direct interactions between the RNA and both lobes, and this reorganization aids in positioning the two major catalytic Targeting crRNA 16 GUGAUAAGUGGAAUGCCAU GGUUUUAGAGCUAUGCUGU centers of the enzyme on opposite sides of the channel, UUUG where the two separated strands are threaded into either active site . Non - targeting 24 GACGCAUAAAGAUGAGACG 40 While the present invention has been described with crRNA ( control ) CGUUUUAGAGCUAUGCUGU reference to the specific embodiments thereof, it should be UUUG understood by those skilled in the art that various changes 55 - bp DNA substrate , 17 GAGTGGAAGGATGCCAGTG may be made and equivalents may be substituted without non - target strand ATAAGTGGAATGCCATGtg departing from the true spirit and scope of the invention . In gGCTGTCAAAATTGAGC 45 addition , many modifications may be made to adapt a 55 - bp DNA substrate , GCTCAATTTTGACAGCCCA particular situation , material, composition of matter , pro target strand CATGGCATTCCACTTATCA cess, process step or steps , to the objective , spirit and scope CTGGCATcc???????? of the present invention . All such modifications are intended to be within the scope of the claims appended hereto .

SEQUENCE LISTING < 160 > NUMBER OF SEQ ID NOS : 33

?< 210 > SEQ ID NO 1 ?< 211 > LENGTH : 1368 ?< 212 > TYPE : PRT ?< 213 > ORGANISM : Streptococcus pyogenes

< 400 > SEQUENCE : 1 Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val 10 15 Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe 20 25 m US 9 ,963 , 689 B2 42 - continued | Lys Val Leu Gly Asn ?Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile 35 40 45 Gly Ala Leu Leu Phe Asp Ser Gly ?3?Glu Thr Ala Glu Ala Thr Arg Leu 55 3E Lys Arg Thr Ala RArg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys 65 2 Tyr Leu gGin Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser 85 90 95 Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys 100 105 His Glu Arq His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr 115 120 125

His Glu Lys Tyr &Pro ? Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp SSR 135 Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His 145 EN 160 smmMet Ile Lys Phe AArg ly His Phe Leu Ile Glu Gly Asp Leu Asn Pro 165 170 175 Asp Asn gSer Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gin Thr Tyr 180 185 Asn Gin Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala F195 200 205 Lys Ala Ile Leu Ser Ala Arq Leu Ser Lys Ser Ara Arq Leu Glu Asn 215 Leu Ile Ala Gin Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn 225 240 Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe 245 250 255 Asp Leu Ala Glu Asp Ala Lys Leu Gin Leu Ser Lys Asp Thr Tyr Asp 260 265 Asp Asp Leu Asp Asn Leu Leu Ala Gin Ile Gly Asp Gin Tyr Ala Asp 275 280 285 Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp 295 Ile Leu Arq Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser 305 320 Met Ile Lys Arg Tyr Asp Glu His His Gin Asp Leu Thr Leu Leu Lys 325 330 335mmm Ala Leu Val Arg Gin Gin Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe mmmmmm 340 345 Asp Gin Ser mmmLys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser 355 360 365 Gin Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp 375 mmGly Thr Glu Glu Leu |Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg | 400 Lys Gin Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gin Ile His Leu 405 410 mmmmmg? 415 Gly Glu Leu His Ala Ile Leu Arg Arg Gin Glu Asp Phe Tyr Pro Phe 420 425 Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile 435 440 445 | mmmmmmmm mmmmmm Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe mmmmmmmAla Trp US 9 ,963 , 689 B2 44

450 455 460 Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu 465 475 480 Val Val Asp Lys Gly Ala Ser Ala Gin Ser Phe Ile Glu Arg Met Thr 485 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser 510 Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys 515 520 525 Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gin 535 540 Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arq Lys Val Thr 545 555 Val Lys Gin Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp 565 575 Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly 590 Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp 595 600 605 Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr 615 620 Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala 625 635 His Leu Phe Asp Asp Lys Val Met Lys Gin Leu Lys Arg Arg Arg Tyr 645 655 Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp 670 Lys Gin Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe 675 680 685 Ala Asn Ara Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe 695 700 Lys Glu Asp Ile Gin Lys Ala Gin Val Ser Gly Gin Gly Asp Ser Leu 705 715 His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly 725 | 735 Ile Leu Gin Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly 750 Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln mmmmm755 760 765 Thr Thr Gin Lys Gly Gin Lys Asn Ser Arg Glu Arg Met Lys Arg Ile 775 780 Glu Glu Gly Ile Lys Glu Leu Gly Ser Gin Ile Leu Lys Glu His Pro 785 795 Val Glu Asn Thr Gin Leu Gin Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu 805 815 Gin Asn Gly Arg Asp Met Tyr Val Asp Gin Glu Leu Asp Ile Asn Arg 3? 830 Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gin Ser Phe Leu Lys 835 840 845 Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg 855 860 Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys 865 875 880 US 9 , 963 ,689 B2 45 46 - continued | Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys 895

Phe2 Asp 5Asn Leu ?Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp 900 905 910

Lys Ala Gly Phe Ile Lys Arg Gin Leu Val Glu Thr Arg Gin Ile ?Thr 920 925 Lys His Val Ala Gin Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp 930 935 940 Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser 945 950 ?955 960 Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gin Phe Tyr Lys Val Arg 975 Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val 980 mmm 985 Aøn990 Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe 1000 1005 | Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala 1010 1015| IYF ASP 1 10201020 Lys Ser Glu Gln Glu ?Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe 1025 1035 | 1030 Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile ?Thr Leu Ala 1040 1045 1050 Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu 1055 1060 1065 Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val 1070 1075 1080 Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr 1085 1090 1095 Glu Val Gin Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu &Pro Lys 1100 1105 1110 Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp AAsp Pro 1115 1120 1 | 1125 Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val 1130 1135 1140 Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys 1145 1150 | 1155 Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser 1160 1165 1170 Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys 1175 1180 1185 | Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu 1190 1195 DE 1200 Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly 1205 1210 1215 Glu Leu Gin Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val 1220 1225 1230 Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser 1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys 1250 1255 1260 His Tyr Leu Asp Glu Ile Ile Glu Gin Ile Ser Glu Phe Ser Lys 1265 1270 1275 US 9 ,963 , 689 B2 48 - continued Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala 1280 1285 1290 Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gin Ala Glu Asn 1295 1300 ??????????????1305 Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala 1310 1315 1320 Phe Lys Tyr Phe Asp Thr ??Thr Ile Asp Arg Lys Arg Tyr Thr Ser 1325 13301 1335 1? RE Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gin Ser Ile Thr 1340 1345 1350 Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gin Leu Gly Gly Asp 1355 1360 1365 | < 210 > SEQ ID NO 2 mmm < 211 > LENGTH : 1388 < 212 > TYPE : PRT < 213 > ORGANISM : Streptococcus thermophilus < 400 > SEQUENCE : 2 Met Thr Lys Pro Tyr Ser Ile 3Gly Leu Asp Ile Gly Thr Asn Ser Val 10 15 1 Gly Trp ?Ala Val Thr ?Thr Asp Asn Tyr Lys Val &Pro Ser Lys Lys Met 25 30 83 Lys Val Leu Gly Asn Thr Ser Lys Lys Tyr Ile Lys ?Lys Asn Leu Leu 35 ?2880 40 Gly mmsmVal Leu Leu Phe Asp Ser Gly geIle Thr Ala JGlu Gly Arg Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr ?Arg Arg Arg Asn Arg Ile Leu 3RaeEgg 80

Tyr Leu Gin Glu Ile Phe Ser Thr Glu Met Ala Thr Leu 2Asp Asp Ala 901 95 Phe Phe Gin Arg Leu Asp Asp Ser Phe Leu Val &Pro Asp Asp Lys Arg 105 110 ?RE Asp Ser Lys Tyr Pro Ile Phe Gly Asn Leu Val Glu Glu 2Lys RAla Tyr 115 120 His Asp Glu Phe Pro Thr Ile Tyr His Leu Arg Lys Tyr Leu Ala Asp

Ser Thr Lys Lys Ala Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His 160 Met Ile Lys Tyr Arg Gly His Phe Leu Ile Glu Gly Glu Phe Asn Ser 170 175 Lys Asn Asn Asp Ile Gin Lys Asn Phe Gin Asp Phe Leu Asp Thr Tyr 185 190 Asn Ala Ile Phe Glu Ser Asp Leu Ser Leu Glu Asn Ser Lys Gln Leu 195 200 Glu Glu Ile Val Lys Asp Lys Ile Ser Lys Leu Glu Lys Lys Asp Arg

Ile Leu Lys Leu Phe Pro Gly Glu Lys Asn Ser Gly Ile Phe Ser Glu 240

Phe Leu Lys Leu Ile Val Gly Asn Gin Ala Asp Phe Arg Lys Cys Phe 250 255 Asn Leu Asp Glu Lys Ala Ser Leu His Phe Ser Lys Glu Ser Tyr Asp 265 270 | EE Glu Asp Leu Glu Thr Leu Leu Gly Tyr Ile Gly Asp Asp Tyr Ser Asp 275 280 285 US 9 , 963 ,689 B2 49 50 - continued

º Val Phe Leu Lys Ala Lys ?Lys Leu Tyr Asp Ala Ile Leu Leu ?Ser Gly 295 300 E Phe Leu Thr Val Thr Asp Asn 3Glu Thr Glu Ala &Pro Leu Ser Ser Ala 305 310 315 320

Met Ile Lys Arq Tyr Asn Glu His Lys Glu Asp Leu EAla Leu Leu Lys 330 335 Glu Tyr Ile Arg Asn Ile Ser Leu Lys Thr Tyr Asn Glu Val Phe Lys 340 350 Asp Asp Thr Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Lys Thr Asn 355 360 Ep Gin Glu Asp Phe Tyr Val Tyr Leu Lys Lys Leu Leu .Ala Glu Phe Glu 375 380 Gly Ala Asp Tyr Phe Leu Glu Lys Ile Asp Arg Glu Asp Phe Leu Arg 385 390 395 400 Lys Gin Arg Thr Phe Asp Asn Gly Ser Ile &Pro Tyr Gin Ile His Leu 410 415 Gin Glu Met Arg Ala Ile Leu Asp Lys Gin Ala Lys Phe Tyr Pro Phe 420 43 0 Leu Ala Lys Asn Lys Glu Arg Ile Glu Lys Ile Leu Thr Phe Arg Ile 435 440 &TR Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Asp Phe Ala Trp 455 460

Ser Ile Arg Lys Arg AAsn Glu Lys Ile Thr Pro Trp Asn Phe Glu Asp 465 470 475 480 Val Ile Asp Lys Glu Ser Ser Ala Glu Ala Phe Ile Asn Arg Met Thr 490 495 Ser Phe Asp Leu Tyr Leu Pro Glu Glu Lys Val Leu Pro Lys His Ser 500 510 Leu Leu Tyr Glu Thr Phe Asn Val Tyr Asn Glu Leu Thr Lys Val Arq 515 520 Phe Ile Ala Glu Ser Met Arg Asp Tyr Gin Phe Leu Asp Ser Lys Gin 535 JEE 540 Lys Lys Asp Ile Val Arg Leu Tyr Phe Lys Asp Lys Arg Lys Val Thr 545 550 555 560 Asp Lys Asp Ile Ile Glu Tyr Leu His Ala Ile Tyr Gly Tyr Asp Gly 570 575

Ile Glu Leu Lys Glylle1 Ile GiuGlu Lys GlnGin PhePrie Asn Ser Ser Leu Ser Thr 580 590 Tyr His Asp Leu Leu Asn Ile Ile Asn Asp Lys Glu Phe Leu Asp Asp 595 600

Ser Ser Asn Glu Ala Ile Ile Glu Glu Ile Ile His Thr Leu Thr Ile mmm 615 620 Phe Glu Asp Arg Glu Met Ile Lys Gin Arg Leu Ser Lys Phe Glu Asn 625 630 635 640 Ile Phe Asp Lys Ser Val Leu Lys Lys Leu Ser Arg Arg His Tyr Thr 650 655 Gly Trp Gly Lys Leu Ser Ala Lys Leu Ile Asn Gly Ile Arg Asp Glu 660 670 Lys Ser Gly Asn Thr Ile Leu Asp Tyr Leu Ile Asp Asp Gly Ile Ser 675 680 Asn Arq Asn Phe Met Gln Leu Ile His Asp Asp Ala Leu Ser Phe Lys mmm mmmmmmmmmm mmmm 695 700 US 9 ,963 , 689 B2 a - continued | Lys Lys Ile Gin Lys Ala Gin Ile Ile Gly Asp Glu Asp Lys Gly Asn 705 1710 A715 720

Ile Lys Glu Val Val Lys Ser Leu Pro Gly Ser &Pro Ala Ile Lys Lys

Gly °Ile Leu Gin Ser Ile Lys Ile Val Asp Glu Leu Val Lys Val Met 750 Gly Gly Arg Lys Pro Glu Ser Ile Val Val Glu Met Ala Arg Glu Asn 760 765 Gin Tyr Thr Asn Gin Gly Lys Ser Asn Ser Gin Gin Arg Leu Lys Arg 770 775 780 Leu Glu Lys Ser Leu Lys Glu Leu 2Gly Ser Lys Ile Leu Lys Glu Asn g785 790 795 800 Ile Pro Ala Lys Leu Ser Lys Ile Asp Asn Asn Ala Leu Gln Asn Asp

Arg Leu Tyr Leu Tyr Tyr Leu Gin Asn Gly Lys Asp Met Tyr Thr Gly 8300 Asp Asp Leu Asp Ile Asp Arg Leu gSer Asn Tyr Asp Ile Asp His Ile 840 845 Ile Pro Gin Ala Phe Leu Lys Asp Asn Ser Ile Asp Asn Lys Val Leu 850 855 860

Val Ser Ser Ala Ser Asn Arg Gly Lys Ser Asp Asp Val Pro Ser Leu 865 870 8gggggg 875 880 Glu Val Val Lys Lys Arg Lys Thr Phe Trp Tyr Gin Leu Leu Lys Ser Lys Leu Ile Ser Gin Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg A 910 Gly Gly Leu Ser Pro Glu Asp Lys Ala Gly Phe Ile Gin Arg Gin Leu ?8? 920 925 Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Arg Leu Leu Asp Glu 930 935 940 Lys Phe Asn Asn Lys Lys Asp Glu Asn Asn Arg Ala mmmVal Arg Thr Val 945 950 955 960 Lys Ile Ile Thr Leu Lys Ser Thr Leu Val Ser Gin Phe Arg Lys Asp 975 mmm | | Phe Glu Leu Tyr Lys Val Arg Glu Ile Asn Asp Phe His His Ala His 985 990 Asp Ala Tyr Leu Asn Ala Val Val Ala Ser Ala Leu Leu Lys Lys Tyr 1000ala ser Rila 10051005 Pro Lys Leu Glu Pro Glu Phe Val Tyr Gly Asp Tyr Pro Lys Tyr 1010 1015 1020 Asn Ser Phe Arg Glu Arg Lys Ser Ala Thr Glu Lys Val Tyr Phe 1025 1030 1035 Tyr Ser Asn Ile Met Asn Ile Phe Lys Lys Ser Ile Ser Leu Ala 1040 1045 Lys Lys ser 1050lle der Leu EEggAla Asp Gly Arg Val Ile Glu Arg Pro Leu Ile Glu Val Asn Glu Glu 1055 1060 1065 Thr Gly Glu Ser Val Trp Asn Lys Glu Ser Asp Leu Ala Thr Val 1070 1075 1080

Arg Ara Val Leu Ser Tyr Pro Gin Val Asn Val Val Lys Lys Val 1085 1090 1095 Glu Glu Gin Asn His Gly Leu Asp Arg Gly Lys Pro Lys Gly Leu 1100 1105 1110

Phe Asn Ala Asn Leu Ser Ser Lys Pro Lys Pro Asn Ser Asn Glu US 9 ,963 , 689 B2 53 54 - continued 1115 1120 1125 Asn Leu Val Gly Ala Lys Glu ETyr Leu Asp Pro Lys Lys Tyr Gly 1130 a1135 | 1140 Gly Tyr Ala Gly Ile Ser Asn Ser Phe Thr Val Leu Val Lys Gly 1145 1150 1? 1155 | Thr Ile Glu Lys Gly Ala Lys Lys Lys Ile Thr Asn Val Leu Glu 1160 1165 1170 Phe Gin Gly Ile Ser Ile Leu Asp Arg Ile Asn Tyr Arg Lys Asp 1175 1180 1185 Lys Leu Asn Phe Leu Leu Glu Lys Gly Tyr Lys Asp mmmIle Glu Leu 1190 1195 1200 Ile Ile Glu Leu Pro Lys Tyr Ser Leu Phe Glu Leu Ser Asp Gly 1205 1210 1215 Ser Arg Arg Met Leu Ala Ser Ile Leu Ser Thr Asn Asn Lys Arg 1220 1225 1230 Gly Glu Ile His Lys Gly Asn Gin Ile Phe Leu Ser Gin Lys Phe 1235 1240 1245 Val Lys Leu Leu Tyr His Ala Lys Arg Ile gggSer Asn Thr Ile Asn 1250 1255 1260 Glu Asn His Arg Lys Tyr Val Glu Asn His Lys Lys Glu Phe Glu 1265 1270 mmm 1275 Glu Leu Phe Tyr Tyr Ile Leu Glu Phe Asn Glu Asn Tyr Val Gly 1280 1285 1290 Ala Lys Lys Asn Gly Lys Leu Leu Asn Ser Ala Phe Gin Ser Trp 1295 1300 1305 | Gin Asn His Ser Ile Asp Glu Leu Cys Ser Ser Phe Ile Gly Pro 1310 1315 1320 Thr Gly Ser Glu Arg Lys Gly Leu Phe Glu Leu Thr Ser Arg Gly 1325 1330 1335 Ser Ala Ala Asp Phe Glu Phe Leu Gly Val Lys Ile Pro Arg Tyr 1340 mmm 1345 1350 Arg Asp Tyr Thr Pro Ser Ser Leu Leu Lys Asp Ala Thr Leu Ile 1355 1360 1365 mmm His Gin Ser Val Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ala 1370 mmmm 1375 1380 Lys Leu Gly Glu Gly 1385 mmm < 210 > SEQ ID No 3 mmm< 211 > LENGTH : 1334 < 212 > TYPE : PRT < 213 > ORGANISMmmmm : Listeria innocua < 400 > SEQUENCE : 3 Met Lys Lys Pro Tyr Thr Ile Gly Leu Asp Ile Gly ?Thr Asn Ser Val 101 15

Gly Trp Ala Val Leu Thr Asp ?Gin Tyr Asp Leu Val Lys Arg Lys Met ma 25 30 Lys Ile Ala Gly Asp gSer Glu Lys Lys Gin Ile Lys Lys Asn Phe Trp 40 Gly Val Arg Leu Phe Asp Glu Gly ?Gin Thr Ala Ala Asp Arg Arg Met 60 Ala Arg Thr Ala Arg Arg Arg Ile Glu Arg Arg Arg Asn Arg Ile Ser 65 mms 70 Eg 75 US 9 , 963 ,689 B2 & 56 - continued ETyr Leu Gin Ile Phe Ala Glu Glu Met Ser Lys Thr Asp Ala Asn & 901 Arean95 Phe Phe ECys Arg Leu Ser Asp Ser Phe Tyr Val 2Asp ?Asn Glu Lys Arg 100 105 110 Asn Ser Arg &HEHis Pro Phe Phe Ala Thr Ile Glu Glu Glu Val Glu Tyr 115 120

His Lys IJAsn Tyr Pro Thr Ile Tyr His Leu Arg Glu Glu Leu Val Asn Ser Ser Glu Lys Ala Asp Leu Arg Leu Val Tyr Leu Ala Leu Ala His 145 155 160 Ile Ile Lys Tyr Arg Gly Asn Phe Leu Ile Glu Gly Ala Leu Asp Thr 170 175 Gln Asn Thr Ser Val Asp Gly Ile Tyr Lys Gln Phe Ile Gln Thr Tyr 180 185 190 Asn Gin Val Phe Ala Ser Gly Ile Glu Asp Gly Ser Leu Lys Lys Leu | 195 3Ep 200 Glu Asp Asn Lys Asp Val Ala Lys Ile Leu Val Glu Lys Val Thr Arg

Lys Glu Lys Leu Glu Arg Ile Leu Lys Leu Tyr Pro Gly Glu Lys Ser 225 235 240 Ala Gly Met Phe Ala Gin Phe Ile Ser Leu Ile Val Gly Ser Lys Gly 250 255 Asn Phe Gin Lys Pro Phe Asp Leu Ile Glu Lys Ser Asp Ile Glu Cys 260 265 270 Ala Lys Asp Ser Tyr Glu Glu Asp Leu Glu Ser Leu Leu Ala Leu Ile AET 275 280 Gly Asp Glu Tyr Ala Glu Leu Phe Val Ala Ala Lys Asn Ala Tyr Ser

Ala Val Val Leu Ser Ser Ile Ile Thr Val Ala Glu Thr Glu Thr Asn 305 315 320 Ala Lys Leu Ser Ala Ser ?EMet Ile Glu Arg Phe Asp Thr His Glu Glu 330 335 Asp Leu Gly Glu Leu Lys Ala Phe Ile Lys Leu His Leu Pro Lys His 340 345 350 Tyr Glu Glu Ile Phe Ser Asn Thr Glu Lys His Gly Tyr Ala Gly Tyr 355 360 Ile Asp Gly Lys Thr Lys Gin Ala Asp Phe mmmmTyr Lys Tyr Met Lys Met Ep Thr Leu Glu Asn Ile Glu Gly Ala Asp Tyr Phe Ile Ala Lys Ile Glu 385 ST 395 400 Lys Glu Asn Phe Leu Arg Lys Gin Arg Thr Phe Asp Asn Gly Ala Ile 410 415 Pro His Gin Leu His Leu Glu Glu Leu Glu Ala Ile Leu His Gin Gin 420 425 430 Ala Lys Tyr Tyr Pro Phe Leu Lys Glu Asn Tyr Asp Lys Ile Lys Ser 435 440 mmmm Leu Val ?Thr Phe &REArg Ile Pro Tyr Phe Val Gly Pro Leu Ala Asn Gly

Gin Ser Glu Phe Ala Trp Leu ?Thr Arg Lys Ala Asp Gly Glu Ile Arg 465 475 480

&Pro Trp Asn Ile Glu Glu Lys Val Asp Phe Gly Lys Ser Ala Val Asp 485 490 495 Phe Ile Glu Lys Met Thr Asn Lys Asp mmmmmmmmThr Tyr Leu Pro Lys Glu Asn US 9 ,963 , 689 B2 58 - continued 505 510 Val Leu Pro Lys His Ser Leu Cys Tyr Gin Lys Tyr Leu Val Tyr Asn 515 525

3Glu Leu Thr Lys Val Arg Tyr Ile Asn Asp Gin Gly Lys Thr Ser Tyr

Phe Ser Gly Gln Glu Lys Glu Gln Ile Phe Asn Asp Leu Phe Lys Gln 550 555 Lys Arg Lys Val Lys Lys Lys Asp Leu Glu Leu Phe Leu Arg Asn Met 575 Ser His Val Glu Ser Pro Thr Ile Glu Gly Leu Glu Asp Ser Phe Asn 585 590 Ser Ser Tyr Ser Thr Tyr His Asp Leu Leu Lys Val Gly Ile Lys Gin 595 605 Glu Ile Leu Asp Asn Pro Val Asn Thr Glu Met Leu Glu Asn Ile Val

Lys Ile Leu Thr Val Phe Glu Asp Lys Arq Met Ile Lys Glu Gln Leu 630 635 93 Gin Gin Phe Ser Asp Val Leu Asp Gly Val Val Leu Lys Lys Leu Glu 655 Arg Arg His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Leu Met 665 670 Gly Ile Arg Asp Lys Gin Ser His Leu Thr Ile Leu Asp Tyr Leu Met 675 685 Asn Asp EDAsp Gly Leu Asn Arg Asn Leu Met Gin Leu Ile Asn Asp Ser Asn Leu Ser Phe Lys Ser Ile Ile Glu Lys Glu Gin Val Thr Thr Ala 710 715 Asp Lys Asp Ile Gin Ser Ile Val Ala Asp Leu Ala Gly Ser Pro Ala 735 Ile Lys Lys Gly Ile Leu Gin Ser Leu Lys Ile Val Asp Glu Leu Val 745 750 Ser Val Met Gly Tyr Pro Pro Gin Thr Ile Val Val Glu Met Ala Arq 755 765 Glu Asn Gin Thr Thr Gly Lys Gly Lys Asn Asn Ser Arg Pro Arg Tyr

Lys Ser Leu Glu Lys Ala Ile Lys Glu Phe Gly Ser Gin Ile Leu Lys 790 795 Glu His Pro Thr Asp Asn Gin Glu Leu Arg Asn Asn Arg Leu Tyr Leu 815 Tyr Tyr Leu Gin Asn Gly Lys Asp Met Tyr Thr Gly Gin Asp Leu Asp 825 830 Ile His Asn Leu Ser Asn Tyr Asp Ile Asp His Ile Val Pro Gln Ser 835 845

Phe Ile Thr Asp Asn Ser Ile Asp Asn Leu Val Leu Thr Ser Ser Ala

Gly Asn Arg Glu Lys Gly Asp Asp Val Pro Pro Leu Glu Ile Val Arg 870 875 880 Lys Arg Lys Val Phe Trp Glu Lys Leu Tyr Gin Gly Asn Leu Met Ser 895 Lys Arg Lys Phe Asp Tyr Leu Thr Lys Ala Glu Arg Gly Gly Leu Thr 905 910 GluGlu Alaala saryAsp Lys Ala ArgArg Phe Ilelle His ArqArg GinGin Leu Val GluGiu Thr ArgArq mmmmmm 915 925 US 9 , 963 ,689 B2 - continued

? Gln Ile Thr Lys .Asn Val Ala Asn Ile Leu His Gin Arg Phe Asn Tyr 930 935 940

Glu Lys Asp Asp His Gly Asn ?Thr Met Lys Gin Val Arg Ile Val Thr 945 955 960

Leu Lys Ser Ala Leu Val Ser Gin Phe Ara Lys Gln Phe Gln Leu Tyr 965 970 m? 975 Lys Val Arg Asp Val Asn Asp Tyr His His Ala His Asp Ala Tyr Leu 985 | Asn Gly Val Val Ala Asn Thr Leu Leu Lys Val Tyr Pro Gin Leu Glu 1000 | 1005

Pro Glu Phe Val Tyr Gly ?Asp Tyr His Gin Phe Asp Trp Phe Lys 1010 1015 1020

Ala Asn Lys Ala ?Thr Ala Lys Lys Gln Phe Tyr Thr Asn Ile Met 1025 1030 1035 Leu Phe Phe Ala Gin Lys Asp Arg Ile Ile Asp Glu Asn Gly Glu 1040 1045 1050 Ile Leu Trp Asp Lys Lys Tyr Leu Asp Thr Val Lys Lys Val Met 1055 1060 1065 Ser Tyr Arg Gin Met Asn Ile Val Lys Lys Thr Glu Ile Gin Lys 1070 1075 1080

Gly Glu Phe Ser Lys Ala Thr Ile Lys &Pro Lys Gly Asn Ser Ser 1085 1090 1095 Lys Leu Ile Pro Arg Lys Thr Asn Trp Asp Pro Met Lys Tyr Gly 1100 1105 1110 Gly Leu Asp Ser Pro Asn Met Ala Tyr Ala Val Val Ile Glu Tyr 1115 1120 1125 Ala Lys Gly Lys Asn Lys Leu Val Phe Glu Lys Lys Ile Ile Arg 1130 1135 | 1140 Val Thr Ile Met Glu Arg Lys Ala Phe Glu Lys Asp Glu Lys Ala 1145 1150 1155 Phe Leu Glu Glu Gin Gly Tyr Arg Gin &Pro Lys Val Leu Ala Lys ???&1160 1165 1170 Leu Pro Lys Tyr Thr Leu Tyr Glu cys Glu Glu Gly Arg Arg Arg 1175 1180 1185 1 Met Leu Ala Ser Ala Asn Glu Ala Gin Lys Gly Asn Gin Gin Val 1190 1195 1200 m? ?2 Leu Pro Asn His Leu Val Thr Leu Leu His His Ala Ala Asn Cys 1205 1210 1215 Glu Val Ser Asp Gly Lys Ser Leu Asp Tyr Ile Glu Ser Asn Arg 1220 1225 1230

Glu Met Phe Ala Glu Leu Leu Ala His Val Ser Glu Phe Ala Lys 1235 1240 1245 Arg Tyr Thr Leu Ala Glu Ala Asn Leu Asn Lys Ile1 Asn Gin Leu 1250 1255 1260 mmmmm Phe Glu Gin Asn Lys Glu Gly Asp Ile Lys Ala Ile Ala Gin Ser 1265 | 1270 1275 Phe Val Asp Leu Met Ala Phe Asn Ala Met Gly Ala Pro Ala Ser 1280 1285 1290 Phe Lys Phe Phe Glu Thr Thr Ile Glu Arg Lys Arg Tyr Asn Asn 1295 1300 1305

Leu Lys Glu Leu Leu Asn Ser Thr Ile Ile Tyr Gln Ser Ile Thr 1310 1315 1320 US 9 ,963 , 689 B2 - continued Gly Leu Tyr Glu Ser Arg Lys Arg Leu Asp Asp 1325 1330

< 210 > SEQ ID NO 4 < 211 > LENGTH : 1370 < 212 > TYPE : PRT < 213 > ORGANISM : Streptococcus agalactiae < 400 > SEQUENCE : 4 Met Asn Lys Pro Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val 9 Gly Trp Ser Ile Ile Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met 25 30 Arg Val Leu Gly Asn Thr Asp Lys Glu Tyr Ile Lys Lys Asn Leu Ile 40 Gly Ala Leu Leu Phe Asp Gly Gly Asn Thr Ala Ala Asp Arg Arg Leu

Lys Arg Thr Ala Arg Arg Arg Tyr ?Thr Arg Arg Arg Asn Arg Ile Leu 70 75 msgTyr Leu Gln Glu Ile Phe Ala Glu Glu Met Ser Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Asp Ser Phe Leu Val Glu Glu Asp Lys Arg 105 110

Gly Ser Lys Tyr Pro Ile Phe Ala ?Thr Leu Gin Glu Glu Lys Asp Tyr 120 His Glu Lys gagPhe Ser Thr Ile Tyr His Leu Arg Lys Glu Leu Ala Asp Lys Lys Glu Lys Ala Asp Leu Arg Leu Ile Tyr Ile Ala Leu Ala His 150 155 Ile Ile Lys Phe Arg Gly His Phe Leu Ile mmmmmGlu Asp Asp Ser Phe Asp Val Arg Asn Thr Asp Ile Ser Lys Gin Tyr Gin Asp Phe Leu Glu Ile 185 mmm 190 Phe Asn Thr Thr Phe Glu Asn Asn Asp Leu Leu Ser Gln Asn Val Asp 200 Val Glu Ala Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp

Arg Ile Leu Ala Gin Tyr Pro Asn Gin Lys Ser Thr Gly Ile Phe Ala mm 230 235 Glu Phe Leu Lys Leu Ile Val Gly Asn Gin Ala Asp Phe Lys Lys Tyr

Phe Asn Leu Glu Asp Lys Thr Pro Leu Gin Phe Ala Lys Asp Ser Tyr 265 270 Asp Glu Asp Leu Glu Asn Leu Leu Gly Gin Ile Gly Asp Glu Phe Ala mmm 280 Asp Leu Phe Ser Ala Ala Lys Lys Leu Tyr Asp Ser Val Leu Leu Ser

Gly Ile Leu Thr Val Ile Asp Leu Ser Thr Lys Ala Pro Leu Ser Ala 310 315 mmmmmm

Ser Met Ile Gln Arg Tyr Asp Glu His Arg Glu Asp Leu Lys Gln Leu

Lys Gin Phe Val Lys Ala Ser mmmmLeu Pro Glu Lys Tyr Gin Glu Ile Phe 345 350 Ala Asp Ser Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Glu Gly Lys Thr smsm mmmm mmm 360 365 m US 9 , 963 ,689 B2 63 64 - continued Asn Gin Glu Ala Phe Tyr Lys Tyr Leu Ser Lys Leu Leu Thr Lys Gin 380 Glu Asp Ser Glu Asn Phe Leu Glu Lys Ile Lys Asn Glu Asp Phe Leu 395 400 Arg Lys Gin Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gin Val His 405 410 Leu Thr Glu Leu Lys Ala Ile Ile Arg Arg Gin Ser Glu Tyr Tyr Pro 420 425 430 | ?&HR Phe Leu Lys Glu Asn Gin Asp Arg Ile Glu Lys Ile Leu Thr Phe Arg 435 440 Ile Pro Tyr Tyr Ile Gly Pro Leu Ala Arg Glu Lys Ser Asp Phe Ala 460

Trp Met Thr Arg Lys Thr Asp Asp Ser Ile Arg &Pro Trp Asn Phe Glu 475 480 Asp Leu Val Asp Lys Glu Lys Ser Ala Glu Ala Phe Ile His Arg Met 485 490 HER Thr Asn Asn Asp Phe Tyr Leu Pro Glu Glu Lys Val Leu Pro Lys His 500 505 510

Ser Leu Ile Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val 515 520 Arg Tyr Lys Asn Glu Gin Gly Glu Thr Tyr Phe Phe Asp Ser Asn Ile 540 Lys Gin Glu Ile Phe Asp Gly Val Phe Lys Glu His Arg Lys Val Ser 555 560 Lys Lys Lys mm8mmLeu Leu Asp Phe Leu Ala Lys Glu Tyr Glu Glu Phe Arg 565 570 575 Ile Val Asp Val Ile Gly Leu Asp Lys Glu Asn Lys Ala Phe Asn Ala 580 585 590 Ser Leu Gly Thr Tyr His Asp Leu Glu Lys Ile Leu Asp Lys Asp Phe 595 600 Leu Asp Asn Pro Asp Asn Glu Ser Ile Leu Glu Asp Ile Val Gin Thr 620 Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Lys Lys Arg Leu Glu Asn 635 640 Tyr Lys Asp Leu Phe Thr Glu Ser Gin Leu Lys Lys Leu Tyr Arg Arg 645 650 655 His Tyr Thr Gly Trp Gly Arg Leu Ser Ala Lys Leu Ile Asn Gly Ile 660 665 670 Arg Asp Lys Glu Ser Gin Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp 675 680 Gly Arg Ser Asn Arg Asn Phe Met Gin Leu Ile Asn Asp Asp Gly Leu 700

Ser Phe Lys Ser Ile Ile Ser Lys Ala Gln Ala Gly Ser His Ser Asp mmmm 715 720 Asn Leu Lys Glu Val Val Gly Glu Leu Ala Gly Ser Pro Ala mmmIle Lys 725 730 735 Lys Gly Ile Leu Gin Ser Leu Lys Ile Val Asp Glu Leu Val Lys Val 740 745 750 Met Gly Tyr Glu Pro Glu Gin Ile Val Val Glu Met Ala Arg Glu Asn mmmm 755 760 mmmmmm Gin Thr Thr Asn Gin Gly Arg Arg Asn Ser Arg Gin Arg Tyr Lys Leu mmm 780 mmmmmmmmmmmm US 9 ,963 , 689 B2 65 - continued Leu Asp Asp Gly Val Lys Asn Leu Ala Ser Asp ?Leu Asn 3Gly Asn Ile 785 790 8795 800 Leu Lys 3Glu Tyr &Pro ?Thr |??Asp Asn Gin Ala Leu Gin Asn Glu Arg Leu 810 815

Phe Leu Tyr Tyr Leu Gin Asn &Gly Arg Asp Met Tyr ?Thr Gly IRGlu Ala ? 820 JH825 830 Leu Asp Ile Asp Asn Leu Ser JGin Tyr Asp Ile Asp His Ile Ile Pro 840 845 Gin Ala Phe Ile Lys Asp Asp Ser Ile Asp Asn Arg Val Leu Val Ser 850 860 RSR Ser Ala Lys Asn Arg Gly Lys Ser Asp Asp Val &Pro Ser Leu Glu Ile 865 870 875 880 Val Lys Asp Cys Lys Val Phe Trp Lys Lys Leu Leu Asp Ala Lys Leu 890 895 Met Ser Gin Arg Lys Tyr Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly 900 905 910 Leu Thr Ser Asp Asp Lys Ala Arg Phe Ile Gin Arg Gin Leu Val Glu 1920 925 Thr Arg Gin Ile Thr Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe 930 940 Asn Asn Glu Leu Asp Ser Lys Gly Arg Arg Ile Arg Lys Val Lys Ile 945 950 955 960 Val Thr Leu Lys Ser Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Gly 970 975 Phe Tyr Lys Ile Arg Glu Val Asn Asn Tyr His His Ala His Asp Ala 980 985 990 Tyr Leu Asn Ala Val Val Ala Lys Ala Ile Leu Thr Lys Tyr Pro Gin 1000 1005

Leu Glu Pro Glu ?Phe Val Tyr Gly Asp Tyr Pro Lys Tyr Asn Ser 1010 1015 1020 Tyr Lys Thr Arg Lys Ser Ala Thr Glummmmmmm Lys Leu Phe Phe Tyr Ser 1025 1030 1035 | Asn Ile Met Asn Phe Phe Lys Thr Lys Val ?Thr Leu Ala Asp Gly 1040 1045 1050 EA Thr Val Val Val Lys 2Asp Asp Ile Glu Val Asn Asn Asp Thr Gly 1055 1060 1065 Glu Ile Val Trp Asp Lys Lys Lys His Phe Ala Thr9mg3558 Val Arg Lys 1070 1075 1080

Val Leu Ser Tyr Pro Gln Asn Asn Ile Val Lys Lys Thr Glu Ile 1085 1090 1095 gg83= ESABE Gin Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Ala His Gly Asn 1100 1105 mmm 1110 Ser Asp Lys Leu Ile Pro Arg Lys Thr Lys Asp Ile Tyr Leu Asp 1115 1120 1125

&Pro Lys Lys Tyr 3Gly Gly Phe Asp Ser Pro Ile Val Ala Tyr Ser 1130 .TS 1140 1135 Val Leu Val Val Ala Asp Ile Lys Lys Gly Lys Ala Gin Lys Leu 1145 ?1150 1155 Lys Thr gg3Val Thr Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser 1160 1165 1170 Arg Phe Glu Lys Asn &Pro Ser Ala Phe Leu Glu Ser mmmmLys mmmGly Tyr 1175 1180 1185 Leu Asn Ile Arg Ala Asp Lys Leu Ile Ile Leu Pro Lys Tyr Ser US 9 ,963 , 689 B2 67 - continued | 1190 1195 1200 Leu Phe Glu ?Leu Glu Asn Gly Arg Arg Arg Leu Leu Ala Ser Ala 1205 1210 1215 Gly Glu Leu Gin Lys Gly Asn Glu Leu Ala Leu Pro Thr Gln Phe 1220 1225 1230 | Met Lys Phe Leu Tyr Leu Ala Ser Arg Tyr Asn Glu Ser Lys Gly 1235 1240 1245 Lys Pro Glu Glu Ile Glu Lys Lys Gin Glu Phe Val Asn Gin His 1250 1255 1260 Val Ser Tyr Phe Asp Asp Ile Leu Gin Leu Ile Asn Asp Phe Ser 1265 1270 ?? 1275 Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Glu ?Lys Ile Asn Lys 1280 1285 1290 Leu Tyr Gln Asp Asn Lys Glu Asn Ile Ser Val Asp Glu Leu Ala 1295 m? 1300 1305 Asn Asn Ile Ile Asn Leu Phe Thr Phe Thr Ser Leu Gly Ala Pro 1310 1315 1320 Ala Ala Phe Lys Phe Phe Asp Lys Ile Val Asp Arg Lys Arg Tyr 1325 1330 1335 Thr Ser Thr Lys Glu Val Leu Asn Ser Thr Leu Ile His Gin Ser 1340 1345 1350 | Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Gly Lys Leu Gly 1355 1360 1365 Glu Asp 1370

< 210 > SEQ ID No 5 < 211 > LENGTH : 1345 < 212 > TYPE : PRT |AAA 213 > ORGANISM : Streptococcus mmmutans < 400 > SEOUENCE : 5 ?Met Lys Lys Pro Tyr Ser Ile &Gly Leu Asp Ile Gly Thr Asn Ser Val 15 Gly Trp Ala Val Val Thr Asp Asp Tyr Lys Val Pro Ala Lys Lys Met

Lys Val Leu Gly Asn Thr Asp Lys Ser His Ile Glu Lys Asn Leu Leu Gly Ala Leu Leu Phe Asp Ser Gly Asn Thrmmm Ala Glu Asp Arg Arg Leu Lys Arg Thr Ala Arg Arg 1Arg ETyr Thr Arg Arg Arg Asn Arg Ile Leu 70 75 80 Tyr Leu Gin Glu Ile Phe Ser Glu Glu Met Gly Lys Val Asp Asp Ser 9? 95 Phe Phe His amArg samLeu Glu Asp Ser Phe amLeu Val Thr Glu Asp Lys Arg Gly Glu Arg His Pro Ile Phe Gly Asn Leu Glu Glu Glu Val Lys amTyr His Glu Asn Phe Pro Thr Ile Tyr His Leu Arg Gin Tyr Leu Ala Asp

Asn Pro Glu Lys Val Asp Leu Arq Leu Val Tyr Leu Ala mmmLeu Ala His 145 150 EN 155 160 Ile Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Lys Phe Asp Thr mmm 175 US 9 , 963 ,689 B2 | 70 - continued Arg Asn Asn Asp Val Gin Arg Leu Phe Gin Glu ?Phe Leu Ala Val Tyr 185 190

Asp Asn Thr Phe Glu ?Asn Ser Ser Leu Gin Glu Gin Asn Val Gin Val 195 200 205 Glu Glu Ile Leu Thr Asp Lys Ile Ser Lys Ser Ala Lys Lys Asp Arg 3? 215 220 Val Leu Lys Leu Phe Pro Asn Glu Lys Ser Asn Gly Arg Phe Ala Glu 225 230 235 23333Phe Leu Lys Leu Ile Val Gly Asn Gln AAla Asp Phe Lys Lys His Phe 245 255 Glu Leu Glu Glu Lys Ala Pro Leu Gin Phe Ser Lys Asp Thr Tyr Glu 265 270 Glu Glu Leu Glu Val Leu Leu Ala Gln Ile Gly Asp Asn Tyr Ala Glu 275 280 285 Leu Phe Leu Ser Ala Lys Lys Leu Tyr Asp Ser Ile Leu Leu Ser Gly 295 300 Ile Leu Thr Val Thr Asp Val Gly Thr Lys Ala Pro Leu Ser Ala Ser 305 310 315

Met Ile Gln Arg Tyr Asn Glu His Gin Met Asp Leu Ala Gln Leu Lys 325 335

Gln Phe Ile Arg Gln Lys Leu Ser 8?Asp Lys Tyr Asn Glu Val Phe Ser 345 350 Asp Val Ser Lys Asp Gly Tyr Ala Gly Tyr Ile Asp Gly Lys Thr Asn 355 360 365 | Gin Glu Ala Phe Tyr Lys Tyr Leu Lys Gly Leu Leu Asn Lys Ile Glu 375 380 Gly Ser Gly Tyr Phe Leu Asp Lys Ile Glu Arg Glu Asp Phe Leu Arg 385 390 395 Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu 405 8mmm 415 Gin Glu Met Arg Ala Ile Ile Arg Arg Gin Ala Glu Phe Tyr Pro Phe 425 430 Leu Ala Asp Asn Gin Asp Arg Ile Glu Lys Leu Leu Thr Phe Arg Ile 435 440 445 Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Lys Ser Asp Phe Ala Trp 455 460 Leu Ser Arg Lys Ser Ala Asp Lys Ile Thr Pro Trp Asn Phe Asp Glu 465 470 475 Ile Val Asp Lys Glu Ser Ser Ala Glu Ala Phe Ile Asn Arg Met Thr 485 495 Asn Tyr Asp Leu Tyr Leu Pro Asn Gin Lys Val Leu Pro Lys His Ser 505 510 | Leu Leu Tyr Glu Lys Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys 515 520 525 Tyr Lys mmmmmThr Glu Gin Gly Lys ?Thr Ala Phe Phe Asp Ala Asn Met Lys mmm 535 540 Gin Glu Ile Phe Asp Gly Val mmmmPhe Lys Val Tyr Arg Lys Val Thr Lys 545 550 555 560 Asp Lys Leu Met Asp Phe Leu Glu Lys Glu Phe Asp Glu Phe Arg Ile 565 575 Val Asp Leu Thr Gly Leu Asp Lys Glu Asn Lys Val Phe Asn Ala Ser 585 590 | Tyr Gly Thr Tyr His Asp Leu Cys Lys Ile Leu 16Asp Lys Asp Phe Leu US 9 ,963 , 689 B2 - continued | | 595 600 605 Asp Asn Ser Lys Asn Glu Lys Ile Leu Glu Asp Ile Val Leu Thr Leu Lys Aøn Giu 615Lys lle Leu Giu Asp ile val Leu Thr Leu Thr Leu Phe Glu Asp Arq Glu Met Ile Arq Lys Arq Leu Glu Asn Tyr 630 Glu Met Ile Arg Lys635 640 Ser Asp Leu Leu Thr Lys Glu Gin Val Lys Lys Leu Glu Arg Arg His 645 | 655 Tyr Thr Gly Trp Gly Arg Leu Ser Ala Glu Leu Ile His Gly AgeIle Arg 660 665 670 1 Asn Lys Glu Ser Arg Lys Thr Ile Leu Asp Tyr Leu Ile Asp Asp Gly 675 680 Asn Ser Asn Arg Asn Phe Met Gin Leu Ile Asn Asp Asp Ala Leu Ser 695 Phe Lys Glu Glu Ile Ala Lys Ala Gin Val Ile Gly Glu Thr Asp Asn 710 715 RTEp3 720 Leu Asn Gin Val Val Ser Asp Ile Ala Gly Ser Pro Ala Ile Lys Lys 725 735

Gly Ile Leu Gin Ser Leu Lys Ile Val Asp 3Glu Leu Val Lys Ile Met 740 1745 750 Gly His Gin Pro Glu Asn Ile Val Val Glu Met Ala Arg Glu Asn Gin 755 760 Phe Thr Asn Gin Gly Arg Arg Asn Ser Gin Gin Arg Leu Lys Gly Leu 775 Thr Asp Ser Ile Lys Glu Phe Gly Ser Gin Ile Leu Lys Glu His Pro 790 8795 800 Val Glu Asn Ser Gin Leu Gln Asn Asp Arg Leu Phe Leu Tyr Tyr Leu 805 815 Gin Asn Gly Arg Asp Met Tyr Thr Gly Glu Glu Leu Asp Ile Asp Tyr 820 825 830 Leu Ser Gin Tyr Asp Ile Asp His Ile Ile Pro Gin Ala Phe Ile Lys 835 840 Asp Asn Ser Ile Asp Asn Arg Val Leu Thr Ser Ser Lys Glu Asn Arg 855 Gly Lys Ser Asp Asp Val Pro Ser Lys Asp Val Val Arg Lys Met Lys 870 875 880 Ser Tyr Trp Ser Lys Leu Leu Ser Ala Lys Leu Ile Thr Gin Arg Lys 885 895 Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Thr Asp Asp Asp 900 905 910 Lys Ala Gly Phe Ile Lys Arg Gin Leu Val Glu Thr Arg Gin Ile Thr 915 920 Lys His Val Ala Arg Ile Leu Asp Glu Arg Phe Asn Thr Glu Thr Asp 935 Glu Asn Asn Lys Lys Ile Arg Gin Val Lys Ile Val Thr Leu Lys Ser 950 955 960 Asn Leu Val Ser Asn Phe Arg Lys Glu Phe Glu Leu Tyr Lys Val Arg 965 975 Glu Ile Asn Asp Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val 980 mmmmm 985 990 Ile Gly Lys Ala Leu Leu Gly Val Tyr Pro Gin Leu Glu Pro Glu Phe mmmm995 1000 1005 Val Tyr Gly Asp Tyr Pro His Phe His Gly His Lys Glu Asn Lys 1010 mmmm mmmmm 1015mmm mmmmmmm mmmmm 1020 US 9 , 963 ,689 B2 13 74 - continued

Ala Thr Ala Lys Lys Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe 1025Thz Ala Lys Ly? 1030 1035

Lys Lys Asp Asp 3Val Arg ?Thr Asp Lys Asn Gly Glu Ile Ile Trp 1040 1045 1050 Lys Lys Asp Glu His Ile Ser Asn Ile Lys Lys Val Leu Ser Tyr 1055 1060 1065 Pro Gin Val Asn Ile Val Lys Lys Val Glu Glu Gin Thr Gly Gly 1070 1075 1080 Phe Ser Lys Glu Ser Ile Leu Pro Lys Gly Asn Ser Asp Lys Leu 1085 1090 1095 Ile Pro Arg Lys Thr Lys Lys Phe Tyr Trp Asp Thr Lys Lys Tyr 1100 1105 1110 Gly Gly Phe Asp Ser Pro Ile Val Ala Tyr Ser Ile Leu Val Ile 1115 1120 1125 Ala Asp Ile Glu Lys Gly Lys Ser mmmLys Lys Leu Lys Thr Val Lys 1130 | 1135 1140 Ala Leu Val Gly Val Thr Ile Met Glu Lys Met Thr Phe Glu Arg 1145 1150 1155 mmm Asp Pro Val Ala Phe Leu Glu Arg Lys Gly Tyr Arg Asn Val Gin 1160 1165 1170 Glu Glu Asn Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Lys Leu 1175 1180 1185 Glu Asn Gly Arg Lys Arg Leu Leu Ala Ser Ala Arg Glu Leu Gin 1190 1195 1200 Lys Gly Asn Glu Ile Val Leu Pro Asn His Leu Gly Thr Leu Leu 1205 1210 mmm 1215 Tyr His Ala Lys Asn Ile His Lys Val Asp Glu Pro Lys His Leu 1220 1225 1230 Asp Tyr Val Asp Lys His Lys Asp Glu Phe Lys Glu Leu Leu Asp 1235 1240 1245 Val Val Ser Asn Phe Ser Lys Lys Tyr Thr Leu Ala Glu Gly Asn 1250 1255 1260 Leu Glu Lys Ile Lys Glu Leu Tyr Ala Gin Asn Asn Gly Glu Asp 1265 1270 1275 Leu Lys Glu Leu Ala Ser Ser Phe Ile Asn Leu Leu Thr Phe Thr 1280 1285 mmmm 1290 Ala Ile Gly Ala Pro Ala Thr Phe Lys Phe Phe Asp Lys Asn Ile 1295 1300 1305 Asp Arg Lys Arg Tyr Thr Ser Thr Thr Glu Ile Leu Asn Ala Thr 1310 mmmmm| 1315 1320 Leu Ile His Gin Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp 1325 1330 1335 ? Leu Asn Lys Leu Gly Gly Asp mmmmmmm 1340 1345 < 210 > SEQ ID No 6 < 21111 > LENGTH : 1340 < 21212 > TYPE : PRT < 21313 > ORGANISM : Enterococcus faecium < 400 > SEOUENCE : 6 mmmmmmm Met Lys Lys Glu Tyr Thr Ile Gly Leu Asp Ile Gly Thr Asn Ser Val 101 15 Gly Trp Ser Val Leu Thr Asp Asp Tyr Arg Leu Val Ser Lys Lys Met US 9 ,963 , 689 B2 75 m - - continued 25 Lys Val Ala Gly Asn Thr Glu Lys Ser Ser Thr Lys Lys Asn Phe Trp 40 Gly Val Arg Leu Phe Asp Glu Gly Gin Thr Ala Glu Ala Arg Arg Ser 55 Lys Arg Thr Ala Arg Arg Arg Leu Ala Arg Arg Arg Gin Arg Ile Leu 65 Tag 70 sm am 80 Glu Leu Gin Lys Ile Phe Ala Pro Glu Ile Leu Lys Ile Asp Glu His 90 Phe Phe Ala Arg Leu Asn Glu Ser Phe Leu Val Pro Asp Glu Lys Lys 105 110 Gin Ser Arg His Pro Val Phe Ala Thr Ile Lys Gin Glu Lys Ser Tyr 120 His Gin Thr Tyr Pro Thr Ile Tyr His Leu Ara Gln Ala Leu Ala Asp 135 Ser Ser Glu Lys Ala Asp Ile Arg Leu Val Tyr Leu Ala Met Ala His 145 150 160 Leu Leu Lys Tyr Arg Gly His Phe Leu Ile Glu Gly Glu Leu Asn Thr 170 Glu Asn Ser Ser Val Thr Glu Thr Phe Ara Gln Phe Leu Ser Thr Tyr 185 190 Asn Gin Gin Phe Ser Glu Ala Gly Asp Lys Gin Thr Glu Lys Leu Asp 200 Glu Ala Val Asp Cys Ser Phe Val Phe Thr Glu Lys Met Ser Lys Thr 215

Lys Lys Ala Glu Thr Leu Leu Lys Tyr Phe Pro His Glu Lys Ser Asn 225 230 240 Gly Tyr Leu Ser Gin Phe Ile Lys Leu Met Val Gly Asn Gin Gly Asn 250 Phe Lys Asn Val Phe Gly Leu Glu Glu Glu Ala Lys Leu Gun Phe Ser 265 270 Lys Glu Thr Tyr Glu Glu Asp Leu Glu Glu Leu Leu Glu Lys Ile Gly mmmm 280 Asp Asp Tyr Ile Asp Leu Phe Val Gin Ala Lys Asn Val Tyr Asp Ala 295 Val Leu Leu Ser Glu Ile Leu Ser Asp Ser Thr Lys Asn Thr Arg Ala 305 310 320 Lys Leu Ser Ala Gly Met Ile Arg Arg Tyr Asp Ala His Lys Glu Asp 330 Leu Val Leu Leu Lys Arg Phe Val Lys Glu Asn Leu Pro Lys Lys Tyr 345 350 Arg Ala Phe Phe Gly Asp Asn Ser Val Asn Gly Tyr Ala Gly Tyr Ile 360 Glu Gly His Ala Thr Gin Glu Ala Phe Tyr Lys Phe Val Lys Lys Glu 375 Leu Thr Gly Ile Arg Gly Ser Glu Val Phe Leu Thr Lys Ile mmmmmGlu Gin 385 390 400 Glu Asn Phe Leu Arg Lys Gin Arg Thr Phe Asp Asn Gly Val Ile Pro 3mm 410 His Gin Ile His Leu Ser Glu Leu Arg Ala Ile Ile Ala Asn Gin Lys 425 430 Lys His Tyr Pro Phe Leu Lys Glu Glu Gin Glu Lys Leu Glu Ser Leu am mmm mmm mmmmmm440 mmm mmmmmm mmmmmmm mmm mmmm US 9 , 963 ,689 B2 77 18 - continued

Leu Thr Phe Lys ?Ile Pro Tyr Tyr Val Gly &Pro Leu Ala? Lys Lys Gin 455 &EA Glu Asn Ser &Pro Phe Ala Trp Leu Ile Arg Lys Ser 3Glu Glu Lys Ile 465 470 475 2480

Lys Pro Trp Asn Leu Pro Glu Ile Val Asp Met Glu 3Gly Ser Ala 3Val 485 ??3E 495 Arg Phe ?Ile Glu Arg Met Ile Asn Thr Asp ?Met Tyr ?Met Pro His Asn 505 510

Lys Val Leu J&?Pro Lys Asn Ser Leu Leu Tyr Gin Lys Phe Ser |ReEIle Tyr 515 520 Asn Glu Leu Thr Lys Val Arg Tyr Gin Asp Glu Arg Gly Gin Met Asn 535 Tyr Phe Ser Ser Ile Glu Lys Lys Glu Ile Phe His Glu Leu Phe Glu 545 550 555 560 Lys Asn Arg Lys Val Thr Lys Lys Asp Leu Gin Glu Phe Leu Tyr Leu 565 3? 575 Lys Tyr Asp Ile Lys His Ala Glu Leu Ser Gly Ile Glu Lys Ala Phe 585 590

Asn Ala Ser Tyr Thr ?Thr Tyr His Asp Phe Leu Thr Met Ser Glu Asn 595 PE 600 Lys Arg Glu Met Lys Gin Trp Leu Glu Asp Pro Glu Leu Ala Ser Met 615 Phe Glu Glu Ile Ile Lys Thr Leu Thr Val Phe Glu Asp Arg Glu Met 625 630 635 640 Ile Lys Thr Arq Leu Ser His His Glu Ala Thr Leu Gly Lys His Ile 645 655 Ile Lys Lys Leu Thr Lys Lys His Tyr Thr Gly Trp Gly Arg Leu Ser 665 1670 Lys Glu Leu Ile Gln Gly Ile Arg Asp Lys Gln Ser Asn Lys Thr Ile 675 680 Leu Asp Tyr Leu Ile Asn Asp Asp Asp Phe Pro His His Arg Asn Arg 695 Asn Phe Met Gin Leu Ile Asn Asp Asp Ser Leu Ser Phe Lys Lys Glu 705 710 715 720 Ile Lys Lys Ala Gin Met Ile Thr Asp Thr Glu Asn Leu Glu Glu Ile 725 735 Val Lys Glu Leu Thr Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gin 745 750 Ser Leu Lys Ile Val Asp Glu Ile Val Gly Ile Met Gly Tyr Glu Pro 755 760 Ala Asn Ile Val Val Glu Met Ala Arg Glu Asn Gin Thr Thr Gly Arq 775 Gly Leu Lys Ser Ser Arg Pro Arg Leu Lys Ala Leu Glu Glu Ser Leu 785 790 795 800 Lys Asp Phe Gly Ser Gin Leu Leu Lys Glu Tyr Pro Thr Asp Asn Ser 805 815 Ser Leu Gin Lys Asp Arg Leu Tyr Leu Tyr Tyr Leu Gin Asn Gly Arg 825 30

Asp Met Tyr ?Thr Gly Ala Pro Leu Asp Ile His Arg Leu Ser Asp Tyr 835 840 Asp Ile Asp His Ile Ile Pro Arg Ser Phe Thr Thr Asp Asn Ser Ile ? 855 mmm mmm mmmmmmmmmmmmmmm mmmmmmm ? US 9 ,963 , 689 B2 19. 80 - continued |Asp Asn Lys Val Leu Val 1|3Ser Ser Lys Glu Asn Arg Leu Lys Lys Asp 865 870 875 1880 Asp Val Pro Ser Glu Lys Val Val Lys Lys Met Arg Ser Phe Trp Tyr 885 895 Asp Leu Tyr Ser Ser Lys Leu Ile Ser Lys SREArg Lys Leu Asp Asn Leu 900 3 905 910 Thr Lys Ile Lys Leu Thr Glu Glu Asp Lys Ala Gly Phe Ile Lys Arg 915 920 925 Gin Leu Val Glu Thr Arg Gin Ile Thr Lys His Val Ala Gly Ile Leu 935 940 1 His His Arg Phe Asn Lys Ala Glu Asp Thr Asn Asp Pro Ile Arg Lys 945 950 955 960 Val Arq Ile Ile Thr Leu Lys Ser Ala Leu Val Ser Gln Phe Arg Asn 965 Ser Gin Phe 975Arg Agn 1 Arg Phe Gly Ile Tyr amLys Val Arg Glu Ile Asn Glu Tyr His His Ala 980 | 985 990 His Asp Ala Tyr Leu Asn Gly Val Val Ala Leu Ala Leu Leu Lys Lys 995 | 1000 . Ala Leu Ala | Leu1005 Leu Lys Lys Tyr Pro Gln Leu Ala Pro Glu Phe Val Tyr Gly Glu Tyr Leu Lys 1010 1015 Val Tyr Gly 1020Glu Tyr Leu Lys

Phe Asn Ala His Lys EAla Asn Lys Ala Thr Val Lys Lys Glu Phe 1025 1030 1035 |RS&A Tyr Ser Asn Ile ?Met Lys Phe Phe Glu Ser Asp Thr &Pro Val Cys 1040 1045 1050 Asp Glu Asn Gly Glu Ile Phe Trp Asp Lys Ser Lys Ser geIle Ala 1055 1060 1065

Gln Val Lys Lys Val Ile Asn His His His Met Asn Ile Val Lys 1070 1075 1080 Lys Thr Glu Ile Gin Lys Gly Gly Phe Ser Lys Glu Thr Val Glu 1085 1090 1095 Pro Lys Lys Asp Ser Ser Lys Leu Leu Pro Arg Lys Asn Asn Trp 1100 1105 1110 Asp Pro Ala Lys Tyr Gly Gly Leu Gly Ser Pro Asn Val Ala Tyr 1115 1120 1125

Thr Val Ala Phe Thr Tyr Glu Lys Gly Lys Ala Arg Lys IArg Thr 1130 1135 1140 |REERAR Asn Ala Leu Glu Gly Ile Thr Ile Met Glu Ara ?????????Glu Ala Phe Glu ????????????????1145 1150 1155 Gin Ser Pro Val Leu Phe Leu Lys Asn Lys Gly Tyr Glu Gin Ala 1160 1165 1170

Glu Ile Glu Met Lys Leu Pro Lys Tyr Ala Leu Phe Glu JLeu Glu 1175 1180 | 1185 Asn Gly Arg Lys Arg Met Val Ala Ser Asn Lys Glu Ala Gin Lys 1190 1195 ggggg 1200 Ala Asn Ser Phe Leu Leu Pro Glu His Leu Val Thr Leu Leu Tyr 1205 1210 1215 His Ala Lys Gin gETyr Asp Glu Ile Ser His Lys Glu Ser Phe Asp 1220 1225 1230 Tyr Val Asn Glu His His Lys Glu Phe Ser Glu Val Phe Ala Arg 1235 1240 1245 | Val Leu Glu Phe Ala Gly Lys Tyr Thr Leu Ala Glu Lys Asn Ile 1250 1255 1260 Glu Lys Leu Glu Lys Ile Tyr Lys Glu Asn Gin Thr Asp Asp Leu