US 2015 0010526A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0010526 A1 Liu et al. (43) Pub. Date: Jan. 8, 2015

(54) EVALUATION AND IMPROVEMENT OF (52) U.S. Cl. NUCLEASE CLEAVAGE SPECIFICITY CPC ...... CI2N 9/22 (2013.01); C12O I/6874 (2013.01); C12O I/44 (2013.01) (71) Applicant: President and Fellows of Harvard USPC ...... 424/94.61; 506/2 College, Cambridge, MA (US) (72) Inventors: David R. Liu, Lexington, MA (US); John Paul Guilinger, Ridgway, CO (57) ABSTRACT (US); Vikram Pattanayak, Cambridge, MA (US) Engineered nucleases (e.g., Zinc finger nucleases (ZFNs). (73) Assignee: President and Fellows of Harvard transcriptional activator-like effector nucleases (TALENs). College, Cambridge, MA (US) and others) are promising tools for genome manipulation and determining off-target cleavage sites of these enzymes is of (21) Appl. No.: 14/320,271 great interest. We developed an in vitro selection method that interrogates 10' DNA sequences for their ability to be (22) Filed: Jun. 30, 2014 cleaved by active, dimeric nulceases, e.g., ZFNs and TAL O O ENs. The method revealed hundreds of thousands of DNA Related U.S. Application Data sequences, some present in the , that can be (63) Continuation of application No. 14/234,031, filed on cleaved in vitro by two ZFNs, CCR5-224 and VF2468, which Mar. 24, 2014, filed as application No. PCT/US2012/ target the endogenous human CCR5 and VEGF-A , 047778 on Jul 22, 2012. respectively. Analysis of the identified sites in cultured (60) Provisional application No. 61/510.841, filed on Jul. human cells revealed CCR5-224-induced mutagenesis at 22, 2011. nine off-target loci. Similarly, we observed 31 off-target sites s cleaved by VF2468 in cultured human cells. Our findings Publication Classification establish an energy compensation model of ZFN specificity in which excess binding energy contributes to off-target ZFN (51) Int. Cl. cleavage and suggest strategies for the improvement of future CI2N 9/22 (2006.01) nuclease design. It was also pobserved that TALENs can CI2O I/44 (2006.01) achieve cleavage specificity similar to or higher than that CI2O I/68 (2006.01) observed in ZFNs. Patent Application Publication Jan. 8, 2015 Sheet 1 of 67 US 201S/0010526 A1

S R S R

5' overhang fill-i 3 dA addition sequence adapter ligation

8888. SR 888 S

Patent Application Publication Jan. 8, 2015 Sheet 2 of 67 US 201S/0010526 A1

specificity SCBE {}

,3

6 N G A T G A G G A T G A C N CCR3-224 (-)

N A A A c T G c A A A A. G. N. Fig. 2A specificity Sissa

WF2468 (+)

N G A G T G A G G A N WF2468 (-) Patent Application Publication Jan. 8, 2015 Sheet 3 of 67 US 201S/0010526 A1

CR5-224 (+) site (-) site

T C A T C C T C A T C spacer A A A C T G C A A A A G 3.

- O - 8 -3.8 - 4 - 2 , 4. ,6 ,8 A specificity score Fig. 3A Patent Application Publication Jan. 8, 2015 Sheet 4 of 67 US 201S/0010526 A1

WF2A68 (--) site {-} site

5' C C C A C T C spacer G A C G C T G C 3'

- O -0.8 -3.8 -.4 -0. . 8.4 0.6 .8 A specificity score Fig. 3B Patent Application Publication Jan. 8, 2015 Sheet 5 of 67 US 201S/0010526 A1

{RS-234

3 station s O 9.6 - 3 titations 3 titations

4 2, 8.5 enzyme concentration in selection Fig. 4A

WF468 (3%

9% 8% 7% 6% S% & statio: 4% E3 2 ?itatios 3% 3rritatics % - % 4. W 2 ry 1 nM 0.5 nM enzyme concertation in selection Fig. 4B Patent Application Publication Jan. 8, 2015 Sheet 6 of 67 US 201S/0010526 A1

*------Ägand#36

G-61-I

Patent Application Publication Jan. 8, 2015 Sheet 7 of 67 US 201S/0010526 A1

FAG-8A standard (ng)

------4 8 12 16 20 24 . . . . . if f N. N. S S & & (ii is is is its S S C ; S.S. SoS. C a e - se 3. ww. s Fig.6A Fig.6B

43 350 : 300 3. ... 253– 15 S 300 g 150 ... Co- 0 AG Standacis 58- - AFNs - linea (FAG standards) {} } C 5 2 RS 3. rig of protei Fig. 6C Patent Application Publication Jan. 8, 2015 Sheet 8 of 67 US 201S/0010526 A1

Fig. 6D Patent Application Publication Jan. 8, 2015 Sheet 9 of 67 US 201S/0010526 A1

CRS-24

i : Fig. 7A

W468

--- ke aa s s

Fi 9. B Patent Application Publication Jan. 8, 2015 Sheet 10 of 67 US 2015/0010526A1

3 5 --- aw t 3 O / A\- -(- pre-Selection / /\,\, ------4 y 2S ~A-2 ry --X - y O - K - 0.5 nM s

O 2 3. 4. s 6 7 8 9 O. nun be of utations Fig. 8A

-- pre-Selection ------4 ny - A-. nw --X - -k- (.5 v

number of mutations Fig. 8B

Patent Application Publication Jan. 8, 2015 Sheet 12 of 67 US 2015/0010526A1

CCR5-224 (--), 4 nM specificity

SQ8

N GC. A G 3' CCR5-224 (--), M

N A Arc A A C N

Fig. 10A Patent Application Publication Jan. 8, 2015 Sheet 13 of 67 US 2015/0010526A1

CCR5-224 (-) S specificity SCCE

N A A A C T G C A A A A G N CCFS-224 (-), M

in A A Act g c A A A A G n CCR5-224 (-), 0.5 riv

N A A A C T G C A A A A G N, Fig. 10B Patent Application Publication Jan. 8, 2015 Sheet 14 of 67 US 2015/0010526A1

WF2468 (--), 4 nM specificity SCO3

S

WF2468 (--), 0.5 nM

S’ 3. Patent Application Publication Jan. 8, 2015 Sheet 15 of 67 US 2015/0010526A1

specificity

SCO8

C C G C N

WF2468 (-), 0.5 nM

A C C C G Fig. 10D Patent Application Publication Jan. 8, 2015 Sheet 16 of 67 US 201S/0010526 A1

Patent Application Publication Jan. 8, 2015 Sheet 17 of 67 US 201S/0010526 A1

Nºvv_2P_1>v_v_v_N"

5 - K.

ovb1p9

- O - 4 Patent Application Publication Jan. 8, 2015 Sheet 18 of 67 US 201S/0010526 A1

? Patent Application Publication Jan. 8, 2015 Sheet 19 of 67 US 2015/0010526A1

WF2468 (--), 4 nM WF2468 (-), 4 nM

G C A s N specificity WF2468 (--), when f2468 (-), whes (3-3G is fatated s T G G : C C A : A N G A G T G A G G A N. N G A C G C T G C T N.

WF2463 (+), when {-}{ is nutated VF2468 (-), when {-}G is mutated x f x

C C A A. N 5' s S 3' WF2468 (+), WF2468 (-), wher. --G and {-}G are notated when {{-}G and (-)G are nutated

& are erger N G A G T G A G G A N, S 3 Fig. 13 Patent Application Publication Jan. 8, 2015 Sheet 20 of 67 US 2015/0010526A1

3. s ; : is a is is a & a 2. 22 2.wo a.a. F. a 2. s 2 : a 2.2 S 2 : a 2 w

s a 2. 2.

8 . (- - a. c. - - 8.

s ; : a ; : ee a a - - c tr i. i. if awww.www.www.www.www. 22 P ef s r is 2 : 22. e a 2. s a 2. s 22 s a a r R na n O a 2 a a i 3 w a a a 2. a ...... 5. d. ... . c. c.

8 B 3 S. a a Ei & 2 2. 22...... es 2 : s 22re &s isna a 2. 2 : a 2. 2. ' a 2 : a 2. a 2. w 2 2 w it...... - - ...... Patent Application Publication Jan. 8, 2015 Sheet 21 of 67 US 2015/0010526A1

s: a s a i. is 3 a 2. 2. a. 2 : iss ? Ss a a 2 2 a 2 : ...Y

2. wat a 22. ... . - a. c. ... . - -,

is. a.

: & iss ; : s:s i. 2, ase Ca 22. 2 s s 2 : a m s e e 22 s 2 2. s a a : 3 ra. O e a a sess e - 22 a la a a 2. at a 3. A- A. . a. c. . C. C.-

3. - - i.- i. ;- : a E. is ; : Ea a 2. 2 s fi- fif , , , , , tea 2 : Ya. - 2. ra 2 : ran 2. 2. a 2 : a 2 : a 2 2. a 2. 22 2. Aw C. C. i. e. - C. h. Patent Application Publication Jan. 8, 2015 Sheet 22 of 67 US 201S/0010526 A1

O% 45% si. 35% CCR5-224, , , 3% 7 5% 20% 3 6 S. S 5 O% 3. ce S96 Q 4 4 5 6 7 spacer ength Fig. 15A

CCRS-224, 7 is 6 -

s

to a 5 6 spacer lengt; Fig. 15B Patent Application Publication Jan. 8, 2015 Sheet 23 of 67 US 2015/0010526A1

CCR5-224, 2 7

3 & 5. s . S.

O a

s 6 Spacer ength Fig. 15C

CCR5-224, 4

5. s 6 5 to

spacer length Fig. 15D Patent Application Publication Jan. 8, 2015 Sheet 24 of 67 US 2015/0010526A1

WF2468, O.S. -- :-- 7 s s . c s C sa > &

spacer length Fig. 16A

WF2468, in 67 s

4.

spacer length Fig. 16B Patent Application Publication Jan. 8, 2015 Sheet 25 of 67 US 2015/0010526A1

WF2468, 7

6

ma

4.

A. S 6 7 O% Spacer ength Fig. 16C

5% AS.6 a0% 3596. WF468, 4 ti 3% 2% O% 56f --ra S96 O% 4.

spacer5 ength O% Fig. 16D Patent Application Publication Jan. 8, 2015 Sheet 26 of 67 US 2015/0010526A1

CCRs.48 4 3 af. Sites 0.35

i of mutations

CC-4 Beti af-Sites

ii of nutations Fig. 17A-1 Patent Application Publication Jan. 8, 2015 Sheet 27 of 67 US 2015/0010526A1

CCR5-4 staf-Sites

-- 4 bp --8-5 bp ---a-6 bp X-7 bp

# of mutations

(CCRs.24 is af. Sites

# of mitations Fig. 17A-2 Patent Application Publication Jan. 8, 2015 Sheet 28 of 67 US 2015/0010526A1

CCRS-2244 V -- af-Sites

ii of initations

CCRS-224, 2 i? -- a-Site

3. 3- -- 4 bp 9. ----5bp - A - - 6- bp S N. “X-7 bp t o ii of nutations Fig. 17B-1 Patent Application Publication Jan. 8, 2015 Sheet 29 of 67 US 2015/0010526A1

CCRS-224 y + -af-Site

-- 4 bp --8-5 bp ---A - 6 bp rX-7 bp

# of titations

CCRS-224 0.5 -- a-site

# of mutations Fig. 17B-2 Patent Application Publication Jan. 8, 2015 Sheet 30 of 67 US 2015/0010526A1

CCRS-22448 it - taif-Site

# of instations

CCR5-242 it - af. Site

i of stations Fig. 17C-1 Patent Application Publication Jan. 8, 2015 Sheet 31 of 67 US 2015/0010526A1

(CCRS-224 - af-Site

# of mutations

CCR5-224 B.S. - af-Site

# of mutations Fig. 17C-2 Patent Application Publication Jan. 8, 2015 Sheet 32 of 67 US 2015/0010526A1

WF2468 4 Best af. Sites

# of mutations

W-468 set af-Sites

3. g -- 4 bp --8-5 bp s ---A-6 bp 3 -X-7 bp t w

# of mutations Fig. 17D-1 Patent Application Publication Jan. 8, 2015 Sheet 33 of 67 US 2015/0010526A1

F468 8th af-Sites

4.

O.35s 8 y^, : O 3. o : O

it of stations

F468 .is Best af-Sites

k: K gs -- 4 bp 5. --S-S bp s ---4--- 6 bp 3. X-7 bp t S.

i of putations Fig. 17D-2 Patent Application Publication Jan. 8, 2015 Sheet 34 of 67 US 2015/0010526A1

WF2468 4 -- a-Site 3 0.45s0. 0.4-N,3S- y -- 4 bp 3. 33 - --8 - 5 bp) go; ---A-6 bp 3 o5- -X-7 bp . . 0.05 - { as: O 2 3 4. 8 7 # of mutations

WF24682 y - -ia-Site

3. g o-- 4 bp --8-5 bp s ---A-6 bp 3. -X-7 bp t kaar 7 # of mutations Fig. 17E-1 Patent Application Publication Jan. 8, 2015 Sheet 35 of 67 US 2015/0010526A1

F2468 -- af-Site 0.6 S. 0.5 sess a 0.44 W -- 4 bp 8. w ----5 bp g 0.2 -%-7 bp 0.1 - 0-4--.scorer: ------

# of stations

WF2468 O.S -- af-Site

# of mutations Fig. 17E-2 Patent Application Publication Jan. 8, 2015 Sheet 36 of 67 US 2015/0010526A1

WF2468 4 - af-Site

i of mutations

WF24682 y - af-Site

if of mutations Fig. 17F-1 Patent Application Publication Jan. 8, 2015 Sheet 37 of 67 US 2015/0010526A1

WF2468 - af-Site

ii of mutations

WF2468 0.5 ti - Haif-Site

# of mutations Fig. 17F-2 Patent Application Publication Jan. 8, 2015 Sheet 38 of 67 US 2015/0010526A1

k (900CO

Patent Application Publication Jan. 8, 2015 Sheet 39 of 67 US 2015/0010526A1

linker -- 28 or 3-63 ?cleavage - A DNA Bindings,

\domain N-Bomains left Haif Site (O-6bp) Space Right Half Site (10-16bp) - TA DNA Bindings, ?cleavage

target Site

ANS 8 - ICAEACACCCAGC (AACACCCCACCCCACCATACA CACACA AAGAA (GGACA(AGAAAAGAGAAA'

R8 AGAAA CGAAGA. Fig. 19 Patent Application Publication Jan. 8, 2015 Sheet 40 of 67 US 2015/0010526A1

Single Repeat

A52 Array of repeats +278

RVO: N \\ NG

Recognized y Base: A { G

N

NN NN N N - D - N8 - D \G NN NK - NN NK NN NG C. C. A C C C A C T C C C C C C (; Fig. 20 Patent Application Publication Jan. 8, 2015 Sheet 41 of 67 US 201S/0010526 A1

3-28 O --6 aa ?eleavagey--- - - T.TALArrays \domain

left Haif Site DNA Spacer Right Haif Site DNA

0.4------A. Array y ?cleavage-”- of Repeats \domain

ra 0.3– A. A.# w g -- Post -- 28 0.2-i -- (- - Post --63 he -j-23 inker efficed for -- Pre library leaved sequences with less K mutations suggesting the +23

. 3 '. ^. linker is more specific 3. f \x, Y. Y. *-8- --&...... e. wo-o-o-M38m. O O 5 Number of arget Site vitations Target Site is left + Right Haif Sites Fig. 21 Patent Application Publication Jan. 8, 2015 Sheet 42 of 67 US 2015/0010526A1

+28 iter left half Site 2 vitations is 10° - O S \ - if y --> -- M ^, S3 Y ------2 ES 100K ' ', Y ------3 kn f --x- 4 -- S - 8-2- Right Haif Site Mutations

left Haif Site 1-63 like vutations

& - S. ------

SS | \, \,\ is a . r4- 3 f -x- 4 & ------is

Right. Haif Site Mutations Fig. 22 Patent Application Publication Jan. 8, 2015 Sheet 43 of 67 US 2015/0010526A1

& S 2- AA A.N. inker ength A. ? - 2 1.5- * / , f{ \ &c ff t : \ A f ~~ +2--28 ... p -/ \ ------, -}.63 - f W f - > il f i \w }: r \ sg (.5 f | r . // Y. --- y 'ssssssssss tri------{ S 2S Space length (p} avage',--28 or --833a 'Al Airys (30.5 N of Repeats

left af Site NA Right Haif Site DNA

-r------/N ofAl Repeats Array -y ?eleavagedoria in vitro, Aisi ceavage Dependence of linker ength & Spacer length from Missolio (20) %. A:FE, TALN Linker length (%- s - - - - AA-NC. AA-N- x -- 7 O%- / \ se 8, A4-\;C: -48 O% A.f \ \ - - A w N - (96. Af 2- is

O%. ff --" / }% t------6.------Y was . --1 O 9 2 5 8 2 24 27 Spacer ength (p) Fig. 23 Patent Application Publication Jan. 8, 2015 Sheet 44 of 67 US 2015/0010526A1

Specificity Enrichnert = -(Post86- Pre%) }

A 7.3% pre-selection C 7% G 7% 7%

A CO% box around post-selection C3% As (100%-79%) wit base G8% Toosa-79%) O% Be Box is Enfiched Fig. 24

Rise Box is Enriched

CR5-AN-538-28 CR AN-RSSA

»

:x: x: N A A A G O A G C N C C C A GAA A A

CCR5-AN-S38 -6 3.

CR5-3 AN-R537 +63 (5 &

N CA A CA C C E G A C C NCAA Red Box is Da-Etiched Fig. 25 Patent Application Publication Jan. 8, 2015 Sheet 45 of 67 US 2015/0010526A1

NTT CAT ACACC GCAGCT N T CTTC CAGA ATTGAT ACT Eried Cordense to ories ine with Only intended target bases for basein

Specificity with boxed position mutated

Subtract baseline profile

e-Eriched Fig. 26 Patent Application Publication Jan. 8, 2015 Sheet 46 of 67 US 2015/0010526A1

Compensating Difference in Specificity of 6 R6A.N. -> 3’ A ACA (C G A {{ (ACAA A A { Enriched

De-finriched

s: Forced utation

Fig. 27 Patent Application Publication Jan. 8, 2015 Sheet 47 of 67 US 2015/0010526A1

+63 aa ------r --- i€leavagey TA.Birding DNA \, |Left Haif Site (10-16bp). Spacer Right Half Site (10-16bp)

/ -- St.M - - - R- - N --- ... (cleavage,------\ o:aiss \domaiy/

larget Site

ANS I6 - CCAACACCCC 3 is - CAAA CEC ... O - EACACC CTCA ACACCGCACCCACCATACACAGACA AGAAGTAATGTGGACGCGAGAGTAAAAGGTATGTCAGTATAGT R. 3 CTATCTAC(AACT. E. ' R3 GAGCCAGCCA R{} GAEG (AG-S Fig. 28 Patent Application Publication Jan. 8, 2015 Sheet 48 of 67 US 201S/0010526 A1

. CAAAcco S' ... CAAAEC go S.' ... CCCC {CAIACACCCACCCARICCAACAGECASACA AGAAGE AGACCGAAAAAAGCACAAGT CACAAG. S ge:CACCA. S. gacy CC

i.ength AN vitations in Recognized air Coiro S Wiean St. 'ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssRev. 3. 6-R 6 6 9 r 8-R 3 A3 35 29 3--R 6 33 .26

6-R P 8 26 3--R33+R3 1. .3737 --R8 3. 3+RO 3.37.23. 23 ...O.R. 3 --RO library

.- re-Selectif * -x - 6-R 8 ...y -N: -...E.S.E - - -- 0.4& 3. - X -- 6--R (7, & -- {-, -, 3--R 3 k ?/ A s ~ ------R 6 0.3. , f \,\%. ---x--- 3--RO f/ f A, y ------{3--R s

A. . W - - -% -- . . --RO

Number of Common larget Site wiutation Fig. 29 Patent Application Publication Jan. 8, 2015 Sheet 49 of 67 US 2015/0010526A1

fotai Bases Recognized asg&E SEE& 3.5 ...... Pre-Seisection ------6--R 6 33

------. 6-3-3

------. ~ 3--R6 29 O - - - -s. s raxxar arr Mapama- ...r.l.i.r.t.a.a. A. 6 8 O

- 6-i-RO -3--R3 -- 0-R 8 26

- 3--RO wa-a-a-a-a-a- C-3 3

20

Number of arget Site Mutations Fig. 30 Patent Application Publication Jan. 8, 2015 Sheet 50 of 67 US 2015/0010526A1

AN Pre-Seiection Digestian library

Recognizedlength TALNPai Mutations in larget Site year St. Stei, wean Dew. Eirror 32 6-6 2.57 a 37 OO 6.50 L6-i-R13 1.894 1.723 0.008 5.914 29 3--R 6 A35 57 COO S,989 6-- 383 .324 (.006 S.A3 6 3-i-R3 326 663 O.OO 383 O--R 6 8 (.806 s.338 23 3-i-R . s r ------O-i-R3 028 .38 0.005 4.8 --R { 8.769 O.O4 (.69

g * a. 6 - 3 s 5 a s 124 8 Pre-Seiection a oAN Digestion a 3 s.s 2 1 - .. 8 g s: 0 O 2 3O length of larget Site Recognized by A.N. Pair Fig. 31 Patent Application Publication Jan. 8, 2015 Sheet 51 of 67 US 201S/0010526 A1

01

$39!8$

o as E. rNto a Se trees ww (Álie%fu03.338S963 jetti.au Patent Application Publication Jan. 8, 2015 Sheet 52 of 67 US 201S/0010526 A1

EE6H $${

(A.ie G9%fiogas%) 3:31.3.3 Patent Application Publication Jan. 8, 2015 Sheet 53 of 67 US 2015/0010526A1

s left af Site vitatio is S. ar S2 ; ; \, \ -- r e - 8 - 2 - 3. ------3 d S. ------.S. S. e

12 Right Haif Site Mutations (Number)

Hypothesis: There are two distinct off-target populations O sites siriar to the on-target sequence with exponential enrichment vs. mutations O sites highly instant to the on-target sequence Fig. 34 Patent Application Publication Jan. 8, 2015 Sheet 54 of 67 US 2015/0010526A1

(Ex)a (ERDonairs Left Half Site (10-16bp) Spacer Right Haif Site (0-16bp) - ALDNA is . {NEAE- Binding - Adomaincleavage DNA spacer Profile A Seriences Pre-Selectior -x------R 6 - X: - , 6--R 3 - e- - 3--R 6 - X: ...... 6...RO - - -e- - - 3-3 -8 - O --R 6 --->4--- 3--R - - -&--- ... 3 ---x - - - --R ore...------5 2. D&A Spacer length (bp)

highly Mutant left is as O or & Right S>: S Vitations

s O.3-

Pre-Selectic 0.2 - Ax is ?v "X----. 8-R 6 5 - e - 3--R 3 - k - --R 6 - x ------R S. ... DNA Space length (bp) Fig. 35 Patent Application Publication Jan. 8, 2015 Sheet 55 of 67 US 2015/0010526A1

f30 cleavage SE /QE. ARIANa left Haif Site (0-6bp) Spacer Right Haif Site (0-6bp) A TAL DNA- ?cleavage NE-ofairs donair

Cleavage Point Profile

8. 8

3 A Sequences 2. RS < 2. & S > 4 S K & RS is a i. 4

3, 2.

R8 R3 R. R. 6 R3 R O R6 R3 RO A Pair 6 3 Fig. 36 Patent Application Publication Jan. 8, 2015 Sheet 56 of 67 US 201S/0010526 A1

of89#?% Patent Application Publication Jan. 8, 2015 Sheet 57 of 67 US 201S/0010526 A1

*N ty

ceaaa 3 (Á; e.g.: 96/uo.388S98) uglulii Patent Application Publication Jan. 8, 2015 Sheet 58 of 67 US 201S/0010526 A1 2------^------,

y

8

9

#

frt-t-t-t-

fugees%) iai to it Patent Application Publication Jan. 8, 2015 Sheet 59 of 67 US 2015/0010526A1

- A ---- A. edite - ibrary ,. 4.6 --- library edited

8. 2

O------J. Ns 8 { O 20 Right half Site Mutations

1.

8. 0.8- 3. 0.6 : A Sequences g 3 RS < 2 & S > a. S < 2 & RS > 4 ts 0,4- ( Š 9.2.

m 8. ill A.N. Pai: ...... R16 R3 R10 R16 R13 Rio Patent Application Publication Jan. 8, 2015 Sheet 60 of 67 US 201S/0010526 A1

| Patent Application Publication Jan. 8, 2015 Sheet 61 of 67 US 2015/0010526A1

Copensatig Difference in Specificity of .6 RSA.N. ICT FCAT I ACAC CTGC IGA A C E G ACT G T A T 5'->3' Enriched

De-Eiched

a rored initatio Fig. 42 Patent Application Publication Jan. 8, 2015 Sheet 62 of 67 US 2015/0010526A1

O' N -X------6--Rs ~ x ~ 6-R 3 O2 --- 8 - 3--R 6 & ~ - x ~ 6--R e ... -8 ... 3-3 S2 ------R is -3%---, 3--RO d ---> ------R3 3 100 - X -- O--R

Ori .C. S. tli -2 { S O Nuber of Militations

109: YN .x - 16-- is a ~ x ~ 6-1-3 O - Y - 3 - 3--R 6 g: ~--x ~ - 6--R { ES - C - ... 3--R 3 is , , - - - - - EO.-- 6 104 ---.3%--- B--RO - - - 8 ------R3 s - x - - - --RO s3 03 & s 2.

Nunner of vitations Fig. 43 Patent Application Publication Jan. 8, 2015 Sheet 63 of 67 US 2015/0010526A1

10° Expected 104- Observed Statistically : --- f--R6 ------; : if f fix -X----- f--R3 ---- K. ---- f f f is . r{- 3--R is --- 3: ---- 03: refer 6--r - - - -e- 3-R 3 ------8 : -- O--R8 --, -8-- - 3, --- X---, 3-i-RO ------E. (2 : - - 8------R3 ------x - - --RO ---...-X---- Nint)-fi, in Fig.int. --T- 4.(ling . 4.". 3."4. Nini,) or Expected Sequence Murber : Number of Frannes {Nui. Spacers 0 g at: GenomeSizelength of target Site Ninter of stations in at Mutations in target Site Fig. 44

C2

Expected g QSeryed x - 8 - 3 -x-5--R6 - - --x -- - - i -36-8--R3 - x - -3-3-4-R6 - 3 - i i aga 8--RO ...... is . Mera 3--R 3 ------83 -8-19+Ris ------r --X----, 3-RO ------R3 - - - 3 - - - - -x ~~. --O ------is 50 6- s - - RCC(n: f{r}. Air : x RCC c Relative Calculated Cleavage En} is Enschnert 8------N{m} : Nunner of Sequences O 3 4. S rise Mutations in larget Site Niner of Autatics Fig. 45 Patent Application Publication Jan. 8, 2015 Sheet 64 of 67 US 201S/0010526 A1

S. -X------6--R is Saw (2. - x - 6--R3 s X. - e - 3--R 6 M N a --X -- 6--RO & 'S Extrapolate fitting - - -6- . . .3-i-R3 to log ------R 6 O. : ------x - - - 3-R { --~6 - - - --R 3 g X ---X - - - --RO s v 00: ,\\ y 8, ,8.v. ,\\RR.W.N. Ori- x * xN--- ricks: ^. s ics: is:3:2 r s:-- se I O-2- O 5 O Nirber of vitations

O5 - S. an is S as SS. ea.S O3 - -x- 8-6 - X: ... 6-R 3 - e - 3--R 6 S. Y. t C - - x - - 8--R O S. SS X ~ -e-, 3--R 3 S. SSS. S.Y. ------R x&ra x SSS.Y. YY Ya Sk s & SR Yg ------3--RO 0-5 - 8-3S ---e-----R3 SSŠy ... --X ... --RO s s X

Q --- S Nine of Mutations Fig. 46 Patent Application Publication Jan. 8, 2015 Sheet 65 of 67 US 2015/0010526A1

--- Expected Statistically

se------6-R i5 ------5-3 {} --- 3-3 ------RO --- {} -- - 3-3 ------O-R is :- - - 3-0 {} - - - O-3 X----. O-RO RCC(r: f(n). Nitri RCC : Relative Caiciated Cleavage En} is richnert Nin} : Nimber of Sequences Nunher of Mutatios m is Mutations in arget Site Fig. 47 Patent Application Publication Jan. 8, 2015 Sheet 66 of 67 US 201S/0010526 A1

... O G RO C N A A. C. A C C A. GAA fife for 5'-> 3’ O3 & A - C C 2. -

x s x O s s X R x s s: a s - * & x s 8 8 s s w w X -- X 8 8 s : &

s ar is

A CA C C GCI GAA CGA C Number of Mutations

Fig. 48 Patent Application Publication Jan. 8, 2015 Sheet 67 of 67 US 201S/0010526 A1

AN 28 Seisectic left Haif Site vitations ------O 10’s,Y. is a -- - - - Š 4- - 2 S t s 8- 3 t 10 : N. * Ya N s 4. V S. x. s: N wa X w , 's * Y 8. O 2 -- s.t.-----. 3. -- - 4 x - aa -T- s: --- o 4. 6 Corresponding Right Haif Site Mutations (Number)

FN Selectice

------left iaif Site Mutations ----- * "tri--- lo &- - e s- re- 8 ------g 8-2 - - - 3 s ------d. - . * ------...i. x -...- i::::::::::::*

4. i 8 Corresponding Right Haif Site Mutations (Number Fig. 49 US 2015/0010526 A1 Jan. 8, 2015

EVALUATION AND IMPROVEMENT OF the assumptions that (i) dimeric nucleases cleave DNA with NUCLEASE CLEAVAGE SPECIFICITY the same sequence specificity with which isolated monomeric domains bind DNA; and that (ii) the binding of one domain RELATED APPLICATION does not influence the binding of the other domain in a given 0001. This application is a continuation of and claims dimeric nuclease. No study to date has reported a method for priority under 35 U.S.C. S 120 to U.S. application, U.S. Ser. determining the broad DNA cleavage specificity of active, No. 14/234,031, filed Mar. 24, 2014, which is a national stage dimeric site-specific nucleases. Such a method would not filing under 35 U.S.C. S371 of international PCT application, only be useful in determining the DNA cleavage specificity of PCT/US2012/047778, filed Jul. 22, 2012, which claims pri nucleases but would also find use in evaluating the cleavage ority under 35 U.S.C. S 119(e) to U.S. provisional patent specificity of other DNA cleaving agents, such as Small mol application, U.S. Ser. No. 61/510,841, filed Jul. 22, 2011, the ecules that cleave DNA. entire contents of each of which are incorporated herein by 0007. This invention addresses the shortcomings of previ reference. ous attempts to evaluate and characterize the sequence speci ficity of site-specific nucleases, and in particular of nucleases GOVERNMENT SUPPORT that dimerize or multimerize in order to cleave their target sequence. Some aspects of this invention provide an in vitro 0002. This invention was made with U.S. Government selection method to broadly examine the cleavage specificity support under grant numbers RO1 GMO65400 and R01 of active nucleases. In some aspects, the invention provide GM088040 awarded by the National Institutes of Health/ methods of identifying Suitable nuclease target sites that are National Institute of General Medical Sciences, under grant sufficiently different from any other site within a genome to number HR0011-11-2-0003 awarded by the Defense achieve specific cleavage by a given nuclease without any or Advanced Research Projects Agency, and under grant num at least minimal off-target cleavage. The invention provide ber DP1 OD006862 awarded by the National Institutes of methods of evaluating, selecting, and/or designing site spe Health. The U.S. Government has certain rights in the inven cific nucleases with enhanced specificity as compared to cur tion. rent nucleases. Methods for minimizing off-target cleavage by a given nuclease, for example, by enhancing nuclease BACKGROUND OF THE INVENTION specificity by designing variant nucleases with binding 0003. Site-specific endonucleases theoretically allow for domains having decreased binding affinity, by lowering the the targeted manipulation of a single site within a genome, final concentration of the nuclease, and by choosing target and are useful in the context of targeting as well as for sites that differ by at least three base pairs from their closest therapeutic applications. In a variety of organisms, including sequence relatives in the genome are provided. Compositions mammals, site-specific endonucleases, for example, zinc-fin and kits useful in the practice of the inventive methods are ger nucleases (ZFNs), have been used for genome engineer also provided. The provided methods, compositions and kits ing by stimulating either non-homologous end joining or are also useful in the evaluation, design, and selection of other homologous recombination. In addition to providing power nucleic acid (e.g., DNA) cleaving agents as would be appre ful research tools, ZFNs also have potential as gene therapy ciated by one of skill in the art. agents, and two ZFNs have recently entered clinical trials: 0008. In another aspect, the invention provides nucleases one, CCR5-2246, targeting a human CCR-5 allele as part of and other nucleic acid cleaving agents designed or selected an anti-HIV therapeutic approach (NCT00842634, using the provided system. Isolated ZFNs and TALENs NCT01044654, NCT01252641), and the other one, designed, evaluated, or selected according to methods pro VF24684, targeting the human VEGF-A promoter as part of vided herein and pharmaceutical compositions comprising an anti-cancer therapeutic approach (NCT01082926). Such nucleases are also provided. 0004 Precise targeting of the intended target site is crucial 0009. Some aspects of this invention provide a method for for minimizing undesired off-target effects of site-specific identifying a target site of a nuclease. In some embodiments, nucleases, particularly in therapeutic applications, as imper the method comprises (a) providing a nuclease that cuts a fect specificity of Some engineered site-specific binding double-stranded nucleic acid target site and creates a 5' over domains has been linked to cellular toxicity. However, the site hang, wherein the target site comprises a left-half site preferences for engineered site-specific nucleases, including spacer sequence-right-half site (LSR) structure, and the current ZFNs, which cleave their target site after dimeriza nuclease cuts the target site within the spacer sequence. In tion, has previously only been evaluated in vitro or in silico Some embodiments, the method comprises (b) contacting the using methods that are limited to calculating binding and nuclease with a library of candidate nucleic acid molecules, cleavage specificity for monomeric . wherein each nucleic acid molecule comprises a concatemer 0005. Therefore, improved systems for evaluating the off of a sequence comprising a candidate nuclease target site and target sites of nucleases and other nucleic acid cleaving a constant insert sequence, under conditions suitable for the agents are needed and would be useful in the design of nuclease to cut a candidate nucleic acid molecule comprising nucleases with better specificity, especially for therapeutic a target site of the nuclease. In some embodiments, the applications. method comprises (c) filling in the 5' overhangs of a nucleic acid molecule that has been cut twice by the nuclease and SUMMARY OF THE INVENTION comprises a constant insert sequence flanked by a left half 0006. This invention is at least partly based on the recog site and cut spacer sequence on one side, and a right half-site nition that the reported toxicity of some engineered site and cut spacer sequence on the other side, thereby creating specific endonucleases is based on off-target DNA cleavage, blunt ends. In some embodiments, the method comprises (d) rather than on off-target binding alone. Information about the identifying the nuclease target site cut by the nuclease by specificity of site-specific nucleases to date has been based on determining the sequence of the left-half site, the right-half US 2015/0010526 A1 Jan. 8, 2015

site, and/or the spacer sequence of the nucleic acid molecule prises candidate nuclease target sites that can be cleaved by a of step (c). In some embodiments, determining the sequence Zinc Finger Nuclease (ZFN), a Transcription Activator-Like of step (d) comprises ligating sequencing adapters to the blunt Effector Nuclease (TALEN), a homing endonuclease, an ends of the nucleic acid molecule of step (c) and amplifying organic compound nuclease, an enediyne, an antibiotic and/or sequencing the nucleic acid molecule. In some nuclease, dynemicin, neocarzinostatin, calicheamicin, espe embodiments, the method comprises amplifying the nucleic ramicin, and/or bleomycin. In some embodiments, the library acid molecule after ligation of the sequencing adapters via comprises at least 10, at least 10, at least 107, at least 10, at PCR. In some embodiments, the method further comprises a least 10, at least 10', at least 10', or at least 10' different step of enriching the nucleic acid molecules of step (c) or step candidate nuclease target sites. In some embodiments, the (d) for molecules comprising a single constant insert library comprises nucleic acid molecules of a molecular sequence. In some embodiments, the step of enriching com weight of at least 5 kDa, at least 6 kDa, at least 7 kDa, at least prises a size fractionation. In some embodiments, the size 8 kDa, at least 9 kDa, at least 10 kDa, at least 12 kDa, or at fractionation is done by gel purification. In some embodi least 15 kDa. In some embodiments, the candidate nuclease ments, the method further comprises discarding any target sites comprise a partially randomized left-half site, a sequences determined in step (d) if the nucleic acid molecule partially randomized right-half site, and/or a partially ran did not comprise a complementary pair of filled-in 5' over domized spacer sequence. In some embodiments, the library hangs. In some embodiments, the method further comprises is templated on a known target site of a nuclease of interest. In compiling a plurality of nuclease target sites identified in step some embodiments, the nuclease of interest is a ZFN, a (d), thereby generating a nuclease target site profile. In some TALEN, a homing endonuclease, an organic compound embodiments, the nuclease is a therapeutic nuclease which nuclease, an enediyne, an antibiotic nuclease, dynemicin, cuts a specific nuclease target site in a gene associated with a neocarzinostatin, calicheamicin, esperamicin, bleomycin, or disease. In some embodiments, the method further comprises a derivative thereof. In some embodiments, partial random determining a maximum concentration of the therapeutic ized sites differ from the consensus site by more than 5%, nuclease at which the therapeutic nuclease cuts the specific more than 10%, more than 15%, more than 20%, more than nuclease target site, and does not cut more than 10, more than 25%, or more than 30% on average, distributed binomially. In 5, more than 4, more than 3, more than 2, more than 1, or no some embodiments, partial randomized sites differ from the additional nuclease target sites. In some embodiments, the consensus site by no more than 10%, no more than 15%, no method further comprises administering the therapeutic more than 20%, no more than 25%, nor more than 30%, no nuclease to a subject in an amount effective to generate a final more than 40%, or no more than 50% on average, distributed concentration equal or lower than the maximum concentra binomially. In some embodiments, the candidate nuclease tion. In some embodiments, the nuclease comprises an unspe target sites comprise a randomized spacer sequence. cific nucleic acid cleavage domain. In some embodiments, the 0011. Some aspects of this invention provide methods of nuclease comprises a FokI cleavage domain. In some embodi selecting a nuclease based on an evaluation of cleavage speci ments, the nuclease comprises a nucleic acid cleavage ficity. In some embodiments, a method of selecting a nuclease domain that cleaves a target sequence upon cleavage domain that specifically cuts a consensus target site from a plurality of dimerization. In some embodiments, the nuclease comprises nucleases is provided. In some embodiments, the method a binding domain that specifically binds a nucleic acid comprises (a) providing a plurality of candidate nucleases sequence. In some embodiments, the binding domain com that cut the same consensus sequence; (b) for each of the prises a Zinc finger. In some embodiments, the binding candidate nucleases of step (a), identifying a nuclease target domain comprises at least 2, at least 3, at least 4, or at least 5 site cleaved by the candidate nuclease that differ from the Zinc fingers. In some embodiments, the nuclease is a Zinc consensus target site; and (c) selecting a nuclease based on the Finger Nuclease. In some embodiments, the binding domain nuclease target site(s) identified in step (b). In some embodi comprises a Transcriptional Activator-Like Element. In some ments, the nuclease selected in step (c) is the nuclease that embodiments, the nuclease is a Transcriptional Activator cleaves the consensus target site with the highest specificity. Like Element Nuclease (TALEN). In some embodiments, the In some embodiments, the nuclease that cleaves the consen nuclease comprises an organic compound. In some embodi SuS target site with the highest specificity is the candidate ments, the nuclease comprises an enediyne. In some embodi nuclease that cleaves the lowest number of target sites that ments, the nuclease is an antibiotic. In some embodiments, differ from the consensus site. In some embodiments, the the compound is dynemicin, neocarzinostatin, calicheamicin, candidate nuclease that cleaves the consensus target site with esperamicin, bleomycin, or a derivative thereof. In some the highest specificity is the candidate nuclease that cleaves embodiments, the nuclease is a homing endonuclease. the lowest number of target sites that are different from the 0010 Some aspects of this invention provide libraries of consensus site in the context of a target genome. In some nucleic acid molecule. In some embodiments, a library of embodiments, the candidate nuclease selected in step (c) is a nucleic acid molecules is provided that comprises a plurality nuclease that does not cleave any target site other than the of nucleic acid molecules, wherein each nucleic acid mol consensus target site. In some embodiments, the candidate ecule comprises a concatemer of a candidate nuclease target nuclease selected in step (c) is a nuclease that does not cleave site and a constant insert sequence spacer sequence. In some any target site other than the consensus target site within the embodiments, the candidate nuclease target site comprises a genome of a Subject at a therapeutically effective concentra left-half site-spacer sequence-right-half site (LSR) tion of the nuclease. In some embodiments, the method fur structure. In some embodiments, the left-half site and/or the ther comprises contacting a genome with the nuclease right-half site is between 10-18 nucleotides long. In some selected in step (c). In some embodiments, the genome is a embodiments, the library comprises candidate nuclease tar Vertebrate, mammalian, human, non-human primate, rodent, get sites that can be cleaved by a nuclease comprising a FokI mouse rat, hamster, goat, sheep, cattle, dog, cat, reptile, cleavage domain. In some embodiments, the library com amphibian, fish, nematode, insect, or fly genome. In some US 2015/0010526 A1 Jan. 8, 2015

embodiments, the genome is within a living cell. In some or a Transcription Activator-Like Effector Nuclease embodiments, the genome is within a Subject. In some (TALEN), a homing endonuclease, or is or comprises an embodiments, the consensus target site is within an allele that organic compound nuclease, an enediyne, an antibiotic is associated with a disease or disorder. In some embodi nuclease, dynemicin, neocarzinostatin, calicheamicin, espe ments, cleavage of the consensus target site results in treat ramicin, bleomycin, or a derivative thereof. ment or prevention of the disease or disorder. In some 0014 Some aspects of this invention provide kits compris embodiments, cleavage of the consensus target site results in ing nucleases and nuclease compositions. In some embodi the alleviation of a symptom of the disease or disorder. In ments, a kit is provided that comprises an isolated nuclease some embodiments, the disease is HIV/AIDS, or a prolifera described herein. In some embodiments, the kit further com tive disease. In some embodiments, the allele is a CCR5 or prises a nucleic acid comprising a target site of the isolated VEGFA allele. nuclease. In some embodiments, the kit comprises an excipi 0012 Some aspects of this invention provide a method for ent and instructions for contacting the nuclease with the selecting a nuclease target site within a genome. In some excipient to generate a composition Suitable for contacting a embodiments, the method comprises (a) identifying a candi nucleic acid with the nuclease. In some embodiments, the date nuclease target site; and (b) using a general purpose nucleic acid is a genome or part of agenome. In some embodi computer, comparing the candidate nuclease target site to ments, the genome is within a cell. In some embodiments, the other sequences within the genome, wherein if the candidate genome is within a subject and the excipient is a pharmaceu nuclease target site differs from any other sequence within the tically acceptable excipient. genome by at least 3, at least 4, at least 5, at least 6, at least 7. 0015. Some aspects of this invention provide pharmaceu at least 8, at least 9, or at least 10 nucleotides, selecting the tical compositions comprising a nuclease or a nucleic acid candidate nuclease site. In some embodiments, the candidate encoding a nuclease as described herein. In some embodi nuclease target site comprises a left-half site-spacer ments, pharmaceutical composition for administration to a sequence-right-half site (LSR) structure. In some embodi Subject is provided. In some embodiments, the composition ments, the left-half site and/or the right-half site is 10-18 comprises an isolated nuclease described herein or a nucleic nucleotides long. In some embodiments, the spacer is 10-24 acid encoding Such a nuclease and a pharmaceutically accept nucleotides long. In some embodiments, the method further able excipient. comprises designing and/or generating a nuclease targeting 0016 Other advantages, features, and uses of the inven the candidate nuclease site selected in step (b). In some tion will be apparent from the detailed description of certain embodiments, designing and/or generating is done by recom non-limiting embodiments; the drawings, which are sche binant technology. In some embodiments, designing and/or matic and not intended to be drawn to scale; and the claims. generating comprises designing a binding domain that spe cifically binds the selected candidate target site, or a half-site BRIEF DESCRIPTION OF THE DRAWINGS thereof. In some embodiments, designing and/or generating 0017 FIG. 1. In vitro selection for ZFN-mediated cleav comprises conjugating the binding domain with a nucleic age. Pre-selection library members are concatemers (repre acid cleavage domain. In some embodiments, the nucleic acid sented by arrows) of identical ZFN target sites lacking 5' cleavage domain is a non-specific cleavage domain and/or phosphates. L=left half-site; R=right half-site, S-spacer; L'. wherein the nucleic acid cleavage domain must dimerize or S', R-complementary sequences to L. S. R. ZFN cleavage multimerize in order to cut a nucleic acid. In some embodi reveals a 5' phosphate, which is required for sequencing ments, the nucleic acid cleavage domain comprises a FokI adapterligation. The only sequences that can be amplified by cleavage domain. In some embodiments, the method further PCR using primers complementary to the adapters are comprises isolating the nuclease. In some embodiments, the sequences that have been cleaved twice and have adapters on nuclease is a Zinc Finger Nuclease (ZFN) or a Transcription both ends. DNA cleaved at adjacent sites are purified by gel Activator-Like Effector Nuclease (TALEN), a homing endo electrophoresis and sequenced. A computational Screening nuclease, or is or comprises an organic compound nuclease, step after sequencing ensures that the filled-in spacer an enediyne, an antibiotic nuclease, dynemicin, neocarzi sequences (S and S) are complementary and therefore from nostatin, calicheamicin, esperamicin, bleomycin, or a deriva the same molecule. tive thereof. In some embodiments, the candidate target site is 0018 FIGS. 2A-B. DNA cleavage sequence specificity within a genomic sequence the cleavage of which is known to profiles for CCR5-224 and VF2468 ZFNs. The heat maps be associated with an alleviation of a symptom of a disease or show specificity scores compiled from all sequences identi disorder. In some embodiments, the disease is HIV/AIDS, or fied in selections for cleavage of 14 nM of DNA library with a proliferative disease. In some embodiments, the genomic (a) 2 nM CCR5-224 or (b) 1 nM VF2468. The target DNA sequence is a CCR5 or VEGFA sequence. sequence is shown below each half-site. Blackboxes indicate 0013 Some aspects of this invention provide isolated target base pairs. Specificity Scores were calculated by divid nucleases with enhanced specificity and nucleic acids encod ing the change in frequency of each at each position ing Such nucleases. In some embodiments, an isolated in the post-selection DNA pool compared to the pre-selection nuclease is provided that has been engineered to cleave a pool by the maximal possible change in frequency from pre target site within a genome, wherein the nuclease has been selection library to post-selection library of each base pair at selected according to any of the selection methods described each position. Blue boxes indicate enrichment for a base pair herein. In some embodiments, an isolated nuclease is pro at a given position, white boxes indicate no enrichment, and vided that cuts a target site selected according to any of the red boxes indicate enrichment against a base pair at a given methods described herein. In some embodiments, an isolated position. The darkest blue shown in the legend corresponds to nuclease is provided that is designed or engineered according absolute preference for a given base pair (specificity score=1. to any of the concepts or parameters described herein. In some O), while the darkest red corresponds to an absolute prefer embodiments, the nuclease is a Zinc Finger Nuclease (ZFN) ence against a given base pair (specificity score -1.0). US 2015/0010526 A1 Jan. 8, 2015

Sequences correspond, from top to bottom, to SEQID NOs: (0023 FIGS. 7A-B. Library cleavage with ZFNs. Cleavage 1 and 2 (FIG. 2A) and SEQID NOs: 3 and 4 (FIG. 2B). of 1 lug of concatemeric libraries of CCR5-224 (a) or VF2468 0019 FIGS. 3A-B. Evidence for a compensation model of (b) target sites are shown with varying amounts CCR5-224 or ZFN target site recognition. The heat maps show the changes VF2468, respectively. The lane labeled “+lysate” refers to in specificity score upon mutation at the black-boxed posi pre-selection concatemeric library incubated with the volume tions in selections with (a) 2 nM CCR5-224 or (b) 1 nM of in vitro transcription/translation mixture contained in the VF2468. Each row corresponds to a different mutant position samples containing 4 nM CCR5-224 or 4 nM of VF2468. (explained graphically in FIG. 12). Sites are listed in their Uncut DNA, which would be observed in the "+lysate' lane, genomic orientation; the (+) half-site of CCR5-224 and the is of length 12 kb and is lost upon purification due to its size (+) half-site of VF2468 are therefore listed as reverse comple and therefore is not present on the gel. The lane labeled ments of the sequences found in FIG. 2. Shades of blue “+Pvul’ is a digest of the pre-selection library at Pvul sites indicate increased specificity score (more stringency) when introduced adjacent to library members. The laddering on the the blackboxed position is mutated and shades of red indicate gels results from cleavage of pre-selection DNA concatemers decreased specificity score (less stringency). Sequences in at more than one site. There is a dose dependent increase in FIG.3A correspond, from top to bottom, to SEQID NOs: 5-6. the amount of the bottom band, which corresponds to cleav age at two adjacent library sites in the same pre-selection 0020 FIGS. 4A-B. ZFNs can cleave a large fraction of DNA molecule. This bottom band of DNA was enriched by target sites with three or fewer mutations in vitro. The per PCR and gel purification before sequencing. centages of the sequences with one, two, or three mutations that are enriched for in vitro cleavage (enrichment factor>1) 0024 FIGS. 8A-B. ZFN off-target cleavage is dependent by the (a) CCR5-224 ZFN and (b) VF2468 ZFN are shown. on enzyme concentration. For both (a) CCR5-224 and (b) Enrichment factors are calculated for each sequence identi VF2468 the distribution of cleavable sites revealed by in vitro fied in the selection by dividing the observed frequency of that selection shifts to include sites that are less similar to the sequence in the post-selection sequenced library by the fre target site as the concentration of ZFN increases. Both CCR5 quency of that sequence in the pre-selection library. 224 and VF2468 selections enrich for sites that have fewer mutations than the pre-selection library. For comparisons 0021 FIG. 5. In vitro synthesis of target site library. between preselection and post-selection library means for all Library members consist of a partially randomized left-half site (L), a fully randomized 4-7 nucleotide spacer sequence combinations of selection stringencies, P-values are 0 with (S), and a partially randomized right-half site (R). Library the exception of the comparison between 0.5 nM and 1 nM members present on DNA primers were incorporated into a VF2468 selections, which has a P-value of 1.7x10'. linear ~545 base pair double-stranded DNA by PCR. During (0025 FIGS. 9A-B. Cleavage efficiency of individual PCR, a primer with a library member (LSR) can anneal to a sequences is related to selection stringency. In vitro DNA DNA strand with a different library member (L*S*R*), digests were performed on sequences identified in selections resulting in a double-strand DNA with two different library of varying stringencies (marked with X's). 2 nMCCR5-224 members at one end. The 3'-5' exonuclease and 5'-3' poly (SEQ ID NOs:7-14) (a) or 1 nMVF2468 (SEQID NOs: 15 merase activities of T4 DNA polymerase removed mis 24)(b) was incubated with 8 nMoflinear substrate containing matched library members and replaced them with comple the sequence shown. The 1 kb linear Substrate contained a mentary, matched library members (L*S*R*). After 5' single cleavage site with the spacer sequence found in the phosphorylation with T4 polynucleotide kinase, the library genomic target of CCR5-224 (“CTGAT) or VF2468 (“TC DNA was Subjected to blunt-end ligation, resulting in a mix GAA') and the indicated (+) and (-) half-sites. Mutant base ture of linear and circular monomeric and multimeric species. pairs are represented with lowercase letters. CCR5-224 sites Circular monomers were purified by gel electrophoresis and and VF2468 sites that were identified in the highest strin concatenated through rolling-circle amplification with d29 gency selections (0.5 nM ZFN) are cleaved most efficiently, DNA polymerase. while sites that were identified only in the lowest stringency 0022 FIGS. 6A-D. Expression and quantification of selections (4 nM ZFN) are cleaved least efficiently. ZFNS. Western blots for CCR5-224 and VF2468 are shown Sequences in FIG. 9A correspond, from top to bottom, to (a) for the ZFN samples used in the in vitro selection, and (b) SEQ ID NOs: 7-14. Sequences in FIG.9B correspond, from for quantification. (c) Known quantities of N-terminal top to bottom, to SEQID NOs: 15-24. FLAG-tagged bacterial alkaline phosphatase (FLAG-BAP) 0026 FIGS. 10A-D. Concentration-dependent sequence were used to generate a standard curve for ZFN quantifica profiles for CCR5-224 and VF2468 ZFNs. The heat maps tion. Diamonds represent the intensities of FLAG-BAP stan show specificity scores for the cleavage of 14 nM of total dards from the Western blot shown in (b), plus signs represent DNA library with varying amounts of (a-b) CCR5-224 or the intensities of bands of ZFNs, and the line shows the (c-d)VF2468. The target DNA sequence is shown below each best-fit curve of FLAG-BAP standards that was used to quan half-site. Black boxes indicate target base pairs. Specificity tify ZFNs. (d) Gels are shown of activity assays of CCR5-224 scores were calculated by dividing the change infrequency of and VF2468 on an 8nM linear substrate containing one target each base pair at each position in the post-selection DNA pool cleavage site. The ZFNs were each incubated with their compared to the pre-selection pool by the maximal possible respective substrate for 4 hours at 37° C. DNA in the "+ly change in frequency of each base pair at each position. Blue sate lane was incubated with an amount of in vitro transcrip boxes indicate specificity for a base pair at a given position, tion/translation mixture equivalent to that used in the 2.5 nM white boxes indicate no specificity, and red boxes indicate ZFN reaction. ZFN-mediated cleavage results in two linear specificity against a base pair at a given position. The darkest fragments approximately 700 bp and 300 bp in length. 2 nM blue shown in the legend corresponds to absolute preference CCR5-224 and 1 nMVF2468 were the amounts required for for a given base pair (specificity score=1.0), while the darkest 50% cleavage of the linear substrate. red corresponds to an absolute preference againstagiven base US 2015/0010526 A1 Jan. 8, 2015 pair (specificity score=-1.0). Sequences in FIGS. 10A-D cor base pairs. Specificity scores were calculated by dividing the respond, from top to bottom, to SEQID NOS: 25-28. change in frequency of each base pair at each position in the 0027 FIG. 11. Stringency at the (+) half-site increases post-selection DNA pool compared to the pre-selection pool when CCR5-224 cleaves sites with mutations at highly speci by the maximal possible change in frequency of each base fied base pairs in the (-)half-site. The heat maps show speci pair at each position. Blueboxes indicate specificity for a base ficity scores for sequences identified in the in vitro selection pair at a given position, white boxes indicate no specificity, with 2 nM CCR5-224. For (-)A3 and (-)G6, indicated by and redboxes indicate specificity againstabase pairata given filled black boxes, both pre-selection library sequences and position. The darkest blue shown in the legend corresponds to post-selection sequences were filtered to exclude any absolute preference for a given base pair (specificity score=1. sequences that contained an A at position3 in the (-)half-site O), while the darkest red corresponds to an absolute prefer or G at position 6 in the (-) half-site, respectively, before ence against a given base pair (specificity score -1.0). specificity scores were calculated. For sites with either (-) Sequences correspond to SEQID NO:40 for VF2468(+) and half-site mutation, there is an increase in specificity at the (+) SEQID NO: 41 for VF2468(-). half-site. Black boxes indicate target base pairs. Specificity 0030 FIGS. 14A-B. ZFN cleavage occurs at characteristic scores were calculated by dividing the change infrequency of locations in the DNA target site. The plots show the locations each base pair at each position in the post-selection DNA pool of cleavage sites identified in the in vitro selections with (a) 4 compared to the pre-selection pool by the maximal possible nM. CCR5-224 or (b) 4 nM VF2468. The cleavage site loca change in frequency of each base pair at each position. Blue tions show similar patterns for both ZFNs except in the case boxes indicate specificity for a base pair at a given position, of five-base pairspacers with four-base overhangs. The titles white boxes indicate no specificity, and red boxes indicate refer to the spacer length? overhang length combination that is specificity against a base pair at a given position. The darkest plotted (e.g., a site with a six base-pairspacer and a four base blue shown in the legend corresponds to absolute preference overhang is referred to as “6/4). The black bars indicate the for a given base pair (specificity score=1.0), while the darkest relative number of sequences cleaved for each combination of red corresponds to an absolute preference againstagiven base spacer length and overhang length. Prefers to nucleotides in pair (specificity score=-1.0). Sequences on the left corre the (+) target half-site, ‘M’ refers to nucleotides in the (-) spond to SEQID NO: 29. Sequences on the right correspond target half site, and ‘N’ refers to nucleotides in the spacer. to SEQID NO:30. There were no “7/7” sequences from the 4 nMVF2468 selec 0028 FIGS. 12A-B. Data processing steps used to create tion. Only sequences with overhangs of at least 4 bases were mutation compensation difference maps. The steps to create tabulated. each line of the difference map in FIG. 3 are shown for the 0031 FIGS. 15A-D. CCR5-224 preferentially cleaves example of a mutation at position (-)A3. (a) Heat maps of the five- and six-base pair spacers and cleaves five-base pair type described in FIG. 11 are condensed into one line to show spacers to leave five-nucleotide overhangs. The heat maps only the specificity scores for intended target site nucleotides show the percentage of all sequences Surviving each of the (in black outlined boxes in FIG. 11). (b) The condensed heat four CCR5-224 in vitro selections (a-d) that have the spacer maps are then compared to a condensed heat map correspond and overhang lengths shown. ing to the unfiltered baseline profile from FIG. 2, to create a condensed difference heat map that shows the relative effect 0032 FIGS. 16A-D. VF2468 preferentially cleaves five of mutation at the position specified by the white box with and six-base pair spacers, cleaves five-base pair spacers to black outline on the specificity score profile. Blue boxes leave five-nucleotide overhangs, and cleaves six-base pair indicate an increase in sequence stringency at positions in spacers to leave four-nucleotide overhangs. The heat maps cleaved sites that contain mutations at the position indicated show the percentage of all sequences Surviving each of the by the white box, while red boxes indicate a decrease in four VF2468 in vitro selections (a-d) that have the spacer and sequence stringency and white boxes, no change in sequence overhang lengths shown. stringency. The (+) half-site difference map is reversed to 0033 FIGS. 17A-F. ZFNs show spacer length-dependent match the orientation of the (+) half-site as it is found in the sequence preferences. Both CCR5-224 (a-c) and VF2468 genome rather than as it is recognized by the Zinc finger (d-f) show increased specificity for half-sites flanking four domain of the ZFN. Sequences in FIG. 12A correspond, from and seven-base pair spacers than for half-sites flanking five top left to bottom right, to SEQID NOS:31-36. Sequences in and six-base pair spacers. For both ZFNs, one half-site has a FIG. 12B correspond to SEQ ID NOs: 37 and 38 (left) and greater change in mutational tolerance than the other, and the SEQID NO:39 (right). change in mutational tolerance is concentration dependent. 0029 FIG. 13. Stringency at both half-sites increases 0034 FIG. 18. Model for ZFN tolerance of off-target when VF2468 cleaves sites with mutations at the first base sequences. Our results suggest that some ZFNS recognize pair of both half-sites. The heat maps show specificity scores their intended target sites (top, black DNA strands with no for sequences identified in the in vitro selection with 4 nM Xs) with more binding energy than is required for cleavage VF2468. For (+)G1, (-)G1, and (+)G1/(-)G1, indicated by under a given set of conditions (dotted line). Sequences with filled black boxes, both pre-selection library sequences and one or two mutations (one or two Xs) are generally tolerated post-selection sequences were filtered to exclude any since they do not decrease the ZFN:DNA binding energy sequences that contained an Gat position 1 in the (+) half-site below the threshold necessary for cleavage. Some sequences and/or G at position 1 in the (-) half-site, before specificity with additional mutations can still be cleaved if the additional scores were calculated. For sites with either mutation, there is mutations occur in regions of the zinc-finger binding interface decrease in mutational tolerance at the opposite half-site and that have already been disrupted (three Xs above the dotted a very slight decrease in mutational tolerance at the same line), as long as optimal interactions present at other locations half-site. Sites with both mutations show a strong increase in in the ZFN:DNA binding interface maintain binding energies stringency at both half-sites. Black boxes indicate on-target above threshold values. Additional mutations that disrupt key US 2015/0010526 A1 Jan. 8, 2015

interactions at other locations in the ZFN:DNA interface, 0053 FIG. 37. Highly Mutant Half Sites in L10 R10 however, result in binding energies that fall short of the cleav TALN Pair. Many potential binding sites in frames outside of age threshold. the intended frame have sites more similar to the intended 0035 FIG. 19. Profiling The Specificity of TAL target (SEQ ID NOS:70-90). Nucleases. Selection 1: +28 vs. +63 aa Linker Between TAL 0054 FIG. 38. Enrichment of Mutations in Total Target DNA Binding Domain and Fok1 Cleavage Domain (SEQID SiteBetween Left and Right Half Sites of TALN Pairs Edited NOs:42-45). for Frame-shifted Binding Sites. 0036 FIG. 20. Structure of TAL DNA binding domain and 0.055 FIG. 39. Highly Mutant Half Sites in L16 R16 RVDs (SEQID NOs:46 and 47). TALN Pair (SEQID NOs:91-111). 0037 FIG. 21. Mutations in target sites from TALN selec 0056 FIG. 40. Highly Mutant Half Sites in L16 R16 tion. The +28 linker enriched for cleaved sequences with less TALN Pair. The highly mutant sequences from L16 R16 mutations Suggesting the +28 linker is more specific. There cannot be explained by a frame-shift (left figure), have no are significantly less mutations in the post-selected sequences DNA Spacer preference (see slide 11) and seem to be cutting compared to the pre-selection library sequences indicating a more often outside of the DNA Spacer (right figure) indicat Successful selection ing perhaps homodimer cleavage (even with heterodimer) or 0038 FIG. 22. Enrichment of Mutations in Total Target heterodimer cleavage independent of a TAL domain binding Site Between Left and Right Half Sites of Previous TALN target site DNA (i.e. dimerization through the Fok1 cleavage Selection. The relatively regular (log relationship) trend domain). between number of half sites mutations and enrichment is 0057 FIG. 41. Heat Maps of TALN Pair Specificity Score consistent with a single repeat binding a base pair indepen (SEQID NOs: 112 and 113). dent of other repeat binding. 0.058 FIG. 42. Compensating Difference in Specificity of 0039 FIG. 23. TALN Cleavage Dependence on DNA L16 R16 TALN. A single mutation in the cleavage site does Spacer Length. There is a similar preference for cut site not alter the distribution of other mutations suggesting that spacer lengths in our in vitro selection compared to previous the TAL repeat domains bind independently (SEQ ID NOs: studies. In vitro, TALN cleavage. Dependence on Linker 114 and 115). Length & Spacer Length from Mussolino (2011). 0059 FIG. 43. Enrichment of Mutations in Full, Total 0040 FIG. 24. Specificity score at individual bases. Target Site of TALN Pairs. The enrichments seem to have 004.1 FIG. 25. Specificity score at individual bases. There similar log slopes in the low mutation range, the selections is variable specificity at each individual position again with containing a TALN recognizing 16 bps seem to be the excep +28 linker demonstrating significantly better specificity tions indicating R16 binding may be saturating for some very low mutation sites (aka R16 & L16 were near or above the Kd (SEQ ID NOs:48 and 49). for the wild type site). 0042 FIG. 26. Compensating Difference in Specificity of 0060 FIG. 44. TALN Off-Target Sites in the Human TALNs Analysis (SEQID NOS:50-51). Genome. 0043 FIG. 27. Compensating Difference in Specificity of 0061 FIG. 45. TALN Off-Target Sites Predicted Cleav L16 R16 TALN. A single mutation in the cleavage site does age. not alter the distribution of other mutations suggesting that 0062 FIG. 46. TALNOff-Target Sites Predicted Cleavage the TAL repeat domains bind independently (SEQ ID NOs: For Very Mutant Target Sites below Detection Limit. 52-53). 0063 FIG. 47. TALNOff-Target Sites Predicted Cleavage 0044 FIG. 28. Profiling the Specificity of TALNs Selec For Very Mutant Target Sites below Detection Limit. tion II: Varying TALN. Lengths (SEQID NOs:54-61). 0064 FIG. 48. TALNOff-Target Sites Predicted Cleavage 004.5 FIG. 29. Enrichment of Mutations in Common Tar For Sequences (Not just Number of Mutations). Combining get Site (SEQID NOs:62-69). the regular log decrease of cleavage efficiency (enrichment) 0046 FIG.30. Distribution of Mutations in Total Targeted as total target site mutations increase and the enrichment at Site of TALN Digestion vs. Pre-Selection Library. each position we should be able to predict the off-target site 0047 FIG.31. Distribution of Mutations in Total Targeted cleavage of any sequence (SEQ ID NOs: 116-118). Site of TALN Digestion vs. Pre-Selection Library. 0065 FIG. 49. Comparing TALNs vs. ZFNs. For the most 0048 FIG. 32. Enrichment of Mutations in Total Target part, in the TALN selection the enrichment is dependent on Site Between Right and Left Half Sites of TALN Pairs. the total mutations in both half sites and not on the distribu 0049 FIG. 33. Enrichment of Mutations in Total Target tion of mutations between half sites like for Zinc finger Site Between Right and Left Half Sites of TALN Pairs. nucleases (ZFN). This observation combined with the context 0050 FIG. 34. Enrichment of Mutations in Total Targeted dependent binding of ZFNs potentially make ZFN far less Site of TALN Digestion vs. Pre-Selection Library for L10 specific than their TAL equivalents. R10 TALN Pair. 0051 FIG. 35. DNA spacer profile. While the vast major DEFINITIONS ity of sequences have a spacer preference, the highly mutant 0066. As used herein and in the claims, the singular forms sequences have no significant spacer preference as might be “a,” “an and “the include the singular and the plural refer expected from alternate frames changing the spacer length. ence unless the context clearly indicates otherwise. Thus, for 0052 FIG. 36. Cleavage point profile. While the vast example, a reference to “an agent' includes a single agent and majority of sequences are cut in the spacer as expected, the a plurality of Such agents. R16 L16 highly mutant sequences are not predominately cut 0067. The term “concatemer,” as used herein in the context in spacer but the L10 R10 ones are cut in the spacer possibly of nucleic acid molecules, refers to a nucleic acid molecule indicative of a frame-shifted binding site leading to produc that contains multiple copies of the same DNA sequences tive spacer cutting. linked in a series. For example, a concatemer comprising ten US 2015/0010526 A1 Jan. 8, 2015

copies of a specific sequence of nucleotides (e.g., XYZo), and calicheamicin T. Proc Natl Acad Sci U.S.A. 89 (10): would comprise ten copies of the same specific sequence 4608-12; the entire contents of which are incorporated herein linked to each other in series, e.g., 5'-XYZXYZXYZXYZX by reference). Their reactivity with DNA confers an antibiotic YZXYZXYZXYZXYZXYZ-3'. A concatemer may com character to many enediynes, and some enediynes are clini prise any number of copies of the repeat unit or sequence, e.g., cally investigated as anticancer antibiotics. Nonlimiting at least 2 copies, at least 3 copies, at least 4 copies, at least 5 examples of enediynes are dynemicin, neocarzinostatin, cali copies, at least 10 copies, at least 100 copies, at least 1000 cheamicin, esperamicin (see, e.g., Adrian L. Smith and K. C. copies, etc. An example of a concatemer of a nucleic acid Bicolaou, “The Enediyne Antibiotics' J. Med. Chem., 1996, sequence comprising a nuclease target site and a constant 39 (11), pp. 2103-2117; and Donald Borders, “Enediyne anti insert sequence would be (target site)-(constant insert biotics as antitumor agents.” Informa Healthcare; 1 edition sequence). A concatemer may be a linear nucleic acid (Nov. 23, 1994, ISBN-10: 082478.9385; the entire contents of molecule, or may be circular. which are incorporated herein by reference). 0068. The terms “conjugating.” “conjugated, and “conju 0072 The term “homing endonuclease.” as used herein, gation” refer to an association of two entities, for example, of refers to a type of restriction enzymes typically encoded by two molecules Such as two proteins, two domains (e.g., a introns or inteins Edgell DR (February 2009). “Selfish DNA: binding domain and a cleavage domain), or a and an homing endonucleases find a home'. Curr Biol 19 (3): R115 agent, e.g., a protein binding domain and a small molecule. R117: Jasin M (June 1996). “Genetic manipulation of geno The association can be, for example, via a direct or indirect month with rare-cutting endonucleases”. Trends Genet 12 (e.g., via a linker) covalent linkage or via non-covalent inter (6): 224-8; Burt A. KoufopanouV (December 2004). “Hom actions. In some embodiments, the association is covalent. In ing endonuclease genes: the rise and fall and rise again of a Some embodiments, two molecules are conjugated via a selfish element”. Curr Opin Genet Dev 14 (6): 609-15; the linker connecting both molecules. For example, in some entire contents of which are incorporated herein by reference. embodiments where two proteins are conjugated to each Homing endonuclease recognition sequences are long other, e.g., a binding domain and a cleavage domain of an enough to occur randomly only with a very low probability engineered nuclease, to form a protein fusion, the two pro (approximately once every 7x10" bp), and are normally teins may be conjugated via a polypeptide linker, e.g., an found in only one instance per genome. amino acid sequence connecting the C-terminus of one pro (0073. The term “library,” as used herein in the context of tein to the N-terminus of the other protein. nucleic acids or proteins, refers to a population of two or more 0069. The term "consensus sequence, as used herein in different nucleic acids or proteins, respectively. For example, the context of nucleic acid sequences, refers to a calculated a library of nuclease target sites comprises at least two nucleic sequence representing the most frequent nucleotide residues acid molecules comprising different nuclease target sites. In found at each position in a plurality of similar sequences. some embodiments, a library comprises at least 10", at least Typically, a consensus sequence is determined by sequence 10, at least 10, at least 10, at least 10, at least 10, at least alignment in which similar sequences are compared to each 107, at least 10, at least 10, at least 10', at least 10', at least other and similar sequence motifs are calculated. In the con 10', at least 10", at least 10", or at least 10" different text of nuclease target site sequences, a consensus sequence nucleic acids or proteins. In some embodiments, the members of a nuclease target site may, in Some embodiments, be the of the library may comprise randomized sequences, for sequence most frequently bound, or bound with the highest example, fully or partially randomized sequences. In some affinity, by a given nuclease. embodiments, the library comprises nucleic acid molecules 0070. The term “effective amount, as used herein, refers that are unrelated to each other, e.g., nucleic acids comprising to an amount of a biologically active agent that is sufficient to fully randomized sequences. In other embodiments, at least elicit a desired biological response. For example, in some some members of the library may be related, for example, embodiments, an effective amount of a nuclease may refer to they may be variants or derivatives of a particular sequence, the amount of the nuclease that is sufficient to induce cleavage Such as a consensus target site sequence. ofa target site specifically bound and cleaved by the nuclease. 0074 The term “linker,” as used herein, refers to a chemi As will be appreciated by the skilled artisan, the effective cal group or a molecule linking two adjacent molecules or amount of an agent, e.g., a nuclease, a hybrid protein, or a moieties, e.g., a binding domain and a cleavage domain of a polynucleotide, may vary depending on various factors as, for nuclease. Typically, the linker is positioned between, or example, on the desired biological response, the specific flanked by, two groups, molecules, or other moieties and allele, genome, target site, cell, or tissue being targeted, and connected to each one via a covalent bond, thus connecting the agent being used. the two. In some embodiments, the linker is an amino acid or 0071. The term “enediyne as used herein, refers to a class a plurality of amino acids (e.g., a peptide or protein). In some of bacterial natural products characterized by either nine- and embodiments, the linker is an organic molecule, group, poly ten-membered rings containing two triple bonds separated by mer, or chemical moiety. a double bond (see, e.g., K. C. Nicolaoui; A. L. Smith; E. W. 0075. The term “nuclease,” as used herein, refers to an Yue (1993). “Chemistry and biology of natural and designed agent, for example a protein or a small molecule, capable of enediynes'. PNAS 90 (13): 5881–5888; the entire contents of cleaving a phosphodiester bond connecting nucleotide resi which are incorporated herein by reference). Some enediynes dues in a nucleic acid molecule. In some embodiments, a are capable of undergoing Bergman cyclization, and the nuclease is a protein, e.g., an enzyme that can bind a nucleic resulting diradical, a 1.4-dehydrobenzene derivative, is acid molecule and cleave a phosphodiester bond connecting capable of abstracting hydrogen atoms from the Sugar back nucleotide residues within the nucleic acid molecule. A bone of DNA which results in DNA strand cleavage (see, e.g., nuclease may be an endonuclease, cleaving a phosphodiester S. Walker; R. Landovitz: W. D. Ding: G. A. Ellestad; D. bonds within a polynucleotide chain, or an exonuclease, Kahne (1992). “Cleavage behavior of calicheamicingamma 1 cleaving a phosphodiester bond at the end of the polynucle US 2015/0010526 A1 Jan. 8, 2015

otide chain. In some embodiments, a nuclease is a site-spe Nucleic acids may be naturally occurring, for example, in the cific nuclease, binding and/or cleaving a specific phosphodi context of a genome, a transcript, an mRNA, tRNA, rRNA, ester bond within a specific nucleotide sequence, which is siRNA. SnRNA, a plasmid, cosmid, , chromatid, also referred to herein as the “recognition sequence, the or other naturally occurring nucleic acid molecule. On the “nuclease target site.” or the “target site.” In some embodi other hand, a nucleic acid molecule may be a non-naturally ments, a nuclease recognizes a single stranded target site, occurring molecule, e.g., a recombinant DNA or RNA, an while in other embodiments, a nuclease recognizes a double artificial chromosome, an engineered genome, or fragment stranded target site, for example a double-stranded DNA tar thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or get site. The target sites of many naturally occurring including non-naturally occurring nucleotides or nucleo nucleases, for example, many naturally occurring DNA sides. Furthermore, the terms “nucleic acid, “DNA,”“RNA. restriction nucleases, are well known to those of skill in the and/or similar terms include nucleic acid analogs, i.e. analogs art. In many cases, a DNA nuclease, such as EcoRI, HindIII, having other than a phosphodiester backbone. Nucleic acids or BamHI, recognize a palindromic, double-stranded DNA can be purified from natural sources, produced using recom target site of 4 to 10 base pairs in length, and cut each of the binant expression systems and optionally purified, chemi two DNA strands at a specific position within the target site. cally synthesized, etc. Where appropriate, e.g., in the case of Some endonucleases cut a double-stranded nucleic acid tar chemically synthesized molecules, nucleic acids can com get site symmetrically, i.e., cutting both Strands at the same prise nucleoside analogs such as analogs having chemically position so that the ends comprise base-paired nucleotides, modified bases or Sugars, and backbone modifications. A also referred to hereinas bluntends. Other endonucleases cut nucleic acid sequence is presented in the 5' to 3’ direction a double-stranded nucleic acid target site asymmetrically, i.e., unless otherwise indicated. In some embodiments, a nucleic cutting each Strand at a different position so that the ends acid is or comprises natural nucleosides (e.g. adenosine, thy comprise unpaired nucleotides. Unpaired nucleotides at the midine, guanosine, cytidine, uridine, deoxyadenosine, deox end of a double-stranded DNA molecule are also referred to ythymidine, deoxyguanosine, and deoxycytidine); nucleo as “overhangs, e.g., as “5'-overhang' or as “3'-overhang.” side analogs (e.g., 2-aminoadenosine, 2-thiothymidine, depending on whether the unpaired nucleotide(s) form(s) the inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methyl 5' or the 5' end of the respective DNA strand. Double-stranded cytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorou DNA molecule ends ending with unpaired nucleotide(s) are ridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl also referred to as sticky ends, as they can “stick to other cytidine, C5-methylcytidine, 2-aminoadenosine, double-stranded DNA molecule ends comprising comple 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-OX mentary unpaired nucleotide(s). A nuclease protein typically oguanosine, O(6)-methylguanine, and 2-thiocytidine); comprises a “binding domain that mediates the interaction chemically modified bases; biologically modified bases (e.g., of the protein with the nucleic acid Substrate, and also, in methylated bases); intercalated bases; modified Sugars (e.g., Some cases, specifically binds to a target site, and a “cleavage 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hex domain that catalyzes the cleavage of the phosphodiester ose); and/or modified phosphate groups (e.g., phosphorothio bond within the nucleic acid backbone. In some embodiments ates and 5'-N-phosphoramidite linkages). a nuclease protein can bind and cleave a nucleic acid mol 0077. The term “pharmaceutical composition, as used ecule in a monomeric form, while, in other embodiments, a herein, refers to a composition that can be administrated to a nuclease protein has to dimerize or multimerize in order to subject in the context of treatment of a disease or disorder. In cleave a target nucleic acid molecule. Binding domains and Some embodiments, a pharmaceutical composition com cleavage domains of naturally occurring nucleases, as well as prises an active ingredient, e.g. a nuclease or a nucleic acid modular binding domains and cleavage domains that can be encoding a nuclease, and a pharmaceutically acceptable fused to create nucleases binding specific target sites, are well excipient. known to those of skill in the art. For example, Zinc fingers or 0078. The term “proliferative disease, as used herein, transcriptional activator like elements can be used as binding refers to any disease in which cell or tissue homeostasis is domains to specifically binda desired target site, and fused or disturbed in that a cell or cell population exhibits an abnor conjugated to a cleavage domain, for example, the cleavage mally elevated proliferation rate. Proliferative diseases domain of FokI, to create an engineered nuclease cleaving the include hyperproliferative diseases, such as pre-neoplastic target site. hyperplastic conditions and neoplastic diseases. Neoplastic 0076. The terms “nucleic acid' and “nucleic acid mol diseases are characterized by an abnormal proliferation of ecule, as used herein, refers to a compound comprising a cells and include both benign and malignant neoplasias. nucleobase and an acidic moiety, e.g., a nucleoside, a nucle Malignant neoplasia is also referred to as cancer. otide, or a polymer of nucleotides. Typically, polymeric (0079. The terms “protein,” “peptide,” and “polypeptide' nucleic acids, e.g., nucleic acid molecules comprising three are used interchangeably herein, and refer to a polymer of or more nucleotides are linear molecules, in which adjacent amino acid residues linked together by peptide (amide) nucleotides are linked to each other via a phosphodiester bonds. The terms refer to a protein, peptide, or polypeptide of linkage. In some embodiments, “nucleic acid refers to indi any size, structure, or function. Typically, a protein, peptide, vidual nucleic acid residues (e.g. nucleotides and/or nucleo or polypeptide will be at least three amino acids long. A sides). In some embodiments, “nucleic acid refers to an protein, peptide, or polypeptide may refer to an individual oligonucleotide chain comprising three or more individual protein or a collection of proteins. One or more of the amino nucleotide residues. As used herein, the terms "oligonucle acids in a protein, peptide, or polypeptide may be modified, otide' and “polynucleotide' can be used interchangeably to for example, by the addition of a chemical entity Such as a refer to a polymer of nucleotides (e.g., a string of at least three carbohydrate group, a hydroxyl group, a phosphate group, a nucleotides). In some embodiments, “nucleic acid encom farnesyl group, an isofarnesyl group, a fatty acid group, a passes RNA as well as single and/or double-stranded DNA. linker for conjugation, functionalization, or other modifica US 2015/0010526 A1 Jan. 8, 2015

tion, etc. A protein, peptide, or polypeptide may also be a ments, the organic molecule is known to bind and/or cleave a single molecule or may be a multi-molecular complex. A nucleic acid. In some embodiments, the organic compound is protein, peptide, or polypeptide may be just a fragment of a an enediyne. In some embodiments, the organic compound is naturally occurring protein or peptide. A protein, peptide, or an antibiotic drug, for example, an anticancer antibiotic Such polypeptide may be naturally occurring, recombinant, or syn as dynemicin, neocarzinostatin, calicheamicin, esperamicin, thetic, or any combination thereof. A protein may comprise bleomycin, or a derivative thereof. different domains, for example, a nucleic acid binding domain and a nucleic acid cleavage domain. In some embodi I0082. The term “subject,” as used herein, refers to an indi ments, a protein comprises a proteinaceous part, e.g., an vidual organism, for example, an individual mammal. In amino acid sequence constituting a nucleic acid binding Some embodiments, the Subject is a human. In some embodi domain, and an organic compound, e.g., a compound that can ments, the Subject is a non-human mammal. In some embodi act as a nucleic acid cleavage agent. ments, the Subject is a non-human primate. In some embodi 0080. The term “randomized,” as used herein in the con ments, the Subject is a rodent. In some embodiments, the text of nucleic acid sequences, refers to a sequence or residue Subject is a sheep, a goat, a cattle, a cat, or a dog. In some within a sequence that has been synthesized to incorporate a embodiments, the Subject is a vertebrate, an amphibian, a mixture of free nucleotides, for example, a mixture of all four reptile, a fish, an insect, a fly, or a nematode. nucleotides A, TG, and C. Randomized residues are typically I0083. The terms “target nucleic acid, and “target represented by the letter N within a nucleotide sequence. In genome.” as used herein in the context of nucleases, refer to a Some embodiments, a randomized sequence or residue is nucleic acid molecule or a genome, respectively, that com fully randomized, in which case the randomized residues are prises at least one target site of a given nuclease. synthesized by adding equal amounts of the nucleotides to be incorporated (e.g., 25% T. 25% A, 25% G, and 25% C) during I0084. The term “target site.” used herein interchangeably the synthesis step of the respective sequence residue. In some with the term “nuclease target site.” refers to a sequence embodiments, a randomized sequence or residue is partially within a nucleic acid molecule that is bound and cleaved by a randomized, in which case the randomized residues are syn nuclease. A target site may be single-stranded or double thesized by adding non-equal amounts of the nucleotides to stranded. In the context of nucleases that dimerize, for be incorporated (e.g., 79% T. 7% A, 7% G, and 7% C) during example, nucleases comprising a FokI DNA cleavage the synthesis step of the respective sequence residue. Partial domain, a target sites typically comprises a left-half site randomization allows for the generation of sequences that are (bound by one monomer of the nuclease), a right-half site templated on a given sequence, but have incorporated muta (bound by the second monomer of the nuclease), and a spacer tions at a desired frequency. E.g., if a known nuclease target sequence between the half sites in which the cut is made. This site is used as a synthesis template, partial randomization in structure (left-half site-spacer sequence-right-half site) which at each step the nucleotide represented at the respective is referred to herein as an LSR structure. In some embodi residue is added to the synthesis at 79%, and the other three ments, the left-half site and/or the right-half site is between nucleotides are added at 7% each, will result in a mixture of 10-18 nucleotides long. In some embodiments, either or both partially randomized target sites being synthesized, which half-sites are shorter or longer. In some embodiments, the left still represent the consensus sequence of the original target and right half sites comprise different nucleic acid sequences. site, but which differ from the original target site at each I0085. The term “Transcriptional Activator-Like Effector.” residue with a statistical frequency of 21% for each residue so (TALE) as used herein, refers to bacterial proteins comprising synthesized (distributed binomially). In some embodiments, a DNA binding domain, which contains a highly conserved a partially randomized sequence differs from the consensus 33-34 amino acid sequence comprising a highly variable sequence by more than 5%, more than 10%, more than 15%, two-amino acid motif (Repeat Variable Diresidue, RVD). The more than 20%, more than 25%, or more than 30% on aver RVD motif determines binding specificity to a nucleic acid age, distributed binomially. In some embodiments, a partially sequence, and can be engineered according to methods well randomized sequence differs from the consensus site by no known to those of skill in the art to specifically bind a desired more than 10%, no more than 15%, no more than 20%, no DNA sequence (see, e.g., Miller, Jeffrey; et. al. (February more than 25%, nor more than 30%, no more than 40%, or no 2011). “A TALE nuclease architecture for efficient genome more than 50% on average, distributed binomially. editing. Nature Biotechnology 29 (2): 143-8; Zhang, Feng; 0081. The terms “small molecule' and “organic com et. al. (February 2011). “Efficient construction of sequence pound are used interchangeably herein and refer to mol specific TAL effectors for modulating mammalian transcrip ecules, whether naturally-occurring or artificially created tion’. Nature Biotechnology 29 (2): 149-53; Gei?ler, R.; (e.g., via chemical synthesis) that have a relatively low Scholze, H.; Hahn, S.; Streubel, J.; Bonas, U.: Behrens, S. E.; molecular weight. Typically, an organic compound contains Boch, J. (2011), Shiu, Shin-Han. ed. “Transcriptional Activa carbon. An organic compound may contain multiple carbon tors of Human Genes with Programmable DNA-Specificity'. carbon bonds, Stereocenters, and other functional groups PLoS ONE 6 (5): e19509; Boch, Jens (February 2011). (e.g., amines, hydroxyl, carbonyls, or heterocyclic rings). In “TALEs of genome targeting. Nature Biotechnology 29 (2): Some embodiments, organic compounds are monomeric and 135-6; Boch, Jens: et al. (December 2009). “Breaking the have a molecular weight of less than about 1500 g/mol. In Code of DNA Binding Specificity of TAL-Type III Effec certain embodiments, the molecular weight of the Small mol tors'. Science 326 (5959): 1509-12; and Moscou, Matthew J.: ecule is less than about 1000 g/mol or less than about 500 Adam J. Bogdanove (December 2009). “A Simple Cipher g/mol. In certain embodiments, the Small molecule is a drug, Governs DNA Recognition by TAL Effectors'. Science 326 for example, a drug that has already been deemed safe and (5959): 1501; the entire contents of each of which are incor effective for use in humans or animals by the appropriate porated herein by reference). The simple relationship governmental agency or regulatory body. In certain embodi between amino acid sequence and DNA recognition has US 2015/0010526 A1 Jan. 8, 2015 allowed for the engineering of specific DNA binding domains nuclease, e.g., if conjugated to a nucleic acid cleavage by selecting a combination of repeat segments containing the domain. Different type of zinc finger motifs are known to appropriate RVDs. those of skill in the art, including, but not limited to, I0086. The term “Transcriptional Activator-Like Element Cys-His, Gag knuckle, Treble clef, Zinc ribbon, Zn/Cys, Nuclease.” (TALEN) as used herein, refers to an artificial and TAZ2 domain-like motifs (see, e.g., Krishna SS. Majum nuclease comprising a transcriptional activator like effector dar I, Grishin NV (January 2003). "Structural classification DNA binding domain to a DNA cleavage domain, for of zinc fingers: survey and summary'. Nucleic Acids Res. 31 example, a FokI domain. A number of modular assembly (2): 532–50). Typically, a single Zinc finger motif binds 3 or 4 schemes for generating engineered TALE constructs have nucleotides of a nucleic acid molecule. Accordingly, a Zinc been reported (Zhang, Feng; et. al. (February 2011). “Effi finger domain comprising 2 Zinc finger motifs may bind 6-8 cient construction of sequence-specific TAL effectors for nucleotides, a Zinc finger domain comprising 3 Zinc finger modulating mammalian transcription’. Nature Biotechnol motifs may bind 9-12 nucleotides, a Zinc finger domain com ogy 29 (2): 149-53; Gei?ler, R.: Scholze, H.; Hahn, S.: prising 4 Zinc finger motifs may bind 12-16 nucleotides, and Streubel, J.; Bonas, U.: Behrens, S. E.; Boch, J. (2011), Shiu, So forth. Any Suitable protein engineering technique can be Shin-Han. ed. “Transcriptional Activators of Human Genes employed to alter the DNA-binding specificity of zinc fingers with Programmable DNA-Specificity”. PLoS ONE 6 (5): and/or design novel Zinc finger fusions to bind virtually any e19509; Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; desired target sequence from 3-30 nucleotides in length (see, Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V. et al. e.g., Pabo CO, Peisach E. Grant RA (2001). “Design and (2011). “Efficient design and assembly of custom TALEN selection of novel cys2His2 Zinc finger proteins’. Annual and other TAL effector-based constructs for DNA targeting. Review of Biochemistry 70:313-340; Jamieson AC, Miller J Nucleic Acids Research: Morbitzer, R.; Elsaesser, J.; Haus C, Pabo CO (2003). “Drug discovery with engineered zinc ner, J.; Lahaye, T. (2011). “Assembly of custom TALE-type finger proteins”. Nature Reviews Drug Discovery 2 (5): 361 DNA binding domains by modular cloning. Nucleic Acids 368; and Liu Q, Segal DJ. Ghiara J. B. Barbas C F (May Research; Li, T.; Huang, S.; Zhao, X. Wright, D.A.: Carpen 1997). “Design of polydactyl zinc-finger proteins for unique ter, S.; Spalding, M. H.; Weeks, D. P.; Yang, B. (2011). addressing within complex genomes. Proc. Natl. Acad. Sci. “Modularly assembled designer TAL effector nucleases for U.S.A. 94 (11); the entire contents of each of which are incor targeted gene knockout and gene replacement in eukaryotes'. porated herein by reference). Fusions between engineered Nucleic Acids Research. Weber, E.; Gruetzner, R.; Werner, Zinc finger arrays and protein domains that cleave a nucleic S.: Engler, C.; Marillonnet, S. (2011). Bendahmane, Moham acid can be used to generate a "Zinc finger nuclease. A zinc med. ed. “Assembly of Designer TAL Effectors by Golden finger nuclease typically comprises a Zinc finger domain that Gate Cloning”. PLoS ONE 6 (5): e19722; the entire contents binds a specific target site within a nucleic acid molecule, and of each of which are incorporated herein by reference). a nucleic acid cleavage domain that cuts the nucleic acid I0087. The terms “treatment,” “treat,” and “treating” refer molecule within or in proximity to the target site bound by the to a clinical intervention aimed to reverse, alleviate, delay the binding domain. Typical engineered Zinc finger nucleases onset of, or inhibit the progress of a disease or disorder, or one comprise a binding domain having between 3 and 6 indi or more symptoms thereof, as described herein. As used vidual Zinc finger motifs and binding target sites ranging from herein, the terms “treatment,” “treat,” and “treating refer to a 9 base pairs to 18 base pairs in length. Longer target sites are clinical intervention aimed to reverse, alleviate, delay the particularly attractive in situations where it is desired to bind onset of, or inhibit the progress of a disease or disorder, or one and cleave a target site that is unique in a given genome. or more symptoms thereof, as described herein. In some I0089. The term “zinc finger nuclease, as used herein, embodiments, treatment may be administered after one or refers to a nuclease comprising a nucleic acid cleavage more symptoms have developed and/or after a disease has domain conjugated to a binding domain that comprises a Zinc been diagnosed. In other embodiments, treatment may be finger array. In some embodiments, the cleavage domain is administered in the absence of symptoms, e.g., to prevent or the cleavage domain of the type II restriction endonuclease delay onset of a symptom or inhibit onset or progression of a FokI. Zinc finger nucleases can be designed to target virtually disease. For example, treatment may be administered to a any desired sequence in a given nucleic acid molecule for Susceptible individual prior to the onset of symptoms (e.g., in cleavage, and the possibility to the design Zinc finger binding light of a history of symptoms and/or in light of genetic or domains to bind unique sites in the context of complex other Susceptibility factors). Treatment may also be contin genomes allows for targeted cleavage of a single genomic site ued after symptoms have resolved, for example to prevent or in living cells, for example, to achieve a targeted genomic delay their recurrence. alteration of therapeutic value. Targeting a double-strand 0088. The term “zinc finger,” as used herein, refers to a break to a desired genomic locus can be used to introduce Small nucleic acid-binding protein structural motif character frame-shift mutations into the coding sequence of a gene due ized by a fold and the coordination of one or more zinc ions to the error-prone nature of the non-homologous DNA repair that stabilize the fold. Zinc fingers encompass a wide variety pathway. Zinc finger nucleases can be generated to target a of differing protein structures (see, e.g., Klug A, Rhodes D site of interest by methods well known to those of skill in the (1987). “Zinc fingers: a novel protein fold for nucleic acid art. For example, Zinc finger binding domains with a desired recognition’. Cold Spring Harb. Symp. Quant. Biol. 52: 473 specificity can be designed by combining individual Zinc 82, the entire contents of which are incorporated herein by finger motifs of known specificity. The structure of the zinc reference). Zinc fingers can be designed to bind a specific finger protein Zif268 bound to DNA has informed much of sequence of nucleotides, and Zinc finger arrays comprising the work in this field and the concept of obtaining Zinc fingers fusions of a series of Zinc fingers, can be designed to bind for each of the 64 possible base pair triplets and then mixing virtually any desired target sequence. Such Zinc finger arrays and matching these modular Zinc fingers to design proteins can form a binding domain of a protein, for example, of a with any desired sequence specificity has been described US 2015/0010526 A1 Jan. 8, 2015

(Pavletich N P. Pabo CO (May 1991). “Zinc finger-DNA from the intended target sequence by one or more nucle recognition: crystal structure of a Zif268-DNA complex at otides. Undesired side effects of off-target cleavage ranges 2.1 A. Science 252 (5007): 809-17, the entire contents of from insertion into unwanted loci during a gene targeting which are incorporated herein). In some embodiments, sepa event to severe complications in a clinical scenario. Off target rate Zinc fingers that each recognize a 3 base pair DNA cleavage of sequences encoding essential gene functions or sequence are combined to generate 3-, 4-, 5-, or 6-finger tumor Suppressor genes by anandonuclease administered to a arrays that recognize target sites ranging from 9 base pairs to Subject may result in disease or even death of the Subject. 18 base pairs in length. In some embodiments, longer arrays Accordingly, it is desirable to characterize the cleavage pref are contemplated. In other embodiments, 2-finger modules erences of a nuclease before using it in the laboratory or the recognizing 6-8 nucleotides are combined to generate 4-, 6-, clinic in order to determine its efficacy and safety. Further, the or 8-Zinc finger arrays. In some embodiments, bacterial or characterization of nuclease cleavager properties allows for phage display is employed to develop a Zinc finger domain the selection of the nuclease best suited for a specific task that recognizes a desired nucleic acid sequence, for example, from a group of candidate nucleases, or for the selection of a desired nuclease target site of 3-30 bp in length. Zinc finger evolution products obtained from existing nucleases. Such a nucleases, in some embodiments, comprise a Zinc finger characterization of nuclease cleavage properties may also binding domain and a cleavage domain fused or otherwise inform the de-novo design of nucleases with enhanced prop conjugated to each other via a linker, for example, a polypep erties. Such as enhanced specificity or efficiency. tide linker. The length of the linker determines the distance of 0092. In many scenarios where a nuclease is employed for the cut from the nucleic acid sequence bound by the Zinc the targeted manipulation of a nucleic acid, cleavage speci finger domain. If a shorterlinker is used, the cleavage domain ficity is a crucial feature. The imperfect specificity of some will cut the nucleic acid closer to the bound nucleic acid engineered nuclease binding domains can lead to off-target sequence, while a longer linker will result in a greater dis cleavage and undesired effects both in vitro and in vivo. tance between the cut and the bound nucleic acid sequence. In Current methods of evaluating site-specific nuclease speci Some embodiments, the cleavage domain of a Zinc finger ficity, including ELISA assays, microarrays, one-hybrid sys nuclease has to dimerize in order to cut a bound nucleic acid. tems, SELEX and its variants, and Rosetta-based computa In some such embodiments, the dimer is a heterodimer of two tional predictions, are all premised on the assumption that the monomers, each of which comprise a different Zinc finger binding specificity of nuclease molecules is equivalent or binding domain. For example, in some embodiments, the proportionate to their cleavage specificity. dimer may comprise one monomer comprising Zinc finger 0093. However, the work presented here is based on the domain A conjugated to a FokI cleavage domain, and one discovery that prediction of nuclease off-target binding monomer comprising Zinc finger domain B conjugated to a effects constitutes an imperfect approximation of a FokI cleavage domain. In this nonlimiting example, Zinc fin nucleases off-target cleavage effects that may result in ger domain Abinds a nucleic acid sequence on one side of the undesired biological effects. This finding is consistent with target site, Zinc finger domain B binds a nucleic acid sequence the notion that the reported toxicity of some site specific DNA on the other side of the target site, and the dimerize FokI nucleases results from off-target DNA cleavage, rather than domain cuts the nucleic acid in between the Zinc finger off-target binding alone. domain binding sites. 0094. The methods and reagents provided herein allow for an accurate evaluation of a given nuclease's target site speci DETAILED DESCRIPTION OF CERTAIN ficity and provide strategies for the selection of suitable EMBODIMENTS OF THE INVENTION unique target sites and the design of highly specific nucleases for the targeted cleavage of a single site in the context of a Introduction complex genome. Further, methods, reagents, and strategies 0090 Site-specific nucleases are powerful tools for the provided herein allow those of skill to enhance the specificity targeted modification of a genome. Some site specific and minimize the off-target effects of any given site-specific nucleases can theoretically achieve a level of specificity for a nuclease. While of particular relevance to DNA and DNA target cleavage site that would allow to target a single unique cleaving nucleases, the inventive concepts, methods, strate site in a genome for cleaveage without affecting any other gies, and reagents provided herein are not limited in this genomic site. It has been reported that nuclease cleavage in respect, but can be applied to any nucleic acid:nuclease pair. living cells triggers a DNA repair mechanism that frequently results in a modification of the cleaved, repaired genomic Identifying Nuclease Target Sites Cleaved by a Site-Specific sequence, for example, via homologous recombination. Nuclease Accordingly, the targeted cleavage of a specific unique 0.095 Some aspects of this invention provide methods and sequence within a genome opens up new avenues for gene reagents to determine the nucleic acid target sites cleaved by targeting and gene modification in living cells, including cells any site-specific nuclease. In general. Such methods comprise that are hard to manipulate with conventional gene targeting contacting a given nuclease with a library of target sites under methods, Such as many human Somatic or embryonic stem conditions suitable for the nuclease to bind and cut a target cells. Nuclease-mediated modification of disease-related site, and determining which target sites the nuclease actually sequences, e.g., the CCR-5 allele in HIV/AIDS patients, or of cuts. A determination of a nuclease's target site profile based genes necessary for tumor neovascularization, can be used in on actual cutting has the advantage over methods that rely on the clinical context, and two site specific nucleases are cur binding that it measures a parameter more relevant for medi rently in clinical trials. ating undesired off-target effects of site-specific nucleases. 0091. One important aspect in the field of site-specific 0096. In some embodiments, a method for identifying a nuclease-mediated modification are off-target nuclease target site of a nuclease is provided. In some embodiments, effects, e.g., the cleavage of genomic sequences that differ the method comprises (a) providing a nuclease that cuts a US 2015/0010526 A1 Jan. 8, 2015

double-stranded nucleic acid target site and creates a 5' over target site with high efficiency, a concatemer comprising tar hang, wherein the target site comprises a left-half site get sites will be cut multiple times, resulting in the generation spacer sequence-right-half site (LSR) structure, and the of fragments comprising a single repeat unit. The repeat unit nuclease cuts the target site within the spacer sequence. In released from the concatemer by nuclease cleavage will be of Some embodiments, the method comprises (b) contacting the the structure SR-(constant region)-LS, wherein S and S nuclease with a library of candidate nucleic acid molecules, represent complementary spacer region fragments after being wherein each nucleic acid molecule comprises a concatemer cut by the nuclease. Any repeat units released from library of a sequence comprising a candidate nuclease target site and candidate molecules can then be isolated and/or the sequence a constant insert sequence, under conditions suitable for the of the LSR cleaved by the nuclease identified by sequencing nuclease to cut a candidate nucleic acid molecule comprising the SR and LS regions of released repeat units. a target site of the nuclease. In some embodiments, the 0099. Any method suitable for isolation and sequencing of method comprises (c) filling in the 5' overhangs of a nucleic the repeat units can be employed to elucidate the LSR acid molecule that has been cut twice by the nuclease and sequence cleaved by the nuclease. For example, since the comprises a constant insert sequence flanked by a left half length of the constant region is known, individual released site and cut spacer sequence on one side, and a right half-site repeat units can be separated based on their size from the and cut spacer sequence on the other side, thereby creating larger uncut library nucleic acid molecules as well as from blunt ends. In some embodiments, the method comprises (d) fragments of library nucleic acid molecules that comprise identifying the nuclease target site cut by the nuclease by multiple repeat units (indicating non-efficient targeted cleav determining the sequence of the left-half site, the right-half age by the nuclease). Suitable methods for separating and/or site, and/or the spacer sequence of the nucleic acid molecule isolating nucleic acid molecules based on their size a well of step (c). In some embodiments, the method comprises known to those of skill in the art and include, for example, size providing a nuclease and contacting the nuclease with a fractionation methods, such as gel electrophoresis, density library of candidate nucleic acid molecules comprising can gradient centrifugation, and dialysis over a semi-permeable didate target sites. In some embodiments, the candidate membrane with a suitable molecular cutoff value. The sepa nucleic acid molecules are double-stranded nucleic acid mol rated/isolated nucleic acid molecules can then be further char ecules. In some embodiments, the candidate nucleic acid acterized, for example, by ligating PCR and/or sequencing molecules are DNA molecules. In some embodiments, the adapters to the cut ends and amplifying and/or sequencing the nuclease dimerizes at the target site, and the target site com respective nucleic acids. Further, if the length of the constant prises an LSR structure (left-half site-spacer sequence region is selected to favor self-ligation of individual released right-half site). In some embodiments, the nuclease cuts the repeat units, such individual released repeat units may be target site within the spacer sequence. In some embodiments, enriched by contacting the nuclease treated library molecules the nuclease is a nuclease that cuts a double-stranded nucleic with a ligase and Subsequent amplification and/or sequencing acid target site and creates a 5' overhang. In some embodi based on the circularized nature of the self-ligated individual ments, each nucleic acid molecule in the library comprises a repeat units. concatemer of a sequence comprising a candidate nuclease 0100. In some embodiments, where a nuclease is used that target site and a constant insert sequence. generates 5' overhangs as a result of cutting a target nucleic 0097. For example, in some embodiments, the candidate acid, the 5' overhangs of the cut nucleic acid molecules are nucleic acid molecules of the library comprise the structure filled in. Methods for filling in 5' overhangs are well known to R-(LSR)-(constant region)-R, wherein R1 and R2 are, those of skill in the art and include, for example, methods independently, nucleic acid sequences that may comprise a using DNA polymerase I Klenow fragment lacking exonu fragment of the (LSR)-(constant region) repeat unit, and X clease activity (Klenow (3'->5' exo-)). Filling in 5' overhangs is an integer between 2 and y. In some embodiments, y is at results in the overhang-templated extension of the recessed least 10", at least 10, at least 10, at least 10, at least 10, at Strand, which, in turn, results in blunt ends. In the case of least 10, at least 107, at least 10, at least 10, at least 10", at single repeat units released from library concatemers, the least 10', at least 10°, at least 10", at least 10", or at least resulting structure is a blunt-ended SR-(constant region)- 10'. In some embodiments, y is less than 10°, less than 10, LS', with S' and S' comprising blunt ends. PCR and/or less than 10, less than 10, less than 10, less than 10", less sequencing adapters can then be added to the ends by blunt than 10, less than 10, less than 10", less than 10'', less than end ligation and the respective repeat units (including SR 10', less than 10", less than 10", or less than 10". The and LS' regions) can be sequenced. From the sequence data, constant region, in Some embodiments, is of a length that the original LSR region can be deducted. Blunting of the allows for efficient self ligation of a single repeat unit. Suit overhangs created during the nuclease cleavage process also able lengths will be apparent to those of skill in the art. For allows for distinguishing between target sites that were prop example, in Some embodiments, the constant region is erly cut by the respective nuclease and target sites that were between 100 and 1000 base pairs long, for example, about non-specifically cut e.g., based on non-nuclease effects Such 100 base pairs, about 200 base pairs, about 300 base pairs, as physical shearing. Correctly cleaved nuclease target sites about 400 base pairs, about 450 base pairs, about 500 base can be recognized by the existence of complementary SR pairs, about 600 base pairs, about 700 base pairs, about 800 and LS' regions, which comprise a duplication of the over base pairs, about 900 base pairs, or about 1000base pairs long hang nucleotides as a result of the overhang fill in, while in some embodiments, the constant region is shorter than target sites that were not cleaved by the respective nuclease about 100 base pairs or longer than about 1000 base pairs. are unlikely to comprise overhang nucleotide duplications. In 0098 Incubation of the nuclease with the library nucleic Some embodiments, the method comprises identifying the acids will result in cleavage of those concatemers in the nuclease target site cut by the nuclease by determining the library that comprise target sites that can be bound and sequence of the left-half site, the right-half-site, and/or the cleaved by the nuclease. If a given nuclease cleaves a specific spacer sequence of a released individual repeat unit. Any US 2015/0010526 A1 Jan. 8, 2015

Suitable method for amplifying and/or sequencing can be comprising two or more repeat units. In some embodiments, used to identify the LSR sequence of the target site cleaved by the concentration is over length allows for efficient sequenc the respective nuclease. Methods for amplifying and/or ing of a complete repeat unit in one sequencing read. Suitable sequencing nucleic acid molecules are well known to those of lengths will be apparent to those of skill in the art. For skill in the art and the invention is not limited in this respect. example, in some embodiments, the constant region is 0101 Some of the methods and strategies provided herein between 100 and 1000 base pairs long, for example, about allow for the simultaneous assessment of a plurality of can 100 base pairs, about 200 base pairs, about 300 base pairs, didate target sites as possible cleavage targets for any given about 400 base pairs, about 450 base pairs, about 500 base nuclease. Accordingly, the data obtained from Such methods pairs, about 600 base pairs, about 700 base pairs, about 800 can be used to compile a list of target sites cleaved by a given base pairs, about 900 base pairs, or about 1000base pairs long nuclease, which is also referred to herein as a target site in some embodiments, the constant region is shorter than profile. If they sequencing method is used that allows for the about 100 base pairs or longer than about 1000 base pairs. generation of quantitative sequencing data, it is also possible 0104. An LSR site typically comprises a left-half site to record the relative abundance of any nuclease target site spacer sequence-right-half site structure. The lengths of detected to be cleaved by the respective nuclease. Target sites the half-size and the spacer sequence will depend on the that are cleaved more efficiently by the nuclease will be specific nuclease to be evaluated. In general, the half-sites detected more frequently in the sequencing step, while target will be 6-30 nucleotides long, and preferably 10-18 nucle sites that are not cleaved efficiently will only rarely release an otides long. For example, each half site individually may be 6, individual repeat unit from a candidate concatemer, and thus, 7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, will only generate few, if any sequencing reads. Such quan 25, 26, 27, 28, 29, or 30 nucleotides long. In some embodi titative sequencing data can be integrated into a target site ments, an LSR site may be longer than 30 nucleotides. In profile to generate a ranked list of highly preferred and less some embodiments, the left half site and the right half site of preferred nuclease target sites. an LSR are of the same length. In some embodiments, the left 0102 The methods and strategies of nuclease target site half site and the right half site of an LSR are of different profiling provided herein can be applied to any site-specific lengths. In some embodiments, the left half site and the right nuclease, including, for example, ZFNs, TALENs, and hom half site of an LSR are of different sequences. In some ing endonucleases. As described in more detail herein, embodiments, a library is provided that comprises candidate nuclease specificity typically decreases with increasing nucleic acids which comprise LSRs that can be cleaved by a nuclease concentration, and the methods described hereincan FokI cleavage domain, a Zinc Finger Nuclease (ZFN), a Tran be used to determine a concentration at which a given scription Activator-Like Effector Nuclease (TALEN), a hom nuclease efficiently cuts its intended target site, but does not ing endonuclease, an organic compound nuclease, an ene efficiently cut any off target sequences. In some embodi diyne, an antibiotic nuclease, dynemicin, neocarzinostatin, ments, a maximum concentration of a therapeutic nuclease is calicheamicin, esperamicin, and/or bleomycin. determined at which the therapeutic nuclease cuts its intended 0105. In some embodiments, a library of candidate nucleic nuclease target site, but does not cut more than 10, more than acid molecules is provided that comprises at least 10, at least 5, more than 4, more than 3, more than 2, more than 1, or any 10, at least 107, at least 10, at least 10, at least 10', at least additional nuclease target sites. In some embodiments, a 10'', or at least 10" different candidate nuclease target sites. therapeutic nuclease is administered to a Subject in an amount In some embodiments, the candidate nucleic acid molecules effective to generate a final concentration equal or lower to of the library are concatemers produced from a secularized the maximum concentration determined as described above. templates by rolling cycle amplification. In some embodi ments, the library comprises nucleic acid molecules, e.g., Nuclease Target Site Libraries concatemers, of a molecular weight of at least 5 kDa, at least 0103 Some embodiments of this invention provide librar 6 kDa, at least 7 kDa, at least 8 kDa, at least 9 kDa, at least 10 ies of nucleic acid molecules for nuclease target site profiling. kDa, at least 12 kDa, or at least 15 kDa. in some embodi In some embodiments such a library comprises a plurality of ments, the molecular weight of the nucleic acid molecules nucleic acid molecules, each comprising a concatemer of a within the library may be larger than 15 kDa. In some embodi candidate nuclease target site and a constant insert sequence ments, the library comprises nucleic acid molecules within a spacer sequence. For example, in Some embodiments, the specific size range, for example, within a range of 5-7 kDa, candidate nucleic acid molecules of the library comprise the 5-10 kDa, 8-12 kDa, 10-15 kDa, or 12-15 kDa, or 5-10 kDa or structure R-(LSR)-(constant region)-R, wherein R1 and any possible Subrange. While some methods Suitable forgen R2 are, independently, nucleic acid sequences that may com erating nucleic acid concatemers according to Some aspects prise a fragment of the (LSR)-(constant region) repeat unit, of this invention result in the generation of nucleic acid mol and X is an integer between 2 and y. In some embodiments, y ecules of greatly different molecular weights, such mixtures is at least 10", at least 10, at least 10, at least 10, at least 10, of nucleic acid molecules may be size fractionated to obtain a at least 10, at least 107, at least 10, at least 10, at least 10", desired size distribution. Suitable methods for enriching at least 10'', at least 10', at least 10", at least 10", or at least nucleic acid molecules of a desired size or excluding nucleic 10'. In some embodiments, y is less than 10°, less than 10, acid molecules of a desired size are well known to those of less than 10, less than 10, less than 10°, less than 107, less skill in the art and the invention is not limited in this respect. than 10, less than 10, less than 10", less than 10'', less than 0106. In some embodiments, a library is provided com 10°, less than 10", less than 10", or less than 10'. The prising candidate nucleic acid molecules that comprise target constant region, in Some embodiments, is of a length that sites with a partially randomized left-half site, a partially allows for efficient selfligation of a single repeat unit. In some randomized right-half site, and/or a partially randomized embodiments, the constant region is of a length that allows for spacer sequence. In some embodiments, the library is pro efficient separation of single repeat units from fragments vided comprising candidate nucleic acid molecules that com US 2015/0010526 A1 Jan. 8, 2015 prise target sites with a partially randomized left half site, a example, a plurality of variations of a given site-specific fully randomized spacer sequence, and a partially random nuclease, for example a given zinc finger nuclease. Accord ized right half site. In some embodiments, partially random ingly, such methods may be used as the selection step in ized sites differ from the consensus site by more than 5%. evolving or designing a novel site-specific nucleases with more than 10%, more than 15%, more than 20%, more than improved specificity. 25%, or more than 30% on average, distributed binomially. In Identifying Unique Nuclease Target Sites within a Genome some embodiments, partially randomized sites differ from the 01.09. Some embodiments of this invention provide a consensus site by no more than 10%, no more than 15%, no method for selecting a nuclease target site within a genome. more than 20%, no more than 25%, nor more than 30%, no As described in more detail elsewhere herein, it was surpris more than 40%, or no more than 50% on average, distributed ingly discovered that off target sites cleaved by a given binomially. For example, in some embodiments partially ran nuclease are typically highly similar to the consensus target domized sites differ from the consensus site by more than 5%. site, e.g., differing from the consensus target site in only one. but by no more than 10%; by more than 10%, but by no more only two, only three, only four, or only five nucleotide resi than 20%; by more than 20%, but by no more than 25%; by dues. Based on this discovery, a nuclease target sites within more than 5%, but by no more than 20%, and so on. Using the genome can be selected to increase the likelihood of a partially randomized nuclease target sites in the library is nuclease targeting this site not cleaving any off target sites useful to increase the concentration of library members com within the genome. For example, in some embodiments, a prising target sites that are closely related to the consensus method is provided that comprises identifying a candidate site, for example, that differ from the consensus sites in only nuclease target site; and comparing the candidate nuclease one, only two, only three, only four, or only five residues. The target site to other sequences within the genome. Methods for rationale behind this is that a given nuclease, for example a comparing candidate nuclease target sites to other sequences given ZFN, is likely to cut its intended target site and any within the genome are well known to those of skill in the art closely related target sites, but unlikely to cut a target sites that and include for example sequence alignment methods, for is vastly different from or completely unrelated to the example, using a sequence alignment software or algorithm intended target site. Accordingly, using a library comprising such as BLAST on a general purpose computer. A Suitable partially randomized target sites can be more efficient than unique nuclease target site can then be selected based on the using libraries comprising fully randomized target sites with results of the sequence comparison. In some embodiments, if out compromising the sensitivity in detecting any off target the candidate nuclease target site differs from any other cleavage events for any given nuclease. Thus, the use of sequence within the genome by at least 3, at least 4, at least 5, partially randomized libraries significantly reduces the cost at least 6, at least 7, at least 8, at least 9, or at least 10 and effort required to produce a library having a high likeli nucleotides, the nuclease target site is selected as a unique site hood of covering virtually all off target sites of a given within the genome, whereas if the site does not fulfill this nuclease. In some embodiments however it may be desirable criteria, the site may be discarded. In some embodiments. to use a fully randomized library of target sites, for example, once a site is selected based on the sequence comparison, as in embodiments, where the specificity of a given nuclease is outlined above, a site-specific nuclease targeting the selected to be evaluated in the context of any possible site in a given site is designed. For example, a zinc finger nuclease may be genome. designed to target any selected nuclease target site by con structing a zinc finger array binding the target site, and con Selection and Design of Site-Specific Nucleases jugating the zinc finger array to a DNA cleavage domain. In 0107 Some aspects of this invention provide methods and embodiments where the DNA cleavage domain needs to strategies for selecting and designing site-specific nucleases dimerize in order to cleave DNA, to zinc finger arrays will be that allow the targeted cleavage of a single, unique sites in the designed, each binding a half site of the nuclease target site, context of a complex genome. In some embodiments, a and each conjugated to a cleavage domain. In some embodi method is provided that comprises providing a plurality of ments, nuclease designing and/or generating is done by candidate nucleases that are designed or known to cut the recombinant technology. Suitable recombinant technologies same consensus sequence; profiling the target sites actually are well known to those of skill in the art, and the invention is cleaved by each candidate nuclease, thus detecting any not limited in this respect. cleaved off-target sites (target sites that differ from the con 0110. In some embodiments, a site-specific nuclease sensus target site); and selecting a candidate nuclease based designed or generated according to aspects of this invention is on the off-target site(s) so identified. In some embodiments, isolated and/or purified. The methods and strategies for this method is used to select the most specific nuclease from designing site-specific nucleases according to aspects of this a group of candidate nucleases, for example, the nuclease that invention can be applied to design or generate any site-spe cleaves the consensus target site with the highest specificity, cific nuclease, including, but not limited to Zinc Finger the nuclease that cleaves the lowest number of off-target sites, Nucleases, Transcription Activator-Like Effector Nucleases the nuclease that cleaves the lowest number of off-target sites (TALENs), homing endonucleases, organic compound in the context of a target genome, or a nuclease that does not nucleases, enediyne nucleases, antibiotic nucleases, and cleave any target site other than the consensus target site. In dynemicin, neocarzinostatin, calicheamicin, esperamicin, some embodiments, this method is used to select a nuclease bleomycin, or a derivative thereof variants or derivatives. that does not cleave any off-target site in the context of the genome of a subject at concentration that is equal to or higher Site-Specific Nucleases thana therapeutically effective concentration of the nuclease. 0111. Some aspects of this invention provide isolated site 0108. The methods and reagents provided herein can be specific nucleases with enhanced specificity that are designed used, for example, to evaluate a plurality of different using the methods and strategies described herein. Some nucleases targeting the same intended targets site, for embodiments, of this invention provide nucleic acids encod US 2015/0010526 A1 Jan. 8, 2015

ing Such nucleases. Some embodiments of this invention hereafter developed in the art of pharmacology. In general, provide expression constructs comprising Such encoding Such preparatory methods include the step of bringing the nucleic acids. For example, in some embodiments an isolated active ingredient into association with an excipient and/or one nuclease is provided that has been engineered to cleave a or more other accessory ingredients, and then, if necessary desired target site within a genome, and has been evaluated and/or desirable, shaping and/or packaging the product into a according to a method provided herein to cut less than 1, less desired single- or multi-dose unit. than 2, less than 3, less than 4, less than 5, less than 6, less than 0115 Pharmaceutical formulations may additionally 7, less than 8, less than 9 or less than 10 off-target sites at a comprise a pharmaceutically acceptable excipient, which, as concentration effective for the nuclease to cut its intended used herein, includes any and all solvents, dispersion media, target site. In some embodiments an isolated nuclease is pro diluents, or other liquid vehicles, dispersion or Suspension vided that has been engineered to cleave a desired unique aids, Surface active agents, isotonic agents, thickening or target site that has been selected to differ from any other site emulsifying agents, preservatives, solid binders, lubricants within a genome by at least 3, at least 4, at least 5, at least 6, and the like, as Suited to the particular dosage form desired. at least 7, at least 8, at least 9, or at least 10 nucleotide Remington's The Science and Practice of Pharmacy, 21 residues. In some embodiments, the isolated nuclease is a Edition, A. R. Gennaro (Lippincott, Williams & Wilkins, Zinc Finger Nuclease (ZFN) or a Transcription Activator Baltimore, Md., 2006; incorporated herein by reference) dis Like Effector Nuclease (TALEN), a homing endonuclease, or closes various excipients used informulating pharmaceutical is or comprises an organic compound nuclease, an enediyne. compositions and known techniques for the preparation an antibiotic nuclease, dynemicin, neocarzinostatin, cali thereof. Exceptinsofar as any conventional excipient medium cheamicin, esperamicin, bleomycin, or a derivative thereof. is incompatible with a Substance or its derivatives. Such as by In some embodiments, the isolated nuclease cleaves a con producing any undesirable biological effect or otherwise sensus target site within an allele that is associated with a interacting in a deleterious manner with any other component disease or disorder. In some embodiments, the isolated (s) of the pharmaceutical composition, its use is contemplated nuclease cleaves a consensus target site the cleavage of which to be within the scope of this invention. results in treatment or prevention of a disease or disorder. In 0116. The function and advantage of these and other some embodiments, the disease is HIV/AIDS, or a prolifera embodiments of the present invention will be more fully tive disease. In some embodiments, the allele is a CCR5 (for understood from the Examples below. The following treating HIV/AIDS) or a VEGFA allele (for treating a prolif Examples are intended to illustrate the benefits of the present erative disease). invention and to describe particular embodiments, but are not 0112. In some embodiments, the isolated nuclease is pro intended to exemplify the full scope of the invention. Accord vided as part of a pharmaceutical composition. For example, ingly, it will be understood that the Examples are not meant to Some embodiments provide pharmaceutical compositions limit the scope of the invention. comprising a nuclease as provided herein, or a nucleic acid encoding Such a nuclease, and a pharmaceutically acceptable EXAMPLES excipient. Pharmaceutical compositions may optionally com prise one or more additional therapeutically active Sub Example 1 Stances. 0113. In some embodiments, compositions provided Zinc Finger Nucleases herein are administered to a subject, for example, to a human Subject, in order to effect a targeted genomic modification Introduction within the subject. In some embodiments, cells are obtained 0117. Zinc finger nucleases (ZFNs) are enzymes engi from the Subject and contacted with a nuclease or a nuclease neered to recognize and cleave desired target DNA encoding nucleic acid ex vivo, and re-administered to the sequences. A ZFN monomer consists of a zinc finger DNA Subject after the desired genomic modification has been binding domain fused with a non-specific FokI restriction effected or detected in the cells. Although the descriptions of endonuclease cleavage domain'. Since the FokI nuclease pharmaceutical compositions provided herein are principally domain must dimerize and bridge two DNA half-sites to directed to pharmaceutical compositions which are Suitable cleave DNA, ZFNs are designed to recognize two unique for administration to humans, it will be understood by the sequences flanking a spacer sequence of variable length and skilled artisan that such compositions are generally Suitable to cleave only when bound as a dimer to DNA. ZFNs have for administration to animals of all sorts. Modification of been used for genome engineering in a variety of organisms pharmaceutical compositions suitable for administration to including mammals by stimulating either non-homologous humans in order to render the compositions suitable for end joining or homologous recombination. In addition to administration to various animals is well understood, and the providing powerful research tools, ZFNs also have potential ordinarily skilled Veterinary pharmacologist can design and/ as gene therapy agents. Indeed, two ZFNs have recently or perform Such modification with merely ordinary, if any, entered clinical trials: one as part of an anti-HIV therapeutic experimentation. Subjects to which administration of the approach (NCT00842634, NCT01044654, NCT01252641) pharmaceutical compositions is contemplated include, but and the other to modify cells used as anti-cancer therapeutics are not limited to, humans and/or other primates; mammals, (NCT01082926). including commercially relevant mammals such as cattle, 0118 DNA cleavage specificity is a crucial feature of pigs, horses, sheep, cats, dogs, mice, and/or rats; and/or birds, ZFNs. The imperfect specificity of some engineered zinc including commercially relevant birds such as chickens, fingers domains has been linked to cellular toxicity' and ducks, geese, and/or turkeys. therefore determining the specificities of ZFNs is of signifi 0114 Formulations of the pharmaceutical compositions cant interest. ELISA assays', microarrays', a bacterial one described herein may be prepared by any method known or hybrid system', SELEX and its variants', and Rosetta US 2015/0010526 A1 Jan. 8, 2015

based computational predictions' have all been used to with at least 10-fold excess all DNA sequences that are seven characterize the DNA-binding specificity of monomeric Zinc or fewer mutations from the wild-type target sequences. finger domains in isolation. However, the toxicity of ZFNs is 0.122 We incubated the CCR5-224 or VF2468 DNA believed to result from DNA cleavage, rather than binding cleavage site library at a total cleavage site concentration of alone''. As a result, information about the specificity of 14 nM with two-fold dilutions, ranging from 0.5 nM to 4 nM, Zinc finger nucleases to date has been based on the unproven of crude in vitro-translated CCR5-224 or VF2468, respec assumptions that (i) dimeric Zinc finger nucleases cleave tively (FIG. 6). Following digestion, we subjected the result DNA with the same sequence specificity with which isolated ing DNA molecules (FIG. 7) to in vitro selection for DNA monomeric Zinc finger domains bind DNA; and (ii) the bind cleavage and Subsequent paired-end high-throughput DNA ing of one Zinc finger domain does not influence the binding sequencing. Briefly, three selection steps (FIG. 1) enabled the of the other zinc finger domain in a given ZFN. The DNA separation of sequences that were cleaved from those that binding specificities of monomeric Zinc finger domains have were not. First, only sites that had been cleaved contained 5' been used to predict potential off-target cleavage sites of phosphates, which are necessary for the ligation of adapters dimeric ZFNs in genomes'', but to our knowledge no study required for sequencing. Second, after PCR, a gel purification to date has reported a method for determining the broad DNA step enriched the smaller, cleaved library members. Finally, a cleavage specificity of active, dimeric Zinc finger nucleases. computational filter applied after sequencing only counted 0119. In this work we present an in vitro selection method sequences that have filled-in, complementary 5' overhangs on to broadly examine the DNA cleavage specificity of active both ends, the hallmark for cleavage of a target site concate ZFNs. Our selection was coupled with high-throughput DNA mer (Table 2 and Protocols 1-9). We prepared pre-selection sequencing technology to evaluate two obligate het library sequences for sequencing by cleaving the library at a erodimeric ZFNs, CCR5-224, currently in clinical trials PVul restriction endonuclease recognition site adjacent to the (NCT00842634, NCT01044654, NCT01252641), and library sequence and Subjecting the digestion products to the VF2468, that targets the human VEGF-A promoter, for their same protocol as the ZFN-digested library sequences. High abilities to cleave each of 10' potential target sites. We iden throughput sequencing confirmed that the rolling-circle-am tified 37 sites present in the human genome that can be plified, pre-selection library contained the expected distribu cleaved in vitro by CCR5-224, 2,652 sites in the human tion of mutations (FIG. 8). genome that can be cleaved in vitro by VF2468, and hundreds of thousands of invitro cleavable sites for both ZFNs that are Design of an In Vitro Selection for ZFN-Mediated DNA not present in the human genome. To demonstrate that sites Cleavage. identified by our in vitro selection can also be cleaved by I0123 To characterize comprehensively the DNA cleavage ZFNs in cells, we examined 34 or 90 sites for evidence of specificity of active ZFNs, we first generated a large library of ZFN-induced mutagenesis in cultured human K562 cells potential DNA substrates that can be selected for DNA cleav expressing the CCR5-224 or VF2468ZFNs, respectively. Ten age in one step without requiring iterative enrichment steps of the CCR5-224 sites and 32 of the VF2468 sites we tested that could amplify noise and introduce bias. We designed the show DNA sequence changes consistent with ZFN-mediated substrate library such that each molecule in the library is a cleavage in human cells, although we anticipate that cleavage concatemer of one of >10' potential substrate sequences is likely to be dependent on cell type and ZFN concentration. (FIG. 5). Incubation with ZFN results in some molecules that One CCR5-224 off-target site lies in a promoter of the malig are uncut, some that have been cut once, and some that have nancy-associated BTBD10 gene. been cut at least twice. Those molecules that have been 0120) Our results, which could not have been obtained by cleaved at least twice have ends consisting of each half of the determining binding specificities of monomeric Zinc finger cleaved DNA sequence (FIG. 1). Cut library members are domains alone, indicate that excess DNA-binding energy enriched relative to uncut library members in three ways results in increased off-target ZFN cleavage activity and Sug (FIG. 1). First, sequences that have been cleaved twice have gest that ZFN specificity can be enhanced by designing ZFNs two complementary 5' overhangs, which can be identified with decreased binding affinity, by lowering ZFN expression computationally following DNA sequencing as hallmarks of levels, and by choosing target sites that differ by at least three bona fide cleavage products. Second, since ZFN-mediated base pairs from their closest sequence relatives in the genome. cleavage reveals 5' phosphates that are not present in the pre-selection library, only DNA that has undergone cleavage Results is amenable to sequencing adapter ligation. Third, after PCR using primers complementary to the sequencing adapters, a In Vitro Selection for ZFN-Mediated DNA Cleavage gel purification step ensures that all sequenced material is of a length consistent with library members that have been 0121 Libraries of potential cleavage sites were prepared cleaved at two adjacent sites. This gel-purified material is as double-stranded DNA using synthetic primers and PCR Subjected to high-throughput DNA sequencing using the Illu (FIG. 5). Each partially randomized position in the primer mina method (Bentley, D. R. et al. Accurate whole human was synthesized by incorporating a mixture containing 79% genome sequencing using reversible terminator chemistry. wild-type phosphoramidite and 21% of an equimolar mixture Nature 456, 53-9 (2008)). Ideally, the library used in a ZFN of all three other phosphoramidites. Library sequences there cleavage selection would consist of every possible DNA fore differed from canonical ZFN cleavage sites by 21% on sequence of the length recognized by the ZFN. Only one out average, distributed binomially. We used a blunt ligation of every 105 members of such a library, however, would strategy to create a 10'-member minicircle library. Using contain a sequence that was within seven mutations of a rolling-circle amplification, >10' members of this library 24-base pair recognition sequence. Since off-target recogni were both amplified and concatenated into high molecular tion sequences most likely resemble target recognition sites, weight (>12 kb) DNA molecules. In theory, this library covers we used instead a biased library that ensures >10-fold cover US 2015/0010526 A1 Jan. 8, 2015 age of all half-site sequences that differ from the wild-type emerging from the more stringent (low ZFN concentration) recognition sequences by up to seven mutations. Library selections were cleaved more efficiently than those from the members consist of a fully randomized base pair adjacent to less stringent selections. Notably, all of the tested sequences the 5' end of the recognition site, two partially randomized contain several mutations, yet some were cleaved in vitro half sites flanking a 4-, 5-, 6-, or 7-bp fully randomized spacer, more efficiently than the designed target. and another fully randomized base pair adjacent to the 3' end 0.126 The DNA-cleavage specificity profile of the dimeric of the recognition site. A fully randomized five-base pair tag CCR5-224 ZFN (FIG. 2a and FIG. 10a,b) was notably dif follows each library member. This tag, along with the ran ferent than the DNA-binding specificity profiles of the CCR5 domized flanking base pairs and the randomized spacer 224 monomers previously determined by SELEX. For sequence, was used as a unique identifier 'key' for each example, Some positions, such as (+)A5 and (+)T9, exhibited library member. If this unique key was associated with more tolerance for off-target base pairs in our cleavage selection than one sequence read containing identical library members, that were not predicted by the SELEX study. VF2468, which these duplicate sequencing reads likely arose during PCR had not been previously characterized with respect to either amplification and were therefore treated as one data point. DNA-binding or DNA-cleavage specificity, revealed two positions, (-)C5 and (+)A9, that exhibited limited sequence Analysis of CCR5-224 and VF2468 ZFNs Using the DNA preference, Suggesting that they were poorly recognized by Cleavage Selection. the ZFNs (FIG.2b and FIG. 10c,d). 0.124. Each member of a sequence pair consisted of a Compensation Between Half-Sites Affects DNA Recognition fragment of the spacer, an entire half-site, an adjacent nucle otide, and constant sequence. One end of the spacer was 0127 Our results reveal that ZFN substrates with muta generally found in one sequence and the other end in its tions in one half-site are more likely to have additional muta corresponding paired sequence, with the overhang sequence tions in nearby positions in the same half-site compared to the present in both paired sequence reads because overhangs pre-selection library and less likely to have additional muta were blunted by extension prior to ligation of adapters. The tions in the other half-site. While this effect was found to be spacer sequences were reconstructed by first identifying the largest when the most strongly recognized base pairs were shared overhang sequence and then any nucleotides present mutated (FIG. 11), we observed this compensatory phenom between the overhang sequence and the half-site sequence. enon for all specified half-site positions for both the CCR5 Only sequences containing no ambiguous nucleotides and and VEGF-targeting ZFNs (FIG.3 and FIG. 12). For a minor overhangs of at least 4 nucleotides were analyzed. Overall, ity of nucleotides in cleaved sites, such as VF2468 target site this computational screen for unique sequences that origi positions (+)G1, (-)G1, (-)A2, and (-)C3, mutation led to nated from two cleavage events on identical library members decreased tolerance of mutations in base pairs in the other yielded 2.0 million total reads of cleaved library members half-site and also a slight decrease, rather than an increase, in (Table 2). There are far fewer analyzed sequences for the 0.5 mutational tolerance in the same half-site. When two of these nM, 1 nM, and 2 nM CCR5-224 and VF2468 selections mutations, (+)G1 and (-)G1, were enforced at the same time, compared to the 4 nM selections due to the presence of a large mutational tolerance at all other positions decreased (FIG. number of sequence repeats, identified through the use of the 13). Collectively, these results show that tolerance of muta unique identifier key described above. The high abundance of tions at one half-site is influenced by DNA recognition at the repeated sequences in the 0.5 nM, 1 nM, and 2 nM selections other half-site. indicate that the number of sequencing reads obtained in I0128. This compensation model for ZFN site recognition those selections, before repeat sequences were removed, was applies not only to non-ideal half-sites, but also to spacers larger than the number of individual DNA sequences that with non-ideal lengths. In general, the ZFNs cleaved at char survived all experimental selection steps. We estimated the acteristic locations within the spacers (FIG. 14), and five- and error rate of sequencing to be 0.086% per nucleotide by six-base pair spacers were preferred over four- and seven analysis of a constant nucleotide in all paired reads. Using this base pair spacers (FIGS. 15 and 16). However, cleaved sites error rate, we estimate that 98% of the post-selection ZFN with five- or six-base pair spacers showed greater sequence target site sequences contain no errors. tolerance at the flanking half-sites than sites with four- or seven-base pair spacers (FIG. 17). Therefore, spacer imper Off-Target Cleavage is Dependent on ZFN Concentration fections, similar to half-site mutations, lead to more stringent in vitro recognition of other regions of the DNA substrate. 0.125. As expected, only a subset of library members was ZFNs can Cleave Many Sequences with Up to Three Muta cleaved by each enzyme. The pre-selection libraries for tions CCR5-224 and VF2468 contained means of 4.56 and 3.45 I0129. We calculated enrichment factors for all sequences mutations per complete target site (two half-sites), respec containing three or fewer mutations by dividing each tively, while post-selection libraries exposed to the highest sequences frequency of occurrence in the post-selection concentrations of ZFN used (4 nM CCR5-224 and 4 nM libraries by its frequency of occurrence in the pre-selection VF2468) had means of 2.79 and 1.53 mutations per target libraries. Among sequences enriched by cleavage (enrich site, respectively (FIG. 8). As ZFN concentration decreased, ment factor-1), CCR5-224 was capable of cleaving all both ZFNs exhibited less tolerance for off-target sequences. unique single-mutant sequences, 93% of all unique double At the lowest concentrations (0.5 nM CCR5-224 and 0.5 nM mutant sequences, and half of all possible triple-mutant VF2468), cleaved sites contained an average of 1.84 and 1.10 sequences (FIG. 4a and Table 3a) at the highest enzyme mutations, respectively. We placed a small subset of the iden concentration used. VF2468 was capable of cleaving 98% of tified sites in a new DNA context and incubated in vitro with all unique single-mutant sequences, half of all unique double 2nMCCR5-224 or 1 nMVF2468 for 4 hours at 37° C. (FIG. mutant sequences, and 17% of all triple-mutant sequences 9). We observed cleavage for all tested sites and those sites (FIG. 4b and Table 3b). US 2015/0010526 A1 Jan. 8, 2015

0130 Since our approach assays active ZFN dimers, it as seven others we observed in cells were not identified in a reveals the complete sequences of ZFN sites that can be recent study that used in vitro monomer-binding data to cleaved. Ignoring the sequence of the spacer, the selection predict potential CCR5-224 substrates. revealed 37 sites in the human genome with five- or six-base 0.133 We have previously shown that ZFNs that can cleave pairspacers that can be cleaved in vitro by CCR5-224 (Table at sites in one cell line may not necessarily function in a 1 and Table 4), and 2,652 sites in the human genome that can different cell line, most likely due to local differences in be cleaved by VF2468 (VF2468 Data). Among the genomic chromatin structure. Therefore, it is likely that a different sites that were cleaved in vitro by VF2468, 1,428 sites had subset of the invitro-cleavable off-target sites would be modi three or fewer mutations relative to the canonical target site fied by CCR5-224 or VF2468 when expressed indifferent cell (excluding the spacer sequence). Despite greater discrimina lines. Purely cellular studies of endonuclease specificity, such tion against single-, double-, and triple-mutant sequences by as a recent study of homing endonuclease off-target cleav VF2468 compared to CCR5-224 (FIG. 4 and Table 3), the age, may likewise be influenced by cell line choice. While larger number of in vitro-cleavable VF2468 sites reflects the our in vitro method does not account for some features of difference in the number of sites in the human genome that are cellular DNA, it provides general, cell type-independent three or fewer mutations away from the VF2468 target site information about endonuclease specificity and off-target (3,450 sites) versus those that are three or fewer mutations sites that can inform Subsequent studies performed in cell away from the CCR5-224 target site (eight sites) (Table 5). types of interest. In addition, while our pre-selection library oversamples with at least 10-fold coverage all sequences Identified Sites are Cleaved by ZFNs in Human Cells within seven mutations of the intended ZFN target sites, the number of sequence reads obtained per selection (approxi 0131 We tested whether CCR5-224 could cleave at sites mately one million) is likely insufficient to cover all cleaved identified by our selections in human cells by expressing sequences present in the post-selection libraries. It is there CCR5-224 in K562 cells and examining 34 potential target fore possible that additional off-target cleavage sites for sites within the human genome for evidence of ZFN-induced CCR5-224 and VF2468 could be identified in the human mutations using PCR and high-throughput DNA sequencing. genome as sequencing capabilities continue to improve. We defined sites with evidence of ZFN-mediated cleavage as those with insertion or deletion mutations (indels) character I0134. Although both ZFNs we analyzed were engineered istic of non-homologous end joining (NHEJ) repair (Table 6) to a unique sequence in the human genome, both cleave a that were significantly enriched (P<0.05) in cells expressing significant number of off-target sites in cells. This finding is active CCR5-224 compared to control cells containing an particularly surprising for the four-finger CCR5-224 pair empty vector. We obtained approximately 100,000 sequences given that its theoretical specificity is 4,096-fold better than or more for each site analyzed, which enabled the detection of that of the three-finger VF2468 pair (CCR5-224 should rec sites that were significantly modified at frequencies of ognize a 24-base pair site that is six base pairs longer than the approximately 1 in 10,000. Our analysis identified ten such 18-base pair VF2468 site). Examination of the CCR5-224 sites: the intended target sequence in CCR5, a previously and VF2468 cleavage profiles (FIG. 2) and mutational toler identified sequence in CCR2, and eight other off-target ances of sequences with three or fewer mutations (FIG. 4) sequences (Tables 1, 4, and 6), one of which lies within the Suggests different strategies may be required to engineer vari promoter of the BTBD10 gene. The eight newly identified ants of these ZFNs with reduced off-target cleavage activities. off-target sites are modified at frequencies ranging from 1 in The four-finger CCR5-224 ZFN showed a more diffuse range 300 to 1 in 5,300. We also expressed VF2468 in cultured of positions with relaxed specificity and a higher tolerance of K562 cells and performed the above analysis for 90 of the mutant sequences with three or fewer mutations than the most highly cleaved sites identified by in vitro selection. Out three-finger VF2468 ZFN. For VF2468, re-optimization of of the 90 VF2468 sites analyzed, 32 showed indels consistent only a Subset offingers may enable a Substantial reduction in with ZFN-mediated targeting in K562 cells (Table 7). We undesired cleavage events. For CCR5-224, in contrast, a more were unable to obtain site-specific PCR amplification prod extensive re-optimization of many or all fingers may be ucts for three CCR5-224 sites and seven VF2468 sites and required to eliminate off-target cleavage events. therefore could not analyze the occurrence of NHEJ at those I0135 We note that not all four- and three-finger ZFNs will loci. Taken together, these observations indicate that off-tar necessarily be as specific as the two ZFNs tested in this study. get sequences identified through the in vitro selection method Both CCR5-224 and VF2468 were engineered using methods include many DNA sequences that can be cleaved by ZFNs in designed to optimize the binding activity of the ZFNs. Previ human cells. ous work has shown that for both three-finger and four-finger ZFNs, the specific methodology used to engineer the ZFN pair can have a tremendous impact on the quality and speci Discussion ficity of nucleases''. 0132) The method presented here identified hundreds of 0.136. Our findings have significant implications for the thousands of sequences that can be cleaved by two active, design and application of ZFNs with increased specificity. dimeric ZFNS, including many that are present and can be cut Half or more of all potential substrates with one or two site in the genome of human cells. One newly identified cleavage mutations could be cleaved by ZFNS, Suggesting that binding site for the CCR5-224 ZFN is within the promoter of the affinity between ZFN and DNA substrate is sufficiently high BTBD10 gene. When downregulated, BTBD10 has been for cleavage to occur even with Suboptimal molecular inter associated with malignancy and with pancreatic beta cell actions at mutant positions. We also observed that ZFNs apoptosis. When upregulated, BTBD10 has been shown to presented with sites that have mutations in one half-site enhance neuronal cell growth and pancreatic beta cell pro exhibited higher mutational tolerance at other positions liferation through phosphorylation of Akt family proteins’ within the mutated half-site and lower tolerance at positions 23. This potentially important off-target cleavage site as well in the other half-site. These results collectively suggest that in US 2015/0010526 A1 Jan. 8, 2015

order to meet a minimum affinity threshold for cleavage, a be minimized by designing ZFNs to target sites that do not shortage of binding energy from a half-site harboring an have relatives in the genome within three mutations; and (iii) off-target base pair must be energetically compensated by ZFNs should be used at the lowest concentrations necessary excess Zinc finger:DNA binding energy in the other half-site, to cleave the target sequence to the desired extent. While this which demands increased sequence recognition Stringency at study focused on ZFNs, our method should be applicable to the non-mutated half-site (Fig. S18). Conversely, the relaxed all sequence-specific endonucleases that cleave DNA invitro, stringency at other positions in mutated half-sites can be including engineered homing endonucleases and engineered explained by the decreased contribution of that mutant half transcription activator-like effector (TALE) nucleases. This site to overall ZFN binding energy. This hypothesis is sup approach can provide important information when choosing ported by a recent study showing that reducing the number of target sites in genomes for sequence-specific endonucleases, Zinc fingers in a ZFN can actually increase, rather than and when engineering these enzymes, especially for thera decrease, activity. peutic applications. 0.137 This model also explains our observation that sites with Suboptimal spacer lengths, which presumably were Methods bound less favorably by ZFNs, were recognized with higher stringency than sites with optimal spacer lengths. In vitro 0.139 Oligonucleotides and Sequences. spacer preferences do not necessarily reflect spacer prefer 0140 All oligonucleotides were purchased from Inte ences in cells;’ however, our results suggest that the grated DNA Technologies or Invitrogen and are listed in dimeric FokI cleavage domain can influence ZFN target-site Table 8. Primers with degenerate positions were synthesized recognition. Consistent with this model, Wolfe and co-work by Integrated DNA Technologies using hand-mixed phos ers recently observed differences in the frequency of off phoramidites containing 79% of the indicated base and 7% of target events in zebrafish of two ZFNs with identical zinc each of the other standard DNA bases. finger domains but different FokI domain variants.' 0141 Sequences of ZFNs Used in this Study. 0138 Collectively, our findings suggest that (i) ZFN 0.142 DNA and protein sequences are shown for the ZFNs specificity can be increased by avoiding the design of ZFNs used in this study. The T7 promoter is underlined, and the start with excess DNA binding energy; (ii) off-target cleavage can codon is in bold.

CCR5-224 (+) DNA sequence (SEQ ID NO: 119): TAATACGACT CACTATAGGGAGACCCAAGCTGGCTAGCCACCAGGACTACAAAGACCATGACGGTGATTATAAA

GATCATGACATCGATTACAAGGATGACGATGACAAGATGGCCCCCAAGAAGAAGAGGAAGGTGGGCATTCACGGG

GTACCCGCCGCTATGGCTGAGAGGCCCTTCCAGTGTCGAATCTGCATGCGTAACTTCAGTGATCGCTCTAACCTG

AGTCGGCACATCCGCACCCACACAGGCGAGAAGCCTTTTGCCTGTGACATTTGTGGGAGGAAGTTTGCCATCTCC

TCCAACCTGAACTCCCATACCAAGATACACACGGGATCTCAGAAGCCCTTCCAGTGTCGAATCTGCATGCGTAAC

TTCACGTCGCTCCGACAACCTGGCCCGCCACATCCGCACCCACACAGGCGAGAAGCCTTTTGCCTGTGACATTTGT

GGGAGGAAATTTGCCACCTCCGGCAACCTGACCCGCCATACCAAGATACACCTGCGGGGATCCCAACTAGTCAAA

AGTGAACTGGAGGAGAAGAAATCTGAACTTCGTCATAAATTGAAATATGTGCCTCATGAATATATTGAATTAATT

GAAATTGCCAGAAATTCCACTCAGGATAGAATTCTTGAAATGAAGGTAATGGAATTTTTTATGAAAGTTTATGGA

TATAGAGGTAAACATTTGGGTGGATCAAGGAAACCGGACGGAGCAATTTATACTGTCGGATCTCCTATTGATTAC

GGTGTGATCGTGGATACTAAAGCTTATAGCGGAGGTTATAATCTGCCAATTGGCCAAGCAGATGAAATGCAACGA

TATGTCAAAGAAAATCAAACACGAAACAAACATATCAACCCTAATGAATGGTGGAAAGTCTATCCATCTTCTGTA

ACGGAATTTAAGTTTTTATTTGTGAGTGGTCACTTTAAAGGAAACTACAAAGCTCAGCTTACACGATTAAATCAT

AAGACTAATTGTAATGGAGCTGTTCTTAGTGTAGAAGAGCTTTTAATTGGTGGAGAAATGATTAAAGCCGGCACA

TTAACCTTAGAGGAAGTGAGACGGAAATTTAATAACGGCGAGATAAACTTTTAA

CCR5-224 (+) protein sequence (SEQ ID NO : 12O): MDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGIHGVPAAMAERPFOCRICMRNFSDRSNLSRHIRTHTGEKPFA

CDICGRKFAISSNLNSHTKIHTGSOKPFOCRICMRNFSRSDNLARHIRTHTGEKPFACDICGRKFATSGNLTRHT

KIHLRGSOLVKSELEEKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEFFMKWYGYRGKHLGGSRKPDG

AIYTVGSPIDYGVIVDTKAYSGGYNLPIGOADEMORYVKENOTRNKHINPNEWWKVYPSSVTEFKFLFWSGHFKG

NYKAOLTRLNHKTNCNGAVLSWEELLIGGEMIKAGTLTLEEWRRKFNNGEINF

US 2015/0010526 A1 Jan. 8, 2015 21

- Continued VF2468 (+) protein sequence (SEQ ID NO: 124) : MDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGIHGVPSRPGERPFOCRICMRNFSRODRLDRHTR

THTGEKPFOCRICMRNFSOKEHLAGHLRTHTGEKPFOCRICMRNFSRRDNLNRHLKTHLRGSOLVKSE

LEEKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEFFMKVYGYRGKHLGGSRKPDGAIYTVGS

PIDYGVIVDTKAYSGGYNLPIGOADEMORYWKENOTRNKHINPNEWWKVYPSSVTEFKFLPVSGHFKG

NYKAOLTRLNHKTNCNGAVLSWEELLIGGEMIKAGTLTLEEWRRKFNNGEINF VF2468 (-) DNA sequence (SEQ ID NO: 125) : TAATACGACT CACTATAGGGAGACCCAAGCTGGCTAGCCACCATGGACTACAAAGACCATGACGG

TGATTATAAAGATCATGACATCGATTACAAGGATGACGATGACAAGATGGCCCCCAAGAAGAAGA

GGAAGGTGGGCATTCACGGGGTGCCGTCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGC

ATGCGGAACTTTTCGACCGGCCAGATCCTTGACCGCCATACCCGTACTCATACCGGTGAAAAACCG

TTTCAGTGTCGGATCTGTATGCGAAATTTCTCCGTGGCGCACAGCTTGAAGAGGCATCTACGTACG

CACACCGGCGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTGACCCCAGCAACCT

GCGGCGCCACCTAAAAACCCACCTGAGGGGATCCCAACTAGTCAAAAGTGAACTGGAGGAGAAG

AAATCTGAACTTCGTCATAAATTGAAATATGTGCCTCATGAATATATTGAATTAATTGAAATTGCC

AGAAATTCCACTCAGGATAGAATTCTTGAAATGAAGGTAATGGAATTTTTTATGAAAGTTTATGGA

TATAGAGGTAAACATTTGGGTGGATCAAGGAAACCGGACGGAGCAATTTATACTGTCGGATCTCC

TATTGATTACGGTGTGATCGTGGATACTAAAGCTTATAGCGGAGGTTATAATCTGCCAATTGGCCA

AGCAGATGAAATGGAGCGATATGTCGAAGAAAATCAAACACGAAACAAACATCTCAACCCTAATG

AATGGTGGAAAGTCTATCCATCTTCTGTAACGGAATTTAAGTTTTTATTTGTGAGTGGTCACTTTAA

AGGAAACTACAAAGCTCAGCTTACACGATTAAATCATATCACTAATTGTAATGGAGCTGTTCTTAG

TGTAGAAGAGCTTTTAATTGGTGGAGAAATGATTAAAGCCGGCACATTAACCTTAGAGGAAGTGA

GACGGAAATTTAATAACGGCGAGATAAACTTTTAA VF2468 (-) protein sequence (SEQ ID NO: 126): MDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGIHGVPSRPGERPFOCRICMRNFSTGOILDRHTRT

HTGEKPFOCRICMRNFSWAHSLKRHLRTHTGEKPFOCRICMRNFSDPSNLRRHLKTHLRGSOLVKSELE

EKKSELRHKLKYWPHEYIELIEIARNSTODRILEMKVMEFFMKWYGYRGKHLGGSRKPDGAIYTVGSPI

DYGVIVDTKAYSGGYNLPIGOADEMERYWEENOTRNKHLNPNEWWKWYPSSVTEFKFLFVSGHFKGN

YKAOLTRLNHITNCNGAVLSWEELLIGGEMIKAGTLTLEEWRRKFNNGEINF

0143 Library Construction. mM dCTP, 0.4 mM dGTP, 0.4 mM dTTP, pH 7.5) for 1.5 014.4 Libraries of target sites were incorporated into hours at room temperature. The blunt-ended and phosphory double-stranded DNA by PCR with Taq DNA Polymerase lated DNA was purified with the Qiagen PCRPurification Kit (NEB) on apUC19 starting template with primers “N5-Pvul” according to the manufacturer's protocol, diluted to 10 ng/uI. and “CCR5-224-N4,” “CCR5-224-N5. “CCR5-224-N6. in NEB T4 DNA Ligase Buffer (50 mM Tris-HCl, 10 mM “CCR5-224-N7. “VF2468-N4 “VF2468-N5. “VF2468 MgCl, 10 mM dithiothreitol, 1 mM ATP pH 7.5) and circu N6,” or “VF2468-N7 yielding an approximately 545-bp larized by ligation with 200units of T4DNA ligase (NEB) for product with a PVul restriction site adjacent to the library 15.5 hours at room temperature. Circular monomers were gel sequence, and purified with the Qiagen PCR Purification Kit. purified on 1% TAE-Agarose gels. 70 ng of circular monomer 0145 Library-encoding oligonucleotides were of the form was used as a substrate for rolling-circle amplification at 30° 5' backbone-Pvul site-NNNNNN-partially randomized half C. for 20 hours in a 100 uL reaction using the Illustra Tem site-N-partially randomized half site-N-backbone 3'. The pliPhi 100 Amplification Kit (GE Healthcare). Reactions purified oligonucleotide mixture (approximately 10 ug) was were stopped by incubation at 65° C. for 10 minutes. Target blunted and phosphorylated with a mixture of 50 units of T4 site libraries were quantified with the Quant-iT PicoGreen Polynucleotide Kinase and 15 units of T4 DNA polymerase dsDNA Reagent (Invitrogen). Libraries with N. N. N., and (NEBNext End Repair Enzyme Mix, NEB) in 1xNEBNext N, spacer sequences between partially randomized half-sites End Repair Reaction Buffer (50 mM Tris-HCl, 10 mM were pooled in equimolar concentrations for both CCR5-224 MgCl, 10 mM dithiothreitol, 1 mM ATP, 0.4 mM dATP, 0.4 and VF2468. US 2015/0010526 A1 Jan. 8, 2015 22

0146 Zinc Finger Nuclease Expression and Characteriza 2, barcoded according to enzyme concentration, or 6 pmol of tion. “adapter1/2 for the PVul digest, were added to the reaction 0147 3xFLAG-tagged Zinc finger proteins for CCR5-224 mixture, along with 10 ul 10xNEBT4DNA Ligase Reaction and VF2468 were expressed as fusions to FokI obligate het Buffer (500 mM Tris-HCl, 100 mM MgCl, 100 mM dithio erodimers' in mammalian expression vectors' derived from threitol, 10 mM ATP). Adapters were ligated onto the blunt pMLM290 and pMLM292. DNA and protein sequences are DNA ends with 400 units of T4DNA ligase at room tempera provided elsewhere herein. Complete vector sequences are ture for 17.5 hours and ligated DNA was purified away from available upon request. 2 ug of ZFN-encoding vector was unligated adapters with Illustra Microspin S-400 HR transcribed and translated in vitro using the TNT Quick sephacryl columns (GE Healthcare). DNA with ligated adapt Coupled rabbit reticulocyte system (Promega). Zinc chloride ers were amplified by PCR with 2 units of Phusion Hot Start (Sigma-Aldrich) was added at 500 LM and the transcription/ II DNA Polymerase (NEB) and 10 pmol each of primers translation reaction was performed for 2 hours at 30° C. “PE1’ and “PE2 in 1x Phusion GC Buffer supplemented Glycerol was added to a 50% final concentration. Western with 3% DMSO and 1.7 mM MgCl2. PCR conditions were blots were used to visualize protein using the anti-FLAG M2 98°C. for 3 min, followed by cycles of 98°C. for 15s, 60° C. monoclonal (Sigma-Aldrich). ZFN concentrations for 15s, and 72°C. for 15s, and a final 5 min extension at 72° were determined by Western blot and comparison with a C. The PCR was run for enough cycles (typically 20-30) to standard curve of N-terminal FLAG-tagged bacterial alkaline See a visible product on gel. The reactions were pooled in phosphatase (Sigma-Aldrich). equimolar amounts and purified with the Qiagen PCR Puri 0148 Test substrates for CCR5-224 and VF2468 were fication Kit. The purified DNA was gel purified on a 1% constructed by cloning into the HindIII/Xbal sites of puC19. TAE-agarose gel, and submitted to the Harvard Medical PCR with primers “test fivd” and “test rev” and Taq DNA School Biopolymers Facility for Illumina 36-base paired-end polymerase yielded a linear 1 kb DNA that could be cleaved sequencing. by the appropriate ZFN into two fragments of sizes ~300 bp 0152 Data Analysis. and -700 bp. Activity profiles for the zinc finger nucleases 0153 Illumina sequencing reads were analyzed using pro were obtained by modifying the in vitro cleavage protocols grams written in C++. Algorithms are described elsewhere used by Miller et al.' and Cradicket al.. 1 ug of linear 1 kb herein (e.g., Protocols 1-9), and the source code is available DNA was digested with varying amounts of ZFN in 1xNEB on request. Sequences containing the same barcode on both uffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 paired sequences and no positions with a quality score of B mM magnesium acetate, 1 mM dithiothreitol, pH 7.9) for 4 were binned by barcode. Half-site sequence, overhang and hours at 37°C. 100 ug of RNase A (Qiagen) was added to the spacer sequences, and adjacent randomized positions were reaction for 10 minutes at room temperature to remove RNA determined by positional relationship to constant sequences from the in vitro transcription/translation mixture that could and searching for sequences similar to the designed CCR5 interfere with purification and gel analysis. Reactions were 224 and VF2468 recognition sequences. These sequences purified with the Qiagen PCR Purification Kit and analyzed were Subjected to a computational selection step for comple on 1%. TAE-agarose gels. mentary, filled-in overhang ends of at least 4 base pairs, 0149. In Vitro Selection. corresponding to rolling-circle concatemers that had been 0150 ZFNs of varying concentrations, an amount of TNT cleaved at two adjacent and identical sites. Specificity scores reaction mixture without any protein-encoding DNA tem were calculated with the formulae: positive specificity score= plate equivalent to the greatest amount of ZFN used (“ly (frequency of base pair at positionpost-selection-frequency sate'), or 50 units Pvul (NEB) were incubated with 1 lug of of base pair at position pre-selection)/(1-frequency of base rolling-circle amplified library for 4 hours at 37° C. in pair at position pre-selection) and negative specificity 1xNEBuffer 4 (50 mM potassium acetate, 20 mM Tris-ac score (frequency of base pair at positionpost-selection etate, 10 mM magnesium acetate, 1 mM dithiothreitol, pH frequency of base pair at position pre-selection)/(frequency 7.9). 100 ug of RNase A (Qiagen) was added to the reaction of base pair at position pre-selection). for 10 minutes at room temperature to remove RNA from the 0154 Positive specificity scores reflect base pairs that in vitro transcription/translation mixture that could interfere appear with greater frequency in the post-selection library with purification and gel analysis. Reactions were purified than in the starting library at a given position; negative speci with the Qiagen PCR Purification Kit. /10 of the reaction ficity scores reflect base pairs that are less frequent in the mixture was visualized by gel electrophoresis on a 1% TAE post-selection library than in the starting library at a given agarose gel and staining with SYBR Gold Nucleic Acid Gel position. A score of +1 indicates an absolute preference, a Stain (Invitrogen). score of -1 indicates an absolute intolerance, and a score of 0 0151. The purified DNA was blunted with 5 units DNA indicates no preference. Polymerase I, Large (Klenow) Fragment (NEB) in 1xNEB uffer 2 (50 mMNaCl, 10 mM Tris-HCl, 10 mMMgCl, 1 mM 0155 Assay of Genome Modification at Cleavage Sites in dithiothreitol, pH 7.9) with 500 uM dNTP mix (Bio-Rad) for Human Cells. 30 minutes at room temperature. The reaction mixture was 0156 CCR5-224 ZFNs were cloned into a CMV-driven purified with the Qiagen PCR Purification Kit and incubated mammalian expression vector in which both ZFN monomers with 5 units of Klenow Fragment (3' exo) (NEB) for 30 were translated from the same mRNA transcript in stoichio minutes at 37° C. in 1xNEBuffer 2 (50 mM. NaCl, 10 mM metric quantities using a self-cleaving T2A peptide sequence Tris-HCl, 10 mM MgCl, 1 mM dithiothreitol, pH 7.9) with similar to a previously described vector'. This vector also 240 uM dATP (Promega) in a 50 LL final volume. 10 mM expresses enhanced green fluorescent protein (eGFP) from a Tris-HCl, pH 8.5 was added to a volume of 90 uL and the PGK promoter downstream of the ZFN expression cassette. reaction was incubated for 20 minutes at 75° C. to inactivate An empty vector expressing only eGFP was used as a nega the enzyme before cooling to 12°C. 300 fmol of “adapter1/ tive control. US 2015/0010526 A1 Jan. 8, 2015

0157 To deliver ZFN expression plasmids into cells, 15ug (0162 Plots. of either active CCR5-224 ZFN DNA or empty vector DNA 0163 All heat maps were generated in the R software were used to Nucleofect 2x10 K562 cells in duplicate reac package with the following command: image(Variable. Zlim=c(-1.1), col-colorRampFalette(c(“red”, “white', tions following the manufacturer's instructions for Cell Line “blue').space="Lab')(2500) Nucleofector Kit V (Lonza). GFP-positive cells were isolated 0164 Protocol 1: Quality Score Filtering and Sequence by FACS 24 hours post-transfection, expanded, and harvested Binning. five days post-transfection with the QIAamp DNA Blood 1) Search each position of both pairs of sequencing read for Mini Kit (Qiagen). quality score, reject if any position has quality Score-B 0158 PCR for 37 potential CCR5-224 substrates and 97 2) output to separate files all sequence reads where the first potential VF2468 substrates was performed with Phusion sequence in the pair start with barcodes (AAT, ATA'. “TAA”, “CAC’.“TCG”) and count the number of sequences DNA Polymerase (NEB) and primers “IZFN # fwd” and corresponding to each barcode “IZFN # rev' (Table 9) in 1x Phusion HF Buffer supple (0165 Protocol 2: Filtering by ZFN (AAT”, “ATA’.“TAA mented with 3% DMSO. Primers were designed using “CAC) Primer3. The amplified DNA was purified with the Qiagen For each binned file, PCR Purification Kit, eluted with 10 mM Tris-HCl, pH 8.5, 1) accept only sequence pairs where both sequences in the and quantified by 1K Chip on a LabChip GX instrument pair start with the same barcode (Caliper Life Sciences) and combined into separate equimo 2) identify orientation of sequence read by searching for lar pools for the catalytically active and empty vector control constant regions samples. PCR products were not obtained for 3 CCR5 sites 0166 orientation 1 is identified by the constant region and 7 VF2468 sites, which excluded these samples from “CGATCGTTGG” (SEQID NO.127) further analysis. Multiplexed Illumina library preparation 0.167 orientation 2 is identified by the constant region was performed according to the manufacturer's specifica “CAGTGGAACG” (SEQ ID NO:128) tions, except that AMPure XP beads (Agencourt) were used 3) search sequences from position 4 (after the barcode) up to for purification following adapter ligation and PCR enrich the first position in the constant region for the Subsequence ment steps. Illumina indices 11 (“GGCTAC) and 12 ("CT that has the fewest mutations compared to the CCR5-224 and TGTA') were used for ZFN-treated libraries while indices 4 VF2468 half site that corresponds to the identified constant ("TGACCA) and 6 (“GCCAAT) were used for the empty region vector controls. Library concentrations were quantified by (0168 search sequences with orientation 1 for "GAT KAPA Library Quantification Kit for Illumina Genome Ana GAGGATGAC” (SEQID NO:129) (CCR5-224(+)) and lyzer Platform (Kapa Biosystems). Equal amounts of the “GACGCTGCT (SEQID NO: 130) (VF2468(-)) barcoded libraries derived from active- and empty vector 0.169 search sequences with orientation 2 for 'AAACT GCAAAAG” (SEQ ID NO:131) (CCR5-224(-)) and treated cells were diluted to 10 nM and subjected to single “GAGTGAGGA (SEQID NO:132) (VF2468(+)) read sequencing on an Illumina HiSeq 2000 at the Harvard 4) bin sequences as CCR5-224 or VF2468 by testing for the University FAS Center for Systems Biology Core facility. fewest mutations across both half-sites Sequences were analyzed using Protocol 9 for active ZFN 5) the positions of the half-sites and constant sequences are samples and empty vector controls. used to determine the overhang/spacer sequences, the flank 0159) Statistical Analysis. ing nucleotide sequences, and the tag sequences 0170 the subsequence between the half-site of orienta 0160. In FIG. 8, P-values were calculated for a one-sided tion 1 and the constant region is the tag sequence test of the difference in the means of the number of target site 0171 if there is no tag sequence, the tag sequence is mutations in all possible pairwise comparisons among pre denoted by X selection, 0.5 nM post-selection, 1 nM post-selection, 2 nM 0172 the overhang sequence is determined by search post-selection, and 4 nM post-selection libraries for CCR5 ing for the longest reverse-complementary Subse 224 or VF2468. The t-statistic was calculated as t(x bar quences between the Subsequences of orientation 1 and X bar)/sqrt(1xp hat X(1-p hat)/n+1Xp hat-X(1-p orientation 2 that start after the barcodes hat)/n), where X bar and X bar are the means of the 0173 the spacer sequence is determined by concatenat distributions being compared, 1 is the target site length (24 for ing the reverse complement of the Subsequence in ori CCR5-224; 18 for VF2468), p hat and p hat are the calcu entation 1 that is between the overhang and the half-site lated probabilities of mutation (X bar/1) for each library, and (if any), the overhang, and the Subsequence in orienta in and n are the total number of sequences analyzed for each tion 2 that is between the overhang and the halfsite selection (Table 2). All pre- and post-selection libraries were 0.174 if there is overlap between the overhang and half assumed to be binomially distributed. site, only the non-overlapping Subsequence present in 0161. In Tables 4 and 7. P-values were calculated for a the overhang is counted as part of the spacer one-sided test of the difference in the proportions of 6) to remove duplicate sequences, sort each sequence pair sequences with insertions or deletions from the active ZFN into a tree sample and the empty vector control samples. The t-statistic 0.175 each level of the tree corresponds to a position in was calculated as t (p hat-p hat)/sqrt((p hat x(1-p the sequence hat)/n)+(p hat-X(1-p hat)/n)), where p hat and n are 0176 each node at each level corresponds to a particular the proportion and total number, respectively, of sequences base (A, C, G, T, or X-not(A, C, G, or T)) and points to from the active sample and p hat and n are the proportion the base of the next position (A.C.G.TX) and total number, respectively, of sequences from the empty 0.177 the sequence pairs are encoded in the nodes and a vector control sample. Subsequence consisting of the concatenation of the US 2015/0010526 A1 Jan. 8, 2015 24

spacer sequence, flanking nucleotide sequence, and tag 0186 Protocol 9: NHEJ Search sequence is sorted in the tree 1) identify the site by searching for exact flanking sequences 0.178 at the terminal nodes of the tree, each newly 2) count the number of inserted ordeleted bases by comparing entered sequence is compared to all other sequences in the length of the calculated site to the length of the expected the node to avoid duplication site and by searching for similarity to the unmodified target 7) the contents of the tree are recursively outputted into sepa site (sequences with 5 or fewer mutations compared to the rate files based on barcode and ZFN intended site were counted as unmodified) 0179 Protocol 3: Library Filtering (“TCG) 3) inspect all sites other than CCR5, CCR2, and VEGF-A 1) accept only sequence pairs where both sequences in the promoter by hand to identify true insertions or deletions pair start with the same barcode 2) analyze the sequence pair that does not contain the REFERENCES sequence “TCGTTGGGAACCGGAGCTGAATGAAGC CATACCAAACGAC” (SEQIDNO:133) (the other pair con 0187. 1. Kim, Y. G., Cha, J. & Chandrasegaran, S. Hybrid tains the library sequence) restriction enzymes: Zinc finger fusions to FokI cleavage 3) search sequences for ZFN half-sites and bin by the ZFN domain. Proc Natl Acad Sci USA 93, 1156-60 (1996). site that has fewer mutations 0188 2. Vanamee, E. S., Santagata, S. & Aggarwal, A. K. 0180 search for “GTCATCCTCATC” (SEQ ID FokI requires two specific DNA sites for cleavage. J Mol NO:134) and “AAACTGCAAAAG” (SEQID NO:135) Biol 309, 69-78 (2001). (CCR5-224) and “AGCAGCGTC” (SEQ ID NO:136) 0189 3. Hockemeyer, D. et al. Efficient targeting of and “GAGTGAGGA” (SEQID NO:137) (VF2468) expressed and silent genes in human ESCs and iPSCs using 4) identify the spacer, flanking nucleotide, and nucleotide tag zinc-finger nucleases. Nat Biotechnol 27, 851-7 (2009). sequences based on the locations of the half-sites 0190. 4. Maeder, M. L. et al. Rapid “open-source engi 5) use the tree algorithm in step 6 under Filtering by ZFN to neering of customized zinc-finger nucleases for highly effi eliminate duplicate sequences cient gene modification. Mol Cell 31, 294-301 (2008). 0181 Protocol 4: Sequence Profiles 0191 5. Zou, J. et al. Gene targeting of a disease-related 1) analyze only sequences that contain no N' positions and gene in human induced pluripotent stem and embryonic have spacer lengths between 4 and 7 stem cells. Cell StemCell 5, 97-110 (2009). 2) tabulate the total number of mutations, the spacer length, (0192 6. Perez, E. E. et al. Establishment of HIV-1 resis the overhang length, the nucleotide frequencies for the (+) tance in CD4+ T cells by genome editing using zinc-finger and (-) half-sites, the nucleotide frequencies for spacers that nucleases. Nat Biotechnol 26, 808-16 (2008). are 4-bp, 5-bp, 6-bp, and 7-bp long, and the nucleotide fre 0193 7. Urnov, F. D. et al. Highly efficient endogenous quencies for the flanking nucleotide and the tag sequence human gene correction using designed zinc-finger 3) repeat steps 1 and 2 for library sequences 64) calculate nucleases. Nature 435, 646-51 (2005). specificity Scores at each position using positive specificity 0194 8. Santiago, Y. et al. Targeted gene knockout in score (frequency of base pair at position post-selection mammalian cells by using engineered zinc-finger frequency of base pair at position pre-selection)/(1-fre nucleases. Proc Natl AcadSci USA 105,5809-14 (2008). quency of base pair at position pre-selection) negative speci 0.195 9. Cui, X. et al. Targeted integration in rat and mouse ficity score (frequency of base pair at position post embryos with zinc-finger nucleases. Nat Biotechnol 29, selection-frequency of base pair at position pre-selection)/ 64-7 (2011). (frequency of base pair at position pre-selection) 0196) 10. Cornu, T.I. et al. DNA-binding specificity is a 0182 Protocol 5: Genomic Matches major determinant of the activity and toxicity of zinc-finger 1) the human genome sequence was searched with 24 and 25 nucleases. Mol Ther 16, 352-8 (2008). base windows (CCR5-224) and 18 and 19 base windows (0197) 11. Segal, D.J., Dreier, B., Beerli, R. R. & Barbas, (VF2468) for all sites within nine mutations (CCR5-224) or C. F., 3rd. Toward controlling gene expression at will: six mutations (VF2468) of the canonical target site with all selection and design of Zinc finger domains recognizing spacer sequences of five or six bases being accepted each of the 5'-GNN-3' DNA target sequences. Proc Natl 2) each post-selection sequence was compared to the set of AcadSci USA96, 2758-63 (1999). genomic sequences within nine and six mutations of CCR5 (0198 12. Bulyk, M. L., Huang, X., Choo, Y. & Church, G. 224 and VF2468, respectively M. Exploring the DNA-binding specificities of zinc fingers 0183 Protocol 6: Enrichment Factors for Sequences with with DNA microarrays. Proc Natl AcadSci USA 98,7158 0, 1, 2, or 3 Mutations 63 (2001). 1) for each sequence, divide the frequency of occurrence in 0199 13. Meng, X. Thibodeau-Beganny, S., Jiang, T., the post-selection library by the frequency of occurrence in Joung, J. K. & Wolfe, S. A. Profiling the DNA-binding the pre-selection library specificities of engineered Cys2His2 Zinc finger domains 0184 Protocol 7: Filtered Sequence Profiles using a rapid cell-based method. Nucleic Acids Res 35. e81 1) use the algorithm described above in Sequence profiles, (2007). except in addition, only analyze sequences with off-target 0200 14. Wolfe, S.A., Greisman, H. A., Ramm, E. I. & bases at given positions for both pre- and post-selection data Pabo. C. O. Analysis of Zinc fingers optimized via phage 0185. Protocol 8: Compensation Difference Map display: evaluating the utility of a recognition code. J Mol 1) use Filtered sequence profiles algorithm for mutation at Biol 285, 1917-34 (1999). every position in both half-sites 0201 15. Segal, D. J. et al. Evaluation of a modular strat 2) calculate A(specificity score)=filtered specificity score egy for the construction of novel polydactyl Zinc finger non-filtered specificity score DNA-binding proteins. Biochemistry 42, 2137-48 (2003). US 2015/0010526 A1 Jan. 8, 2015

0202 16. Zykovich, A., Korf, I. & Segal, D.J. Bind-n-Seq: 0218 32. Doyon, Y. et al. Heritable targeted gene disrup high-throughput analysis of in vitro protein-DNA interac tion in Zebrafish using designed zinc-finger nucleases. Nat tions using massively parallel sequencing. Nucleic Acids Biotechnol 26, 702–8 (2008). Res 37, e151 (2009). 0219. 33. Rozen, S. & Skaletsky, H. Primer3 on the WWW 0203) 17. Yanover, C. & Bradley, P. Extensive protein and for general users and for biologist programmers. Methods DNA backbone sampling improves structure-based speci Mol Biol 132,365-86 (2000). ficity prediction for C2H2 zinc fingers. Nucleic Acids Res 0220 All publications, patents and sequence database (2011). entries mentioned herein, including those items listed above, 0204 18. Beumer, K. Bhattacharyya, G., Bibikova, M., are hereby incorporated by reference in their entirety as if Trautman, J. K. & Carroll, D. Efficient gene targeting in each individual publication or patent was specifically and Drosophila with zinc-finger nucleases. Genetics 172, individually indicated to be incorporated by reference. In case 2391-403 (2006). of conflict, the present application, including any definitions 0205. 19. Bibikova, M., Golic, M., Golic, K.G. & Carroll, herein, will control. D. Targeted chromosomal cleavage and mutagenesis in Drosophila using zinc-finger nucleases. Genetics 161, Example 2 1169-75 (2002). 0206. 20. Gupta, A., Meng, X. Zhu, L.J., Lawson, N. D. TALENS & Wolfe, S. A. Zinc finger protein-dependent and -inde 0221) The site preferences of different TALENs were pro pendent contributions to the in vivo off-target activity of filed in analogy to the work done for ZFN profiling described Zinc finger nucleases. Nucleic Acids Res 39, 381-92 above. The experiments and results are described in FIGS. (2011). 19-49. Selection 1 included a comparison between TALENs 0207 21. Chen, J. et al. Molecular cloning and character having a +28 vs. a +63 linker. Selection 2 included a com ization of a novel human BTB domain-containing gene, parison of TALENs of different TAL domain length. BTBD10, which is down-regulated in glioma. Gene 340, 0222 TAL DNA binding domains are the basis of a trans 61-9 (2004). formative technology to specifically modulate target DNA 0208 22. Wang, X. et al. Glucose metabolism-related pro both in vitro and in cells. The designable TAL DNA binding tein 1 (GMRP1) regulates pancreatic beta cell proliferation domains have advantages in targetable sequence space and and apoptosis via activation of Akt signalling pathway in ease of construction compared to other DNA binding rats and mice. Diabetologia 54,852-63 (2011). domains, for example, zinc fingers. These TAL DNA binding 0209 23. Nawa, M., Kanekura, K., Hashimoto.Y., Aiso, S. domains are comprised of repeats of a 34 amino acid domain & Matsuoka, M. A novel Akt/PKB-interacting protein pro with a highly variable di-amino acid (RVD) coding for rec motes cell adhesion and inhibits familial amyotrophic lat ognition of a single base pair in the target DNA sequence eral sclerosis-linked mutant SOD1-induced neuronal death (FIG. 20). Based on the robustness of this RVD code and the via inhibition of PP2A-mediated dephosphorylation of crystal structure of a TAL bound to its DNA target, it is likely Akt/PKB. Cell Signal 20, 493-505 (2008). that binding of a single repeat to a base pair is relatively 0210 24. Petek, L. M., Russell, D. W. & Miller, D. G. independent of adjacent repeat binding. The TAL DNA bind Frequent endonuclease cleavage at off-target locations in ing domain (an array of repeats) can be linked to the monomer vivo. Mol Ther 18,983-6 (2010). of a heterodimeric nuclease domain to form a TAL nuclease. 0211 25. Hurt, J. A., Thibodeau, S.A., Hirsh, A. S., Pabo. Thus, two distinct TAL nucleases can bind adjacent target C.O. & Joung, J. K. Highly specific Zinc finger proteins half sites to cleave a specific sequence resulting in genome obtained by directed domain shuffling and cell-based modifications in vivo (FIGS. 19 and 20). While a number of selection. Proc Natl AcadSci USA 100, 12271-6 (2003). studies have investigated the specificity of TAL DNA bind 0212 26. Ramirez, C. L. et al. Unexpected failure rates for ing, to our knowledge no studies have profiled the specificity modular assembly of engineered zinc fingers. Nat Methods of TAL nucleases on a large scale. We applied the concept of 5,374-5 (2008). high-throughput, in vitro selection for nuclease specificity 0213 27. Shimizu, Y. et al. Adding Fingers to an Engi outlined for ZFNs in Example 1 to TAL nucleases to both neered Zinc Finger Nuclease Can Reduce Activity. Bio confirm the modular, independent binding of TAL repeats chemistry 50, 5033-41 (2011). expected from their easy design-ability and also identify 0214) 28. Bibikova, M. et al. Stimulation of homologous genomic off-target sequences cut by therapeutically relevant recombination through targeted cleavage by chimeric TAL nucleases. nucleases. Mol Cell Biol 21, 289-97 (2001). 0223) The selection scheme for profiling the specificity of 0215. 29. Handel, E. M., Alwin, S. & Cathomen, T. TAL nucleases via in vitro library screening was in analogy to Expanding or restricting the target site repertoire of zinc the selection scheme described for ZFNs in Example. finger nucleases: the inter-domain linker as a major deter Detailed protocols are provided below: minant of target site selectivity. Mol Ther 17, 104-11 0224 Preparation of Library of Partly Randomized Target (2009). Sites 0216. 30. Miller, J. C. et al. An improved zinc-finger 0225. 2 ul of 10 pmol TALNCCR5 Library Oligo (separate nuclease architecture for highly specific genome editing. reactions for each oligo) Nat Biotechnol 25, 778-85 (2007). 0226) 2 ul 10x CircLigase II 10x Reaction Buffer 0217 31. Cradick, T.J., Keck, K., Bradshaw, S., Jamieson, 0227 1 ul 50 mM MnCl2 A. C. & McCaffrey, A. P. Zinc-finger nucleases as a novel 0228. 1 ul CircIligase II ssDNA Ligase (100 U) Epicen therapeutic strategy for targeting hepatitis B virus . tre Mol Ther 18, 947-54 (2010). 0229 Xul water to 20 uL total volume US 2015/0010526 A1 Jan. 8, 2015 26

Incubate 16 hrs at 60° C. Incubate 10 min at 85°C. to inac 0259 3 u. 10 mM dNTP tivate. 0260 1.5 uL Phusion Hot Start II Add 2.5 ul of each Circligase II reaction (without purifica 0261 106.5 uL of water tion) 98°C. for 3 min, do 6 cycles of 98° C. for 15s, 60° C. for 15 Add 25ul TempliPhiTM GE Healthcare 100 sample buffer. s, 72° C. for 1 min. Purify with Qiagen PCR Purification Kit. Incubate 3 min at 95° C. Slow cool to 4°C. 0262 Preparation of Pre-Selection Library Add 25ul TempliPhiTM reaction buffer/1 ul enzyme mix. 0263. 25 uL of 10xNEB Buffer 4 Incubate 16 hrs at 30° C. Heat inactive 10 min at 55° C. 0264. 10 uL of 2 uM TempliPhi Library DNA Quantify amount of dsDNA using Quant-iTTM PicoGreen(R) dsDNA Invitrogen Combine equal moles of TempliPhiTM 0265 165 uL water reactions to final 2 uM with respect to number of cut sites. 0266 5 uL of Appropriate Restriction Enzyme New 0230. TALN Expression England Biolabs 0231. 16 ul TnT.R. Quick Coupled Promega 0267 210 uI of water 0232 0.4 u1 1 mM methionine Incubate 1 hrs at 37° C. Purify with Qiagen PCR Purification 0233 2 ul of 0.8 ug TALN vector expression plasmid or Kit. water for empty lysate 0268 50 u1 eluted DNA 0234 1.6 uL of water 0269 5.9 u1 T4 DNA Ligase Buffer (NEB) Incubate at 30 for 1.5 hours and then store at 4°C. overnight. 0270 2 ul (20 pmol) heat/cooled adapter (pool of 4 adapter Quantify amount of TALN in lysate via Western Blot. sequences) 0235 TALN Digestion (0271 1 u1 T4 DNA ligase (NEB, 400 units) 0236 25uL of 10xNEB Buffer 3 New England Biolabs Incubate at RT for 20 hrs. Purify with Qiagen PCR Purifica 0237) 10 uL of 2 uM TempliPh Library DNA tion Kit. 0238 165 uL water 0272 6 ul of Restriction Enzyme Digested DNA (5-26 Add left TALN lysate to 20 nM total left TALN 12) Add right TALN lysate to 20 nM total right TALN 0273 30 u, of 5x Buffer HF Add empty lysate to total of 50 uL lysate 0274 1.5 u 100 uM Illumina rev Primer Incubate 2 hrs at 37°C. Add 5ul (50 ug) RNaseA (Qiagen). 0275 1.5 u 100 uMTALNLibPCR Primer Incubate 10 min at RT. Purify with Qiagen PCR Purification 3 u. 10 mM dNTP Kit. Elute in 50 uL of 1 mM Tris, pH 8.0. 0276 0239 Adapter Ligation, PCR and Gel Purification of (0277 1.5 uL Phusion Hot Start II TALN Digestion 0278 106.5 uL of water 0240 50 ul digested DNA 98°C. for 3 min, 12 cycles of 98°C. for 15s, 60° C. for 15s, 0241. 3 ul dNTP mix 72° C. for 1 min. Purify with Qiagen PCR Purification Kit 0242 6 ul NEB2 0279 High-Throughput Sequencing 0243 1 ul Klenow New England Biolabs Incubate 30 min at RT. Purify with Qiagen PCR Purification Quantify via RT-qPCR Kit. 12.5 uL of IQSYBR Green Supermix 0244 50 ul eluted DNA 0280 0245 5.9 ul T4 DNA Ligase Buffer (NEB) (0281 1 uL of 10 uM Illumina rev 0246 2 ul (20 pmol) heat/cooled adapter (different adapter 0282 1 uL of 10 uM Illumina fwd for each selection) 0283 9.5 uL of water 0247 1 u1 T4 DNA ligase (NEB, 400 units) (0284. 1 uL of DNA template (both Pre-Selection Library Incubate at RT for 20 hrs. Purify with Qiagen PCR Purifica and TALN Digestion) tion Kit. 95° C. for 5 min, do 30 cycles of 95°C. for 30s, 65° C. for 30 0248 6 uL of TALN digested DNA s, 72° C. for 40s. 0249 30 u, of 5x Buffer HF Dilute DNA to 2 nM (compared to sequencing standard) 0250 1.5 u 100 uM Illumina fivd Primer (0285) 5 uL of TALN Digestion 2 nM DNA 0251 1.5 uL 100 uMPE TALN rev 1 Primer (0286 2.5 uL of Pre-Selection Library 2 nM DNA 0252) 3 u. 10 mM dNTP (0287 10 uL of 0.1N NaOH 0253) 1.5 uL Phusion Hot Start II Incubate at room temp for 5 min 0254 106.5 uL of water 98°C. for 3 min, do 15 cycles of 98°C. for 15s, 60° C. for 15 Sequence via Illumina Mi-Seq s, 72° C. for 1 min. Purify with Qiagen PCR Purification Kit Gel Purify on 2% Agarose gel loading 1 ug of eluted DNA in 0288 Computational Filtering 40 uL of 10% glycerol. Run on gel at 135V for 35 min. Gel For TALN Digested sequences, find two appropriately spaced purify bands of the length corresponding to a cut half site+full constant oligo sequences half site+adapter with filter paper. Remove filter paper and For Pre-selection Library sequences, find appropriately collect supernatant. Purify with Qiagen PCR Purification Kit. spaced constant oligo sequence and library adapter sequence 0255 6 uL of TALN digested DNA (5-26-12) Parse sequence into cut overhang, left half site, spacer, right 0256 30 uIl of 5x Buffer HF half site 0257 1.5 uL 100 uM Illumina fivd Primer Remove sequences with poor Illumina base scores in half 0258 1.5 uL 100 uM PE TALN rev2 Primer sites (

US 2015/0010526 A1 Jan. 8, 2015 29

- Continued

Primer sequences Primer Sequence

PE TALCCR5Blib GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCGTAT (SEQ ID adapterrev4 NO: 186)

TALCCR5BioPCR AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG CTCTTCCGATCTNNNNCCTCGGGACTCCACGCT (SEQ ID NO: 187) IlluminaEwd AATGATACGGCGACCAC (SEQ ID NO: 188) IlluminaRev CAAGCAGAAGACGGCATACGA (SEQ ID NO: 189)

Conclusions tion includes embodiments in which exactly one member of 0289. The relatively regular (log relationship) trend the group is present in, employed in, or otherwise relevant to between number of half sites mutations and enrichment is a given product or process. The invention also includes consistent with a single TAL repeat binding a base pair inde embodiments in which more than one, or all of the group pendent of other repeat binding. A single mutation in the members are presentin, employed in, or otherwise relevant to cleavage site does not significantly alter the distribution of a given product or process. other mutations in the compensation difference analysis Sug 0293. Furthermore, it is to be understood that the invention gesting that the TAL repeat domains bind independently. The encompasses all variations, combinations, and permutations +28 linker is more specific than the +63 linker TALN con in which one or more limitations, elements, clauses, descrip structs. While TALNs recognizing larger target sites are less tive terms, etc., from one or more of the claims or from specific in that they can tolerate more mutations, the abun relevant portions of the description is introduced into another dance of the mutant larger sequences is less than the increase claim. For example, any claim that is dependent on another in enrichment, thus the in vitro selection data and abundance claim can be modified to include one or more limitations of off-target sites indicates off-target cleavage to be signifi found in any other claim that is dependent on the same base cantly less likely in longer TALN pairs. Combining the regu claim. Furthermore, where the claims recite a composition, it lar decrease of cleavage efficiency (enrichment) as total target is to be understood that methods of using the composition for site mutations increase and the enrichment at each position it any of the purposes disclosed herein are included, and meth is possible to predict the off-target site cleavage of any ods of making the composition according to any of the meth ods of making disclosed herein or other methods known in the sequence. For the most part, in the TALN selection the enrich art are included, unless otherwise indicated or unless it would ment was dependent on the total mutations in both half sites be evident to one of ordinary skill in the art that a contradic and not on the distribution of mutations between half sites as tion or inconsistency would arise. was observed for zinc finger nucleases (ZFN). This observa 0294. Where elements are presented as lists, e.g., in tion combined with the context dependent binding of ZFNs Markush group format, it is to be understood that each sub indicated that TALENs may readily be engineered to a speci group of the elements is also disclosed, and any element(s) ficity as high or higher than their ZFN equivalents. can be removed from the group. It is also noted that the term 0290 All publications, patents and sequence database “comprising is intended to be open and permits the inclusion entries mentioned herein, including those items listed above, of additional elements or steps. It should be understood that, are hereby incorporated by reference in their entirety as if in general, where the invention, or aspects of the invention, each individual publication or patent was specifically and is/are referred to as comprising particular elements, features, individually indicated to be incorporated by reference. In case steps, etc., certain embodiments of the invention or aspects of of conflict, the present application, including any definitions the invention consist, or consistessentially of Such elements, herein, will control. features, steps, etc. For purposes of simplicity those embodi ments have not been specifically set forth in haec verba EQUIVALENTS AND SCOPE herein. Thus for each embodiment of the invention that com 0291 Those skilled in the art will recognize, or be able to prises one or more elements, features, steps, etc., the inven ascertain using no more than routine experimentation, many tion also provides embodiments that consist or consist essen equivalents to the specific embodiments of the invention tially of those elements, features, steps, etc. described herein. The scope of the present invention is not 0295. Where ranges are given, endpoints are included. intended to be limited to the above description, but rather is as Furthermore, it is to be understood that unless otherwise set forth in the appended claims. indicated or otherwise evident from the context and/or the 0292. In the claims articles such as “a” “an and “the understanding of one of ordinary skill in the art, values that may mean one or more than one unless indicated to the are expressed as ranges can assume any specific value within contrary or otherwise evident from the context. Claims or the stated ranges in different embodiments of the invention, to descriptions that include “or” between one or more members the tenth of the unit of the lower limit of the range, unless the of a group are considered satisfied if one, more than one, or all context clearly dictates otherwise. It is also to be understood of the group members are present in, employed in, or other that unless otherwise indicated or otherwise evident from the wise relevant to a given product or process unless indicated to context and/or the understanding of one of ordinary skill in the contrary or otherwise evident from the context. The inven the art, values expressed as ranges can assume any Subrange US 2015/0010526 A1 Jan. 8, 2015 30 within the given range, wherein the endpoints of the Subrange sitions and/or methods of the invention, can be excluded from are expressed to the same degree of accuracy as the tenth of any one or more claims. For purposes of brevity, all of the the unit of the lower limit of the range. embodiments in which one or more elements, features, pur 0296. In addition, it is to be understood that any particular poses, or aspects is excluded are not set forth explicitly embodiment of the present invention may be explicitly herein. excluded from any one or more of the claims. Where ranges are given, any value within the range may explicitly be Tables excluded from any one or more of the claims. Any embodi ment, element, feature, application, or aspect of the compo 0297

(+) half-site (-) half-site in vitro selection K562 mutations (SEQ ID NOS : (SEQ ID stringency (nM) modification T (+) (-) gene 190-226) spacer NOs: 227-263) 4 2 frequency O O o CCR5 (coding) GTCATCCTCATC CTGAT AAACTGCAAAAG X X 1. 23

2 1. 1 CCR2 (coding) GTCgTCCTCATC TTAAT AAACTGCAAAAa X X 1. 1O

3 2 1 BTBD10 GTtt TCCT CATC AAAGC AAACTGCAAAAt X X 1. 1 400 (promoter)

4. O 4. GTCATCCTCATC AGAGA AAACTGgctAAt X X

4. 3 1. SLC4A8 taaaTCCTCATC TCTATAAAAaTGCAAAAG X X

3 2 1 Z83955 RNA GTCATCC caATC GAAGAAAAACTGaAAAAG X

3 1. 2 DGKK cTCATCCTCATC CATGC AcAaTGCAAAAG X

3 1. 2 GALNT13 GTCATCCTCAgc ATGGG AAACaGCAgAAG X

3 1. 2 GTCATCitTCATC AAAAG gAACTGCAAAAC X 1. 2,800

4. O 4. GTCATCCTCATC CAATA AAAgaaCAAAgG X

4. 1. 3 TACR3 GTCATCitTCATC AGCAT AAACTGtAAAgt X 1. 3OO

4. 1. 3 PIWIL2 GTCATCCTCATa CATAA AAACTGCct tAG X

4. 1. 3 aTCATCCTCATC CATCC AAtgTt CAAAAG X

4. 3 1. GTCCTg CTCAgC AAAAG AAACTGaAAAAG X 1. 4 OOO

4. 3 1 KCNB2 aTgtTCCTCATC TCCCG AAACTGCAAAtG X 1. 1 400

4. 3 1. GTctTCCTgATg CTACC AAACTGgAAAAG X 1. 5, 3 OO

4. 3 1. aa. CATCCaCATC ATGAA AAACTGCAAAAa X

6 3 3 aTCitTCCTCATt ACAGG AAAaTGtAAtAG X

6 4. 2 CUBN GgctTCCT.gAcc CACGG AAACTG tAAAtG X

6 5 1. NID1 GTttTg CaCATt TCAAT tAACTGCAAAAG X

3 2 1. GTCAaCCTCAaC ACCTAC AgACTGCAAAAG X 1. 1, 7 OO

4. 1. 3 WWOX GTCATCCTCCTC CAACTC CAAtTGCtAAAG X

4. 2 2 AMBRA1 GTct TCCTCTC TGCACA totACTGCAAAAG X

4. 2 2 GTgATaCTCATC ATCAGC AAt CTGCAtAAG X

4. 2 2 WBSCR17 GTtATCCTCAgc AAACTA AAACTGgAAcAG X 1. 86 O

4. 2 2 ITSN cTCATgcTCATC ATTTGT taaCTGCAAAAt X

4 4 O GcCAgtCTCAgC ATGGTG AAACTGCAAAAG X

4. 4. O cTCATtcTgtTC ATGAAAAAACTGCAAAAG X

5 3 2 Gaag TCCTCATC CCGAAGAAACTGaAAgAG X

5 3 2 ZNF462 GTct TCCT.Ct. Tt CACATAAAACcGCAAAtG X US 2015/0010526 A1 Jan. 8, 2015 31

- Continued

(+) half-site (-) half-site in vitro selection K562 mutations (SEQ ID NOS : (SEQ ID stringency (nM) modification T (+) (-) gene 190-226) spacer NOS: 227-263) 4 2 1. O 5 frequency

5 4 1. aTaaTCCTttTC TGTTTAAAACaGCAAAAG X n.d.

5 4 1. GaCATCCaaATt ACATGG AAACTGaAAAAG X n.d.

5 5 O SDK1 GTCtTgCTg tTg CACCTC AAACTGCAAAAG X n.d. 4 1 3 SPTB (coding) GTCATCCdCATC GCCCTG gAACTGgAAAAa X n.d.

4 2 2 aTCATCCTCAaC AAACTA AAACaGgAAAAG X

4 4 O KIAA168O GgaATgCdCATC ACCACA AAACTGCAAAAG X n.d.

5 5 O GTttTgCTCcTg TACTTC AAACTGCAAAAG X n.d.

Table 1. CCR5-224 Off-Target Sites in the Genome of Human Table 2: K562 Cells. 0298 Lower case letters indicate mutations compared to 0299 Sequencing statistics. The total number of interpret the target site. Sites marked with an X were found in the able sequences (“total sequences”) and the number of ana corresponding in vitro selection dataset. Trefers to the total lyzed sequences for each in vitro selection condition are number of mutations in the site, and (+) and (-) to the shown. Analyzed sequences are non-repeated sequences con number of mutations in the (+) and (-)half-sites, respectively. The sequences of the sites are listed as 5' (+) half-site/spacer/ taining no ambiguous nucleotides that, for post-selection (-) half-site 3', therefore the (+) half-site is listed in the sequences, contained reverse complementary overhang reverse sense as it is in the sequence profiles. K562 modifi sequences of at least four bases, a signature used in this study cation frequency is the frequency of observed sequences as a hallmark of ZFN-mediated cleavage. “Incompatible showing significant evidence of non-homologous endjoining repair (see Methods) in cells expressing active ZFN com overhangs' refer to sequences that did not contain reverse pared to cells expressing empty vector. Sites that did not show complementary overhang sequences of at least four bases. statistically significant evidence of modifications are listed as The high abundance of repeated sequences in the 0.5 nM. 1 not detected (n.d.), and K562 modification frequency is left nM, and 2 nM selections indicate that the number of sequenc blank for the three sites that were not analyzed due to non specific PCR amplification from the genome. Table 4 shows ing reads obtained in those selections, before repeat the sequence counts and P-values for the tested sites used to sequences were removed, was larger than the number of determine K562 modification frequency, and Table 6 shows individual DNA sequences that survived all experimental the modified sequences obtained for each site. selection steps.

Rejected Sequences

Uncalled Total Analyzed Incompatible Repeated Bases in Sequences Sequences Overhangs Sequences Half-Sites

CCR5-224 Pre-Selection 1426,442 1,392,576 O 33,660 2O6 CCRS-224 O.S nM 649,348 52.552 209,442 387,299 55 CCR5-2241 nM 488,798 55,618 89,672 343,442 66 CCR5-2242 nM 1,184,523 303462 170,700 710,212 149 CCR5-2244 nM 1,339,631 815,634 352,888 170,700 159

Total 5,088,742 2,619,842 822,702 1,645,563 635 VF2468 Pre-Selection 1431,372 1,393,153 O 38,128 91 VF2468 O.SM 297,650 25,851 79,113 192,671 15 VF24681 nM 148,556 24,735 19,276 104,541 VF2468 2nM 1,362,058 339,076 217475 805,433 74 VF2468.4 nM 1,055,972 397,573 376,364 281,991

Total 4,295,608 2,180,388 692,228 1422,764 228 US 2015/0010526 A1 Jan 8, 2015 32

8. 4 nM (wt EF = 5.48) 2 nM (wt EF = 8.11) 1 nM (wt EF = 16.6) 0.5 nM (wt EF = 24.9) CCRS-224 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts

EF > 0 100% 99.98% 76%. 100% 99% 49%. 100% 83% 1490 100% 75% 11% EF > 1 100% 93%. 55%. 100% 84% 42%. 100%. 68% 1490 100% 58% 11% EF > 2 100% 78%. 37%. 100% 70% 31% 99%. 55.9% 14% 96%. 46% 11% EF > 100% 63%. 28% 93% 40% 1796 51%. 15% 8% 31% 8% 4% (.5 x wt EF) EF - wt EF 14% 9%, 10% 8% 6% 6% 3% 29/o 3% 6% 196 2%

b 4 nM (wt EF = 16.7) 2 nM (wt EF = 22.5) 1 nM (wt EF = 30.2) 0.5 nM (wt EF = 33.1) VF2468 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts 1 mut 2 muts 3 muts

EF > 0 100% 95% 38%. 100% 92% 26%. 100%. 47% 5%. 100%. 44% 4% EF > 1 98%. 49% 1796 93%. 34% 11% 83% 24% 59 80%. 21% 4% EF > 2 89%, 31% 10% 83%. 23% 79% 74%. 17% 59 61%. 14% 4% EF > 57%. 15.9% 4% 30%. 10% 296 11% 6% 196 9% 59 190 (.5 x wt EF) EF - wt EF 79% 196 190 79% 196 O.4% 79% 190 O.4% 79% 196 O.3%

Table 3: calculated for each sequence identified in the selection by dividing the observed frequency of that sequence in the post 0300. Both ZFNs tested have the ability to cleave a large selection sequenced library by the observed frequency of that fraction of target sites with three or fewer mutations. The sequence in the preselection library. The enrichment factors percentage of the set of sequences with 1, 2, or 3 mutations for the wild-type sequence (wt EFs) calculated for each in (muts) that can be cleaved by (a) the CCR5-224 ZFN and (b) vitro selection stringency are shown in the first row of the the VF2468 ZFN is shown. Enrichment factors (EFs) were table.

mutations (-) T (+) (-) gene build 36 coordinates (+) half-site spacer half-site

CCR5-224. 1 O O O CCR5 (coding) chr3 : 4839.9548 - 46.3895.78 GTCATCCTCATC CIGAT AAACTGCAAAAG

CCR5-224 2 2 1 1. CCR2 (coding) chr3: 463742 09 - 4 6374.237 GTCdTCCTCATC TTAAT AAACTGCAAAAA

CCR5-224 3 3 2 1. BTBD10 chr11: 13441738-134 41766 GTttTCCTCTATC AAAGC AAACTGCAAAAt (promoter)

CCR5-224 4 4 O 4. chr10: 296 O4352 - 296 O438O GTCATCCTCATC AGAGA AAACTGgctAAt

CCR5-224 5 4 3 1. SILC4A8 chr12: 5 O1.86653 - 5 O186682 taaaTCCTCATC TCTATA AAAaTGCAAAAG

CCR5-224 6 3 2 1. Z83955 RNA chr12: 33484433-33484. 462 GTCATCC caATC GAAGAA AAACTGaAAAAG

CCR5-224 7 3 1. 2 DGKK chrx: 5 O1499 61 - 5 O149989 cTCATCCTCATC CATGC AcAaTGCAAAAG

CCR5-224 8 3 1. 2 GALNT13 chr2: 154567664-154567692 GTCATCCTCAgC ATGGG AAACaGCAgAAG

CCR5-224 9 3 1. 2 chr17 : 61624 429 - 51624. 457 GTCATCitTCATC AAAAG gAACTGCAAAAC

CCR5-224 10 4 O 4. chrx: 145275453- 14527 5481 GTCATCCTCATC CAATA AAAgaaCAAAgG

CCR5-224. 11 4 1 3 TACR3 chr4 : 104775.175 - 1047752O3 GTCATCitTCATC AGCAT AAACTGtAAAgt

CCR5-224 12 4 1 3 PIWIL2 chr8: 221.91670-22191 698 GTCATCCTCATa CATAA AAACTGCCttAG

CCR5-224 13 4 1 3 ch9 : 76.194351-761943.79 aTCATCCTCATC CATCC AAtgTtCAAAAG

CCR5-224 14 4 3 1. chr8: 521-14315 - 52114343 GTCCT.gcTCAgC AAAAC AAACTGaAAAAG

CCR5-224 15 4 3 1. KCNB2 chr8: 7339.9370-73899.398 aTgtTCCTCATC TCCCG AAACTGCAAAtG

CCR5-224 16 4 3 1. chr8: 486.5886 - 48.65914 GTctTCCTgATg CTACC AAACTGgAAAAG

CCR5-224, 17 4 3 1. ch9 : 14931O72 - 149311OO aa. CATCCaCATC ATGAA AAACTGCAAAAa

CCR5-224, 18 6 3 3 chir13 : 65537258 - 65537286 aTCitTCCTCATt ACAGG AAAaTGtAAtAG