USOO9044492B2

(12) United States Patent (10) Patent No.: US 9,044,492 B2 Delacote et al. (45) Date of Patent: Jun. 2, 2015

(54) METHOD FORMODULATING THE 8,206,965 B2 6/2012 Arnould et al. EFFICIENCY OF DOUBLE-STRAND 8,211,685 B2 7/2012 Epinatet al. 8.426,177 B2 4/2013 Gouble BREAK-INDUCED MUTAGENESIS 8,476,072 B2 7/2013 Cabaniols et al. O O 8,530,214 B2 9/2013 Arnould et al. (75) Inventors: Fabien Delacote, Paris (FR); Philippe 2006, OO78552 A1 4/2006 Arnould et al. Duchateau, Livry Gargan (FR): 2006/0153826 A1 7/2006 Arnould et al. Christophe Perez-Michaut, Paris (FR) 2006/0206949 A1 9, 2006 Arnould et al. s 2009/0220476 A1 9/2009 Paques 2009/0222937 A1 9, 2009 Arnould et al. (73) Assignee. CELLECTISSA, Romainville (FR) 2009/0271881 A1 10, 2009 Arnould et al. - 2010/0086533 A1 4/2010 Montoya et al. (*) Notice: Subject to any disclaimer, the term of this 2010, 0146651 A1 6, 2010 Smith et al. patent is extended or adjusted under 35 2010, 0151556 Al 6, 2010 Arnould et al. U.S.C. 154(b) by 407 days. 2010.0167357 A1 7/2010 Fajardo Sanchez et al. 2010/02O3031 A1 8, 2010 Grizot et al. (21) Appl. No.: 13/367,098 2010/0229252 A1 9/2010 Perez-Michaut 9 (Continued) (22) Filed: Feb. 6, 2012 Primary Examiner — Terra C Gibbs (65) Prior Publication Data (74) Attorney, Agent, or Firm — Oblon, McClelland, US 2012/0244131 A1 Sep. 27, 2012 Maier & Neustadt, L.L.P. Related U.S. Application Data (57) ABSTRACT (60) Provisional application No. 61/439,739, filed on Feb. A method for modulating double-strand break-induced 4, 2011. mutagenesis at a genomic locus of interest in a cell, thereby giving new tools for genome engineering, including thera 51) Int. C peuticeut1C applicatiapol1Cat1OnS andd cellCell li1ne eng1neering.gi ing. A method fOr CI2O I/68 (2006.01) modulating double-strand break-induced mutagenesis, con A63/788 (2006.01) cerns the identification of effectors that modulate double CI2N IS/10 (2006.01) Strand break-induced mutagenesis by use of interfering (52) U.S. Cl. agents; these agents are capable of modulating double-strand CPC ...... A61K3I/7088 (2013.01); C12N 15/102 break-induced mutagenesis through their respective director (2013.01); C12N 15/1086 (2013.01) indirect actions on said effectors. Methods of using these (58) Field of Classification Search effectors, interfering agents and derivatives, respectively, by USPC ...... 435/6.1,911,325,375,536/24.5 introducing them into a cell in order to modulate and more See application file for complete search history. particularly to increase double-strand break-induced mutagenesis. Specific derivatives of identified effectors and (56)56 References Cited 1nterferinginterfering agagents, Vectors encodingding ththem, compositionspositi and kits comprising Such derivatives for modulating or increasing U.S. PATENT DOCUMENTS double-strand break-induced mutagenesis. 7,842,489 B2 11/2010 Arnould et al. 7,897,372 B2 3/2011 Duchateau et al. 9 Claims, 12 Drawing Sheets

------A :8: :x:g acroix:

a sets ;32&ssi:

US 9,044,492 B2 Page 2

(56) References Cited 2012/0272.348 A1 10, 2012 Danos et al. 2012/0288941 A1 11/2012 Arnould et al. U.S. PATENT DOCUMENTS 2012/0288942 A1 11/2012 Arnould et al. 2012/0288943 A1 11/2012 Arnould et al. 2011/0072527 A1 3f2011 Duchateau et al. 2012/030 1456 Al 1 1/2012 Tremblay et al. 2011/009 1441 A1 4/2011 Gouble et al. 2012/0304321 A1 11/2012 Arnould et al. 2011/O151539 A1 6/2011 Arnould et al. 2012/0317664 A1 12/2012 Arnould et al. 2011/0158974 A1 6, 2011 Duchateau et al. 2012/0322689 A1 12/2012 Epinatet al. 2011/0173710 A1 7, 2011 Grizot et al. 2012/0331574 A1 12/2012 Arnould et al. 2011/01795.06 A1 7, 2011 Grizot 2013,0059387 A1 3/2013 Smith et al. 2011/0179507 A1 7/2011 Paques 2013,0061341 A1 3/2013 Arnould et al. 2011 019 1870 A1 8/2011 Paques 2013, OO676O7 A1 3/2013 Arnould et al. 2011/02O7199 A1 8/2011 Paques et al. 2013,0203840 A1 8/2013 Arnould et al. 2011/0225664 A1 9, 2011 Smith 2013/0209437 A1 8/2013 Paques 2011/0239319 A1 9/2011 Danos et al. 2013/0227715 A1 8/2013 Danos et al. 2012. O159659 A1 6/2012 Arnould et al. 2013,0236946 A1 9/2013 Gouble 2012/0171191 A1 7, 2012 Choulika et al. 2013/0326644 Al 12/2013 Paques 2012fO258537 A1 10, 2012 Duchateau et al. 2014,0004608 A1 1/2014 Cabaniols et al. 2012fO26035.6 A1 10, 2012 Choulika et al. 2014, OO17731 A1 1/2014 Gouble et al. U.S. Patent Jun. 2, 2015 Sheet 1 of 12 US 9,044,492 B2

Figure

•or . r DNA ends & binding and protection

DNA exis

processing

------

Nik &is igation

K:::::::::::: 38: Kx: 88: 88k Ks xxxx-xx-x.: U.S. Patent Jun. 2, 2015 Sheet 2 of 12 US 9,044,492 B2

Figure 2

8 x 88883 : x -xxx x. $88: 8xxx xxxix. {y: $88.8.x:

: x 8.8 xxixe: ki: cocions Ars .xxxx x88 - {{

88: xxx xxxi-xxx xxx xx xxxx U.S. Patent Jun. 2, 2015 Sheet 3 of 12 US 9,044,492 B2

rigue 3

U.S. Patent US 9,044,492 B2

.

× U.S. Patent Jun. 2, 2015 Sheet 5 of 12 US 9,044,492 B2

s As {K

assssssssss...... : *::::: ::::: ::::: 888 U.S. Patent Jun. 2, 2015 Sheet 6 of 12 US 9,044,492 B2

Figure 6

{& Ex.888 x -8x: ix.88883: xxxx xik cycine serine stretch

x 8.8 xxix: e is exiers :

... to: : xix. text is 88. is Kreipes svopa pic or Amp 88.8.x is:x: xix {x :

U.S. Patent Jun. 2, 2015 Sheet 8 of 12 US 9,044,492 B2

Figure 8: 8888 kept {x :

ex orwar: pixer // x:

8:{{888 33 se 8. : i: 8:8 ::::::::::

a six K preverse primer : Kp. it proseter origits sve 8: x8x:

Figure 9 city forware prise civ promoter sco Races

tax

(8.3. : :::::::::: : xis:

U.S. Patent Jun. 2, 2015 Sheet 9 of 12 US 9,044,492 B2

Figure 10 Ampr

ORF EGFP pCLSOO99 6119 bp BGH polyA

N f1 OR

SV40 polyA o is 0. SV40 prom Neo R

Figure 11

e APA P(LAC)

APr pCLS0002G1. 2686 bp

OR U.S. Patent Jun. 2, 2015 Sheet 10 of 12 US 9,044,492 B2

U.S. Patent Jun. 2, 2015 Sheet 11 of 12 US 9,044,492 B2

Figure 13 8 pronotor 888 {xx xxxx {8 &8:::::::::::

::::::::::::

Exx 8x8x swept : Kg8. 8x8. is origs . 8x888: 888 88: 888

its grx 88:8: 8.8 x888

::::::::::::: 8:8: 388: 888 $8xx:8:

s: : N ... x: 8:8 88 grx 3 steps 88: 88: 8888 8 p. it origs U.S. Patent Jun. 2, 2015 Sheet 12 of 12 US 9,044,492 B2

Figure 15

& & & 3: 3: : & 3& &

Figure 16

888 (8 x x:

::::::::::

scies re-sex 8x8 $88.88: US 9,044,492 B2 1. 2 METHOD FOR MODULATING THE Osakabe et al. 2007). Typically, GT events occur in a fairly EFFICIENCY OF DOUBLE-STRAND small proportion of treated mammalian while GT efficiency is BREAK-INDUCED MUTAGENESIS extremely low in higher plant cells and range between 0.01 0.1% of the total number of random integration events CROSS-REFERENCE TO RELATED (Terada, Johzuka-Hisatomi et al. 2007). The low GT frequen APPLICATION cies reported in various organisms are thought to result from competition between HR and non homologous end joining This application claims priority under 35 U.S.C. S 119(e) to (NHEJ) for repair of dsDNA breaks (DSBs). As a conse U.S. Provisional Application No. U.S. 61/439,739, filed Feb. quence, the ends of a donor molecule are likely to be joined by 4, 2011, which is hereby incorporated by reference in its 10 NHEJ rather than participating in HR, thus reducing GT entirety. frequency. There is extensive data indicating that DSBs repair by NHEJ is error-prone. Often, DSBs are repaired by end FIELD OF THE INVENTION joining processes that generate insertions and/or deletions (Britt 1999). Thus, these NHEJ-based strategies might be The present invention relates to a method for modulating 15 more effective than HR-based strategies for targeted double-strand break-induced mutagenesis at a genomic locus mutagenesis into cells. Indeed, expression of I-Sce I, a rare of interest in a cell, thereby giving new tools for genome cutting restriction , has been shown to introduce engineering, including therapeutic applications and cell line mutations at I-Sce I cleavage sites in Arabidopsis and tobacco engineering. More specifically, the method of the present (Kirik, Salomon et al. 2000). Nevertheless, the use of restric invention for modulating double-strand break-induced tion is limited to rarely occurring natural recogni mutagenesis (DSB-induced mutagenesis), concerns the iden tion sites or to artificial target sites. To overcome this prob tification of effectors that modulate said DSB-induced lem, meganucleases with engineered specificity towards a mutagenesis by uses of interfering agents; these agents are chosen sequence have been developed. Meganucleases show capable of modulating DSB-induced mutagenesis through high specificity to their DNA target, these proteins being able their respective direct or indirect actions on said effectors. 25 to cleave a unique chromosomal sequence and therefore do The present invention also concerns the uses of these effec not affect global genome integrity. Natural meganucleases are tors, interfering agents and derivatives, respectively, by intro essentially represented by homing endonucleases, a wide ducing them into a cell in order to modulate and more par spread class of proteins found in , bacteria and ticularly to increase DSB-induced mutagenesis. The present archae (Chevalier and Stoddard 2001). Early studies of the invention also relates to specific derivatives of identified 30 I-Sce I and HO homing endonucleases have illustrated how effectors and interfering agents, vectors encoding them, com the cleavage activity of these proteins can be used to initiate positions and kits comprising such derivatives in order to HR events in living cells and have demonstrated the recombi modulate and more particularly to increase DSB-induced nogenie properties of chromosomal DSBs (Dujon, Colleaux mutagenesis. et al. 1986; Haber 1995). Since then, meganuclease-induced 35 HR has been Successfully used for genome engineering pur BACKGROUND OF THE INVENTION poses in bacteria (Posfai, Kolisnychenko et al. 1.999), mam malian cells (Sargent, Brenneman et al. 1997: Cohen-Tan Mutagenesis is induced by physical and chemical means noudji, Robine et al. 1998; Donoho, Jasin et al. 1998), mice provoking DNA damages when incorrectly repaired leading (Cbuble, Smith et al. 2006) and plants (Puchta, Dujon et al. to mutations. Several chemicals are known to cause DNA 40 1996; Siebert and Puchta 2002). Meganucleases have lesions and are routinely used. Radiomimetic agents work emerged as Scaffolds of choice for deriving genome engineer through free radical attack on the sugar moieties of DNA ing tools cutting a desired target sequence (Paques and (Povirk 1996). A second group of drugs inducing DNA dam Duchateau 2007). age includes inhibitors of topoisomerase I (Topol) and II Combinatorial assembly processes allowing to engineer (Topol I) (Teicher 2008) (Burden and N. 1998). Other classes 45 meganucleases with modified specificities has been described of chemicals bind covalently to the DNA and form bulky by Arnould etal. (Arnould, Chames et al. 2006; Smith, Grizot adducts that are repaired by the nucleotide excision repair et al. 2006; Arnould, Perez et al. 2007: Grizot, Smith et al. (NER) system (Nouspikel 2009). Chemicals inducing DNA 2009). Briefly, these processes rely on the identifications of damage have a diverse range of applications and are widely locally engineered variants with a Substrate specificity that used. However, although certain agents are more commonly 50 differs from the substrate specificity of the wild-type mega applied in studying a particular repair pathway (e.g. cross nuclease by only a few nucleotides. Another type of specific linking agents are favored for NER Studies), most drugs endonucleases is based on Zinc finger nuclease. ZFNs are simultaneously provoke a variety of lesions (Nagy and Sou chimeric proteins composed of a synthetic Zinc finger-based toglou 2009). The physical means to generate mutagenesis is DNA binding domain and a DNA cleavage domain. By modi through the exposure of cells to ionizing radiation of one of 55 fication of the Zinc finger DNA binding domain, ZFNs can be three classes—X-rays, gamma rays, or neutrons (Green and specifically designed to cleave virtually any long stretch of Roderick 1966). However, using these classical, strategies, dsDNA sequence (Kim, Chaetal. 1996; Cathomen and Joung the overall yield of induced mutations is quite low, and the 2008). An NHEJ-based targeted mutagenesis strategy was DNA damage leading to mutagenesis cannot be targeted to developed recently in several organisms by using synthetic precise genomic DNA sequence. 60 ZFNs to generate DSBs at specific genomic sites (Lloyd, The most widely used in vivo site-directed mutagenesis Plaisier et al. 2005; Beumer, Trautman et al. 2008; Doyon, strategy is targeting (GT) via homologous recombina McCammon et al. 2008; Meng, Noyes et al. 2008). Subse tion (HR). Efficient GT procedures have been available for quent repair of the DSBs by NHEJ frequently produces dele more than 20 years in yeast (Rothstein 1991) and mouse tions and/or insertions at the joining site. For examples, in (Capecchi 1989). Successful GT has also been achieved in 65 Zebrafish embryos, the injection of mRNA coding for engi Arabidopsis and rice plants (Hanin, Volrath et al. 2001; neered ZFN led to animals carrying the desired heritable Terada, Urawa et al. 2002; Endo, Osakabe et al. 2006; Endo, mutations (Doyon, McCammon et al. 2008). In plant, same US 9,044,492 B2 3 4 NHEJ-based targeted-mutagenesis has also been Successful related (such as DNA-dependent protein , ATR (Lloyd, Plaisier et al. 2005). Although these powerful tools and ATM) by the radiosensitizing agent, wortmannin. are available, there is still a need to further improved double In an attempt to define in molecular detail the mechanism Strand break-induced mutagenesis. of NHEJ, an in vitro system for end-joining was recently As mentioned above, two mechanisms for the repair of 5 developed (Baumarm and West 1998). The reactions exhib DSBs have been described, involving either homologous ited an apparent requirement for DNA-PKS, Ku70/80, recombination or non-homologous end-joining (NHEJ). XRCC4 and DNA ligase IV, consistent with the in vivo NHEJ consists of at least two genetically and biochemically requirements. Preliminary fractionation and complementa distinct process (Feldmann, Schmiemann et al. 2000). The tion assays, however, revealed that these factors were not major and best characterized "classic' end-joining pathway 10 Sufficient for efficient end-joining, and that other components of the reaction remained to be identified. (C-NHEJ) involves rejoining of what remains of the two DNA RNA interference is an endogenous gene silencing path ends through direct, relegation (Critchlow and Jackson 1998). way that responds to dsRNAS by Silencing homologous A scheme for this pathway is shown in FIG.1. NHEJ can be (Meister and Tuschl 2004). First described in Caenorhabditis divided in three major steps: detection and protection of DNA 15 elegans by Fire et al. the RNAi pathway functions in a broad ends, DNA end-processing and finally DNA ligation, Detec range of eukaryotic organisms (Hannon 2002). Silencing in tion and protection of DNA ends are mediated by DNA-PK these initial experiments was triggered by introduction of which is composed of Ku70 and Ku80 proteins that form an long dsRNA. The enzyme Dicer cleaves these long dsRNAs heterodimer () binding DNA ends and recruiting DNA-PK into short-interfering RNAs (siRNAs) of approximately catalytic subunit (DNA-PKcs). This interaction DNA-PKcs 21-23 nucleotides. One of the two siRNA strands is then Ku-DSB stimulates DNA-PKcs kinase activity, maintains the incorporated into an RNA-induced silencing complex broken ends in close proximity and prevents from extended (RISC). RISC compares these “guide RNAs” to RNAs in the degradation. Ku also recruits other components of C-NHEJ cell and efficiently cleaves target RNAS containing sequences repair process. Candidates for DNA end processing are Arte that are perfectly, or nearly perfectly complementary to the mis DNA mu () and lamda (W), polynucleotide 25 guide RNA. kinase (PNK) and Werner's syndrome (WRN) (for For many years it was unclear whether the RNAi pathway review (Mahaney, Meek et al., 2009)). The ligation process is was functional in cultured mammalian cells and in whole mediated by DNA ligase IV and its cofactors XRCC4 and mammals. However, Elbashir S. M. et al., 2001 (Elbashir, XLF/Cernmunos. Finally, other proteins or complex modulat Harborth et al. 2001), triggered RNAi in cultured mammalian ing NHEJ activity have been described such as BRCA1, 30 cells by transfecting them with 21 nucleotide synthetic RNA Rad50-Mre 11-Nbs (Williams, Williams et al. 2007: Shrivas duplexes that mimicked endogenous siRNAs. McCaffrey et tav, De Haro et al. 2008) complex, CtlP or FANCD2 (Bau, al. (McCaffrey, Meuse et al., 2002), also demonstrated that Man et al. 2006: Pace, Mosedale et al. 2010)). NHEJ is siRNAs and shRNAs could efficiently silence genes in adult thought to be effective at all times in the cell cycle ((Essers, mice. van Steeg et al. 2000): (Takata, Sasaki et al. 1998)). NHEJ 35 Introduction of chemically synthetized siRNAs can effec also plays an important role in DSB repair during V(D)J tively mediate post-transcriptional gene silencing in mamma recombination (Blunt, Finnie et al. 1995) (Taccioli, Rathbun lian cells without inducing interferon responses. Synthetic et al. 1993). siRNAS, targeted against a variety of genes have been Suc The second mechanism, referred as microhomology medi cessfully used in mammalian cells to prevent expression of ated end joining (MMEJ) or alternative NHEJ (A-NHEJ) or 40 target mRNA (Harborth, Elbashir et al. 2001). These discov back up NHEJ (B-NHEJ) is associated with significant 5'-3' eries of RNAi and siRNA-mediated gene silencing has led to resection of the end and uses microhomologies to anneal a spectrum of opportunities for functional genomics, target DNA allowing repair. Little is known about the components validation, and the development of siRNA-based therapeu of this machinery. DNA ligase3 with XRCC1 proteins are tics, making it a potentially powerful tool for therapeutics and candidate for the ligase activity (Audebert, Salles et al. 2004: 45 in vivo studies. Wang, Rosidi et al. 2005). PARP seems also to be an impor The authors of the present invention have developed a new tant factor of this mechanism (Audebert, Salles et al. 2004) approach to increase the efficiency of DSB-induced mutagen (Wang, Wu et al. 2006). esis. This new approach relates through, the identification of Theoretically, both classical and alternative NHEJ could new effectors that modulate said DSB-induced mutagenesis lead to mutagenesis, although A-NHEJ mechanism would 50 by uses of interfering agents in an in vivo assay. These agents represent the main pathway to favour when one wants to being capable of modulate DSB-induced mutagenesis increase DSB-induced mutagenesis. Several methods have through their respective direct or indirect actions on respec been described in order to modulate NHE.J. For example, US tive effectors, introduction of these interfering agents and/or 2004/029130A1 concerns a method of stimulating NHEJ of derivatives into a cell, respectively, will lead to a cell wherein DNA the method comprising performing NHEJ of DNA in 55 said DSB-induced mutagenesis is modulated. the presence of inositol hexakisphosphate (IP6) or other stimulatory inositol phosphate. The invention also provides BRIEF SUMMARY OF THE INVENTION screening assays for compounds which may modulate NHEJ and DNA-PK and related protein kinases and which may be The present invention relates to a method for modulating therapeutically useful. WO98/30902 relates to modulation of 60 DSB-induced mutagenesis at a genomic locus of interest in a the NHEJ system via regulation (using protein and/or natural cell, thereby giving new tools for genome engineering, or synthetic compounds) of the interactions of XRCC4 and including therapeutic applications and cell line engineering. DNA ligase IV, and XRCC4 and DNA-PK to effect cellular More specifically, in a first aspect, the present invention DNA repair activity. It also relates to screens for individuals concerns a method for identifying effectors that modulate predisposed to conditions in which XRCC4 and/or DNA 65 DSB-induced mutagenesis, thereby allowing the increase or ligase IV are deficient, Sarkaria et al. (Sarkaria, Tibbetts et al. decrease of DSB-induced mutagenesis in a cell. As described 1998) describes the inhibition of phosphoinositide 3-kinase elsewhere, this method allows screening of interfering agents US 9,044,492 B2 5 6 libraries covering an unlimited number of molecules. As a (pCLS6810, SEQID NO: 5) or I-Sce I (pCLS6663, SEQID non-limiting example, the method of the present invention NO: 6) meganucleases. The vectors can be targeted at RAG1 allows screening for interfering RNAs, which in turn allow endogenous locus to obtain an established cell line identifying the genes which they silence, through their FIG. 7: Extrachromosomal transfection assay in 293H cell capacities to stimulate or to inhibit DSB-induced mutagen line to validate induction of NHEJ repair events of the EGFP esis, based on at least one reporter system. reporter gene of the pCLS6810 (SEQID NO. 5) plasmid with In a second aspect, the present invention concerns a method the expression vector pCLS2690 (SEQID NO: 3) for the for modulating DSB-induced mutagenesis in a cell by using SC GS meganuclease in comparison with a control vector interfering agents. pCLS002 (SEQID NO: 41). In a third aspect, the present invention concerns specific 10 FIG. 8: Vector map of pCLS2690. interfering agents, their derivatives such as polynucleotide FIG.9: Vector map of pCLS2222. derivatives or other molecules as non-limiting examples. FIG. 10: Vector map of pCLS0099. In a fourth aspect, the present invention further encom FIG. 11: Vector map of pCLS0002. passes cells in which. DSB-induced mutagenesis is modu FIG. 12: Normalized Luciferase activity of the High lated. It refers, as non-limiting example, to an isolated cell, 15 throughput screening of the sRNA library. Hits stimulating or obtained and/or obtainable by the method according to the inhibiting the SC GS-induced Non Homologous End Joining present invention. repair activity are indicated by plain or hatched squares In a fifth aspect, the present invention also relates to com respectively. positions and kits comprising the interfering agents, poly FIG. 13: Vector map of pCLS1853 nucleotides derivatives, vectors and cells according to the FIG. 14: Vector map of pCLS8054 present invention. FIG. 15: Graph correlation between the percentage of In a sixth aspect, the present invention concerns the uses of GFP+ cells induced by the meganucleases SC GS and specific interfering agents, their derivatives such as poly Trex2 SC GS and the frequency of NHEJ mutagenesis ana nucleotide derivatives or other molecules as non-limiting lyzed by deep sequencing. Striated triangle: negative control examples, for modulating DSB-induced mutagenesis. 25 of transfection with pCLS0002 (SEQ ID NO: 41). Striated The above objects highlight certain aspects of the inven circle: cotransfection of SC GS (SEQID NO: 4) with siRNA tion. Additional objects, aspects and embodiments of the control AS. Dark circles: cotransfections of SC GS with invention are found in the following detailed description of siRNAs CAP1 (SEQID NO: 367), TALDO1 (SEQ ID NO: the invention. 111) and DUSP1 (SEQID NO: 106). Striated square: cotrans 30 fection of Trex2/SC GS (SEQ ID NO: 1049) with siRNA BRIEF DESCRIPTION OF THE FIGURES control AS. Dark squares: cotransfections of SC GS with siRNAs TALDO1 (SEQID NO: 111), DUSP1 (SEQID NO: In addition to the preceding features, the invention further 106) and PTPN22 (SEQID NO: 283). comprises other features which will emerge from the descrip FIG. 16 Vector map of pCLS9573 tion which follows, as well as to the appended drawings. A 35 more complete appreciation of the invention and many of the DETAILED DESCRIPTION OF THE INVENTION attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the follow Unless specifically defined herein below, all technical and ing Figures in conjunction with the detailed description Scientific terms used herein have the same meaning as com below. 40 monly understood by a skilled artisan in the fields of gene FIG. 1: Scheme of the “classic' end-joining pathway therapy, biochemistry, genetics, and molecular biology. (C-NHEJ). All methods and materials similar or equivalent to those FIG.2: Plasmid construction maps to quantify NHEJ repair described herein can be used in the practice or testing of the events by SC GS (pCLS6883: SEQ ID NO: 1) or I-SceI present invention, with Suitable methods and materials being (pCLS6884 SEQID NO: 2); these constructions can be tar 45 described herein. All publications, patent applications, pat geted to RAG1 locus. ents, and other references mentioned herein are incorporated FIG. 3: Z-score values of an extrachromosomal assay by reference in their entirety. In case of conflict, the present screening of siRNA targeting 696 genes coding for kinases. specification, including definitions, will control. Further, the FIG. 4: Validation, of a stable cellular model to quantify materials, methods, and examples are illustrative only and are NHEJ repair events induced by SC GS via a luciferase 50 not intended to be limiting, unless otherwise specified. reporter gene, after integration of pCLS6883 (SEQID NO: 1) The practice of the present invention will employ, unless at RAG1 locus of 293H cells; Panel 4A: Examples of four otherwise indicated, conventional techniques of cell, biology, siRNAs increasing NHEJ repair events induced by SC GS at cell culture, molecular biology, transgenic biology, microbi RAG1 locus of 293H cells after targeting WRN, MAPK3, ology, recombinant DNA, and immunology, which are within FANCD2 and LIG4 genes. Panel 4B: Examples of 10 siRNAs 55 the skill of the art. Such techniques are explained fully in the increasing NHEJ repair events induced by SC GS at RAG1 literature. See, for example, Current Protocols in Molecular locus of 293H cells (8 siRNAs identified with an extrachro Biology (Frederick M. AUSUBEL, 2000, Wiley and son Inc, mosomal assay targeting CAMK2G, SMG1, PRKCE, Library of Congress, USA); Molecular Cloning: A Labora CSNK1D, AK2, AKT2, MAPK12 and ElF2AK2 genes and tory Manual. Third Edition, (Sambrook et al., 2001, Cold two siRNAs targeting PRKDC gene). 60 Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press); FIG. 5: DeepSequencing experiment for monitoring of Oligonucleotide Synthesis (M. J. Gaited., 1984); Mullis et al. NHEJ repair events induced by SC-RAG meganuclease at U.S. Pat. No. 4,683, 195; Nucleic Acid Hybridization (B. D. endogenous RAG1 locus of 293H cells in the presence or not Harries & S.J. Higgins eds. 1984): Transcription And Trans of siRNAs targeting WRN, MAPK3, FANCD2, ATR, lation (B. D. Hames & S. J. Higgins eds. 1984); Culture Of BRCA1 and XRCC6 genes. 65 Animal Cells (R, I. Freshney, Alan R. Liss, Inc., 1987); FIG. 6: EGFP plasmid construction maps to monitor a Immobilized Cells And Enzymes (IRL Press, 1986); B. Per frequency of NHEJ repair events induced by SC GS bal, A Practical Guide To Molecular Cloning (1984); the US 9,044,492 B2 7 8 series, Methods. In ENZYMOLOGY (J, Abeison and M. (a) providing a cell expressing a reporter gene rendered Simon, eds.-in-chief, Academic Press, Inc., New York), spe inactive by a frameshift in its coding sequence, due to the cifically, Vols. 154 and 155 (Wu et al. eds.) and Vol. 185, introduction in said sequence of a DSB-creating agent “ Technology' (D. Goeddel, ed.); Gene target site; Transfer Vectors For Mammalian Cells (J. H. Miller and M. P. (b) providing an interfering agent; Calos eds., 1987, Cold Spring Harbor Laboratory): Immu (c) contacting said cell with: nochemical Methods In Cell And Molecular Biology (Mayer i. an interfering agent; and Walker, eds. Academic Press, London, 1987); Handbook ii. a delivery vector comprising a double-strand break OfExperimental Immunology, Volumes I-IV (D. M. Weir and creating agent, wherein said double-strand break cre 10 ating agent provokes a mutagenic double-strand C. C. Blackwell, eds., 1986); and Manipulating the Mouse break that can be repaired by NHEJ leading to a func Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring tional restoration of said reporter gene; Harbor, N.Y., 1986). (d) detecting expression of the functional reporter gene in In a first aspect, the present invention concerns a method the cell obtained at the end of step (c); for identifying effectors that modulate double-strand break 15 (e) repeating steps (c) and (d) at least one time for each induced mutagenesis, thereby allowing the increase or interfering agent; decrease of double-strand break-induced mutagenesis in a (f) identifying effectors whose interfering agent increases cell. As described elsewhere, this method allows screening of or decreases the expression of the reporter gene detected interfering agents libraries covering an unlimited number of at Step (d) as compared to a negative control; and molecules. As a non-limiting example, the method of the (g) for the effectors identified at Step (f), repeating steps (a), present invention allows screening for interfering RNAS, (c), (d) and (f) with a cell line expressing a different which in turn allow identifying the genes which they silence, inactive reporter gene than the inactive reporter gene through their capacities to stimulate or to inhibit double previously used; Strand break-induced mutagenesis, based on at least one whereby the effectors identified at the end of step (f) are reporter system. 25 effectors that modulate double-stranded break-induced This first aspect of the method of the invention is based on mutagenesis in a cell. two successive screening steps. In a preferred embodiment, the present invention concerns The first screening step is a highly sensitive high-through a method for identifying effector genes that modulates endo put assay measuring double-strand break-induced mutagen nuclease-induced mutagenesis, thereby allowing the increase esis based on a compatible reporter gene, for example the 30 or decrease of double-strand break-induced mutagenesis in a cell. As elsewhere described, this method allows screening of luciferase gene. This method allows, in a few runs, to screen an interfering agents library, wherein in a non limitative several thousands of interfering agents for their capacities to example, this library is an interfering RNA library covering modulate double-strand break-induced mutagenesis (com an unlimited number of genes. The method of the present pared to negative, neutral or positive interfering agents taken 35 invention allows screening for interfering RNAs, which in as controls) by measuring the restoration of a functional turn allow identifying the genes which they silence, through reporter gene originally rendered inactive by a frameshift their capacities to stimulate or to inhibit endonuclease-in introduced via a double-strand break creating agent target duced mutagenesis, based on at least one reporter system. site. It is easily understandable that the target sequence for In this preferred embodiment, the method of the invention double-strand break-induced mutagenesis can be as a non 40 is based on two Successive screening steps. limiting example, any double-strand break-induced mutagen The first screening step is a highly sensitive high-through esis site. For this identification step, said interfering agents put assay measuring endonuclease-induced mutagenesis are co-transfected with a delivery vector containing said based on a compatible reporter gene, for example the reporter gene rendered inactive by a frameshift mutation luciferase gene. This method allows, in a few runs, to Screen inserted via a double-strand break-induced target site and a 45 several thousands of interfering RNAs for their capacities to delivery vector containing a double-strand break creating modulate the reparation of an endonuclease-induced agent, said double-strand break creating agent provokes a mutagenesis Substrate coupled to said reporter system, com mutagenic double-strand break that can be repaired by NHEJ pared to negative, neutral or positive interfering RNAS taken leading to the restoration of said reporter gene and to the as controls. Said endonuclease-induced mutagenesis Sub increase in said reporter signal. 50 strate is rendered inactive by a frameshift in its coding Interfering agents that modulate double-strand break-in sequence due to the introduction in said sequence of an endo duced mutagenesis can be divided in candidates that stimulate nuclease-specific target site, like an I-Sce or an engineered or inhibit said double-strand break-induced mutagenesis. meganuclease target site. It is easily understandable that the Effectors whose interfering agents increase or decrease the endonuclease-specific target site can be any endonuclease expression of reporter gene detected and thus double-strand 55 specific target site. For this identification step, said interfering break-induced mutagenesis can also be classified as effectors RNAs are co-transfected with a delivery vector containing stimulating or inhibiting double-strand break-induced said reporter gene rendered inactive by a frameshift mutation mutagenesis. due to the insertion of a double-strand break-induced target In the second screening step of this aspect of the invention, site and a delivery vector containing an endonuclease expres a similar system as in the first screening step is used, except 60 sion cassette; said endonuclease provokes a mutagenic for the reporter gene employed. In this second step, the double-strand break, that can be repaired by NHEJ leading to reporter gene is preferably selected to allow a qualitative the functional restoration of said reporter gene and to the and/or quantitative measurement of the modulation seen dur increase in said reporter gene-associated signal. ing the first screening step. Interfering RNAs that modulate endonuclease-induced The invention therefore relates to a method for identifying 65 mutagenesis can be divided in candidates that stimulate or effectors that modulate double-strand break-induced inhibit said endonuclease-induced mutagenesis. Genes from mutagenesis in a cell comprising the steps of which these interfering RNAs are derived can also be classi US 9,044,492 B2 10 fied as genes stimulating or inhibiting endonuclease-induced two different interfering RNAs for each gene of the eukary mutagenesis. Therefore, genes related to interfering RNAS otic transcriptome. Most preferably, it is constituted by that stimulate endonuclease-induced mutagenesis can be iRNAS capable of targeting human genes, although it may classified as genes whose products inhibit double-strand also be constituted by iRNAS capable of targeting genes form break-induced mutagenesis. Conversely, genes related to common animal models such as mice, rats or monkeys. In a interfering RNAs that, inhibit endonuclease-induced preferred embodiment, the interfering RNA library used in mutagenesis can be classified as genes whose products are the frame of the present invention, can be restricted to a part necessary or stimulate double-strand break-induced of an eukaryotic transcriptome. Said restricted interfering mutagenesis. RNA library can be focused and representative of certain In the second screening step of this aspect of the invention, 10 classes of genes, such as genes encoding for protein kinases a similar system as in the first screening step is used, except as a non-limiting example. for the reporter gene used. In this second step, the reporter At step (c), in addition to being transfected with the iRNA, gene is preferably selected to allow a qualitative and/or quan the eukaryotic cell is transfected with a second vector. titative measurement of the modulation seen during the first The second, Vector comprises an endonuclease expression screening step. Such as the gene encoding the Green Fluores 15 cassette (i.e. an endonuclease under the control of expression cent Protein (GFP) as non-limiting example. signals allowing its expression upon transfection into the The invention therefore relates to a method for identifying cell). Therefore, a functional copy of the reporter gene (and genes that modulate endonuclease-induced mutagenesis in a thus a detectable signal) can only be obtained upon endonu cell comprising the steps of clease-induced mutagenesis in the transfected eukaryotic (a) providing a cell expressing a reporter gene rendered cell. inactive by a frameshift in its coding sequence, due to the The second vector can for example consist of, or be derived introduction in said sequence of a target sequence for an from, the pCLS2690 vector of SEQ ID NO: 3. The second endonuclease; vector can also for example encode for I-Sce meganuclease (b) providing an interfering RNA comprised in an interfer (SEQID NO: 40). ing RNA library; 25 The endonuclease present in the second vector can for (c) transiently co-transfecting said cell with: example correspond to a a homing endonuclease such as i. said interfering RNA; I-SceI, I-CreI, I-Ceul, I-MsoI, and I-DmoI. It may be a wild ii. a delivery vector comprising an endonuclease expres type or a variant endonuclease. In a preferred embodiment, sion cassette wherein said endonuclease provokes a the endonuclease is an engineered meganuclease Such as, in a mutagenic double-strand break that can be repaired 30 non-limiting example, an engineered SC GS meganuclease by NHEJ leading to a functional restoration of said (SEQID NO: 4). reporter gene; The first and second vectors may further comprise selec (d) detecting the signal emitted by the reporter gene in the tion markers such as genes conferring resistance to an anti co-transfected cell obtained at the end of step (c); biotic in order to select cells co-transfected with both vectors. (e) repeating step (c) and (d) at least, one time for each 35 In a preferred embodiment, the reporter gene used at step interfering RNA of said interfering RNA library; (c) is a high throughput screening-compatible reporter gene (f) identifying genes whose silencing through RNA inter Such as e.g. the gene encoding luciferase (including variants ference increases or decreases the signal detected at Step of this gene Such as firefly or renilla luciferase genes) or other (d) as compared to a negative control; and reporter genes that allow measuring a defined parameter in a (g) optionally, for the genes identified at step (f), providing 40 large number of Samples (relying on the use of multiwell an interfering RNA capable of silencing said gene, and plates, typically with 96, 384 or 1536 wells) as quickly as repeating steps (a), (c), (d) and (f) with a cell line possible. Other reporter genes include in a nonlimitative way, expressing a different inactive reporter gene than the the beta-galactosidase and the phosphatase alkaline genes, inactive reporter gene previously used; which are well-known in the art. whereby the genes identified at the end of step (i) and/or (g) 45 In step (d), the signal emitted by the reporter gene in the are genes that modulate endonuclease-induced mutagenesis co-transfected cell is detected using assays well-known in the in a cell. art. The eukaryotic cell line used at Step (a) can be constructed Step (e) comprises repeating steps (c) and (d) at least one by stably transfecting a cell line with a vector (hereafter time for each interfering RNA of the interfering RNA library. referred to as the first vector) comprising an inactive reporter 50 For example, if the iRNA library comprises two different gene, i.e. a reporter gene rendered inactive by a frameshift interfering RNAs for each gene of the eukaryotic transcrip mutation in its coding sequence, said frameshift mutation tome, each gene of the transcriptome will be tested twice. being due to the introduction in said sequence of a target At step (f), genes whose silencing through RNA interfer sequence for an endonuclease. In other terms, such inactive ence increases or decreases, preferably significantly reporter gene is not capable of emitting any relevant detect 55 increases or decreases, the signal detected at step (d) as com able signal upon transfection into a cell. On the vector, the pared to a negative control are identified. In particular, the inactive reporter gene is placed under the control of expres signal detected at step (d) is compared with the signal sion signals allowing its expression. Thus, upon stable trans detected in the same conditions with at least one interfering fection of the cell line with the first vector, the cell line RNA taken as a negative control. The interfering RNA taken expresses the inactive reporter gene which is integrated in its 60 as a negative control corresponds to a iRNA known not to genome. hybridize and thus not to be involved in endonuclease-in This first vector can for example consist of, or be derived duced mutagenesis such as e.g. the “All Star (AS) iRNA from, the pCLS6883 vector of SEQ ID NO: 1, or of the (Qiagen #1027280). For example, if a two-fold increase of the pCLS6884 vector of SEQID NO: 2. signal detected upon transfection with an iRNA targeting a The interfering RNA library used in the frame of this 65 given gene, compared to the signal detected with a negative method is preferably representative of an entire eukaryotic control, said given gene is identified as a gene that modulates transcriptome. In addition, it preferably comprises at least endonuclease-induced mutagenesis in said cell. US 9,044,492 B2 11 12 In a preferred embodiment, the method of the present Indeed, interfering agents that modulate double-strand invention further comprises Supplementary steps of selection. break-induced mutagenesis through their respective effectors In other terms, the interfering RNAs identified at step (f) are can be used directly. For a given interfering agent, it is easily further selected through another Succession of steps (a), (c), understood that other interfering agents derived from said (d) and(t), wherein inactive reporter gene is different from the given interfering agent (equivalent interfering RNAS) can be one previously used. synthetized and used with the same objectives and results. In a most preferred embodiment, steps (a) to (f) the above Interfering agents or derivatives can be used to modulate methods are first carried out using a cell line expressing an double-strand break-induced mutagenesis in a cell by intro inactive luciferase reporter gene. This cell line can for ducing them with at least, one delivery vector containing at example correspond to a cell line obtained through stable 10 least one double-strand break creating agent expression cas transfection of a cell line with pCLS6883 vector of SEQ ID sette. It is easily understood that these interfering agents or NO: 1, or of the pCLS6884 vector of SEQ ID NO: 2 or derivatives can be introduced by all methods known in the art, plasmids derived from those. This cell line is then co-trans as part or not of a vector, unique or not, under the control of an fected with iRNAs and pCLS2690 vector of SEQID NO: 3, inducible promoter or not. Therefore, the effects of these Once genes whose silencing through. RNA interference 15 interfering agents or derivatives in the cell can be permanent increases or decreases the signal detected at step (d) as com or transitory. pared to a negative control are identified, steps (a), (c), (d) and Therefore, the second aspect of the invention pertains to a (f) may then be repeated with iRNAs silencing these genes. method for modulating double-strand break-induced The cell line used at the second selection round may for mutagenesis in a cell, comprising the steps of: example express an inactive GFP reporter gene (due to a (a) identifying an effector that is capable of modulating frameshift mutation after insertion of an endonuclease target double-strand break-induced mutagenesis in a cell by a site), and may e.g. be obtained through stable transfection of method according to the first aspect of the invention; and a cell line with the pCLS inactive GFP-encoding vector (b) introducing into a cell: (pCLS6810 of SEQID NO: 5 or pCLS6663 of SEQID NO: i. at least one interfering agent capable of modulating 6. The pCLS2690 vector of SEQ ID NO: 3 and the pCLS 25 said effector; inactive GFP-encoding vector of SEQID NO: 5 can then be ii. at least one delivery vector comprising at least one used for co-transfection with iERNAs. This second screening double-strand break creating agent; allows confirming that the genes identified at Step (f) are thereby obtaining a cell in which double-strand break-in genes that modulate endonuclease-induced mutagenesis in a duced mutagenesis is modulated. cell. 30 Therefore, in the second aspect of the invention is com In the second screening, the reporter gene used can be a prised a method for increasing double-strand break-induced gene that when active, confers resistance to an antibiotic such mutagenesis in a cell, comprising the steps of: as the neomycin phosphotransferase resistant gene mptl, the (a)identifying a gene that is capable of stimulating double hygromycin phosphotransferase resistant gene hph, the puro strand break-induced mutagenesis in a cell by a method mycin N-acetyltransferase gene pac, the blasticidin S deami 35 according to the first aspect of the invention or providing nase resistant gene bSr and the bleomycin resistant, gene sh a gene selected from the group of genes listed in table I ble, as non-limiting examples. or II; and In this second screening, the reporter gene is preferably a (b) Introducing into a eukaryotic cell: geneallowing an accurate detection of the signal and a precise i.at least one interfering agent, wherein said interfering qualitative and/or quantitative measurement of the endonu 40 agent is a polynucleotide silencing or encoding said clease-induced mutagenesis modulation, such as e.g. the gene, wherein said polynucleotide is an interfering genes encoding the Green Fluorescent Protein (GFP), the Red RNA capable of silencing said gene if the signal Fluorescent Protein (RFP), the Yellow Fluorescent Protein detected at Step (d) of the method according to claim (YFP) and the Cyano Fluorescent Protein (CFP), respec 1 is increased as compared to the negative control, and tively. The reporter gene of the second screening can also be 45 is a cDNA transcribed from said gene if the signal any protein antigen that can be detected using a specific detected at Step (d) of the method according to claim antibody conjugated to a fluorescence-emitting probe or 1 is decreased as compared to the negative control; tagged by Such a fluorescent probe usable in Fluorescent ii. at least one delivery vector comprising at least one Activated Cell Sorting (FACS). For example cell surface double-strand break creating agent; expressing molecule like CD4 can be used as an expression 50 thereby obtaining a eukaryotic cell in which double-strand reporter molecule detectable with a specific anti-CD4 anti break-induced mutagenesis is increased. body conjugated to a fluorescent protein. FACS technology In another embodiment, is a method for increasing double and derivated applications to measure expression of reporter Strand break-induced mutagenesis in a cell comprising the genes are well known in the art. steps of introducing into said cell: As shown in Examples 1 to 4, the above method according 55 i. at least one interfering agent, wherein said interfering to the invention was successfully applied to identify several agent is a polynucleotide silencing at least one gene genes that modulate endonuclease-induced mutagenesis in a selected from the group of genes listed in tables I, II, IV cell. and VII; In a second aspect, the present invention concerns a method ii. at least one delivery vector comprising at least one for modulating double-strand break-induced mutagenesis in a 60 double-strand break creating agent; cell by using interfering agents. The information obtained thereby obtaining a eukaryotic cell in which double-strand when carrying out the above method for identifying effectors break-induced mutagenesis is increased. that modulate double-strand break-induced mutagenesis in a More preferably, the interfering RNA used according to the cell can be used to increase or decrease mutagenesis in cells. present invention for increasing double-strand break-induced Depending on the envisioned application, interfering agents 65 mutagenesis in a cell targets a sequence selected from the that increase or interfering agents that decrease double-strand group consisting of SEQID NO: 13-35, SEQID NO:37-39, break-induced mutagenesis in a cell can be used. SEQID NO: 44-76 and SEQ ID NO: 80-555. More prefer US 9,044,492 B2 13 14 ably, the interfering RNA targets used according to the the method for identifying genes that modulate endonu present invention for increasing double-strand break-induced clease-induced homologous recombination, or a different mutagenesis in a cell targets a sequence selected from the endonuclease. This endonuclease can correspond to any of group consisting of SEQID NO: 106, 15, 16, 20, 33, 45,80, the endonucleases described in the “Definitions' paragraph 83, 85, 89, 96, 97,98, 102, 103,104,108, 109, 110, 111, 113, below. It may for example be a homing endonuclease Such as 114, 115, 118, 121, 122, 126, 127, 128, 135, 137, 138, 139, I-SceI, I-CreI, I-Ceul, I-MsoI, and I-DmoI. It may be a wild 140, 141, 143, 146, 149, 151,153, 162, 163, 167, 168, 174, type or a variant endonuclease. In a preferred embodiment, 175, 177, 178, 180, 181, 184, 185, 186, 187, 188, 189, 193, the endonuclease is a variant I-CreI endonuclease. 195, 196, 198, 201, 203, 204, 215, 221, 222, 223, 225, 226, By increasing double-strand break-induced mutagenesis is 227, 228, 229, 232, 233, 235, 236, 237, 238,239, 243, 244, 10 understood the increase of its efficiency, ie any statistically 247, 249, 250, 251, 252,254, 256, 257, 258, 265, 267, 268, significant increase of double-strand break-induced 269,271, 277,278, 282,283, 285, 299, 308,309, 315, 328, mutagenesis in a cell when compared to an appropriate con 331, 335, 338, 340,353, 367, 368, 385, 399, 416. trol, including for example, at least 5%, 10%, 20%, 30%, In the above methods, “at least one interfering agent” 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 300%, 500% means that only one interfering agent but also more than one 15 or greater increase in the efficiency of a double-strand break interfering agent, can be used. In a preferred embodiment, 2 induced mutagenesis event for a polynucleotide of interest interfering RNAs can be used at the same time in the above (i.e. a transgene). methods; in a most preferred embodiment, 3, 4, 5, 6, 7, 8, 9 or In a preferred embodiment according to the invention, the 10 interfering RNAs can be used at the same time; in another gene that modulates endonuclease-induced mutagenesis is a most preferred embodiment, more than 10 interfering RNAs gene that decreases endonuclease-induced mutagenesis effi can be used. When several interfering RNAs are used in the ciency. In Such a case, an interfering RNA capable of silenc above methods, they can be mixed or not, i.e. introduced into ing said gene, introduced into said considered cell, is able to the cell at the same moment or not. In another embodiment, increase endonuclease-induced mutagenesis. The interfering more than one interfering agent means 2 different interfering RNA may for example be a siRNA, a miRNA or a shRNA. agents as described in the “Definitions' paragraph below; as 25 In an extrachromosomal assay transiently expressing the non-limiting example, one interfering RNA targeting one vectors of the above method of the invention in a eukaryotic gene can be used at the same time than one cDNA overex cell, the inventors have found that the genes listed in table I pressing another gene. As another non-limiting example of herebelow are capable of decreasing endonuclease-induced using different kinds of interfering agents (as described in the mutagenesis, particularly GS engineered meganuclease-in “Definitions' paragraph below), at least one interfering RNA 30 duced mutagenesis (see Example 2). Indeed, siRNAS respec can be used at the same time than at least one Small com tively targeting those genes (sequences listed in Table I) are pound. able to stimulate GS engineered meganuclease-induced In the above methods, the endonuclease encoded by the mutagenesis. Therefore, a gene that is capable of modulating vector comprising at least one endonuclease expression cas endonuclease-induced mutagenesis in a eukaryotic cell can sette may either be the same endonuclease as the one used in be selected from the group of genes listed in Table I below. TABL E I siRNA hits stimulatind GS SC-induced luciferase Sicinal. SEO Mean Gene Gene siRNA D Z. Stimulation targeted ID target sequence NO: Score Sto factor

CSNK1D 1453 CCGGTCTAGGATCGAAATGTT 13 3.14 O. 75 3.4 O

AK2 2O4 CGGCAGAACCCGAGTATCCTA 14 5.51 O2O 5. O8

AKT2 2O8CAAGCGTGGTGAATACATCAA 15 3.. 65 0.23 2.94

CAMK2G 81.8GAGGAAGAGATCTATACCCTA 16 5 O1. O. 23 3. 66

GK2 2712TACGTTAGAAGAGCACTGTAA. 17 3.33 1. 49 2.75

PFKFB4 521. OCAGAAAGTGTCTGGACTTGTA 18 3.92 114 2.18

MAPK12 63 OOCTGGACGTATTCACTCCTGAT 19 384 O. O6 3.22

PRKCE 5581 CCCGACCATGGTAGTGTTCAA 20 4 OO O. 43 2.91

EIF2AK2 561, OCGGAAAGACTTACGTTATTAA 21 4.5O O. 22 3.15

WEE1 74 65CAGGGTAGATTAC CTCGGATA 22 3.2O O. O.8 5. O1

CDK5R1 8851 CCGGAAGGLCACGCTGTTTGA, 23 4. O1 O.26 6. O3

LIG4 3981 CACCGTTTATTTGGACTCGTA. 24. 4.11 O. 41 6.15

AKAP1 8165AGCGCTGAACTTGATTGGGAA 25 4.97 O.32 7.24

MAP3K6 9064TCAGAGGAGCTGAGTAATGAA. 26 5.99 O.22 5. 41

DYRK3 84.44TCGACAGTACGTGGCCCTAAA 27 3.54. O.22 3 : 61 US 9,044,492 B2 15 16 TABLE I- Continued siRNA hits stimulatind GS SC-induced luciferase signal. SEO Mean Gene Gene siRNA ID Z Stimulation targeted ID target sequence NO: Score Sto factor

RPS6KA4 8986 CGCCACCTTCATGGCATTCAA 28 3.56. Of3 ... 61

STK17A 92.63 CACACTCGTGATGTAGTTCAT 29 3.26 O. 43 Of

GNE 10O2 OCCCGATCATGTTTGGCATTAA 3 O 331 O.25 .2O

ERN2 10595 CTGGTTCGGCGGGAAGTTCAA 31, 3.47 147 3O

HUNK 3O811 CACGGGCAAAGTGCCCTGTAA 32 363 13 O 97

SMG1 23 O49 CACCATGGTATTACAGGTTCA 33 3.22 O. 46 OS

WNK4 65266CAGCTTGTTGGGCGTTTCCAA 3 4 5 - 58 Of O 15

MAGI2 98.63 CAGGCCCAACTTGGGATATCA 35 3. Of O. 63 Of

More preferably, the interfering RNA targets used in the SiRNAs targeting XRCC6 (SEQID NO:44), BRCA1 (SEQ frame of the method according to the present invention target ID NO:45), FANCD2 (SEQID NO:39), WRN (SEQID NO: a sequence selected from the group consisting of SEQID NO: 25 37) and MAPK3 (SEQID NO:38) were able to enhance the 13-35, SEQID NO:37-39, SEQID NO: 44-76 and SEQID percentage of mutagenic NHEJ repair as measured by Deep NO: 80-1041. More preferably, the interfering RNA targets Sequencing analysis at the endogenous RAG1 locus (see used in the frame of the method according to the present Table II below) invention target a sequence selected from the group consist ing of SEQID NO: 13-35, SEQID NO:37-39, SEQID NO: 30 TABL E II 44-76 and SEQ ID NO: 80-555. More preferably, the inter siRNA stimulating endonuclease-induced fering RNA targets used in the frame of the method according mutagenesis at RAG1 locus. to the present invention target a sequence selected from the group consisting of SEQID NO: 106, 15, 16, 20, 33, 45,80, SEQ NHEJ 35 Gene Gene siRNA ID Stimulation 83, 85, 89, 96, 97,98, 102, 103,104,108, 109, 110, 111, 113, targeted ID target sequence NO : factor 114, 115, 118, 121, 122, 126, 127, 128, 135, 137, 138, 139, 140, 141, 143, 146, 149, 151,153, 162, 163, 167, 168, 174, XRCC6 2547 ACCGAGGGCGATGAAGAAGCA 44 1.6 175, 177, 178, 180, 181, 184, 185, 186, 187, 188, 189, 193, BRCA1 672 ACCATACAGCTTCATAAATAA 45 2.1 195, 196, 198, 201, 203, 204, 215, 221, 222, 223, 225, 226, 40 227, 228, 229, 232, 233, 235, 236, 237, 238,239, 243, 244, FANCD2 2177 AAGCAGCTCTCTAGCACCGTA 39 2.5 247, 249, 250, 251, 252,254, 256, 257, 258, 265, 267, 268, WRN 7486 CGGATTGTATACGTAACTCCA 37 2.4 269,271, 277,278, 282,283, 285, 299, 308,309, 315, 328, 331, 335, 338, 340,353, 367, 368, 385, 399, 416. MAPK3 5595 CCCGTCTAATATATAAATATA 38 1.9 As shown in example 3, the above method according to the 45 invention was successfully applied to stimulate endonu As also shown in example 3, the screen of a siRNA collec clease-induced mutagenesis in a cellular model stably tion from Qiagen led to the identification of 481 siRNA hits expressing at an endogenous locus (RAG1) the construction that stimulate SC-GS-induced mutagenesis as listed in table that allows to measure GS engineered meganuclease-induced IV (SEQ ID NO: 80-555) and to the identification of 486 mutagenesis. Indeed, siRNAS targeting genes involved in 50 siRNA hits that inhibit SC-GS-induced mutagenesis as listed NHEJ (LIG4: SEQID NO: 24) or in NHEJ and other DNA in tableV (SEQID NO: 556-1041). Interfering RNA capable repair pathway (WRN; SEQ ID NO: 37) or in DNA repair of silencing a given gene can easily be obtained by the skilled (FANCD2, SEQ ID NO: 39) or in DNA repair regulation in the art. Such iERNAs may for example be purchased from a (MAPK3, SEQID NO: 38) were able to increase GS engi provider. Alternatively, commercially available tools allow neered meganuclease-induced luciferase signal. Moreover, 8 55 siRNAs identified with the extrachromosomal assay of designing iRNAS targeting a given gene. example 2, targeting CAMK2G (SEQ ID NO: 16). SMG1 Useful interfering RNAs can be designed with a number of (SEQID NO:33), PRKCE (SEQIDNO:20), CSNK1D (SEQ Software program, e.g., the OligoEngine siRNA design tool ID NO: 13), AK2 (SEQID NO: 14), AKT2 (SEQID NO:15), available at the oligoengine.com world wide website. Data MAPK12 (SEQID NO: 19) and EIF2AK2 (SEQID NO: 21) 60 base RNAi Codex (available at the codex.cshl.edu website) genes and also two siRNAs targeting PRKDC gene publishes available RNAi resources, and provides the most (PRKDC 5, SEQID NO: 75 and PRKDC 8, SEQID NO: complete access to this growing resource. 76) involved in DNA repair regulation were able to increase The iRNAs used in the frame of the present invention can GS engineered meganuclease-induced luciferase signal. As for example be a shRNA. shRNAs can be produced using a shown in example 4, the above method according to the 65 wide variety of well-known RNAi techniques. ShRNAs that invention was successfully applied to stimulate endonu are synthetically produced as well as miRNA that are found, clease-induced mutagenesis at an endogenous locus (RAG1). in nature can for example be redesigned to function as Syn US 9,044,492 B2 17 18 thetic silencing shRNAs. DNA vectors that express perfect RNA of the duplex comprising for example between 17 and complementary shRNAS are commonly used to generate 29 nucleotides, e.g. 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27. functional siRNAs. 28 or 29 nucleotides. iRNAs can be produced by chemical synthesis (e.g. in the Such siRNAs can beformed from two RNA molecules that case of siRNAs) or can be produced by recombinant tech hybridize together or can alternatively be generated from a nologies through an expression vector (e.g. in the case of single RNA molecule that includes a self-hybridizing portion, shRNAs). referred to as shRNAs. The duplex portion of a siRNA can The iRNAs according to the invention may optionally be include one or more impaired and/or mismatched nucleotides chemically modified. in one or both Strand of the duplex (bulges) or can contain one In another preferred embodiment according to the inven 10 or more noncomplementary nucleotides pairs. Duplex of a tion, the gene that modulates endonuclease-induced siRNA is composed of a sense Strand and of an antisense mutagenesis is a gene that increases endonuclease-induced strand. Given a target transcript, only one strand of the siRNA mutagenesis (i.e. the presence of which increases double duplex is Supposed to hybridize with one strand of said target Strand break-induced mutagenesis in a cell). In Such a case, a 15 transcript, in certain embodiments, one strand (either sense, cDNA leading to increased expression of said gene is intro either antisense) is perfectly complementary with a region of duced into said cell. the target transcript, either on the entire length of the consid cDNA usually refers to a double-stranded DNA that is ered siRNA strand (comprised between 17 and 29 nucle derived from mRNA which can be obtained from prokaryotes otides, including 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28. or eukaryotes by reverse transcription. cDNA is a more con and 29 nucleotides), either on only a part of the considered venient way to work with the coding sequence than mRNA siRNA strand, 17 to 29 or 19 to 29 nucleotides matching for because RNA is very easily degraded by omnipresent example, or 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 from 29 RNases. Methods and advantages to work with cDNA are nucleotides. In one embodiment it is intended that the con well known in the art (1989, Molecular cloning: a laboratory sidered strand of the siRNA duplex (either sense, either anti manual. 2" edition and further ones, Cold Spring Harbor 25 sense) hybridizes the target transcript without a single mis Laboratory Press, Cold Spring Harbor, N.Y.). Particularly in match over that length. In another embodiment, one or more the context of the present invention the availability of a cDNA mismatches between the considered strand of the siRNA clone allows the corresponding protein to be expressed in a duplex (either sense, either antisense) can exist. variety of contexts. The cDNA can be inserted into a variety Therefore, an aspect of the invention is drawn to an inter of expression vectors for different purposes. Perhaps the most 30 fering RNA for increasing endonuclease-induced mutagen esis in a cell, wherein said interfering RNA comprises a sense obvious use of such an approach in the present invention is to RNA nucleic acid and an antisense RNA nucleic acid, and drive the expression of a defined protein involved, in a protein wherein said interfering RNA down-regulates the expression transduction cascade to levels that allow higher frequency of (most preferably silences the expression) of gene transcripts endonuclease-induced mutagenesis and So, mutagenesis 35 part of library representative of an entire transcriptome. In a events. As well-known in the art, one can express not only the preferred embodiment, the interfering RNA library used in wild type protein but also mutant proteins, said particular the frame of the present invention can be representative of mutations having consequences in structure-function rela only a part of an eukaryotic transcriptome. Said restricted tionships within a protein itself (improved catalytic activity) interfering RNA library can be representative of certain or for association with another endogenous protein. 40 classes of transcripts, such as those encoding for kinases as a As used herein, the term “cDNA encompasses both full non-limiting example. In a preferred embodiment, said inter length cDNAs naturally transcribed from the gene and bio fering RNA library can be obtained from a provider; as a non logically active fragments thereof. Such as e.g. cDNAS encod limiting-example, said interfering RNA library can be a ing the mature protein encoded by the gene or biologically library purchased from Qiagen and covering 19121 genes active fragments thereof. The biologically active fragments 45 with two different siRNAs per gene. In a preferred embodi thereof can for example code for maturation products of the ment of this aspect of the invention, interfering RNA targets a protein encoded by the gene. gene selected from this library. In a most preferred embodi In a third aspect, the present invention concerns specific ment of this aspect of the invention, interfering RNA targets a interfering agents, their derivatives such as polynucleotides gene selected from the group of genes listed in Table I, II, IV derivatives or other molecules as non-limiting examples. In 50 and Table VII. More preferably, the interfering RNA accord this aspect, the present invention concerns specific interfering ing to the invention targets a sequence selected from the group agents for modulating double-(strand break-induced consisting of SEQID NO: 13-35, SEQID NO:37-39, SEQID mutagenesis in a cell, wherein said interfering agents modu NO: 44-76 and SEQ ID NO: 80-1041. More preferably, the late effectors representative of an entire eukaryotic transcrip interfering RNA targets used in the frame of the method tome. In a preferred embodiment, said interfering agents 55 according to the present invention target a sequence selected modulate effectors which are part of a restricted library rep from the group consisting of SEQIDNO: 13-35, SEQID NO: resentative of certain classes of effectors. In a most preferred 37-39, SEQ ID NO: 44-76 and SEQ ID NO: 80-555. More embodiment, said interfering agents modulate effectors from preferably, the interfering RNA targets used in the frame of the group listed in Table I and Table II. In a preferred embodi the method according to the present invention target a ment of this third aspect, the present invention concerns spe 60 sequence selected from the group consisting of SEQID NO: cific polynucleotide derivatives identified for effector genes, 106, 15, 16, 20, 33, 45,80, 83, 85, 89, 96, 97,98, 102, 103, which increase endonuclease-induced mutagenesis. 104, 108, 109, 110, 111, 113, 114, 115, 118, 121, 122, 126, In a preferred embodiment of this aspect of the invention, 127, 128, 135, 137, 138, 139, 140, 141, 143, 146, 149, 151, these polynucleotide derivatives are interfering RNAs, more 153, 162, 163, 167, 168, 174, 175, 177, 178, 180, 181, 184, preferably siRNAs or shRNAs. 65 185, 186, 187, 188, 189, 193, 195, 196, 198, 201, 203, 204, As indicated in the definitions hereabove, the siRNAs 215, 221, 222, 223, 225, 226, 227, 228, 229, 232, 233, 235, according to the invention are double-stranded RNAs, each 236, 237,238,239, 243, 244, 247, 249, 250, 251, 252, 254, US 9,044,492 B2 19 20 256, 257, 258, 265, 267, 268,269,271, 277,278, 282,283, fragment of at least 17 consecutive nucleotides complemen 285, 299, 308,309, 315, 328,331, 335, 338, 340, 353, 367, tary of the respective mRNA sequences of the genes listed in 368, 385, 399, 416. Tables I, II, IV, V and VII may for example include 17, 18, 19. In other terms, one strand of this iRNA (either sense, either 20, 21, 22, 23, 24, 25, 26, 27, 28, and 29 consecutive nucle antisense) comprises a sequence hybridizing to a sequence 5 otides complementary of this sequence. selected from the group consisting of SEQ ID NO: 13-35, The iRNAs down-regulating the expression of a given gene SEQ ID NO: 37-39, SEQ ID NO: 44-76 and SEQ ID NO: listed in Tables I and II may correspond to a different 80-1041, more preferably from the group consisting of SEQ sequence targeting the same given genes listed in Table III ID NO: 13-35, SEQ ID NO:37-39, SEQID NO: 44-76 and below. SEQID NO: 80-555, again more preferably from the group 10 consisting of SEQID NO: 106, 15, 16, 20,33, 45,80, 83, 85, TABLE III 89, 96, 97,98, 102, 103, 104, 108, 109, 110, 111, 113, 114, 115, 118, 121, 122, 126, 127, 128, 135, 137, 138, 139, 140, Other siRNAs target sequences for 141, 143, 146, 149, 151, 153, 162, 163, 167, 168, 174, 175, targeted denies of Tables I and II 177, 178, 180, 181, 184, 185, 186, 187, 188, 189, 193, 195, 15 SEQ 196, 198, 201, 203, 204, 215, 221, 222, 223, 225, 226, 227, Gene Gene siRNA ID 228, 229, 232, 233, 235, 236, 237,238, 239, 243, 244, 247, targeted ID target sequence NO: 249, 250, 251, 252, 254, 256, 257, 258, 265,267, 268, 269, CSNK1D 1453 CTCCCTGACGATTCCACTGTA 47 271, 277,278, 282, 283, 285,299, 308, 309, 315, 328,331, 335, 338, 340,353, 367, 368, 385, 399, 416 with or without 20 AK2 2O4. CTGCAAGCCTACCACACTCAA 48 mismatch. Preferably, there is no mismatch, meaning that one strand of this iRNA (either sense, either antisense) comprises AKT2 2O8 ACGGGCTAAAGTGACCATGAA 49 or consists of the RNA sequence corresponding to a DNA CAMK2G 818 CCGATGAGAAACCTCGTGTTA SO sequence selected from the group consisting of SEQID NO: 13-35, SEQID NO:37-39, SEQID NO: 44-76 and SEQID 25 GK2 2712 CTCGGGTGTGCCATAATAATA 51 NO: 80-1041. In the iRNAs according to the invention, the sense RNA PFKFB4 5210 ACGGAGAGCGACCATCTTTAA 52 nucleic acid may for example have a length comprised MAPK12 63 OO TGGAAGCGTGTTACTTACAAA 53 between 19 and 29. In the frame of the present invention, the interfering RNA 30 PRKCE 5581 CACGGAAACACCCGTACCTTA 54 according to the invention may further comprising a hairpin EIF2AK2 561O TACATAGGCCTTATCAATAGA 55 sequence, wherein the sense RNA nucleic acid and the anti sense RNA nucleic acid are covalently linked by the hairpin WEE1 7.465 ACAATTACGAATAGAATTGAA 56 sequence to produce a shRNA molecule. In a preferred embodiment according to the invention, the 35 CDK5R1 885.1 TGAGCTGGTTTGACTCATTAA st interfering RNA according to the invention as defined here LIG4 3981 ATCTGGTAAGCTCGCATCTAA 58 above down-regulates the expression (most preferably silences the expression) of the genes listed in Table I, Table II, AKAP1 81.65 CACGCAGAGATGACAGTACAA 59 Table IV. Table V and Table VII. Indeed, as respectively MAP3K6 9064 CACCATCCAAATGCTGTTGAA 60 shown in examples 2 and 4, introducing Such an iRNA 40 selected from the group consisting of SEQ ID NO: 13-35, DYRK3 84.44 AGCCAATAAGCTTAAAGCTAA 61 SEQ ID NO: 37-39, SEQ ID NO: 44-46 and SEQ ID NO: 75-76 in a cell leads to approximately a 2 to 7 fold increase of RPS6KA4 8986 CAGGCTGTGCCTTTGACTTTA 62 the endonuclease-induced mutagenesis signal of an extrach STK17A 92.63 TCCATTGTAACCGAAGAGTTA 63 romosomal reporter assay in this cell and to a 1.6 to 2.5 45 increase of the endonuclease-induced mutagenesis events at GNE 1 OO2O ATGGAAATACATATCGAATGA 64 an endogenous locus of this cell. Other results and fold ERN2 10595 AAGGATGAAACTGGCTTCTAT 65 increase are shown in example 3 and 5 for iRNA listed in Tables IV, V and VII. HUNK 3O811 TCGGACCAAGATCAAACCAAA. 66 In a preferred embodiment, these iRNA down-regulating 50 the expression of their respective targeted genes comprise a SMG1 23O49 ATCGATGTTGCCAGACTACTA 67 sense RNA nucleic acid consisting of a sequence at least 70%, WNK4 65266 CAGGAGGAGCCAGCACCATTA. 68 75%, 80%, 85%, 90%, 95%, 99% or 100% identical to a fragment of at least 17 consecutive nucleotides of the respec MAGI2 98.63 ATGGACCGATGGGAGAATCAA 69 tive mRNA sequences of the genes listed in Tables I, II, IV, V 55 XRCC6 2547 TTTGTACTATATACTGTTAAA 70 and VII. These fragments of at least 17 consecutive nucle otides of the respective mRNA sequences of the genes listed BRCA1 672 AACCTATCGGAAGAAGGCAAG 71. in Tables I and II may for example include 17, 18, 19, 20, 21, FANCD2 2177 CAGAGTTTGCTTCACTCTCTA 72 22, 23, 24, 25, 26, 27, 28, and 29 consecutive nucleotides of the respective mRNA sequences of the genes listed in Tables 60 WRN 7486 TCCGCTGTAGCAATTGGAGTA 73 I, II, IV, V and VII. The antisense RNA nucleic acid of such an iRNA above MAPK3 5595 TGGACCGGATGTTAACCTTTA 74 from the mRNA sequence of a given gene listed in Tables I, II, IV and V may as a non-limiting example consist of a sequence The iRNA clown-regulating the expression of the MAPK3 at least 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100% 65 gene (Gene ID 5595) may for example target a sequence identical to a fragment complementary to at least 17 consecu consisting of SEQID NO:38 and/or SEQID NO: 74. In other tive nucleotides of the considered mRNA sequence. This terms, one strand of this iRNA (either sense, either antisense) US 9,044,492 B2 21 22 comprises a sequence hybridizing to a sequence selected sisting of an inducible promoter, a tissue specific promoter from the group consisting of SEQID NO:38 and/or SEQID and a RNA polymerase III promoter. NO: 74, with or without mismatch. Preferably, there is no In a fourth main aspect of the present invention, is encom mismatch, meaning that one strand of this iRNA (either sense, passed cells in which double-strand break-induced mutagen either antisense) comprises or consists of the RNA sequence esis is modulated. It refers, as non-limiting example, to an corresponding to a DNA sequence selected from the group isolated cell, obtained and/or obtainable by the method consisting of SEQ ID NO:38 and/or SEQ ID NO: 74. In a according to the present invention. Cells in which double preferred embodiment, two interfering RNAs can be used at Strand break-induced mutagenesis is increased are useful for the same time in the methods of the present invention; in a genome engineering, including therapeutic applications and preferred embodiment, these iRNAs are siRNAs; in a most 10 cell line engineering. preferred embodiment, combinations of two siRNAs used at The invention therefore relates to an isolated cell obtained the same time in the methods of the present invention encom and/or obtainable by the methods according to the invention pass siRNAs targeting CAMK2G (SEQID NO: 16), SMG1 as defined in the above paragraphs. As shown in example 3, a (SEQID NO:33), PRKCE (SEQID NO:20), FANCD2 (SEQ cellular model has been established which stably expresses at ID NO:39) and LIG4 (SEQ ID NO: 24) genes. In another 15 an endogenous locus (RAG1) the construction that allows to most preferred embodiment, the combinations of two siRNAs measure GS engineered meganuclease-induced mutagenesis. that are used in the present invention are selected from the Moreover, in this cell line, different siRNAs were shown to group consisting of CAMK2G+SMG1, CAMK2G+PRKCE, increase GS engineered meganuclease-induced mutagenesis CAMK2G+FANCD2, CAMK2G+LIG4, SMG1+PRKCE, via a reporter signal. According to this fourth aspect of the SMG1+FANCD2, SMG1+LIG4, PRKCE+FANCD2, invention, a cell in which endonuclease-induced mutagenesis PRKCE+LIG4, FANCD2+LIG4. In another preferred is increased can be directly or indirectly be derived from this embodiment, several combinations of two siRNAs, i.e. 2, 3, 4, cellular model. 5, 6, 7, 8, 9 or 10 combinations of two interfering RNAs can The invention further relates to a cell, wherein said cell is be used at the same time; in another most preferred embodi stably transformed with at least one interfering RNA, viral ment, more than 10 combinations of two interfering RNAs 25 vector, isolated DNA polynucleotide or plasmidic vector as can be used. When several combinations of two interfering described in the previous paragraphs. RNAs are used in the above methods, they can be mixed or The eukaryotic cell can be any type of cell Such as e.g. a not, i.e. introduced into the cell at the same moment or not. In CHO cell (for example a CHO-K1 or a CHO-S cell), a another embodiment, one combination of two interfering HEK293 cell, a Caco2 cell, an U2-OS cell, a NIH 3T3 cell, a RNA can be used at the same time than one cDNA overex 30 NSO cell, a SP2 cell, and a DG44 cell. pressing another gene. In another embodiment, one combi In a preferred embodiment, the cell is a cell suitable for nation of two interfering RNA can be used at the same time production of recombinant proteins. than at least one Small compound. Said cell is preferably an immortalized and/or a trans The invention further pertains to viral vector for producing formed cell, although primary cells are contemplated by the the interfering RNA according to the invention, wherein said 35 present invention, in particular in the frame of gene therapy. viral vector comprises a polynucleotide sequence encoding In a fifth main aspect, the present invention also relates to the sense RNA nucleic acid of said interfering RNA and a compositions and kits comprising the interfering agents, polynucleotide sequence encoding the antisense RNA polynucleotides derivatives, Vectors and cells according to nucleic acid of said interfering RNA. the present invention. In Such vectors, the polynucleotide sequence encoding the 40 The invention further pertains to compositions and kits sense RNA nucleic acid may under the control of a first comprising the iRNAs, DNA polynucleotides, cDNAs, vec promoter, and the polynucleotide sequence encoding the anti tors and cells according to the invention described hereabove. sense RNA nucleic acid may be under the control of a second In this aspect of the invention, the present invention con promoter. These promoters may for example be selected from cerns a composition for modulating double-strand break-in the group consisting of an inducible promoter, a tissue spe 45 duced mutagenesis in a cell, wherein said composition com cific promoter and a RNA polymerase III promoter. prises at least an interfering agent that modulate an effector Alternatively, when the sense and the antisense nucleic from a group of effectors representative of an entire eukary acids are covalently linked by a hairpin sequence to produce otic transcriptome. In a preferred embodiment, said interfer a shRNA molecule, they are under the control of a single ing agent modulates an effector which is part of a restricted promoter. 50 library representative of certain classes of effectors. In a most Another aspect of the invention is drawn to an isolated preferred embodiment, said interfering agent modulates an DNA polynucleotide coding for the interfering RNA accord effector from the group listed in Table I, Table II, Table IV. ing to the invention, wherein said DNA polynucleotide com Table V and Table VII. prises a polynucleotide sequence encoding the sense RNA In a preferred embodiment of this aspect of the invention, nucleic acid of said interfering RNA and a polynucleotide 55 the invention pertains to a composition for increasing sequence encoding the antisense RNA nucleic acid of said mutagenesis and/or endonuclease-induced mutagenesis in a interfering RNA. In such a DNA polynucleotide, the sense cell comprising at least one interfering RNA, viral vector, and the antisense nucleic acids may be covalently linked by a isolated DNA polynucleotide or plasmidic vector as defined hairpin sequence to produce a shRNA molecule upon tran in the above paragraphs, and/or an isolated cell as defined in Scription. 60 the above paragraphs. Still another aspect of the invention relates to a plasmidic The composition preferably further comprises a carrier. vector comprising the DNA polynucleotide according to the The carrier can for example be a buffer, such as e.g. a buffer invention. allowing storage of the iRNAs, DNA polynucleotides, vec Such a plasmidic vector preferably comprises a promoter, tors and cells according to the invention, or a pharmaceuti wherein the polynucleotide sequence encoding the sense 65 cally acceptable carrier. RNA nucleic acid is under control of said promoter. Said In another aspect of the invention, the present invention promoter may for example be selected from the group con concerns a kit for modulating double-strand break-induced US 9,044,492 B2 23 24 mutagenesis in a cell, wherein said composition comprises at A preferred embodiment of the invention is drawn to an least an interfering agent that modulate an effector from a interfering agent or an interfering RNA according to the group of effectors representative of an entire eukaryotic tran invention for use as an adjuvant in the treatment of a genetic Scriptome. In a preferred embodiment, said interfering agent disease by gene therapy. For purposes of therapy, an interfer modulate an effector which are part of a restricted library 5 ing agent or an interfering RNA according to the invention representative of certain classes of effectors. In a most pre can be administered with a DSB-creating agent with a phar ferred embodiment, said interfering agent modulate an effec maceutically acceptable excipient in a therapeutically effec tor from the group listed in Table I, II, IV, V and VII. tive amount. Such a combination is said to be administered in In a preferred embodiment of this aspect of the invention, a “therapeutically effective amount' if the amount adminis the invention also pertains to a kit for increasing mutagenesis 10 tered, is physiologically significant. An agent is physiologi and/or endonuclease-induced mutagenesis in a cell, wherein cally significant if its presence results in a detectable change said kit comprises at least one interfering RNA, viral vector, in the physiology of the recipient. In the present context, an isolated DNA polynucleotide or plasmidic vector as defined agent, is physiologically significant if its presence results in a in the above paragraphs, and/or an isolated eukaryotic cell as 15 decrease in the severity of one or more symptoms of the defined in the above paragraphs. targeted disease and in a genome correction of the lesion or The kit may further comprise instructions for use in abnormality. (See Current Protocols in Human Genetics: increasing mutagenesis efficiency and/or for use in increasing Chapter 12 “Vectors For Gene Therapy & Chapter 13 endonuclease-induced mutagenesis. “Delivery Systems for Gene Therapy'). In other words, the In a sixth main aspect, the present invention concerns the term adjuvant refers to a compound administered in addition uses of specific interfering agents for modulating double to the active principle aiming at treating the patient, said Strand break-induced mutagenesis in a cell, wherein said adjuvant increasing the efficiency of the treatment. In a pre interfering agent modulates an effector from a group of effec ferred embodiment, said interfering agent according to the tors representative of an entire eukaryotic transcriptome. In a invention can be administered at the same time than a DSB preferred embodiment, said interfering agent modulates an 25 creating agent. In another preferred embodiment, said inter effector which is part of a restricted library representative of fering agent according to the invention can be administered certain classes of effectors. In a most preferred embodiment, before a DSB-creating agent in another preferred embodi said interfering agent modulates an effector from the group ment, said, interfering agent, according to the invention can listed in Tables I, II, IV, V and VII. In a preferred embodiment be administered after a DSB-creating agent. of this sixth aspect, the present invention concerns the uses of 30 Gene therapy is a technique for the treatment of genetic specific polynucleotide derivatives identified for effector disorders in man whereby the absent or faulty gene is replaced genes, which increase double-strandbreak-induced mutagen by a working gene, so that the body can make the correct esis efficiency. enzyme or protein and consequently eliminate the root cause Indeed, the polynucleotides derivatives according to the of the disease. invention, which include the iRNAs, DNA polynucleotides, 35 In the present case, the interfering agent such as but non cDNAs and vectors described hereabove, can be used to limited to an interfering RNA modulates the endonuclease increase mutagenesis in a cell and/or to increase double induced mutagenesis to increase the efficiency of the treat Strand break-induced mutagenesis in a cell. ment by gene therapy. Therefore, an aspect, of the invention is directed to an in Examples of genetic disorders that can be treated by gene vitro or ex vivo use of at least one interfering agent, Such as 40 therapy include but are not limited to the Lesch-Nyhan syn but non-limited to interfering RNA, DNA polynucleotide, drome, retinoblastoma, thalassaemia, the sickle cell disease, viral vector or plasmidic vector as defined in the above para adenosine deaminase-deficiency, severe combined immune graphs for increasing mutagenesis in a cell and/or endonu deficiency (SCID), Huntington's disease, adrenoleukodys clease-induced mutagenesis in a cell, tissue or organ. trophy, the Angelman syndrome, the Canavan disease, the Modulating double-strand break-induced mutagenesis is 45 Celiac disease, the Charcot-Marie-Tooth disease, colorblind also useful in animal models, for which it is often desired to ness, Cystic fibrosis, the Down syndrome, Duchenne muscu construct knock-in or knock-out animals, as a non limiting lar dystrophy, Haemophilia, the Klinefelter's syndrome, example. Neurofibromatosis, Phenylketonuria, the Prader-Willi syn Therefore, the invention relates to the use of specific inter drome, the Sickle-cell disease, the Tay-Sachs disease and the fering agents for modulating double-strand break-induced 50 Turner syndrome. mutagenesis in a non-human model, wherein said interfering As non-limiting example, an interfering agent according to agent modulates an effector from a group of effectors repre the invention can be used to modulate the endonuclease sentative of an entire eukaryotic transcriptome. In a preferred induced mutagenesis in the treatment of a genetic disorder embodiment, said interfering agent modulates an effector where a dominant non functional allele is targeted by at least which is part of a restricted library representative of certain 55 one DSB-creating agent to knock Such dominant non func classes of effectors. In a most preferred embodiment, said tional allele; in this case, an interfering agent according to the interfering agent modulate an effector from the group listed in invention is used to increase the endonuclease-induced Tables I, II, IV, V and VII. The invention also relates to the use mutagenesis in the treatment of a genetic disorder. of an interfering RNA according to the invention for increas As another non-limiting example, an interfering agent ing mutagenesis efficiency and/or endonuclease-induced 60 according to the invention can be used to modulate the endo mutagenesis in a non-human animal model. The animal mod nuclease-induced mutagenesis in the treatment of a genetic els thus obtained are also part of the invention. disorder where an absent or faulty gene is targeted by at least It is further desirable to modulate double-strand break one DSB-creating agent and replaced by a working gene via induced mutagenesis or endonuclease-induced mutagenesis gene targeting for example; in this case, an interfering agent in the frame of treatments of subjects by therapy. 65 according to the invention is used to decrease the endonu Therefore, the invention further pertains to an interfering clease-induced mutagenesis in the treatment of a genetic dis agent according to the invention for use as a medicament. order. US 9,044,492 B2 25 26 An interfering agent according to the present invention ognizing a specific target sequence, thereby targeting the may also be used in cancer therapy. A way to improve cancer cleavage activity to a specific sequence. Chemical endonu cells killing can be to increase their mutagenesis rate using an cleases also encompass synthetic nucleases like conjugates of interfering agent according to the invention either in associa orthophenanthroline, a DNA cleaving molecule, and triplex tion with radiotherapy, as a non-limiting example, either by forming oligonucleotides (TFOs), known to bind specific increasing endonuclease-induced mutagenesis according to DNA sequences (Kalish and Glazer 2005). Such chemical the invention. As known in the art, radiotherapy is also called endonucleases are comprised in the term “endonuclease' radiation therapy. This approach allows the treatment of can according to the present invention. Rare-cutting endonu cers and other diseases with ionizing radiation that injures or cleases can also be for example TALENs, a new class of destroy cancer cells in the area being treated by damaging 10 chimeric nucleases using a FokI catalytic domain and a DNA their genetic material. The approach according to the present binding domain derived from Transcription Activator Like inventionallows to improve Such radio therapeutic treatments Effector (TALE), a family of proteins used in the infection by increasing the mutagenesis rate in the cells of the treated process by plant pathogens of the Xanthomonas genus (Boch, area, either by adding in the treated cells an interfering agent Scholze et al. 2009; Moscou and Bogdanove 2009; Christian, according to the invention and/or targeting a gene with a 15 Cermak et al., 2010; Li, Huang et al. 2010). specific endonuclease, thereby obtaining cancer cells with Rare-cutting endonuclease can be a homing endonuclease, increased rate of mutagenesis and increased rate of mortality. also known under the name of meganuclease. Such homing In a parallel approach, an interfering agent according to the endonucleases are well-known to the art (Stoddard 2005). present invention may also be used to improve cancer treat Homing endonucleases recognize a DNA target sequence and ment by chemiotherapy. generate a single- or double-strand break. Homing endonu Definitions cleases are highly specific, recognizing DNA target sites The terms “effector” and “effectors' refer to any cellular ranging from 12 to 45 base pairs (bp) in length, usually target, from nucleic or protein origin that can be targeted to ranging from 14 to 40 bp in length. The homing endonuclease directly or indirectly modulate double-strand break-induced according to the invention may for example correspond to a mutagenesis; it encompasses any molecule that binds to 25 LAGLIDADG endonuclease, to a HNHendonuclease, or to a nucleic acid to modulate gene transcription or protein trans GIY-YIG endonuclease. lation, any molecule that bind to another protein to alter or In the wild, meganucleases are essentially represented by modify at least one property of that protein, such as its activ homing endonucleases. Homing Endonucleases (HES) are a ity, or any gene or gene products that could play a role directly widespread family of natural meganucleases including hun or indirectly in the process of double-strand break-induced 30 dreds of proteins families (Chevalier and Stoddard 2001). mutagenesis. These proteins are encoded by mobile genetic elements The term “interfering agent” or “interfering agents” refer which propagate by a process called "homing’: the endonu to any molecule and compound, likely to interact with effec clease cleaves a cognate allele from which the mobile element tors. It encompasses Small chemicals, Small molecules, or is absent, thereby stimulating a homologous recombination Small compounds, composite chemicals or molecules, from 35 event that duplicates the mobile DNA into the recipient locus. synthetic or natural origin, encompassing amino acids or Given their exceptional cleavage properties in terms of effi nucleic acid derivatives, synthons, Active Pharmaceutical cacy and specificity, they could represent ideal scaffolds to Ingredients, any chemical of industrial interest, used in the derive novel, highly specific endonucleases. HES belong to manufacturing of drugs, industrial chemicals or agricultural four major families. The LAGLIDADG family, named after a products. These interfering agents are part or not of molecular 40 conserved peptidic motif involved in the catalytic center, is libraries dedicated to particular screening, commercially the most widespread and the best characterized group. Seven available or not. These interfering agents encompass poly structures are now available. Whereas most proteins from this nucleotides derivatives as a non limiting example. family are monomeric and display two LAGLIDADG motifs, The term “endonuclease' refers to any wild-type or variant a few have only one motif, and thus dimerize to cleave pal enzyme capable of catalyzing the hydrolysis (cleavage) of 45 indromic or pseudo-palindromic target sequences. bonds between nucleic acids within of a DNA or RNA mol Although the LAGLIDADG peptide is the only conserved ecule, preferably a DNA molecule. Endonucleases do not region among members of the family, these proteins share a cleave the DNA or RNA molecule irrespective of its very similar architecture. The catalytic core is flanked by two sequence, but recognize and cleave the DNA or RNA mol DNA-binding domains with a perfect two-fold symmetry for ecule at specific polynucleotide sequences, further referred to 50 homodimers such as I-CreI (Chevalier, Monnat et al. 2001), as “target sequences' or “target sites’. Endonucleases can be I-MsoI (Chevalier, Turmel et al. 2003) and I-CreI (Spiegel, classified as rare-cutting endonucleases when having typi Chevalier et al., 2006) and with a pseudo symmetry for mono cally a polynucleotide recognition site of about 12-45 base mers such as I-Sce (Moure, Gimble et al. 2003), I-DmoI pairs (bp) in length, more preferably of 14-45 bp. Rare-cut (Silva, Dalgaard et al. 1999) or I-Anil (Bolduc, Spiegel et al. ting endonucleases significantly increase HR by inducing 55 2003). Both monomers and both domains (for monomeric DNA double-strand breaks (DSBs) at a defined locus (Rouet, proteins) contribute to the catalytic core, organized around Srnih et al. 1994; Rouet, Smih et al. 1994; Choulika, Perrinet divalent cations. Just above the catalytic core, the two LAGL al. 1995; Pingoud and Silva 2007). Rare-cutting endonu IDADG peptides also play an essential role in the dimeriza cleases can for example be a homing endonuclease (Paques tion interface. DNA binding depends on two typical saddle and Duchateau 2007), a chimeric Zinc-Finger nuclease 60 shaped CBBC.Bf3C, folds, sitting on the DNA major groove. (ZFN) resulting from the fusion of engineered zinc-finger Other domains can be found, for example in inteins such as domains with the catalytic domain of a restriction enzyme PI-PfuI (Ichiyanagi, Ishino et al. 2000) and PI-Scel (Moure, such as FokI (Porteus and Carroll 2005) or a chemical endo Gimble et al. 2002), whose protein splicing domain is also nuclease (Eisenschmidt, Lanio et al. 2005; Arimondo, Tho involved in DNA binding. mas et al. 2006: Simon, Cannata et al. 2008). In chemical 65 The making of functional chimeric meganucleases, by fus endonucleases, a chemical or peptidic cleaver is conjugated ing the N-terminal I-DmoI domain with an I-CreI monomer either to a polymer of nucleic acids or to another DNA rec (Chevalier, Kortemme et al. 2002; Epinat, Arnould et al. US 9,044,492 B2 27 28 2003); International PCT Application WO 03/078619 (Cel cleave. Then, the combination of such “half-meganucleases” lectis) and WO 2004/031346 (Fred Hutchinson Cancer can result in a heterodimeric species cleaving the target of Research Center, Stoddard etal)) have demonstrated the plas interest. The assembly of four sets of mutations into het ticity of LAGLIDADG proteins. erodimeric endonucleases cleaving a model target sequence Different groups have also used a semi-rational approach 5 or a sequence from different genes has been described in the to locally alter the specificity of the I-CreI (Seligman, following Cellectis International patent applications: XPC Stephens et al. 1997: Sussman, Chadsey et al. 2004); Inter gene (WO2007/093918), RAG gene (WO2008/010093), national PCT Applications WO 2006/097784, WO 2006/ HPRT gene (WO200S/0593.82), beta-2 microglobulin gene 097853, WO 2007/060495 and WO 2007/049156 (Cellectis); (WO2008/102274), Rosa26 gene (WO2008/152523), (Arnould, Chames et al. 2006; Rosen, Morrison et al. 2006; 10 Smith, Grizot et al. 2006), I-Sce (Doyon, Pattanayak et al. Human hemoglobin beta gene (WO2009/13622) and Human 2006), PI-Scel (Gimble, Moureet al. 2003) and I-MsoI (Ash interleukin-2 receptor gamma chain gene (WO20090196.14). worth, Havranek et al. 2006). These variants can be used to cleave genuine chromosomal In addition, hundreds of I-CreI derivatives with locally sequences and have paved the way for novel perspectives in altered specificity were engineered by combining the semi 15 several fields, including gene therapy. rational approach and High Throughput Screening: Examples of such endonuclease include I-Sce I, I-Chu I, Residues Q44, R68 and R70 or Q44, R68, D75 and I77 of I-Cre I, I-CSmI, PI-Sce I, PI-Tli I, PI-Mtu I, I-Ceu I, I-Sce II, I-CreI were mutagenized and a collection of variants I-Sce III, HO, PI-Civ I, PI-Ctr I, PI-Aae I, PI-Bsu I, PI-Dha I, with altered specificity at positions +3 to 5 of the DNA PI-Dra I, PI-Mav I, PI-Mch I, PI-Mfu I, PI-Mfl I, PI-Mga I, target (5NNN DNA target) were identified by screening PI-Mgo I, PI-Min I, PI-Mka I, PI-Mle I, PI-Mma I, PI-Msh I, (international PCT Applications WO 2006/097784 and PI-Msm I, PI-Mth I, PI-Mtu I, PI-Mxe I, PI-Npu I, PI-Pu I, WO 2006/097853 (Cellectis); (Arnould, Chames et al. PI-Rma I, PI-Spb I, PI-Ssp I, PI-Fac I, PI-Mja I, PI-Pho I, 2006; Smith, Grizot et al. 2006). PI-Tag I, PI-Thy I, PI-Tko I, PI-Tsp I, I-MsoI. Residues K28, N30 and Q38 or N30,Y33 and Q38 or K28, A homing endonuclease can be a LAGLIDADG endonu Y33, Q38 and S40 of I-CreI were mutagenized and a 25 clease such as I-Sce, I-Cre, I-Ceul, I-MsoI, and I-DmoI. collection of variants with altered specificity at positions Said LAGLIDADG endonuclease can be I-Sce I, a member +8 to 10 of the DNA target (10NNN DNA target) were of the family that contains two LAGLIDADG motifs and identified by screening (Arnould, Chames et al. 2006; functions as a monomer, its molecular mass being approxi Smith, Grizot et al., 2006); International PCT Applica mately twice the mass of other family members like I-CreI tions WO 2007/060495 and WO 2007/049156 (Cellec 30 which contains only one LAGLIDADG motif and functions tis)). as homodimers. Two different variants were combined and assembled in a Endonucleases mentioned in the present application functional heterodimeric endonuclease able to cleave a chi encompass both wild-type (naturally-occurring) and variant meric target resulting from the fusion of two different halves endonucleases. Endonucleases according to the invention can of each variant DNA target sequence ((Arnould, Chames etal. 35 be a “variant endonuclease, i.e. an endonuclease that does 2006; Smith, Grizot et al. 2006); International PCT Applica not naturally exist in nature and that is obtained by genetic tions WO 2006/097854 and WO 2007/034262). engineering or by random mutagenesis, i.e. an engineered Furthermore, residues 28 to 40 and 44 to 77 of I-Cre were endonuclease. This variant endonuclease can for example be shown to form two partially separable functional subdo obtained by substitution of at least one residue in the amino mains, able to bind distinct parts of a homing endonuclease 40 acid sequence of a wild-type, naturally-occurring, endonu target half-site (Smith, Grizot et al. 2006); International PCT clease with a different amino acid. Said substitutions) can for Applications WO 2007/049095 and WO 2007/057781 (Cel example be introduced by site-directed mutagenesis and/or lectis). by random mutagenesis. In the frame of the present invention, The combination of mutations from the two subdomains of Such variant endonucleases remain functional, i.e. they retain I-CreI within, the same monomer allowed the design of novel 45 the capacity of recognizing and specifically cleaving a target chimeric molecules (homodimers) able to cleave a palindro sequence to initiate gene targeting process. mic combined DNA target sequence comprising the nucle The variant endonuclease according to the invention otides at positions +3 to 5 and +8 to 10 which are bound by cleaves a target sequence that is different from the target each subdomain (Smith, Grizot et al. 2006); International sequence of the corresponding wild-type endonuclease. PCT Applications WO 2007/049095 and WO 2007/057781 50 Methods for obtaining such variant endonucleases with novel (Cellectis). specificities are well-known in the art. The method for producing meganuclease variants and the Endonucleases variants may be homodimers (meganu assays based on cleavage-induced recombination in mammal clease comprising two identical monomers) or heterodimers or yeast cells, which are used for screening variants with (meganuclease comprising two non-identical monomers). It altered specificity are described in the International PCT 55 is understood that the scope of the present invention also Application WO 2004/067736; (Epinat, Arnould et al. 2003: encompasses endonuclease variants per se, including het Chames, Epinat et al. 2005; Arnould, Chames et al. 2006). erodimers (WO2006097854), obligate heterodimers These assays result in a functional Lac Z reporter gene which (WO2008093249) and single chain meganucleases can be monitored by standard methods. (WO0307861.9 and WO2009095793) as non limiting The combination of the two former steps allows a larger 60 examples, able to cleave one target of interest in a polynucle combinatorial approach, involving four different Subdo otide sequence or in a genome. The invention also encom mains. The different subdomains can be modified separately passes hybrid variant perse composed of two monomers from and combined to obtain an entirely redesigned meganuclease different origins (WO03078619). variant (heterodimer or single-chain molecule) with chosen Endonucleases with novel specificities can be used in the specificity. In a first step, couples of novel meganucleases are 65 method according to the present invention for gene targeting combined in new molecules (“half-meganucleases') cleaving and thereby integrating a transgene of interest into a genome palindromic targets derived from the target one wants to at a predetermined location. US 9,044,492 B2 29 30 Endonucleases according to the invention or rare-cutting capable of emitting a relevant detectable signal upon trans endonucleases according to the invention can be mentioned fection in a cell. “RNA interference” refers to a sequence or defined as one double-strand break creating agent amongst specific post transcriptional gene silencing mechanism trig other double-strand break creating agents well-known in the gered by dsRNA, during which process the target RNA is art. Double-strand break creating agent means any agent or 5 degraded. RNA degradation occurs in a sequence-specific chemical or molecule able to create DNA (or double-stranded manner rather than by a sequence-independent dsRNA nucleic acids) double-strand breaks (DSBs). As previously response, like PKR response. mentioned, endonucleases can be considered as double The terms “interfering RNA” and “iRNA” refer to double Strand break creating agent targeting specific DNA stranded RNAs capable of triggering RNA interference of a sequences, in other terms, a double-strand break creating 10 agent targeting a double-strand break creating agent target gene. The gene thus silenced is defined as the gene targeted by site. Under “double-strand break creating agent' is also the iRNA. Interfering RNAs include, e.g., siRNAs and shR encompassed variants orderivatives of endonucleases such as NAS; an interfering RNA is also an interfering agent as engineered variants or engineered derivatives of meganu described above. cleases, zinc-finger nucleases or TALENs; these variants or 15 “iRNA-expressing construct” and “iRNA construct” are derivatives can be chimeric rare-cutting endonucleases, i.e. generic terms which include small interfering RNAs (siR fusion proteins comprising additional protein catalytic NAs), shRNAs and other RNA species, and which can be domains, displaying one or several enzymatic activities cleaved in vivo to form siRNAs. As mentioned before, it has amongst nuclease, endonuclease or exonuclease, or a fusion been shown that the enzyme Dicer cleaves long dsRNAs into protein with a polymerase activity, a kinase activity, a phos short-interfering RNAs (siRNAs) of approximately 21-23 phatase activity, a methylase activity, a topoisomerase activ nucleotides. One of the two siRNA strands is then incorpo ity, an activity, a activity, a ligase activ rated into an RNA-induced silencing complex (RISC). RISC ity, a helicase activity, or a recombinase activity, as non compares these “guide RNAs to RNAs in the cell and effi limiting examples or fusion proteins with other proteins ciently cleaves target RNAS containing sequences that are implicated in DNA processing. In a more precise non-limit 25 perfectly, or nearly perfectly complementary to the guide ing example, said "double-strand break creating agent' RNA. “iRNA construct” also includes nucleic acid prepara according to the present invention can be a fusion protein tion designed to achieve an RNA interference effect, such as between a single-chain meganuclease obtained according to expression vectors able of giving rise to transcripts which previously published methods (Grizot et al. 2009) and an form dsRNAs or hairpin RNA in cells, and or transcripts exonuclease Trex2 as shown in example 4. 30 which can produce siRNAs in vivo. Other agents or chemicals or molecules are double-strand A “short interfering RNA” or “siRNA comprises a RNA break creating agents whom DNA sequence targets are non duplex (double-stranded region) and can further comprises specific or non-predictable Such as, in a non limiting list, one or two single-stranded overhangs, 3' or 5' overhangs. alkylating agents (Methyl Methane Sulfonate or dimethane Each molecule of the duplex can comprise between 17 and 29 Sulfonates family and analogs). Zeocyn, enzyme inhibitors 35 nucleotides, including 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, Such as toposiomerase inhibitors (types I and II Such as non 27, 28, and 29 nucleotides. siRNAs can additionally be limiting examples quinolones, fluoroquinolones, ciprofloxa chemically modified. cin, irinotecan, lamellarin D, doxorubicin, etoposide) and “MicroRNAs” or “miRNAs are endogenously encoded ionizing radiations (X-rays, Ultraviolet, gamma-rays). RNAS that are about 22-nucleotide-long, that post-transcrip The term “reporter gene’’, as used herein, refers to a nucleic 40 tionally regulate target genes and are generally expressed in a acid sequence whose product can be easily assayed, for highly tissue-specific or developmental-stage-specific fash example, colorimetrically as an enzymatic reaction product, ion. At least more than 200 distinct miRNAs have been iden Such as the lacZ gene which encodes for B-galactosidase. tified in plants and animals. These small regulatory RNAs are Examples of widely-used reporter molecules include believed to serve important biological functions by two pre enzymes such, as 3-galactosidase, B-glucoronidase, B-glu 45 dominant modes of action: (1) by repressing the translation of cosidase; luminescent molecules Such as green fluorescent target mRNAs, and (2) through RNA interference, that means protein and firefly luciferase; and auxotrophic markers such cleavage and degradation of mRNAS. In this latter case, miR as His3p and Ura3p. (See, e.g., Chapter 9 in Ausubel, F.M., et NAs function analogously to siRNAS. miRNAs are first tran al. Current Protocols in Molecular Biology, John Wiley & scribed as part as a long, largely single-stranded primary Sons, Inc. (1998)). The expressions “inactive reporter gene’ 50 transcript (pri-miRNA) (Lee, Jeon et al. 2002). This pri or “reporter gene rendered inactive' refers to a reporter gene miRNA transcript is generally and possibly invariably, Syn wherein one part of said reporter gene has been replaced for thetized by RNA polymerase II and therefore is polyadeny the purpose of the present invention, in this inactive state, said lated and may be spliced. It contains an about 80-nucleotides inactive reporter gene is not capable of emitting any relevant long hairpin structure that encodes the mature about detectable signal upon, transfection in a cell. In the present 55 22-nucleotides miRNA part of one arm of the stem. In animal invention, reporter genes Such as Luciferase and Green Fluo cells, this primary transcript is cleaved by a nuclear RNaseIII rescent Protein genes have been rendered inactive by, respec type enzyme called Drosha (Lee et al., 2003, Nature 425:415 tively, the introduction of a frameshift mutation in their 419) to liberate a hairpin mRNA precursor, or pre-miRNA of respective coding sequence. Said frameshift mutation can be about-65 nucleotides long. This pre-miRNA is then exported due, as a non-limiting example, to the introduction in said 60 to the cytoplasm by exportin-5 and the GTP-bound form, of coding sequence of a target, sequence for an endonuclease. the Ran cofactor (Yi, Qin et al. 2003). Once in the cytoplasm, Upon cellular co-transfection of said inactive reporter gene the pre-miRNA is further processed by Dicer, another and endonuclease, said endonuclease provokes a double RNaseIII enzyme to produce a duplex of about-22 nucle strand break in its target that is repaired by NHEJ, leading to otides base pairs long that is structurally identical to a siRNA a functional restoration of said reporter gene. The expressions 65 duplex (Hutvagner, McLachlan et al. 2001). The binding of “functional restoration of a reporter gene or “functional protein components of the RISC, or RISC cofactors, to the reporter gene' refer to the recovering of a reporter gene duplex results in incorporation of the mature, single-stranded US 9,044,492 B2 31 32 miRNA into a RISC or RISC-like protein complex, while the “Altered/enhanced/increased/improved cleavage activity”. other strand of the duplex is degraded (Bartel et al., 2004, Cell refers to an increase in the detected level of meganuclease 116: 281-297). cleavage activity, see below, against a target DNA sequence Thus, one can design and express artificial miRNAS based by a second meganuclease in comparison to the activity of a on the features of existing miRNA genes. The miR-30 (mi 5 first meganuclease against the target DNA sequence. Nor croRNA30) architecture can be used to express miRNAs (or mally the second meganuclease is a variant of the first and siRNAs) from RNA polymerase II promoter-based expres comprise one or more Substituted amino acid residues in sion plasmids (Zeng, Cai et al. 2005). In some instances the comparison to the first meganuclease. precursor miRNA molecules may include more than one “Nucleotides’ are designated as follows: one-letter code is 10 used for designating the base of a nucleoside: a is adenine, t is stem-loop structure. The multiple stem-loop structures may thymine, c is cytosine, and g is guanine. For the degenerated be linked to one another through a linker, Such as, for nucleotides, r represents g or a (purine nucleotides), k repre example, a nucleic acid, linker, a miRNA flanking sequence, sents gort, S represents goric, w represents a ort, m represents other molecules, or some combination thereof. a or c, y represents t or c (pyrimidine nucleotides), d repre A “short hairpin RNA (shRNA) refers to a segment of 15 sents g, a or t, V represents g, a or c, b represents g. t or c, h RNA that is complementary to a portion of a target gene represents a, t or c, and n represents g, a, t or c. (complementary to one or more transcripts of a target gene), By "meganuclease', is intended an endonuclease having a and has a stem-loop (hairpin) structure, and which can be double-stranded DNA target sequence of 12 to 45 bp, more used to silence gene expression. preferably 22 to 24bp when said meganuclease is an I-CreI A “stem-loop structure' refers to a nucleic acid having a variant. Said meganuclease is either a dimeric enzyme, secondary structure that includes a region of nucleotides wherein each domain is on a monomer or a monomeric which are blown or predicted to form a double strand (stem enzyme comprising the two domains on a single polypeptide. portion) that is linked on one side by a region of predomi By "meganuclease domain is intended the region which nantly single-stranded nucleotides (loop portion). The terms interacts with one half of the DNA target of a meganuclease “hairpin' is also used herein to refer to stem-loop structures. 25 and is able to associate with the other domain of the sane By “double-strand break-induced target sequence' or meganuclease which interacts with the other half of the DNA “double-strand break creating agent target site', or “DSB target to form a functional meganuclease able to cleave said creating agent target site' is intended a sequence that is rec DNA target. ognized by any double strand break creating agent. By “meganuclease variant' or “variant it is intended a The expression “polynucleotide derivatives' refers to 30 meganuclease variant or a "DSB-creating agent variant, a polynucleotide sequences that can be deduced and con rare-cutting endonuclease' variant, a "chimeric rare-cutting structed from the respective sequence or a part of the respec endonuclease” variant obtained, by replacement of at least tive sequence of identified-effector genes according to the one residue in the amino acid sequence of the parent mega present invention. These derivatives can refer to mRNAs, nuclease or parent “DSB-creating agent, parent rare-cutting siRNAs, dsRNAs, miRNAs, cDNAs. These derivatives can 35 endonuclease, parent "chimeric rare-cutting endonuclease' be used directly or as part of a delivery vector or vector/ with a different amino acid. plasmid/construct, by introducing them in an eukaryotic cell By "peptide linker it is intended to mean a peptide to increase gene targeting efficiency and/or endonuclease sequence of at least 10 and preferably at least 17 amino acids induced homologous recombination. which links the C-terminal amino acid residue of the first “Transfection' means “introduction' into a live cell, either 40 monomer to the N-terminal residue of the second monomer in vitro or in vivo, of certain nucleic acid construct, preferably and which allows the two variant monomers to adopt the into a desired cellular location of a cell, said nucleic acid correct conformation for activity and which does not alter the construct being functional once in the transfected cell. Such specificity of either of the monomers for their targets. presence of the introduced nucleic acid may be stable or By “subdomain it is intended the region of a LAGL transient. Successful transfection will have an intended effect 45 IDADG homing endonuclease core domain which interacts or a combination of effects on the transfected cell, such as with a distinct part of a homing endonuclease DNA target silencing and/or enhancing a gene target and/or triggering half-site. target physiological event, like enhancing the frequency of By “selection or selecting it is intended to mean the iso mutagenesis following an endonuclease-induced DSB as a lation of one or more meganuclease variants based upon an non-limiting example. 50 observed specified phenotype, for instance altered cleavage “Modulate' or “modulation' is used to qualify the up- or activity. This selection can be of the variant in a peptide form down-regulation of a pathway like NHEJ consecutive to an upon which the observation is made or alternatively the selec endonuclease-induced DSB in particular conditions or not, tion can be of a nucleotide coding for selected meganuclease compared to a control condition, the level of this modulation variant. being measured by an appropriate method. More broadly, it 55 By "screening it is intended to mean the sequential or can refer to any phenomenon “modulation' is associated simultaneous selection of one or more meganuclease with, like the expression level of a gene, a polynucleotide or variant(s) which exhibits a specified phenotype such as derivative thereof (DNA, cDNA, plasmids, RNA, mRNA, altered cleavage activity. interfering RNA), polypeptides, etc. By “derived from it is intended to mean a “DSB-creating Amino acid residues' in a polypeptide sequence are des 60 agent variant, a rare-cutting endonuclease' variant, a “chi ignated herein according to the one-letter code, in which, for meric rare-cutting endonuclease' variant or a meganuclease example, Q means Gln or Glutamine residue, R means Arg or variant which is created from a parent “DSB-creating agent'. Arginine residue and D means Asp or Aspartic acid residue. rare-cutting endonuclease', 'chimeric rare-cutting endonu Amino acid substitution” means the replacement of one clease' or meganuclease and hence the peptide sequence of amino acid residue with another, for instance the replacement 65 the resulting “DSB-creating agent variant, rare-cutting of an Arginine residue with a Glutamine residue in a peptide endonuclease” variant, "chimeric rare-cutting endonuclease' sequence is an amino acid Substitution. variant or meganuclease variant is related to (primary US 9,044,492 B2 33 34 sequence level) but derived from (mutations) the sequence meganuclease as described in WO03078619 and peptide sequence of the parent meganuclease. By “I-Cre' is WO2009095793 capable of cleaving a target sequence intended the wild-type I-CreI having the sequence of pdb according to SEQ ID NO: 10, and having a polypeptidic accession code 1g.9y, corresponding to the sequence SEQID sequence corresponding as a non-limiting example to SEQID NO: 7 in the sequence listing. 5 NO: 11. By “I-CreI variant with novel specificity' is intended a By “DNA target”, “DNA target sequence', “target variant having a pattern of cleaved targets different from that sequence', “target-site', “target”, “site”, “site of interest'. of the parent meganuclease. The terms "novel specificity'. “recognition site', 'polynucleotide recognition site'. “recog “modified specificity”, “novel cleavage specificity”, “novel nition sequence”, “homing recognition site'. “homing site'. substrate specificity’ which are equivalent and used indiffer 10 "cleavage site', 'endonuclease-specific target site' is ently, refer to the specificity of the variant towards the nucle intended a 20 to 24bp double-stranded palindromic, partially otides of the DNA target sequence. In the present patent palindromic (pseudo-palindromic) or non-palindromic poly applicationall the I-Crel variants described comprise an addi nucleotide sequence that is recognized and cleaved by a tional Alanine after the first Methionine of the wild type LAGLIDADG homing endonuclease such as I-Cre, or a I-CreI sequence (SEQ ID NO: 7). These variants also com 15 variant, or a single-chain chimeric meganuclease derived prise two additional Alanine residues and an Aspartic Acid from I-CreI. Said DNA target sequence is qualified of “cleav residue after the final Proline of the wild type I-CreI able' by an endonuclease, when recognized withina genomic sequence. These additional residues do not affect the proper sequence and known to correspond to the DNA target ties of the enzyme and to avoid confusion these additional sequence of a given endonuclease or a variant of Such endo residues do not affect the numeration of the residues in I-Cre nuclease. These terms refer to a distinct DNA location, pref or a variant referred in the present patent application, as these erably a genomic location, at which a double stranded break references exclusively refer to residues of the wildtype I-CreI (cleavage) is to be induced by the meganuclease. The DNA enzyme (SEQ ID NO: 7) as present in the variant, so for target is defined by the 5' to 3' sequence of one strand of the instance residue 2 of I-Crei is in fact residue 3 of a variant double-stranded polynucleotide, as indicate above for C1221. which comprises an additional Alanine after the first 25 Cleavage of the DNA target occurs at the nucleotides at posi Methionine. tions +2 and -2, respectively for the sense and the antisense By “I-CreI site' is intended a 22 to 24bp double-stranded strand. Unless otherwise indicated, the position at which DNA sequence which is cleaved by I-CreI. I-CreI sites cleavage of the DNA target by an I-Cre I meganuclease vari include the wild-type non-palindromic I-CreI homing site ant occurs, corresponds to the cleavage site on the sense and the derived palindromic sequences such as the sequence 30 strand of the DNA target. By “an I-SceI target site' is meant 5'-t-12c-11a–10a–9a-sa-7c-62-st4C-3g2t-1a1e. 22.3a. 4Cs a target sequence for the endonuclease I-Scel; by “an engi got 7tstotoga, 2 (SEQID NO:8), also called C1221. neered meganuclease target site' is meant a target sequence By “domain or “core domain is intended the “LAGL for a variant endonuclease that has been engineered as previ IDADG homing endonuclease core domain” which is the ously mentioned and as described in WO2006097854, characteristic C.Bf3C.Bf3C, fold of the homing endonucleases of 35 WO2008093249, WO03078619, WO2009095793, the LAGLIDADG family, corresponding to a sequence of WOO3O78619 and WO 2004/067736. about one hundred amino acid residues. Said domain com By “DNA target half-site”, “half cleavage site' or half prises four beta-strands (BBBB) folded in an anti-parallel site' is intended the portion of the DNA target which is bound beta-sheet which interacts with one half of the DNA target. by each LAGLIDADG homing endonuclease core domain. This domain is able to associate with another LAGLIDADG 40 By "chimeric DNA target” or “hybrid DNA target” is homing endonuclease core domain which interacts with the intended the fusion of different halves of two parent mega other half of the DNA target to form a functional endonu nuclease target sequences. In addition at least one half of said clease able to cleave said DNA target. For example, in the target may comprise the combination of nucleotides which case of the dimeric homing endonuclease I-CreI (163 amino are bound by at least two separate Subdomains (combined acids), the LAGLIDADG homing endonuclease core domain 45 DNA target). corresponds to the residues 6 to 94. By "parent meganuclease it is intended to mean a wild By “beta-hairpin' is intended two consecutive beta-strands type meganuclease or a variant of Such a wild type meganu of the antiparallel beta-sheet of a LAGLIDADG homing clease with identical properties or alternatively a meganu endonuclease core domain (Bf3 or f3(3) which are con clease with Some altered characteristic in comparison to a nected by a loop or a turn, 50 wild type version of the same meganuclease. By "single-chain meganuclease', 'single-chain chimeric By “delivery vector or “delivery vectors' is intended any meganuclease', 'single-chain meganuclease derivative'. delivery vector which can be used in the present invention to “single-chain chimeric meganuclease derivative' or 'single put into cell contact or deliver inside cells or subcellular chain derivative' is intended a meganuclease comprising two compartments agents/chemicals and molecules (proteins or LAGLIDADG homing endonuclease domains or core 55 nucleic acids) needed in the present invention. It includes, but domains linked by a peptidic spacer as described in is not limited to liposomal delivery vectors, viral delivery WO03078619 and WO2009095793. The single-chain mega vectors, drug delivery vectors, chemical carriers, polymeric nuclease is able to cleave a chimeric DNA target sequence carriers, lipoplexes, polyplexes, dendrimers, microbubbles comprising one different half of each parent meganuclease (ultrasound, contrast agents), nanoparticles, emulsions or target sequence. —By “SC-GS meganuclease' or “engi 60 other appropriate transfer vectors. These delivery vectors neered SC-GS meganuclease' is meant an engineered single allow delivery of molecules, chemicals, macromolecules chain meganuclease as described in WO03078619 and (genes, proteins), or other vectors such as plasmids, peptides WO2009095793 capable of cleaving a target sequence developed by Diatos. In these cases, delivery vectors are according to SEQ ID NO: 9, and having a polypeptidic molecule carriers. By “delivery vector” or “delivery vectors’ sequence corresponding as a non-limiting example to SEQID 65 is also intended delivery methods to perform transfection. NO: 4. —By “SC-RAG meganuclease' or “meganuclease The terms “vector” or “vectors' refer to a nucleic acid, SC-RAG” or “SC-RAG” is meant an engineered single chain molecule capable of transporting another nucleic acid to US 9,044,492 B2 35 36 which it has been linked. A “vector” in the present invention photransferase, herpes simplex virus , includes, but is not limited to, a viral vector, a plasmid, a RNA adenosine deaminase, glutamine synthetase, and hypoxan vector or a linear or circular DNA or RNA molecule which thine-guanine phosphoribosyltransferase for eukaryotic cell may consists of a chromosomal, non chromosomal, semi culture: TRP1 for S. cerevisiae; tetracycline rifampicin or synthetic or synthetic nucleic acids. Preferred vectors are ampicillin resistance in E. coli. Preferably said vectors are those capable of autonomous replication (episomal vector) expression vectors, wherein a sequence encoding a polypep and/or expression of nucleic acids to which they are linked tide of interest is placed under control of appropriate tran (expression vectors). Large numbers of Suitable vectors are Scriptional and translational control elements to permit pro known to those of skill in the art and commercially available. duction or synthesis of said polypeptide. Therefore, said Viral vectors include retrovirus, adenovirus, parvovirus 10 polynucleotide is comprised in an expression cassette. More (e.g. adenoassociated viruses), coronavirus, negative strand particularly, the vector comprises a replication origin, a pro RNA viruses such as orthomyxovirus (e.g., influenza virus), moter operatively linked to said encoding polynucleotide, a rhabdovirus (e.g., rabies and vesicular stomatitis virus), ribosome binding site, a RNA-splicing site (when genomic paramyxovirus (e.g. measles and Sendai), positive strand DNA is used), a polyadenylation site and a transcription RNA viruses such as picornavirus and alphavirus, and 15 termination site. It also can comprise an enhancer or silencer double-stranded DNA viruses including adenovirus, herpes elements. Selection of the promoter will depend upon the cell virus (e.g. Herpes Simplex virus types 1 and 2, Epstein-Barr in which the polypeptide is expressed. Suitable promoters virus, cytomegalovirus), and poxvirus (e.g., vaccinia, fowl include tissue specific and/or inducible promoters. Examples pox and canarypox). Other viruses include Norwalk virus, of inducible promoters are: eukaryotic metallothionine pro togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, moter which is induced by increased levels of heavy metals, and hepatitis virus, for example. Examples of retroviruses prokaryotic lac7 promoter which is induced in response to include: avian leukosis-sarcoma, mammalian C-type, B-type isopropyl-B-D-thiogalacto-pyranoside (IPTG) and eukary viruses, D type viruses, HTLV-BLV group, lentivirus, spuma otic heat shock promoter which is induced by increased tem virus (Coffin, J. M., Retroviridae: The viruses and their rep perature. Examples of tissue specific promoters are skeletal lication. In Fundamental Virology. Third Edition, B. N. 25 muscle , prostate-specific antigen (PSA), Fields, et al., Eds. Lippincott-Raven Publishers, Philadel C-antitrypsin protease, human Surfactant (SP) A and B pro phia, 1996). teins, B-casein and acidic whey protein genes. By “lentiviral vector' is meant HIV-Based lentiviral vec Inducible promoters may be induced by pathogens or tors that are very promising for gene delivery because of their stress, more preferably by stress like cold, heat, UV light, or relatively large packaging capacity, reduced immunogenicity 30 high ionic concentrations (reviewed in (Potenza, Aleman et and their ability to stably transduce with high efficiency a al, 2004)). Inducible promoter may be induced by chemicals large range of different cell types. Lentiviral vectors are usu (reviewed in (Moore, Samalova et al. 2006): (Padidam 2003); ally generated following transient transfection of three (pack (Wang, Zhou et al. 2003); (Zuo and Chua 2000). aging, envelope and transfer) or more plasmids into producer Delivery vectors and vectors can be associated or com cells. Like HIV, lentiviral vectors enter the target cell through 35 bined with any cellular permeabilization techniques such as the interaction of viral Surface glycoproteins with receptors Sonoporation or electroporation or derivatives of these tech on the cell surface. On entry, the viral RNA undergoes reverse niques. transcription, which is mediated by the viral reverse tran By “cell' or “cells” is intended any prokaryotic or eukary Scriptase complex. The product of reverse transcription is a otic living cells, cell lines derived from these organisms for in double-stranded linear viral DNA, which is the substrate for 40 vitro cultures, primary cells from animal or plant origin. viral integration in the DNA of infected cells. By "primary cell' or “primary cells are intended cells By “integrative lentiviral vectors (or LV), is meant such taken directly from living tissue (i.e. biopsy material) and vectors as non limiting example, that are able to integrate the established for growth in vitro, that have undergone very few genome of a target cell. At the opposite by “non integrative population doublings and are therefore more representative of lentiviral vectors (or NILV) is meant efficient gene delivery 45 the main functional components and characteristics of tissues vectors that do not integrate the genome of a target cell from which they are derived from, in comparison to continu through the action of the virus integrase. ous tumorigenic or artificially immortalized cell lines. These One type of preferred vector is an episome, i.e., a nucleic cells thus represent a more valuable model to the in vivo state acid capable of extra-chromosomal replication. Preferred they refer to. vectors are those capable of autonomous replication and/or 50 In the frame of the present invention, “eukaryotic cells' expression of nucleic acids to which they are linked. Vectors refer to a fungal, plant or animal cellor a cell line derived from capable of directing the expression of genes to which they are the organisms listed below and established for in vitro culture. operatively linked are referred to herein as “expression vec More preferably, the fungus is of the genus Aspergillus, Peni tors. A vector according to the present invention comprises, cillium, Acremonium, Trichoderma, Chrysoporium, Mortier but is not limited to, a YAC (yeast artificial ), a 55 ella, Kluyveromyces or Pichia; More preferably, the fungus is BAG (bacterial artificial), a baculovirus vector, a phage, a of the species Aspergillus niger, Aspergillus nidulans, phagemid, a cosmid, a viral vector, a plasmid, a RNA vector Aspergillus Oryzae, Aspergillus terreus, Penicillium or a linear or circular DNA or RNA molecule which may chrysogenium, Penicillium citrinum, Acremonium Chrysoge consist of chromosomal, non chromosomal, semi-synthetic num, Trichoderma reesei, Mortierella alpine, Chrysosporium or synthetic DNA. In general, expression vectors of utility in 60 lucknowense, Kiuyverornyces lactis, Pichia pastoris or recombinant DNA techniques are often in the form of “plas Pichia ciferrii. mids” which refer generally to circular double stranded DNA More preferably the plant is of the genus Arabidospis, loops which, in their vector form are not bound to the chro Nicotiana, Solanum, Iactuca, Brassica, Oryza, Asparagus, mosome. Large numbers of Suitable vectors are known to Pisum, Medicago, Zea, Hordeum, Secale, Triticum, Capsi those of skill in the art. Vectors can comprise selectable mark 65 cum, Cucumis, Cucurbita, Citrullis, Citrus, Sorghum.; More ers, for example: neomycin phosphotransferase, histidinol preferably, the plant is of the species Arabidospis thaliana, dehydrogenase, dihydrofolate reductase, hygromycin phos Nicotiana tabaccum, Solanum lycopersicum, Solanum US 9,044,492 B2 37 38 tuberosum, Solanum melongena, Solanum esculentum, Lac have been inserted in the coding sequence of a reporter gene, tuca saliva, Brassica napus, Brassica oleracea, Brassica leading to a inactive reporter gene that can only be restored rapa, Oryza glaberrima, Oryza saliva, Asparagus officinalis, after an NHEJ event. In the frame of the present invention, the Pisum sativum, Medieago Saliva, zea mays, Hordeum vul expression “double-strand break-induced mutagenesis' gare, Secale cereal, Triticum aestivum, Triticum durum, Cap (DSB-induced mutagenesis) refers to a mutagenesis event sicum sativus, Cucurbita pepo, Citrullus lanatus, Cucumis consecutive to an NHEJ event following an endonuclease melo, Citrus aurantifolia, Citrus maxima, Citrus medica, Cit induced DSB, leading to insertion/deletion at the cleavage rus reticulata. site of an endonuclease. More preferably the animal cell is of the genus Homo, By “gene' is meant the basic unit of heredity, consisting of Rattus, Mus, Sus, Bos, Danio, Canis, Fells, Equus, Salmo, 10 Oncorhynchus, Gallus, Meleagris, Drosophila, Caenorhab a segment of DNA arranged in a linear manner along a chro ditis; more preferably, the animal cell is of the species Homo mosome, which codes for a specific protein or segment of sapiens, Rattus norvegicus, Mus musculus, Sus scrofa, Bos protein. A gene typically includes a promoter, a 5' untrans taurus, Danio rerio, Canis lupus, Felis catus, Equus cabalus, lated region, one or more coding sequences (exons), option Salmo salar, Oncorhynchus mykiss, Gallus gallus, Meleagris 15 ally introns, a 3' untranslated region. The gene may further gallopavo, Drosophila melanogaster; Caenorhabditis comprise a terminator, enhancers and/or silencers. elegans. As used herein, the term “transgene' refers to a sequence By “homologous is intended a sequence with enough encoding a polypeptide. Preferably, the polypeptide encoded identity to another one to lead to homologous recombination by the transgene is either not expressed, or expressed but not between sequences, more particularly having at least 95% biologically active, in the cell, tissue or individual in which identity, preferably 97% identity and more preferably 99%. the transgene is inserted. Most preferably, the transgene “Identity” refers to sequence identity between two nucleic encodes a therapeutic polypeptide useful for the treatment of acid molecules or polypeptides. Identity can be determined an individual. by comparing a position in each sequence which may be The term, “gene of interest' or “GOI refers to any nucle aligned for purposes of comparison. When a position in the 25 otide sequence encoding a known or putative gene product. compared sequence is occupied by the same base, then the As used herein, the term “locus is the specific physical molecules are identical at that position. A degree of similarity location of a DNA sequence (e.g. of a gene) on a chromo or identity between nucleic acid or amino acid sequences is a some. The term, “locus' usually refers to the specific physical function of the number of identical or matching nucleotides at location of an endonuclease's target sequence on a chromo positions shared by the nucleic acid sequences. By a poly 30 nucleotide having a sequence at least, for example, 95% some. Such a locus, which comprises a target sequence that is “identical to a query sequence of the present invention, it is recognized and cleaved by an endonuclease according to the intended that the sequence of the polynucleotide is identical invention, is referred to as "locus according to the invention'. to the query sequence except that the sequence may include Also, the expression "genomic locus of interest' is used to up to five nucleotide alterations per each 100 nucleotides of 35 qualify a nucleic acid sequence in a genome that can be a the query sequence. In otherwords, to obtain a polynucleotide putative target for a double-strand break according to the having a sequence at least 95% identical to a query sequence, invention. By "endogenous genomic locus of interest” is up to 5% (5 of 100) of the nucleotides of the sequence may be intended a native nucleic acid sequence in a genome, i.e., a inserted, deleted, or substituted with another nucleotide. Vari sequence orallelic variations of this sequence that is naturally ous alignment algorithms and/or programs may be used to 40 present at this genomic locus. It is understood that the con calculate the identity between two sequences, including sidered genomic locus of interest of die present invention can FASTA, or BLAST which are available as a part of the GCG be between two overlapping genes the considered endonu sequence analysis package (University of Wisconsin, Madi clease's target sequences are located in two different genes. son, Wis.), and can be used with, e.g., default setting. The "Genomic locus of interest” in the present application, program, which uses the Needleman-Wunsch 45 encompasses nuclear genetic material but also a portion of global alignment algorithm (Needleman and Wunseh 1970) genetic material that can existindependently to the main body to find the optimum alignment (including gaps) of two of genetic material. Such as plasmids, episomes, virus, trans sequences when considering their entire length, may for posons or in organelles such as mitochondria or chloroplasts example be used. The needle program is for example avail as non-limiting examples, at which a double stranded break able on the ebi.ac.uk worldwide web site. The percentage of 50 (cleavage) can be induced by the DSB-creating agent, i.e identity in accordance with the invention is preferably calcu endonuclease, rare-cutting endonuclease and/or chimeric lated using the EMBOSS::needle (global) program with a rare-cutting endonuclease of the invention. “Gap Open parameter equal to 10.0, a "Gap Extend param By "RAG1 locus is intended the RAG1 gene position in a eter equal to 0.5, and a Blosumó2 matrix. mammalian genome. For example, the human RAG1 gene is By "mutation' is intended the substitution, deletion, inser 55 tion of one or more nucleotides/amino acids in a polynucle available in the NCBI database, under the accession number otide (cDNA, gene) or a polypeptide sequence. Said mutation NC000011.8 (GeneID:5896) and its locus is positioned from can affect the coding sequence of a gene or its regulatory position 36546139 to 36557877. sequence. It may also affect the structure of the genomic The above written description of the invention provides a sequence or the structure/stability of the encoded mRNA. By 60 manner and process of making and using it such that any "frameshift mutation' is intended a genetic mutation caused person skilled in this art is enabled to make and use the same, by insertions or deletions in a DNA sequence of a number of this enablement being provided in particular for the subject nucleotides that is not evenly divisible by three. Due to the matter of the appended claims, which make up a part of the triplet nature of the genetic code, Such insertions or deletions original description. can change the reading frame of the considered gene, result 65 As used above, the phrases 'selected from the group con ing in a completely different translation of this gene. For the sisting of “chosen from.” and the like include mixtures of purpose of the present invention, Such frameshift mutations the specified materials. US 9,044,492 B2 39 40 Where a numerical limit or range is stated herein, the recognition site. Thus induction of a DNA double strand endpoints are included. Also, all values and Subranges within break (DSB) by SC GS (SEQID NO: 4) encoded by vector a numerical limit or range are specifically included as if pCLS2690 (SEQID NO:3) or I-SecI meganuclease (SEQID explicitly written out. NO: 40) followed by a mutagenic DSB repair by NHEJ can The above description is presented to enable a person 5 lead to the restoration of Luciferase gene in frame with ATG skilled in the art to make and use the invention, and is pro start codon. vided in the context of a particular application and its require These sequences were placed in a plasmid used to target the ments. Various modifications to the preferred embodiments final construct at RAG1 locus using hisRAG1 Integration will be readily apparent to those skilled in the art, and the Matrix CMV Neo from coPS(R) Custom Human Full Kit DD generic principles defined herein may be applied to other 10 (Cellectis Bioresearch). embodiments and applications without departing from the The plasmid pCLS6883 (SEQID NO: 1) was tested in an spirit and scope of the invention. Thus, this invention is not extrachromosomal assay by co transfection with either intended to be limited to the embodiments shown, but is to be pCLS2690 (SEQID NO: 3) encoding SC GS meganuclease accorded the widest scope consistent with the principles and or pCLS0099 (SEQID NO: 12) encoding GFP. Luciferase features disclosed herein. 15 signal was analysed 72 hours post transfection. Co transfec Having generally described this invention, a further under tion with pCLS6883 (SEQID NO: 1) and pCLS0099 (SEQID standing can be obtained by reference to certain specific NO: 12) led to a 1,800 R.L.U. signal whereas co transfection examples, which are provided herein for purposes of illustra with pCLS6883 (SEQID NO: 1) and pCLS2690 (SEQID NO: tion only, and are not intended to be limiting unless otherwise 3) led to a 20,000 R.L.U signal. This result demonstrates that specified. the presence of meganuclease SC GS induced luciferase by a factor of 11 and that pCLS6883 (SEQID NO: 1) construe is EXAMPLES thus measuring mutagenic NHEJ DSB-repair induced by SC GS. Example 1 25 Example 2 Constructions Monitoring Meganuclease-Induced Mutagenesis Screening with Extrachromosomal Assay of siRNAs Targeting Genes Coding for Kinases Plasmids measuring meganuclease-induced mutagenesis at their target site were constructed. They are based on acti 30 The construct measuring SC GS induced mutagenesis at Vation of a reporter gene after cleavage of a target by a GS locus with Luciferase reporter gene (pCLS6883, SEQID meganuclease followed by a mutagenic repair of this double NO: 1) was used to screen in an extrachromosomal assay strand break (DSB). The present invention uses Luciferase siRNAS targeting genes coding for kinases. This screen iden reporter gene and I-Sce and GS meganucleases but other tified 23 siRNAs that led to stimulation of Luciferase signal reporter gene Such as GFP and other meganucleases can be 35 induced by meganuclease SC GS. used. a) Materials and Methods a) Materials and Methods Extrachromosomal Screening Cell Culture Twenty thousand of 293H cells per well were seeded in Cell line 293H was cultured at 37° C. with 5% CO, in white 96 well plates one day before transfection. Per well, Dulbecco's modified Eagle's medium (DMEM) Glutamax 40 cells were transfected with 200 ng of total DNA with 100 ng supplemented with 10% fetal calf serum, 2 mM L-glutamme, pCLS6883 (SEQID NO: 1) and either 100 ng of pCLS2690 100 UI/ml penicilline, 100 g/ml streptomycine, 0.25 ug/m (SEQID NO:3) or 100ng of pCLS0099 (SEQID NO: 12) and amphotericine B (Fongizone). with siRNA at 33 nM final concentration using 1.35 ul of Transient Transfection in 96 Well Plate Format Polyfect transfection reagent (QIAGEN). The siRNAs kinase Twenty thousand cells per well were seeded in white 96 45 set (QIAGEN) is targeting 696 genes with 2 siRNAs per gene. well plates one day before transfection. Per well, cells were Ninety-six hours post transfection 50 ul per well of ONEGlo transfected with 200ng of total DNA with 100 ng pCLS6883 (Promega) were added, cells were incubated in dark for 3 (SEQID NO: 1) and either 100ng of pCLS2690 (SEQID NO: minutes before luciferase activity analysis (1 second?well) 3) or 100 ng of pCLS0099 (SEQID NO: 12) using 1.35ul of using PHER AStar luminometer (BMG Labtech). Polyfect transfection reagent (QIAGEN). Ninety-six hours 50 b) Results post transfection 50 ul per well of ONEGlo (Promega) were Each plate from siRNA kinase set was co-transfected in added, cells were incubated in dark for 3 minutes before duplicate in presence of SC GS (pCLS2690, SEQID NO: 1), luciferase activity analysis (1 second?well) using PHERAStar induced condition) or in simplicate in presence of GFP luminometer (BMG Labtech). (pCLS0099, SEQID NO: 12, not induced condition). Induc b) Results; 55 tion by SC GS was monitored by comparison of the The plasmids pCLS6883 (SEQID NO: 1) and pCLS6884 Luciferase signal obtained with pCLS2690 (SEQID NO: 1) (SEQ ID NO:2) were constructed to quantify NHEJ repair over the one obtained with pCLS0099 (SEQID NO: 12). The events induced by SC GS or I-Sce respectively. These con induction varied from 2 to 11 depending on the transfection. structions can be targeted at RAG1 site locus and are pre To normalize the different transfections, Z score was cal sented in FIG. 2. 60 culated for each plate with, the following equation Z=(x-u)/o The sequence used to measure meganuclease-induced were x is the R.L.U. value of a given siRNA, L, is the median mutagenesis is made of an ATG start codon followed by i) 2 R.L.U. value of the plate and O its standard deviation. A codons for alanine ii) the tag HA sequence iii) GS or I-Scel siRNA was considered as a stimulating hit when its Z score recognition site iv) a glycine serine stretch V) the same 2 value was higher than 3 (FIG. 3). codons for alanine as in i) and finally vi) luciferase reporter 65 This screen led to the identification of 23 positive hits that gene lacking its ATG start codon. Luciferase reporter gene is stimulate luciferase signal by a factor ranging from 2 to 7 (cf. inactive due to a frame-shift introduced by GS or I-Scel Table I below). US 9,044,492 B2 41 42 TABLE I siRNA hits stimulatind GS SC-induced luciferase Sicinal. SEO Mean Gene Gene siRNA D Z. Stimulation targeted ID target sequence NO: Score Sto factor

CSNK1D 1453 CCGGTCTAGGATCGAAATGTT 13 3.14 O.75 3 - 4 O

AK2 2O4 CGGCAGAACCCGAGTATCCTA 14 5.51 O2O 5. O8

AKT2 2O8CAAGCGTGGTGAATACATCAA 15 3.. 65 0.23 2.94

CAMK2G 81.8GAGGAAGAGATCTATACCCTA 16 5 O1. O. 23 3 66

GK2 2712TACGTTAGAAGAGCACTGTAA. 17 3.33 1. 49 2.75

PFKFB4 521. OCAGAAAGTGTCTGGACTTGTA 18 3.92 114 2.18

MAPK12 63 OOCTGGACGTATTCACTCCTGAT 19 384 OO6 3.22

PRKCE 5581 CCCGACCATGGTAGTGTTCAA 20 4 OO O. 43 2.91.

EIF2AK2 is 61OCGGAAAGACTTACGTTATTAA 21 4.5 O O. 22 3.15

WEE1 7.465 CAGGGTAGATTAC CTCGGATA. 22 32 O O. O.8 5. O1

CDK5R1, 8851 CCGGAAGGCCACGCTGTTTGA 23 4. O1 O.26 6. O3

LIG4 3981 CACCGTTTATTTGGACTCGTA. 24. 4.11 O. 41 6.15

AKAP1 8165AGCGCTGAACTTGATTGGGAA 25 4.97 O.32 F24

MAP3K6 9 O64TCAGAGGAGCTGAGTAATGAA. 26 5.99 O.22 s: 41

DYRK3 84.44 TCGACAGTACGTGGCCCTAAA 27 3.54. O.22 3 : 61

RPS6KA4 8986 CGCCACCTTCATGGCATTCAA 28 3.56. Of3 3 : 61

STK17A. 9263 CACACTCGTGATGTAGTTCAT 29 3.26 O. 43 2. Of

GNE 10O2 OCCCGATCATGTTTGGCATTAA 3 O 331 O.25 2.2O

ERN2 10595 CTGGTTCGGCGGGAAGTTCAA 31, 3.47 147 2.30

HUNK 3O811 CACGGGCAAAGTGCCCTGTAA 32 363 13 O 1.97

SMG1 23 O49 CACCATGGTATTACAGGTTCA 33 3.22 O. 46 2. O5

WNK4 65266CAGCTTGTTGGGCGTTTCCAA 3 4 5 - 58 Of O 4.15

MAGI2 98.63 CAGGCCCAACTTGGGATATCA 35 3. Of O. 63 2. Of

45 Example 3 Stable Transfection to Generate Cell Line Measuring I-Mega nuclease-Induced Mutagenic NHEJ Repair Establishment of Cellular Model Measuring One million of 293H cells were seeded one day prior to Meganuclease-Induced Mutagenesis transfection, 3 ug of SC RAG encoding vector (pCLS2222, 50 SEQID NO: 36) and 2 ug of plasmid measuring SC GS Stable cell lines measuring meganuclease-induced induced mutagenic NHEJ repair (pCLS6883, SEQID NO; 1) mutagenesis at targeted locus were established. The different were co-transfected on cells using 25 ul of lipo?ectamine constructions were introduced at RAG1 locus in a single copy A.E. E.t tO f yout S TE, using c(GPS kit. The cell line measuring SC GS-induced CaWSOTOW19ys 9. aSC1O.: CIS Wee SCCC 1 mutagenesis can be used to screen an siRNA collection cov 55 10 cm petri. One week after seeding 400 ug/ml of G418 ering 19,121 genes to identify new genes regulating (Invitrogen) were added on cells. Neomycin resistant clones mutagenic DSB repair. were transferred in 96 well plate using ClonePix (Genetix) Materials and Method and cultured in presence of 400 ug/ml of G418 (Invitrogen) C R C C18S a VOCS and 50 uM of Gancyclovir (Sigma). Genomic DNA of Neo U. ture O +1, a0 60 mycin and Gancyclovir resistant clones were extracted in Cell line 293H was cultured at 37° C. with 5% CO, in order to perform a PGR specific of RAG1 targeted integration Dulbecco'ss modified Eagle'ss medium (DMEM) Glutamax (cGPS(R) Custom Human Full Kit DD, Cellectis Rioresearch). supplemented with 10%O fetal calf serum, 2 mM L-glutamine, Transient Transfection in 96 Well Plate Format for siRNA 100 UI/ml penicilline, 100 g/ml streptomycine, 0.25 ug/ml Screening amphotericine B (Fongizone). The clones measuring mega- 65 Twenty thousand cells per well were seeded in white 96 nuclease-induced mutagenesis were maintained with 200 well plates one day before transfection. Per well, cells were ug/ml of G418 (Invitrogen). transfected with 200 ng of DNA (SC GS encoding vector US 9,044,492 B2 43 44 pCLS2690, SEQID NO: 3 or GFP encoding vector a) Materials and Methods pCLS0099, SEQID NO: 12) and with or without 33 nM final siRNA Dilation concentration of siRNA using 1.35ul of Polyfect transfection The siRNA collection from QIAGEN was received in 96 reagent (QIAGEN). Seventy two to ninety six hours post well plate format in solution at 10M concentration. On each transfection 50 ul per well of ONEGlo (Promega) were plate columns 1 and 12 were empty allowing controls addi added, cells were incubated in dark for 3 minutes before tion. During dilution process of siRNA, siRNAAS (Qiagen luciferase activity analysis (1 second?well) using PHERAStar #1027280), a negative control, siRNA RAD51 (SEQID NO: luminometer (BMG Labtech). siRNAs targeting the gene 77) and siRNALIG4 (SEQIDNO: 78), two siRNAs targeting WEN (SEQID NO:37), MAPK3 (SEQIDNO:38), FANCD2 proteins involved in recombination process and siRNA Luc2 (SEQID NO: 39), PRKDC (PRKDC 5, 10 (SEQID NO: 79) targeting the expression of the reporter gene CTCGTGTATTACAGAAGGAAA=SEQ ID NO: 75 and used were added at 333 nM final concentration. PRKDC 8, GACCCTGTTGACAGTACTTTA=SEQ ID Fourteen thousand cells per well were seeded in white 96 NO: 76) and LIG4 (SEQID NO: 24) were used to extinct well plates one day before transfection. Per well cells were genes involved in DNA repair or regulation in order to ana 15 co-transfected with 200 ng of DNA poLS2690 (SEQID NO: lyze their potential for mutagenic NHEJ stimulation. More 3) and with 33 nM final concentration of siRNA using 1.35ul over, 8 siRNAs identified with an extrachromosomal assay of Polyfect transfection reagent (QIAGEN) per well. Seventy and targeting CAMK2G (SEQID NO: 16), SMG1 (SEQ ID two hours post transfection 50 ul per well of ONEGlo NO:33), PRKCE (SEQID NO:20), CSNK1D (SEQID NO: (Promega) were added, cells were incubated in dark for 3 13), AK2 (SEQ ID NO: 14), AKT2 (SEQ ID NO: 15), minutes before analysis of luciferase activity (1 second?well) MAPK12 (SEQID NO: 19) and E1F2AK2 (SEQID NO: 21) using PHER AStar luminometer (BMG Labtech). genes were used. All experiments carried out in 96-well plates b) Results: (cell seeding, cell transfection, incubation and luciferase Seventeen runs were performed to screen the entire collec detection) were performed with a Velocity 11 robot (Velocity, tion. For each run the mean luciferase intensity of the all run Palo Alto, Calif.). 25 and of siRNA Luc2 (SEQ ID NO: 79) and their standard b) Results: deviations were calculated. A siRNA hit stimulating A cell line measuring mutagenic NHEJ repair induced by luciferase signal was defined for each run when its luciferase SC GS was created. This cell line contains a single copy of intensity was above the run mean intensity plus 2.5 times the the reporter system integrated at RAG1 locus and was vali run standard deviation. A siRNA hit inhibiting luciferase dated by comparison of Luciferase signal obtained after 30 signal was defined as follows: its luciferase signal is less than transfection with GFP encoding vector pCLS0099, SEQID the mean luciferase activity obtained with siRNALuc2 (SEQ NO: 12 to SC GS encoding vectorpCLS2690, SEQID NO:3 ID NO: 79) plus 0.5 times its standard deviation. On each run (see FIG. 4A). Indeed, transfection with GFP (SEQID NO: SC GS-induced mutagenic NHEJ repair was checked by 12) encoding vector gave similar 60 R.L.U. luciferase signal comparison of induced luciferase signal between co-transfec 35 tion of pCLS2690 (SEQ ID No. 3) with either the siRNA than untreated cells whereas transfection with SC GS encod control AS (Qiagen #1027280) or with the siRNA screened. ing vector (SEQ ID NO: 3) with no sRNA or with siRNA Effect of siRNA was also verified by analyzing the decrease control AS induced a 600 R.L.U. luciferase signal. Moreover of luciferase signal with co-transfection of pCLS2690 (SEQ siRNAs targeting genes involved in classical NHEJ (LIG4) or ID No. 3) with siRNA Luc2 (SEQID NO: 79) in classical NHEJ and other DNA repair pathway (WRN and 40 To compare the screen form run to run, normalization was FANCD2) or in DNA repair regulation (MAPK3) increased applied on each run to get the run mean luciferase signal equal SC GS-induced luciferase signal from 725 up to 1,200 R.L.U to 100 R.L.U. FIG. 12 represents data of all runs after nor (see FIG. 4A). Moreover, 8 siRNAs identified with an extra malization and shows the hits stimulating (with at least a chromosomal assay, targeting CAMK2G (SEQID NO: 16), normalized luciferase activity superior to 183) or hits inhib SMG1 (SEQ ID NO: 33), PRKCE (SEQ ID NO: 20), 45 iting (with at least a normalized luciferase activity inferior to CSNK1 D (SEQ ID NO: 13), AK2 (SEQID NO: 14), AKT2 37.5) SC GS-induced mutagenic NHEJ repair luciferase sig (SEQ ID NO: 15), MARK. 12 (SEQ ID NO: 19) and nal. E1F2AK2 (SEQ ID NO: 21) genes and also two siRNAs As indicated in Table IV below, this screen led to the targeting PRKDC gene (siRNA target sequence PRKDC 5, identification of 481 siRNAs hits that stimulate SC GS-in CTCGTGTATTACAGAAGGAAA=SEQ ID NO: 75 and 50 duced mutagenic NHEJ repair luciferase signal with at least a siRNA target Sequence PRKDC 8, stimulation factor of 1.83. GACCCTGTTGACAGTACTTTA=SEQ ID NO: 76) involved in DNA repair regulation increased SC GS-induced TABL E IV luciferase signal from 6,000 up to 14,000 R.L.U. (see FIG. 4B). This result demonstrates that inhibition of these genes 55 siRNA hits stimulating GS SC-induced stimulate SC GS-induced mutagenic NHEJ repair signal. luciferase signal with at least a fold HTS Screening Measuring SC GS-Induced Non Homolo increase of increase of 1.83 gous End Joining Repair Activity: SEQ Screening of a siRNA collection covering 19,121 genes Gene Gene siRNA ID (Qiagen with two siRNAS targeting each gene) using this cell 60 Targeted ID target sequence NO line will lead, to identification of other siRNAs that could, LCMT2 9836 CAGGCGCGGTACAGAACACCA. 8O modulate SC GS-induced mutagenic NHEJ repair. For that purpose, a high-throughput screening was set up to cotrans SNORD115-10 1OOO33447 GAGAACCTTATATTGTCTGAA 81 fect each siRNA of the collection with pCLS2690 (SEQID WNK4 652 66 CAGCTTGTTGGGCGTTTCCAA 82 NO: 3) in duplicate. This screen led to identification of 481 65 and 486 hits stimulating and inhibiting the luciferase signal HMX2 3.167 CGGGCGCGTACTGTACTGTAA 83 respectively.

US 9,044,492 B2 49 50 TABLE IV - continued TABLE IV - continued siRNA hits stimulating GS SC-induced siRNA hits stimulating GS SC-induced luciferase signal with at least a fold luciferase signal with at least a fold increase of increase of 1.83 5 increase of increase of 1.83

SEQ SEQ Gene Gene siRNA ID Gene Gene siRNA ID Targeted ID target sequence NO Targeted ID target sequence NO

KCNIP2 3O819 CAGCTGCAGGAGCAAACCAAA. 226 10 WPS4A 2.7183. CTCAAAGACCGAGTGACATAA. 262

PPP3CA 553 O TCGGCCTGTATGGGACTGTAA, 227 FAM38B 63.895 CACCTAGTGATTCTAACT CAA. 263

PRPF4 9128 TCCGGTCGTGAAGAAACCACA 228 LRRC24 44.1381 CCACGAGATGTTCGTCATCAA. 264

SYNGAP1 88.31 CAGAGCAGTGGTACCCTGTAA 229 15 PRKCE 5581 CCCGACCATGGTAGTGTTCAA 2O

PTBP1 sf2. GCGCGTGAAGATCCTGTTCAA 23 O SECISBP2 79 0.48 TCCCAGTATCTTTATAACCAA 26

IGFL4 44 4882 CCAGACAGTTGTGAGGTTCAA 231 C6orf1 221491 CAGATGTATAGTATTCAGTAT 266

HES1 328O CACGACACCGGATAAACCAAA. 232 2O SEMA3G 5692O CCCTGCCCTATTGAAACT CAA 267

UNO83O 389084 CACAGACGATGTTCCACAGGA, 233 C16orf3 75O CTGGGACAACGCAGTGTTCAA. 268

CTCGGGAAACGTGGACGACAA 234 ANKRD12 23253 CCGGAGCGGATTAAACCACCA. 269

PLG 534 O AAGTGCGGTGGGAGTACTGTA. 235 TEX28 1527 CAGCGAAGAGAGAATGGCCTA. 27 O 25 IK 35.50 CAGGCGCTTCAAGGAAACCAA. 236 MAPK8IP3 23162 CAGCCGCAACATGGAAGTACA. 271

UBE2D4 51619 CCGAATGACAGTCCTTACCAA 237 FAM9A 171482 AAAGCTCAGTTGGAAGCTCAA. 272

NEK3 4752 CAGAGATATCAAGTCCAAGAA. 238 ACTR8 93.973 TACTACCAACTTAGTCATCAA 273

ATF7 IP2 8 OO 63 TAGGACGACTGAAATAACCAA. 239 30 SNHG1 236.42 CAGCGTTACAGTAATGTTCCA 274

SRRM2 23524 CGCCACCTAAACAGAAATCTA. 24 O SNORD114-10 767588 ATGATGAATACATGTCTGAAA. 275

SRRM2 23524 CTCGATCATCTCCGGAGCTAA. 241 LYPD1 116372 CACGGTGAACGTTCAAGACAT 276

FYCO1 79443 AAGCCACGTCATATAACT CAA. 242 35 SSBP1 6742 AGCCTAAAGATTAGACTGTAA. 277

ABHD9 79852 CAGCTCAGTGCTACTGTGAAT 243 TRIM32 22954 CAGCACTCCAGGAATGTTCAA 278

CGB 1082 ACCAAGGATGGAGATGTTCCA. 244 POLR3H 171568 AACAAACGGCACAGACACCAA. 279

HES5 388585 CACGCAGATGAAGCTGCTGTA. 245 40 SNORA71B 26776 TGCCTTTGCCCTGGTCATTGA 28O

RAB10 10890 AAGGGACAAACTAGTAGGTTT 246 KRT6C 286887 CAAGTCAACGTCTCTGTAGTA. 281

CCCGTTAGTGCTACACTCATT 247 CYP2A7 1549 CCCAAGCTAGGTGGCATTCAT 282

BZW2 2.8969 CAGGAGCGTCTTTCTCAGGAA. 248 45 PTPN22 26.191 TGGGATGTACGTTGTTACCAA 283

FAM1 OOB 283991 CACGTTCTTCCAAGAAACCAA. 249 P2RX5 SO26 CTGATAAAGAAGGGTTACCAA. 284

MGC23985 38.9336 ACCAGACAAGCCAGACGACAA 250 FANCE 21.78 TCGAATCTGGATGATGCTAAA 285

CSAG2 7284 61 CCAGCCGAACGAGGAACT CAA 251 50 PLS3 5358 AACGGATTCATTTGTGACTAT 286

CTCCTTTATCTTCCAAACCAA 252 HNRNPA1 31.78 CAGGGTGATGCCAGGTTCTAT 287

GGA1 26088 CCCGCCATGTGACGACACCAA 253 CD72 971. ATCACCTACGAGAATGTTCAA. 288

C10orf53 2829 66 CTGGAATGTGGTGGAACTCAT 254 55 PRC1 9 O55 TCGAGTGGAGCTGGTTCAGTA. 289

TBL3 106. Of CTGCGTCACGTGGAACACCAA 255 COL5A3 5 OSO9 CCCGGGCATCCAGGTCTGAAA 29 O

EEF1A1 1915 CAGAATAGGAACAAGGTTCTA. 256 ZNF649 65251 AACGCTATGAACACGGAAGAA 291

OTOP3 34.7741 TTGCCAGTACTTCACCCTCTA. 257 60 HCP5 10866 TAGGAGGGAGTCAGTACTGTT 292

BANP 54971 CAGCGACATCCAGGTTCAGTA. 25.8 PXDNL 1379 O2 TACCGACTGAATGCCACCTTA. 293

FOXP2 93986 AAGGCGACATTCAGACAAATA. 259 CDYL2 124359 AATGATCATGTTGGAGAGCAA 294

BAHD1 22893 CAAGAATTACCCACTTCGTAA. 260 C1 forf28 283987 CCCGTGGAAGCCACCGATGAT 295 65 ZNF416 55659 GAGGCCTTTGCCAGAGTTAAA. 261 PPP1R13L 10848 AAGGAGTAAAGTCTAGCAGGA, 296

US 9,044,492 B2 57 58 TABLE IV- continued TABLE IV- continued siRNA hits stimulating GS SC-induced siRNA hits stimulating GS SC-induced luciferase signal with at least a fold luciferase signal with at least a fold increase of increase of 1.83 increase of increase of 1.83

SEQ SEQ Gene Gene siRNA ID Gene Gene siRNA ID Targeted ID target sequence NO Targeted ID target sequence NO

OR1N1 138883 CTGCGTTGTTTGTGTGTTCTA 511. 10 TMEM167 153339 TTCAGAGTCTATTGACTGTAA sis

NPLOC4 556.66 CAGTCGAAATAAGGACACCTA 512 COPZ2 51226 CGGTGTGATTCTGGAGAGTGA 536

CDON SO937 AAGCATGTTATTACAGCAGAA 513 CNOT2 4848 CAGCAGCGTTTCATAGGAAGA 537

FAM4 6C 54.855 CAGACTGATCGCCACCAAGAA, 514 15 NUP21OL 91181 CTGGCTGTCCGGCGTCATCAA 538 OR271 284383 ACCACAGTCCACAGCAGGATA 515 KCTD17 79734 CCCGGGCCTGAGAAGGAAGAA. 539

BCL2T1 598 CTGCTTGGGATAAAGATGCAA 516 WDR54 84 O58 CACGCTAAGGAGGGTGCTGGA. 540

ZFYVE9 93.72 AACTATAGTTGGGATGATCAA 517 TTC31 64427 TGCGATGGCGCCGATTCCAAA. 541

ST6GAL1. 648O AACCCTCAGCTTATGTAGCTA 518 GLTSCR1 29998 CCGCATCGGGCTCAAGCTCAA 542

FABP 6 2172 CACCATCGGAGGCGTGACCTA 519 PDLIM3 27295 CAGGACGGGAACTACTTTGAA. 543

USF2 7392 CCGGGAGTTGCGCCAGACCAA 52O KRTF8 1963f4. CAGCCTGTTCTGCTCGCTCAA 544 25 TSPYL1 729 TAGAACCGGTTGCAAGTTCAA 521 C6 or f31 221481 CAGGTTCACTCCAACTTCCTA As

SLC35E2 99 O6 CCAGCGTCCCTTTGTTGTGAA 522 PTPN23 2593 O CCGCCAGATCCTTACGCTCAA 546

RFXANK 86.25 CTCAGTCTTTGCGGACAAGAA 523 EPC2 26.122 CAGCAGTTAGTTCAGATGCAA 547 30 PLEKHG2 64 857 CAGGTTCAGCCAGACCCTCAA 524 RBP1 5947 TAGGAACTACATCATGGACTT 548

FLJ362O8 283948 CGCAATGTAGTTAGGTGCTCA 525 3. 8-1 352961 TTGGATGTCTTTGGGAACCAT 49

MEGF11 84.465 AAGAATCCGTGTGCAGTTCTA 526 C17orf79 5.5352 TTCCTTATTGACAGTGTTCAA issO 35 MLXIP 22877 CAGGACGATGACATGCTGTAT 527 SYT3 84258 TAGGGCGTAGTTGGTGCTGGA 551

TTLL12 231.70 CACGGTGAGCTGCCCAGTACA 528 SAR1B 51128 CACATTGGTTCCAGGTCTCAA iss2

ACOT4 122970 CAAACAGTCTCTGAACGGTTA 529 FLJ4 O243 13355.8 CACTGCGAAAGTGCTGACAAA 553

WPS33B 26.276 CAGCGTTGGATCAACACTGTA 53 O 40 SNORA27 619.499 CAAACTGGGTGTTTGTCTGTA 554

LRFNS 1455.81 CTCGGTTAGATGTGACATCAA 531 LOC559 O8 559 O8 CTGGGTCTCTATGGCCGCACA iss

CLDN17 26.285 TAGTAAGACCTCCACCAGTTA 532

45 This screen led also to the identification of 486 siRNAs hits SCN1A 63.23 ACGCATCAATCTGGTGTTCAT 33 that inhibit SC GS-induced mutagenic NHEJ repair ZNF521 25925 CAGCGCTTAAATCCAAGACTA 534 luciferase signal to at least a normalized luciferase activity inferior to 37.5 (see Table V below). TABL E V siRNA hits inhibiting GS SC-induced lusiferase signal to at least a normalized luciferase activity interior to 37, 5 Gene Targeted Gene ID siRNA target sequence SEQ ID NO

SNX3 8724 ATCGATGTGAGCAACCCGCAA 556

MYO1E 46.43 CACAGACGAACTCAGCTTTAA sist

MEGF9 1955 CAGGATGCCATCAGTCCTTTA 558

NOC3L 64318 AAGCATGAACGCATTATAGAT 559

HHIPL2 798 O2 CCCGTTCAGACCACTCGCCAA 560

ITIH4 37 OO ATGGATCGAAGTGACCTTCAA 561

2.83417 CTCCGTAATCAATGGAGCATA 562

US 9,044,492 B2 79 TABLE V- continued siRNA hits inhibiting GS SC-induced lusiferase signal to at least a normalized luciferase activity inferior to 37, B Gene Targeted Gene ID siRNA target sequence SEQ ID NO

TCP11L2 255.394 CAAGCTAATCTTATAGGTCAA 938

APOBEC4 4 O3314 TACCATATTCGAACAGGTGAA 939

CPN1 1369 CCGGTGGATGCACTCCTTCAA 94 O

FRAP1 24.75 CCGGAGTGTTAGAATATGCCA 941

PTBP1 sf2 CACGCACATTCCGTTGCCTTA 942

LGR5 8549 CAGCAGTATGGACGACCTTCA 943

ZNF567 163O81 TACCACTTCCGTAGCCTATAA 944

CHMP4C 9242.1 TGGCAGCTTGGGCTACCTAAA 945

NOL9 797 07 ATCCGGGTTCATCC TACATTT 946

KIAAO831 22863 CTCGGTGACCTCCTGGTTTAA 947

STRN 68 O1. CTGGAATACCACTAATCCCAA 948

ZNF576 791.77 CGGGCTGGTGCGACTATACTA 949

RPLPO 6175. CAAGAACACCATGATGCGCAA 950

CMTM3 123920 CTCCATCACGGCCATCGCCAA 951

ARHGEF1 91.38 CACCGATCACAAAGCCTTCTA 952

FOXD4 2298 CAGCGGCATCTGCGCCTTCAT 95.3

Pf6 1964 63 GTGGATGATCGTGGACTACAA 954

FTH1 2495 CGCCATCAACCGCCAGATCAA 955

C12orf53 1965 OO CACAATTACCATCTCCATCAT 956

RPS11 62O5 CCGAGACTATCTGCACTACAT 95.7

RHBDF2 796.51 CACGGCTATTTCCATGAGGAA 958

ALOX15B 247 TTGGACCTTATGGTCACCCAA 959

UNC13D 201294 CTGGTGTACTGCAGCCTTATA 96.O

PLEKHB1 58473 CAGACCGTGGTGGGCCTTCAA 961

PCNXL2 8 OOO3 CCGAAGGATCCTCATCCGCTA 962

DGCR5 2622O TACGTTCTAGCATCCATTCAA 963

FARSA 2193 CCGCTTCAAGCCAGCCTACAA 964

AGPAT1 105.54 ACGCAACGTCGAGAACATGAA 965

C19 of 63 284.361 CAAGACGGTCCTGATGTACAA 966

C18orf51 125704 AGCGCAGCGCGTAAACAACAA 967

TMEM31 2O3562 CACGTAGGACACCTACAACAT 968

TMEM54 113452 CCACTAGGACCCTGCAAGCAA 969

PML 5371 CAGGAGCAGGATAGTGCCTTT 97O

GABRD 2563 CACCTTCATCGTGAACGCCAA 971.

UNO9391 2O3 O74 CACCTCGTTGGTGAACTACAA 972

ITGA9 3 680 ACAGGTCACTGTCTACATCAA 973

PDZD8 11.8987 ACCGATCTCGTAGAACCTTCA 974.

GPX4 2879 GTGGATGAAGATCCAACCCAA 97.

US 9,044,492 B2 83 84 TABLE V- continued siRNA hits inhibiting GS SC-induced lusiferase signal to at least a normalized luciferase activity inferior to 37, B Gene Targeted Gene ID siRNA target sequence SEQ ID NO

HBSIL 1. Of 67 TACGTTACGGTGGTTCTACAA O13

TTC23 649.27 CTCCGGAACTGCCCTACTTTA O14

C3orf 44 131831 CAGCGAAGAGTACCTCTGGAA O15

ZNF271 10778 ACCCATGTAATCAGTGCAATA O16

CDGAP 57514 CTGATCTGGCCTGAGATTCAA O17

SBNO1 552O6 AAGGAGCTAGAATGTGGATAA O18

HIST1H2AE 3O12 CCGCAACGACGAGGAGCTAAA O19

C1orf 41 1668 CCGCTACTTACTTGAGATTCA O2O

TTC16 18248 CTGGTGGACTTCTATGCCTTA O21

LCE1D 3.31.34 TTCCTTCTGATTCTGCCTGAA O22

BPIL3 128859 CCCGGACTTTCTGGCCATGAA O23

SIWA1 1 Osf2 CACGCCGTGCATGGCAGCCTT O24

ARHGEF5 7984 TAGCCGTATGTTAAACAGAAT O25

PRSS8 652 ACCCATCACCTTCTCCCGCTA O26

COL9A1 1297 CACCGACAGATCAGCACATTA O27

PKD1L2 114780 CCGTGTTTGCTGAATGCACAA O28

PHF10 552.74 CGGACAGTTCCAGGAATATTA O29

MKS1 549 O3 ACCGACGAATCTTTACCTACA O3O

ARHGAP27 2O1176 CCGCAGGGTGTTCTTCTACAA O31

CXCL17 28434. O AGCGCCCACTCTTCCAATTAA O32

SRGAP2 23380 CTCGCTAATGTCAGTGCCAGA O33

ACTR6 644.31 GACGACCTTAGTGCTGGATAA O34

MIA 819 O CAGCGTTCAGGGAGATTACTA O35

OR8J1 219477 AGCTATTGTGGTTTCATCTTA O36

FLJ44 635 392490 AAGGCCCTGAGGGCAAAGGTA O37

SLC26A1 10861. CAGCCTCTATACGTCCTTCTT O38

CNN1 1264 AAGATCAATGAGTCAACCCAA O39

C19 orf23 148046 CACGACGTGGCAGACGAGGAA O4 O

TRPM2 7226 CAGGCCTATGTCTGTGAGGAA O41

Example 4 ss cellular model with a single copy of the substrate for NHEJ recombination at the RAG1 locus to measure at a chromo NHEJ GFP Reporter Gene Based Model in HEK293 Somal location the frequency of SC GS induced mutagen Cell Line esis and validate novel effectors increasing NHEJ efficiency. a) Material and Methods in order to validate the siRNAs hits issued from the primary 60 Design and Construction of Vector Monitoring GFP Mega high-throughput screening using the detection of a luciferase nuclease Induced NHEJ Mutagenesis signal, it was also useful to derive a new construct based on a The plasmids pCLS6810 (SEQID NO: 5) and pCLS6663 different reporter gene allowing the establishment of a corre- (SEQ ID NO: 6) were designed to quantify NHEJ repair lation between the efficiency of the NHEJactivity induced by frequency induced by SC GS or I-SceI meganucleases a meganuclease and the effect of the siRNAs hits. After its 65 respectively. These plasmids depicted in FIG. 6 are derived functional validation in a transient transfection assay in 293H from the hsRAG1 Integration Matrix CMV Neo used in cell line, such plasmid may be further used to establish a cGPS(R) Custom Human Full Kit DD of Cellectis Bioresearch. US 9,044,492 B2 85 86 pCLS6810 (SEQID NO: 5) and pCLS6663 (SEQID NO: 6) b) Results contain all the characteristics to obtain by homologous A) Extrachromosomal Validation of the NHEJ GFP Reporter recombination a highly efficient insertion eventofa transgene Vector DNA sequence of interest at the RAG1 natural endogenous In order to test the ability of the vector pCLS6810 (SeqID locus. They are composed of two homology arms of 1.8 kb NO: 5) to achieve efficiently NHEJ mutagenesis of GFP and 1.2 kb separated by i) an expression cassette of neomycin reporter gene induced by SC GS expression plasmid tran resistance gene driven by mammalian CMV promoter and ii) sient transfections in 96 well plate format were set up. FIGS. an expression, cassette for the Substrate of recombination 7A and B present the functional assays corresponding to monitoring NHEJ of GFP reporter gene driven also by CMV cotransfections of 100 ng of pCLS6810 (SEQID NO: 5) with promoter. As for the vectors pCLS6883 (SEQID NO: 1) and 10 150 ng of the SC GS expression vector pCLS2690 (SEQID pCLS6884 (SEQID NO: 2) described in FIG. 2 the sequence NO: 3) or the pCLS0002 (SEQID NO: 41) control plasmid. used to measure meganuclease-induced mutagenesis is made As presented in FIGS. 7A and B, we get a measurable of an ATG start codon followed by i) 2 codons for alanine ii) increase of the percentage of EGFP positive cells with the the tag HA sequence iii) GS or I-Sce recognition sites iv) a 15 pCLS2690 (SEQID NO: 3) expression plasmid in compari glycine serine stretch, V) the same 2 codons for alanine as in son with the transfection performed with the vector control i) and finally vi) a GFP reporter gene lacking its ATG start pCLS0002 (SEQID NO: 41). In fact, we get a percentage of codon. Since by itself GFP reporter gene is inactive due to a EGFP positive cells of 13.3% vs 6.2% with a fold increase frame-shift introduced by GS or I-SceI recognition sites, cre ratio of 2.1 obtained. These data imply that pCLS6810 can be ation of a DNA double strand break (DSB) by SC GS or used to further establish a cellular model allowing testing the I-SceI meganuclease (SEQ ID NO. 4 and SEQ ID NO: 40 potential effect of different siRNAs hits issued from the high respectively) followed by a mutagenic DSB repair event of throughput Lueiferase primary Screening on the modulation NHEJ can lead to restoration of GFP gene expression inframe of the efficiency of the NHEJ repair mechanism induced by a with the ATG start codon. custom meganuclease. Cell Culture 25 B) Functional Validation of the siRNAs Hits on the NHEJ Cell line 293H was cultured at 37° C. with 5% CO, in GFP Reporter Gene Based HEK293 Cell Line Dulbecco's modified Eagle's medium (DMEM) Glutamax The high-throughput screening of the siRNA human supplemented with 10% fetal calf serum, 2 mM L-glutamine, genome wide library has allowed the identification of several 100 UI/ml penicilline, 100 g/ml streptomycine, 0.25 ug/ml hundreds of potential hits (cf Table IV) able to increase amphotericine B (Fongizone). 30 SC GS-induced mutagenic NHEJ repair of a luciferase Cellular Transient Transfection for Functional Validation of reporter gene. To correlate such effect to an improvement of NHEJ GFP Reporter Plasmid the frequency of the NHEJ activity, siRNAs were tested in a One day prior transfection the 293H cell line was seeded in new cellular model described in this example with the read 96 well plate at the density of 15000 cells per well in 100 ul. 35 out of a different reporter gene EGFP. The next day, cells were transfected with Polyfect transfec Material and Methods: tion reagent (Qiagen), Briefly a quantity of total DNA of 200 a) Culture Conditions of the NHEJGFP Reporter Gene Based ng or 250 ng was diluted in 30 ul of water RNAse free. On the HEK293 Cell Model other hand 1.33 ul of Polyfect was resuspended in 20 ul of Same protocol as for the culture of the 293H cell line DMEM without serum. Then the DNA was added to the 40 except that the complete culture medium DMEM Glutamax Polyfect mix and incubated for 20 min. at room temperature. medium with penicilline (100 UI/ml), streptomycine (100 After the incubation period the total transfection mix (50 ul) ug/ml), amphotericine B (Fongizone) (0.25ug/ml), 10% FBS was added over plated, cells. After 96 h of incubation at 37° is supplemented with 0.25 mg/ml of G418 sulfate (Invitro C., cells were trypsinized and the percentage of EGFP posi gen-Life Science). tive cells was monitored by flow cytometry analysis (Guava 45 b) Making of Trex2/SC GS Fusion Protein Instrument) and corrected by the transfection efficiency. The Trex2 protein was fused to the SC GS meganuclease Stable Transfection to Generate 293H Based Cellular Model to its N-terminus using a ten amino acids glycin stretch Measuring Efficiency of Chromosomal Meganuclease-In (GGGGS) (SEQ ID NO: 1042) as linker. Both SC GS and duced Mutagenic NHEJ Repair Trex2 were initially cloned into the AscI/XhoI restriction One day prior to transfection, 293H cells are seeded in 10 50 sites of the pCLS1853 (FIG. 13, SEQID NO: 1043), a deriva cm tissue culture dishes (10° cells per dish) in complete tive of the pcDNA3.1 (Invitrogen), which drives the expres medium. The next day 3 ug of SC RAG encoding vector sion of a gene of interest under the control of the CMV pCLS2222 (SEQID NO:36) and 2 ug of plasmid measuring promoter. The fusion protein construct was obtained by SC GS induced GFP mutagenic NHEJ repair (pCLS6810 amplifying separately the two ORFs using a specific primer 55 and the primer CMVfor (5'-CGCAAATGGGCGGTAG SEQID NO: 5) are co transfected using 25 ul of Lipo GCGT-3'; SEQ ID NO: 1044) or V5reverse (5'-CGTA fectamine 2000 reagent (Invitrogen) during 6 hours according GAATCGAGACCGAGGAGAGG-3'; SEQ ID NO: 1045), to the instructions of the manufacturer. Three days following which are located on the plasmid backbone. Then, after a gel transfection, 2000 cells are seeded and G418 selection was purification of the two PCR fragments, a PGR assembly was added at 400 ug/ml one week after seeding. Neomycin resis 60 performed using the CMVfor/V5reverse oligonucleotides. tant clones were transferred in 96 well plate using Clone Fix The final PCR product was then digested by AscI and XhoI (Genetix) and cultured in presence of 400 ug/ml of G418 and and ligated into the pCLS18S3 digested with these same 50 uMof Gancyclovir (Sigma). Genomic DNA of Neomycin enzymes to generate the pCLS8054 (FIG. 14, SEQ ID NO: and Ganclovir resistant clones is extracted and targeted inte 1046) expression vector encoding the fused protein gration of a single copy of the transgene at the RAG1 locus 65 Trex2 SC GS (SEQID NO: 1049). The following table VI identified by specific PGR amplification. (cGPSR) Custom gives the oligonucleotides that were used to create the con Human Full Kit DD, Cellectis Bioresearch). Struct. US 9,044,492 B2 87 88 TABLE VI assessed using the statistical Student test analysis to eliminate such siRNAs that do not have a robust effect. The ratio of Oligonucleotides used to create the Trex2/SC GS construct EGFP positive cells percentage calculated between a siRNA SEQ SEQ hit and siRNA control AS leads to determine the stimulation Amplified Forward ID Reverse ID 5 factor of each siRNA. Construct ORF primer NO: primer NO: In parallel, using the same functional assay and the statis Trex2FSC GS Trex2 CMVfor 1044 Link1OTrexRev 1048 tical analysis method as described previously, functional vali SC GS Link10GSFor 1047 VSrewerse 1045 dation of the 223 siRNAs was also performed in the context of a cotransfection with an expression vector pCLS8054 (FIG. 10 b) Cellular Transfection in 96 Well Format for Functional 14, SEQID NO: 1046) encoding for the Trex2/SC GS (Seq Validation of the siRNAS Hits ID NO: 1049) protein consisting to N-terminus fusion Same protocol of cotransfection with polyfect as described between the meganuclease SC GS (Seq ID NO: 4) and a 236 in example 3 with 200ng of pCLS2690 DNA (SEQID NO:3) amino acid functional version (SEQ ID NO: 1050) of the or pCLS8054 (SEQID NO: 1046) plasmids and siRNA at a 15 exonuclease Trex2 (SEQID NO: 1051). In fact, human Trex2 final concentration of 33 nM. After, 96 h of incubation at 37° protein (SEQID NO: 1051) was choosen since it’s known to C., cells were trypsinized and the percentage of EGFP posi exhibit a 3' to 5' non processive exonuclease activity (Mazur tive cells was monitored by flow cytometry analysis (Guava and Perrino, 2001) that might be compatible with the degra Instrument) and corrected by the transfection efficiency. dation of the 3' DNA overhangs generated by the meganu Results: clease GS and with an improvement of its NHEJ mutagen The new cell line containing a single copy of the GFP esis in presence or not of siRNAs. In comparison with the reporter system integrated at RAG1 locus was first validated transfection of the NHEJ GFP reporter cell line with SC GS by comparing the frequency of the EGFP positive cells expression vector pCLS2690 (SEQID NO:3) quantification obtained after transfection with the empty vector pCLS0002 of the percentage of EGFP+ cells induced by the fused mega (SEQID NO: 41) to the one obtained with the SC GS encod 25 nuclease Trex2/SC GS encoded by pCLS8054 (SEQID NO: ing vectorpCLS2690 (SEQID NO:3). Typically, transfection 1046) was typically enhanced from 0.5%+A 0.1 to 1.8%+/- with pCLS0002 (SEQ ID NO: 41) gave no EGFP positive 0.7 (data not shown) demonstrating the increased efficiency cells as for untreated cells whereas transfections with SC GS (3.6 fold induction) of the fusion protein Trex2/SC GS to encoding vector (SEQ ID NO: 3) with no siRNA or with obtain mutagenic repair of the reporter gene. siRNA control AS led to detection of 0.5%+/-0.1 of EGFP 30 As indicated in Table VII below, among the 223 hits tested, positive cells (data not shown). This result, implies that, in 115 siRNAs are able to increase the percentage of EGFP comparison with the high-throughput cellular model moni positive cells induced by SC GS (SEQID NO: 4) or Trex2/ toring the effect of the siRNAs hits using the detection of a SC GS (SEQID NO: 1046) expression vectors with at least luciferase signal, this NHEJ GFP new cell line is useful to a stimulation factor of 2. Moreover, a group of 15 siRNAs establish a correlation between a percentage of GFP+ cells 35 corresponding to the ClassI have specifically an effect and a frequency of the NHEJ mutagenesis induced by SC GS detected in the context of a transfection with SC GS mega in presence of different siRNAs. nuclease, whereas another group of 63 siRNAs correspond In this example, the effect of 223 different siRNAs (220 ing to ClassII have an activity detected only in presence of the siRNAs identified with the high-throughput screening (cf Trex2/SC GS fused meganuclease. Finally, the ClassIII con Example 3) and three siRNAs issued from the results of the 40 cerns a group of 37 siRNAs that increase the percentage of extrachromosomal screening (cf. Example 3) and targeting GFP+ cells in the presence of either SC GS or Trex2/SC GS the genes FANCD2 (SEQID NO:39), AKT2 (SEQID NO: meganucleases. 15) and LIG4 (SEQID NO: 24) were monitored using the Altogether, Such data confirm the pertinence of the poten same siRNAS as those used, during the primary Screening. tial hits identified with the cellular model based on detection They were chosen based on the high luciferase signal stimu 45 of luciferase signal confirming the robustness of the method lation obtained. Co-transfections with SC GS encoding vec ology applied to determine the cellular genes able to increase tor (SEQID NO:3) were performed in 96w format at least in the efficiency of double-strand break-induced mutagenesis triplicates and the potential effect of siRNAs hits was by a meganuclease. TABLE VII Validation of siRNAs hits stimulating SC GS or Trex2/SC GS-induced EGFP activity with at least a 2 fold increase Gene SEO ID Effect with Effect with Class of Targeted Gene ID siRNA target sequence NO SC GS Trex2/SC GS Hits

AFG3L1P 172 CGGCTGGAAGTCGTGAACAAA. 14 O (-) (+) I

AKT2 208 CAAGCGTGGTGAATACATCAA 15 (-) (+) I

BRCA1 672 CTGCAGATAGTTCTACCAGTA 45 (+) (+) III

C16orf3 750 CTGGGACAACGCAGTGTTCAA 268 (+) (+) III

CAMK2G 818 GAGGAAGAGATCTATACCCTA 16 (+) (+) III

CAW3 859 TTGCGTTCACTTGTACTGTAA 126 (-) (+) I

CSH1 1442 ACGGGCTGCTCTACTGCTTCA 177 (+) (+) III US 9,044,492 B2 89 90 TABLE VII - continued Validation of siRNAs hits stimulating SC GS or Trex2/SC GS-induced EGFP activity with at least a 2 fold increase Gene SEO ID Effect with Effect with Class of Targeted Gene ID siRNA target sequence NO SC GS Trex2/SC GS Hits

CYP2A7 1549 CCCAAGCTAGGTGGCATTCAT 282 (+)

DDX3X 1654 AACGAGAGAGTTGGCAGTACA 184 (+)

DPP4 1803 ATCGGGAAGTGGCGTGTTCAA 163 (+) I

DUSP1 1843 CACGAACAGTGCGCTGAGCTA 106 (+) I

EEF1A1 1915 CAGAATAGGAACAAGGTTCTA 256 (+)

FANCE 21.78 TCGAATCTGGATGATGCTAAA 285 (+)

GNG4 2786 CCGAAGTCAACTTGACTGTAA 185 (+)

SFN 281 O CCGGGAGAAGGTGGAGACTGA 162 (-)

GRIN2C 2.905 CCCAGCTTTCACTATCGGCAA 167 (-)

GTF2I 2969 TAGGTGGTCGTGTGATGGTAA 141 (+)

HIST1H2AE 3 O12 ATCCCGAGTCCCAGAAACCAA (+) I

HMX2 31.67 CGGGCGCGTACTGTACTGTAA 83 (+)

HES1 328O CACGACACCGGATAAACCAAA. 232 (+)

IK 355 O CAGGCGCTTCAAGGAAACCAA 236 (+)

KCNC3 37.48 CAGCGGCAAGATCGTGATCAA 18O (+)

LRP5 CTGGACGGACTCAGAGACCAA 399 (+)

LTBR TACATCTACAATGGACCAGTA 328 (+)

NEK3 4752 CAGAGATATCAAGTCCAAGAA 238 (+) I

NMBR 4829 CCCGCGGACAGTAAACTTGCA 338 (+) I

PLG 534 O AAGTGCGGTGGGAGTACTGTA 235 (+)

PPP3CA 553 O TCGGCCTGTATGGGACTGTAA 227 (+)

PRKCE 5581 CCCGACCATGGTAGTGTTCAA (+)

PTPRA 5786 CCGGAGAATGGCAGACGACAA 315 (+) I

SSBP1 6742 AGCCTAAAGATTAGACTGTAA 277 (+)

TAF6 6878 CTGGGAGTGTCCAGAAGTACA 108 (+)

TALDO1 6888 CCGGGCCGAGTATCCACAGAA 111 (+) I

TXNRD1 7296 CCGACTCAGAGTAGTAGCTCA 153 (+)

WLDLR 7436 CAAGATCGTAGGATAGTACTA 368 (+)

SYNGAP1 88.31 CAGAGCAGTGGTACCCTGTAA 229 (+)

SART1 CAGCATCGAGGAGACTAACAA 222 (+) I

PRPF4 9128 TCCGGTCGTGAAGAAACCACA 228 (+) I

ZRANB2 CACGATCTTCATCACGCTCAT 335 (-)

H2AFY 9555 CAAGTTTGTGATCCACTGTAA 113 (+)

EIF4A3 9775 CCGCATCTTGGTGAAACGTGA 109 (+)

EIF4A3 9775 AAAGAGCAGATTTACGATGTA 353 (+)

LCMT2 9836 CAGGCGCGGTACAGAACACCA. (-)

EDIL3 CCCAAGTTTGTCGAAGACATT 178 (+) US 9,044,492 B2 91 92 TABLE VII - continued Validation of siRNAs hits stimulating SC GS or Trex2/SC GS-induced EGFP activity with at least a 2 fold increase Gene SEO ID Effect with Effect with Class of Targeted Gene ID siRNA target sequence NO SC GS Trex2/SC GS Hits

WAW3 10451 CACGACTTTCTCGAACACCTA 85 (-)

CAP1 104.87 AAGCCTGGCCCTTATGTGAAA 385 (-)

CAP1 104.87 CAACACGACATTGCAAATCAA 367 (+) I

CPLX2 10814 CAGATAGGTAGCAGAGACCAA 17s (+)

SLC6A14 11254 ACCAATAGTAACT CACTGTAA 137 (+)

RBPJL 11317 CTCAAAGGTCTCCCTCTTCAA 309 (+)

TRIM32 22954 CAGCACTCCAGGAATGTTCAA 278 (+)

SMG1 23 O49 CACCATGGTATTACAGGTTCA 33 (+)

MAPK8IP3 23162 CAGCCGCAACATGGAAGTCAC 271 (+)

FAM168A. 232O1 CAGCATTCCCTCTGCTATCTA 127 (+)

ANKRD12 23253 CCGGAGCGGATTAAACCACCA. 269 (+)

ISCU 234.79 CTCCAGCATGTGGTGACGTAA 128 (+)

PTPN22 26.191 TGGGATGTACGTTGTTACCAA 283 (+)

CYFIP2 26.999 CGCCCACGTCATGGAGGTGTA 122 (+)

KCNIP2 CAGCTGCAGGAGCAAACCAAA. 226 (+)

UBE2D4 51619 CCGAATGACAGTCCTTACCAA 237 (+) I

CLIC6 54102 CCGAATCTAATTCCGCAGGAA (+)

PAF1 54 623 CTCCACTGAGTTCAACCGTTA 98 (+)

BANP 54971 CAGCGACATCCAGGTTCAGTA 258 (+)

SETD5 55.209 AACGCGCTTGAACAACACCTA 221 (-)

OGFOD1 55239 TCGGACGCTGTTACGGAAGAA 196 (+) I

LARP6 5.5323 ATGGTGTCTTGTAGGACCAAA 151 (+)

IL17RB 5554. O CCGCTTGTTGAAGGCCACCAA 223 (+)

TMEM13 O 55769 TCCGTCAACAGTAGTTCCTTA 103 (+)

ST6GALNAC1 CCCACGACGCAGAGAAACCAA 225 (+) I

C12orf 62 56.245 CAACCTGATGTGCAACTGTAA 195 (+)

SEMA3G 5692 O CCCTGCCCTATTGAAACT CAA 267 (+)

INTS12 st 117 CAGGACCTAGTGGAAGTACTA. 97 (+)

ZFYWE28 sff.2 CAAGCCTGAAACAGACGACAA (+) I

SLC25A19 60386 CTCCCTGTGATCAGTTACCAA 299 (-)

PROK2 60 675 TCGCTCTGGAGTAGAAACCAA 115 (+) I

CHP2 63928 CAGGGCGACAATAAACTGTAT 138 (+)

PRDM14 63978 ACCGGCCTCACAAGTGTTCTA 34 O (-)

CARD9 64 17 O CAGCGACAACACCGACACTGA 149 (+) I

FAM59A 64762 AAGGGCAGATTTAGCACCCGA 189 (-)

BCL11B 64 919 CAGAGGTGGGTTAAACTGTAA 193 (+)

KCTD15 AACCTTGGAGATTCACGGCAA 187 (+) US 9,044,492 B2 93 94 TABLE VII - continued Validation of siRNAs hits stimulating SC GS or Trex2/SC GS-induced EGFP activity with at least a 2 fold increase Gene SEO ID Effect with Effect with Class of Targeted Gene ID siRNA target sequence NO SC GS Trex2/SC GS Hits

SECISBP2 79 0.48 TCCCAGTATCTTTATAACCAA 265 II

SAP3 OL 79685 CAAGAGCGTAAGGCAC CTATA 186

EPHX3. 79852 CAGCTCAGTGCTACTCTGAAT 243

PANK2 80025 CTGTGTGTGAACTTACTGTAA 331

ATF7 IP2 TAGGACGACTGAAATAACCAA 239

LRRC8C 84230 TACCTTATACTGGCTGTTCTA 81

CGB 94 O27 ACCAAGGATGGAGATGTTCCA 244

NUP35 1294 O1 CAGGACTTGGATCAACACCTT 35

LIPJ 14291. O AGGGTTGTTGTATACTTGCAA 98

CSAG2 1526.67 CTCCTTTATCTTCCAAACCAA 252

159296 CAGGTACAAGTGCAAGAGACA 46

FAM179A 1651.86 CAACGTCTCTATAGAGACCAA 88

221 656 ATCGATGCGGTATGAAACCAA 74

TMEM13 O 22.2865 CCCGCTGGTGCTTACTGGCAA

GKS 256356 TACCATCTTGTACGAGCAATA 416

KLHL34 25724 O CTCGGCAGTCGTGGAAACCAA 21

C10orf53 2829 66 CTGGAATGTGGTGGAACTCAT 254

FAM1 OOB 283991 CACGTTCTTCCAAGAAACCAA 249

CXorf59 2864 64 CTGTGAGTTCCTGTACACCTA 11 O

KRTAP13-3 33796 O CAGGACT CACATGCTCTGCAA 143

ADAMTSL5 33.9366 ATGCCTAACCAGGCACTGTAA 118

OTOP3 34.7741 TTGCCAGTACTTCACCCTCTA 257

PEAR1 375 O33 CTGCACGCTGCTCATGTGAAA 215

TMEM179 388 O21 CGGGCCGGCCATGGCGCTCAA 168

389 O84 CACAGACGATGTTCCACAGGA 233

38.933 6. ACCAGACAAGCCAGACGACAA 25 O

SAMD 5 389432 CTGCTCATAGGAGTTCAGTAA 114

TRIM61 391712 TAGGGTATGTATATGTTCCTA 139

RAM10 4 O1123 CCCGTTAGTGCTACACTCATT 247

4 O2415 CACCCATAATGTAGTAGACTA 2O4.

4 41024 CAGCGGTATATTAGTTCAGTT 89

SPRN 503542 CAGGAACATTCCCAAGCAGGA 96

CSAG2 728461 CCAGCCGAACGAGGAACT CAA 251

SNORD114-17 7675.95 ATGAATGATATGTGTCTGAAA 104 US 9,044,492 B2 95 96 (+) indicates detection of at least a 2 fold increase of the or Trex2/SC GS are also able to increase the frequency of the percentage of GFP+ cells NHEJ mutagenic repair of the reporter gene. For that purpose, (-) indicates absence of detection of at least a 2 fold increase the cell line described in this example was co-transfected of the percentage of GFP+ cells either with the plasmid pCLS2690 expressing SC GS (SEQ siRNAs ClassI: effect detected with meganuclease SC GS ID NO:3) and the siRNAs control AS and those targeting the siRNAs ClassII: effect detected with meganuclease Trex2/ genes CAP1 (SEQID NO:367), TALDO1 (SEQID NO: 111) SC GS and DUSP1 (SEQID NO: 106) or with the expressing vector siRNAs ClassIII: effect detected with meganuclease SC GS pCLS8054 encoding Trex2/SC GS (SEQID NO: 1046) and and Trex2/SC GS the siRNAS control AS and those targeting the genes C) Effect of the siRNAs on the NHEJ Repair Mutagenesis 10 Induced by the SC GS and Trex2/SC GS Meganucleases TALDO1 (SEQID NO: 111), DUSP1 (SEQID NO: 106) and In order to correlate the increase of the EGFP+ cells PTPN22 (SEQID NO: 283). Quantification of the percentage induced by SC GS or Trex2/SC GS in presence of siRNAs of GFP+ cells was determined by flow cytometry 4 days post hits identified precedently (cf Table VII) with an increase of transfection and frequency of mutagenesis determined by the frequency of the NHEJ repair activity of the reporter gene, 15 deep sequencing analysis. deep sequencing analysis was performed to quantify the fre As shown in table VIII the percentages of 0.96% and 8.86% quency of mutagenesis occurring at the site of the meganu of GFP+ cells induced by SC GS or Trex2/SC GS respec clease after its cleavage. tively in presence of the siRNA control AS were increased Material and Methods: with the different siRNAs tested. In the case of the transfec Transfection in the Cellular Model NHEJ EGFP Monitoring tion with SC GS, percentage of GFP+ cells was stimulated to Meganuclease-Induced Mutagenesis 1.47%, 1.85% and 1.45% with the siRNAs targeting respec One million of cells of the NHEJ GFP model were seeded tively the genes CAP1 (SEQID NO:367), TALDO1 (SEQID one day prior transfection. Cells were cotransfected with NO: 111) and DUSP1 (SEQ ID NO: 106) corresponding to either 3 ug of plasmid encoding SC GS (pCLS2690, SEQID stimulations factors of the GFP+ cells of 1.53, 1.93 and 1.51. NO:3) or Trex2/SC GS (pCLS8054, SEQID NO: 1046) in 5 Besides, comparatively to the co-transfection with Trex2/ ug of total DNA by complementation with an empty vector 25 SC GS and the siRNA control AS, we also observed an pCLS0003 (SEQID NO: 1052) in presence or not of siRNAs increase of the percentage of GFP+ cells to 18.06%, 15.07% at final concentrations of 5 nM, 10 nM or 20 nM depending on and 16.04% with the siRNAs targeting respectively the genes the siRNA used and 25 ul of lipofectamine (Invitrogen) TALDO1 (SEQID NO: 111), DUSP1 (SEQID NO: 106) and according to the manufacturers instructions. PTPN22 (SEQID NO: 283) leading to stimulations factors of 30 the GFP+ cells of 2.04, 1.70 and 1.82. Three to four days following transfection, cells were har This phenotypic stimulation of GFP+ cells was also con vested for flow cytometry analysis using Guava instrumenta firmed at a molecular level (cf Table VIII). In fact, SC GS tion and for genomic DNA extraction. Locus specific PCR (SEQID NO: 4) led to 4.7% of targeted mutagenesis whereas around the GS target site was performed using the following co-transfection of SC GS expressing plasmid with the siR primers: 5'-CCATCTCATCCCTGCGTGTCTCCGACT NAs CAP1 (SEQID NO:367), TALDO1 (SEQID NO: 111) CAGNNNNNNNNNNGCTCTCTCTG 35 and DUSP1 (SEQID NO: 106) stimulate this mutagenic DSB GCTAACTAGAGAACCC-3' (SEQ ID NO: 1053) (contain repair to 5.9%, 8.8% and 6.2% respectively. A similar result ing a forward adaptor sequence and a transgenic locus was obtained with the transfection with Trex2/SC GS (SEQ specific forward sequence, separated by a sequence of ten ID NO: 1049). In this case, the frequency of mutagenesis of variable nucleotides needed for PCR product identification) 19.3% with the siRNA control AS was increased repectively and 5'-CCTATCCCCTGTGTGCCTTGGCAGTCT 40 to 32.6%, 30%, and 37% with the siRNAs TALDO1 (SEQID CAGTCGATCAGCACGGGCACGATGCC-3' (SEQ ID NO:111), DUSP1 (SEQID NO: 106) and PTPN22 (SEQID NO: 1054) (containing a reverse adaptor sequence and a NO: 283). transgenic locus specific reverse sequence). PCR products Altogether these data and the result presented in FIG. 15 were sequenced by a 454 sequencing system (454 Life Sci demonstrate that after transfection of this NHEJGFP cell line 45 by SC GS or Trex2/SC GS meganuclease expressing plas ences). Approximately 10,000 sequences were obtained per mids with siRNAs hits, the percentage of GFP positive cells is PCR product and then analyzed for the presence of site increased and directly correlated to the mutagenic NHEJ specific insertion or deletion events. repair frequency at the meganuclease targeted site implying Results that siRNAS hits may be useful to improve targeted mutagen This example is focused on testing if siRNAs hits known to esis at different chromosomal locus cleaved by distinct cus stimulate the percentage of EGFP+ cells induced by SC GS tom meganucleases. TABLE VIII Deep sequencing analysis of the effect of siRNAs hits on NHEJ repair mutagenesis induced by the SC GS and Trex2/SC GS neganucleases. Meganuclease siRNA Seq ID % age GFP+ Stimulation factor of % age of NHEJ Stimulation factor of NHEJ used tested NO cells GFP+ cells mutagenesis mutagenesis Ctrl (pCLSO002) O.O1 O.O1 O.OO O.OO SC GS Ctrl AS 0.96 1.OO 4.70 1.00 (pCLS2690) CAP1 367 1.47 1.53 5.86 1.25 TALDO1 111 1.85 1.93 8.77 1.87 DUSP1 106 1.45 1.51 6.19 1.32 Trex2FSC GS Ctrl AS 8.86 1.OO 1926 1.00 (pCLS8054) TALDO1 111 18.06 2.04 32.60 1.69 DUSP1 106 15.07 1.70 30.00 1.56 PTPN22 283 16.14 1.82 37.01 1.92 US 9,044,492 B2 97 98 Example 5 2 ug of empty vector (pCLS0002, SEQ ID: 41) and 1 nM of siRNA targeting XRCC6 (SEQID NO:44), BRCA1 (SEQID Stimulation of Meganuclease-Induced Mutagenesis NO: 45), ATR (SEQID NO:46), FANCD2 (SEQID NO:39), at an Endogenous Locus Using siRNAS Targeting WRN (SEQID NO:37), MAPK3 (SEQID NO:38) or AS (a Specific Genes siRNA control with no known human targets). Those genes were chosen because of their implication in classical NHEJ In order to verify that a define siRNA could stimulate (XRCC6) or in NHEJ and other DNA repair pathways mutagenic DSB repair at an endogenous locus, siRNAS tar (BRCA1, FANCD2, WRN) or in DNA repair pathway (ATR) geting genes involved in DSB repair or siRNAs identified or in DNA repair regulation (MAPK3). Genomic DNA was during the screenings with the two cellular models were co 10 transfected in 293H cells with meganuclease SC RAG (SEQ extracted 2-3 days after transfection and was used to perform ID NO: 11 encoded by pCLS2222, SEQ ID NO: 36), or a PCR with primer allowing 454 sequencing technology. SCTrex2/SC RAG (SEQ ID NO: 1056 encoded by Sequences obtained per PCR were analyzed to determine the pCLS9573, FIG. 16 and SEQID NO: 1055) plasmid encod frequency and the nature of mutagenic DSB repair (insertion ing for the meganuclease SC RAG fused at its N terminus to 15 and or deletion) at RAG1 locus. a single chain version of Trex2 exonuclease. Mutagenic DSB Mutagenic DSB repair at RAG1 locus in presence of repair was monitored at molecular level by Deep Sequencing. siRNAAS appeared in 0.66%+/-0.13 of events analyzed. Materials and Methods When siRNA (SEQID NO:46) targeting ATR gene (Gene ID Cellular Transfection of 293HCell Line and PCRAnalysis of N545) was added, percentage of NHEJ was in the same Mutagenic DSB Repair range as with siRNA AS: 0.81%. The presence of siRNAs 293H cell line was plated at a density of 1x10 cells per 10 XRCC6 (SEQ ID NO. 44), BRCA1 (SEQ ID NO: 45), cm dish in complete medium (DMEM supplemented with 2 FANCD2 (SEQ ID NO:39), WRN (SEQ ID NO: 37) or mM L-glutamine, penicillin (100 IU/ml), streptomycin (100 MAPK3 (SEQ ID NO: 38) enhanced the percentage of mg/ml), amphotericin B (Fongizone: 0.25 mg/ml. Invitrogen mutagenic NHEJ repair up to 1.13%, 1.88%, 2.06% 2.15% Life Science) and 10% FBS). The next day, cells were trans 25 and 1.6%, respectively corresponding to stimulations factors fected in the presence of 25ul of lipofectamine reagent (Invit of 1.7.2.8, 3.1, 3.2, 2.4 (Table IX). Moreover the nature of the rogen) according to the manufacturer's protocol. Typically deletions was also modified for all those stimulating siRNAs cells were co-transfected with 2 ug of empty vector since they all presented larger deletion events (superior to 100 pCLS0002 (SEQ ID NO: 41), and 3 g of meganuclease bp) than the deletion observed with the other siRNAs (the expression vectors pCLS2222 (SEQ ID NO: 36) or 30 pCLS9573 (SEQ ID NO: 1055) in presence of siRNAs at a control AS and the siRNAATR cf. FIG. 5). Altogether these final concentration of 1 nM, 7.5 nM, 10 nM or 20 nM depend results demonstrate that siRNAS targeting genes involved in ing on the siRNA used. After 48 h to 72 h of incubation at 37° DNA repair mechanism or regulation can be used to increase C., cells were harvested for genomic DNA extraction with the and modulate the efficiency and the nature of mutagenic Blood and Cell culture DNA midikit (QIAGEN) according to 35 NHEJ repair induced by I-CreI meganuclease with a modified the manufacturer's protocol. PCR amplification reactions specificity and at a natural locus (cf. Table IX below). were performed using primers to obtain a fragment of RAG1 locus flanked by specific adaptor sequences. The forward TABL E IX primer contains the following sequence: 5'-CCATCTCATC siRNA stimulating endonuclease-induced mutagenesis CCTGCGTGTCTCCGACT 40 at RAG1 locus. CAGNNNNNNNNNNGGCAAAGATGAAT CAAAGATTCTGTCCT-3' (SEQID NO: 1057) (containing SEQ NHEJ a forward adaptor sequence and a RAG1 locus specific Gene Gene ID Stimulation sequence, separated by a sequence of four to ten variable targeted ID siRNA target sequence NO: factor nucleotides needed for PCR product identification) and the 45 XRCC6 2547 ACCGAGGGCGATGAAGAAGCA 44 1, 7 reverse primer contains the following sequence, 5'-CCTATC BRCA1 672 ACCATACAGCTTCATAAATAA 45 2, 8 CCCTGTGTGCCTTGGCAGTCTCAG GATCTCACCCGGAACAGCTTAAATTTC-3' (SEQ ID FANCD2 2177 AAGCAGCTCTCTAGCACCGAT 39 3, 1 NO: 1058) (containing a reverse adaptor sequence and a WRN 7486 CGGATTGTATACGTAACTCCA 37 3, 2 RAG1 locus specific sequence). The sequence of four to ten 50 variable nucleotides is a fixed sequence of different lengths MAPK3 5595 CCCGTCTAATATATAAATATA 38 2, 4 ranging from four to ten nucleotides (depending on the pro tocol of the manufacturer) included in the primer used to perform the PCR amplification and that allows to identify Moreover, the siRNAs CAP1 (SEQ ID NO: 367), VAV3 each sequence issued from a deep sequencing reaction with a 55 (SEQ ID NO: 85), PTPN22 (SEQID NO: 283), MTHFD2L mix of different PCR fragments corresponding to different (SEQID NO: 89), TALDO1 (SEQID NO: 111) and DUSP1 experimental conditions. PCR products were sequenced by a (SEQ ID NO: 106) identified with the screenings of the two 454 sequencing system (454 Life Sciences). Approximately cellular-models and belonging to three different classes of 10,000 exploitable sequences were obtained per PCR pool siRNAs defined in Table VII were also tested for their capac and then analyzed for the presence of site-specific insertion or 60 ity to increase the frequency of mutagenic repair at the endog deletion events. enous locus RAG. As previously described in this example, a) Results: 293H cell line was cotransfected with the expression plas siRNAs targeting genes known to be involved in DSB mids pCLS 2222 (Seq ID NO:36) or pCLS9573 (SEQ ID repair mechanism or regulation were tested to estimate their NO: 1055) encoding for the meganucleases SC RAG (SEQ potential for increasing mutagenic DSB repair by NHE.J. 65 ID NO: 11) and SCTrex2/SC RAG (SEQ ID NO: 1056) in 293H cell lines were co-transfected with 3 lug of SC RAG presence of the siRNA control AS or the different siRNAs meganuclease encoding vector (pCLS2222, SEQID NO:36), tested. Frequency of mutagenesis at RAG locus was analyzed US 9,044,492 B2 99 100 by deep sequencing to monitor the efficiency of each siRNA Blunt, T., N.J. Finnie, et al. (1995). “Defective DMA-depen to increase the mutagenic repair induced by each type of dent activity is linked to V(D)J recombina meganuclease. tion and DNA repair defects associated with the murine As shown in Table X below, in agreement with their scid mutation.” Cell 80(5): 813–23. belonging to different classes defined in Table VII die two 5 siRNAs CAP1 (SEQID NO: 367) and VAV3 (SEQ ID NO: Boch, J., H. Scholze, et al. (2009). "Breaking the code of 85) are able to increase the frequency of mutagenesis of DNA binding specificity of TAL-type III effectors. Sci SC RAG meganuclease with respectively stimulation factors ence 328(5959): 1509-12. of 1.43, 1.25 while the siRNA PTPN22 (SEQ ID NO: 283) Bolduc, J. M., P. C. Spiegel, et al. (2003). “Structural and enhances the NHEJ mutagenic repair of the SCTrex2/ biochemical analyses of DNA and RNA binding by a SC RAG meganuclease with a 1.39 fold increase. Moreover, 10 bifunctional homing endonuclease and group I intron the three siRNAs MTHFD2L (SEQ ID NO: 89), TALDO1 splicing factor.” Genes Dev 17(23): 2875-88. (SEQID NO: 111) and DUSP1 (SEQID NO: 106), known to Britt, A. B. (1999). “Molecular genetics of DNA repair in have an effect with SC GS or Trex2/SC/GS are also able to higher plants.” Trends Plant Sci 4(1): 20-25. increase the targeted mutagenesis induced by the meganu Burden and O. N. (1998). “Mechanism of action of eukary cleases SC RAG (SEQ ID NO: 11 encoded by pCLS2222, 15 otic topoisomerase II and drugs targeted to the enzyme.” SEQID NO:36) or SCTrex2/SC RAG (SEQID NO: 1056 encoded by pCLS9573, SEQID NO: 1055) with stimulations Biochim Biophy's Acta. 1400(1-3): 139-154. factors of respectively 1.22, 1.23 and 1.43 Capecchi, M. R. (1989). “The new mouse genetics: altering Altogether, these data imply that siRNAS targeting genes the genome by gene targeting.” Trends Genet 5(3): 70-6. involved in double strand break repair or other cellular pro Cathomen, T. and J. K. Joung (2008). “Zinc-finger nucleases: cess can be useful effectors to enhance the efficiency of NHEJ the next generation emerges.” Mol Ther 16(7): 1200-7. mutagenesis at natural endogenous locus targeted with dis Chames, P. J. C. Epinat, et al. (2005), “In vivo selection of tinct custom meganucleases fused or not to the Trex2 exonu engineered homing endonucleases using double-strand clease. break induced homologous recombination. Nucleic Acids 25 Res 33(20): e178. TABLE X Chevalier, B., M. Turmel, et al. (2003). “Flexible DNA target site recognition by divergent homing endonuclease isos Sec. Stimulation factor of NHE mutagenesis chizomers I-CreI and I-MsoIJ Mol Biol 329(2): 253-69. ID SC RAG SCTrex SC RAG Chevalier, B.S., T. Kortemme, et al. (2002). “Design, activity, siRNA Class siRNA tested NO (Seq ID NO: 11) (Seq ID NO: 1056) 30 and structure of a highly specific artificial endonuclease.” Ctrl AS 1.OO 1.00 Mol Cell 10(4): 895-905. I CAP1 367 1.43 ND Chevalier, B.S., R.J. Monnat, Jr., et al. (2001). “The homing I WAV3 85 1.25 ND endonuclease I-Creuses three metals, one of which is II PTPN22 283 ND 1.39 shared between the two active sites.” Nat Struct Biol 8(4): III MTHFD2L. 89 122 ND 35 31 2-6. III TALDO1 111 ND 1.23 Chevalier, B. S. and B. L. Stoddard (2001). “Homing endo III DUSP1 106 ND 1.43 nucleases: structural and functional insight into the cata Effect of siRNAs hits on NHEJ repair mutagenesis induced by the SC RAG and SCTrex2. lysts of intron/intein mobility.” Nucleic Acids Res 29(18): SC RAG meganucleases; ND: non determined. 3757-74. List of Cited References 40 Choulika, A., A. Perrin, et al. (1995). “Induction of homolo Arimondo, P. B., C.J. Thomas, et al. (2008), “Exploring the gous recombination in mammalian by using cellular activity of camptothecin-triple-helix-forming oli the I-SceI system of Saccharomyces cerevisiae.' Mol Cell gonucleotide conjugates. Mol Cell Biol 28(1): 324-33. Biol 15(4): 1968-73. Arnould, S. P. Chames, et al. (2006). “Engineering of large Christian, M., T. Cermak, et al. (2010). “Targeting DNA numbers of highly specific horning endonucleases that 45 double-strand breaks with TAL effector nucleases.” induce recombination on novel DNA targets.”J Mol Biol Genetic 186(2): 757-61. 355(3): 443-58, Cohen-Tannoudji, M. S. Robine, et al. (1998), “I-Scel-in Arnould, S., C. Perez, et al. (2007. “Engineered I-CreI duced gene replacement at a natural locus in embryonic derivatives cleaving sequences from the human XPC gene stem cells.” Mol Cell Biol 18(3): 1444-8. can induce highly efficient gene correction in mammalian 50 Critchlow, S. E. and S. P. Jackson (1998). “DNA end-joining: cells.” J Mol Biol 371(1): 49-65. from yeast to man. Trends Biochem Sci 23(10): 394-8. Ashworth, J. J. J. Havranek, et al. (2006). “Computational Donoho, G. M. Jasin, et al. (1998). “Analysis of gene target redesign of endonuclease DNA binding and cleavage ing and intrachromosomal homologous recombination specificity.” Nature 441 (7093): 656-9. stimulated by genomic double-strand breaks in mouse Audebert, M., B. Salles, et al. (2004). “involvement of poly 55 embryonic stem cells. Mol Cell Biol 18(7): 4070-8. (ADP-ribose) polymerase-1 and XRCC1/DNA ligase III in Doyon, J. B. V. Pattanayak, et al. (2006), “Directed evolution an alternative route for DNA double-strand breaks rejoin and Substrate specificity profile of homing endonuclease ing.” J Biol Chem 279(53): 551.17-26. I-SceI.”JAm ChemSoc. 128(7): 2477-84. Bau, D.T.Y. C. Mau, et al. (2006). “The role of BRCA1 in Doyon, Y., J. M. McCammon, et al. (2008). “Heritable tar non-homologous end-joining. Cancer Lett 240(1): 1-8. 60 geted gene disruption in Zebrafish using designed zinc Baumann, P. and S. C. West (1998). “DNA end-joining cata finger nucleases.” Nat Biotechnol 28(6): 702-8. lyzed by human cell-free extracts.” Proc Natl AcadSci USA Dujon, B., L. Colleaux, et al. (1988). “Mitochondrial introns 95(24): 14066-70. as mobile genetic elements: the role of intron-encoded Beumer, K. J. J. K. Trautman, et al. (2008). “Efficient gene proteins.” Basic Life Sci 40: 5-27. targeting in Drosophila by direct embryo injection with 65 Eisenschmidt, K., T. Lanio, et al. (2005). “Developing a pro zinc-finger nucleases.” Proc Natl AcadSci USA 105(50): grammed restriction endonuclease for highly specific 19821-6. DMA cleavage.” Nucleic Acids Res 33(22): 7039-47. US 9,044,492 B2 101 102 Elbashir, S. M., J. Harborth, et al. (2001). “Duplexes of Lloyd, A. C. L. Plaisier, et al. (2005), “Targeted mutagenesis 21-nucleotide RNAs mediate RNA interference in cultured using zinc-finger nucleases in Arabidopsis.' Proc Natl mammalian cells.” Nature 411 (6836): 494-8. AcadSci USA 102(6): 2232-7. Endo, M. K. Osakabe, et al. (2005). “Molecular characteri Mahaney, B. L., K. Meek, et al. (2009). “Repair of ionizing sation of true and ectopic gene targeting events at the radiation-induced DNA double-strand breaks by non-ho acetolactate synthase gene in Arabidopsis.' Plant Cell mologous end-joining.” Biochem J 417(3): 639-50. Physiol 47(3): 372-9. Mazur, D.J., and Perrino, F. W. (2001). “Excision of 3' ter Endo, M., K. Osakabe, et al. (2007). “Molecular breeding of mini by the Trex 1 and TREX2 3'->5' exonucleases. Char a novel herbicide-tolerant rice by gene targeting. Plant J acterization of the recombinant proteins' J Biol Chem. 52(1): 157-66, 10 276(20): 17022-9 Epinat, J. C. S. Arnould, et al. (2003), “A novel engineered McCaffrey, A. P. L. Meuse, et al. (2002). “RNA interference meganuclease induces homologous recombination in yeast in adult mice.” Nature 418(6893): 38-9. and mammaliancells. Nucleic Acids Res 31(11): 2952-62. Meister, G. and T. Tuschl (2004). “Mechanisms of gene Essers, J., H. van Steeg, et al. (2000). “Homologous and 15 silencing by double-stranded RNA.” Nature 431 (7006): non-homologous recombination differentially affect DMA 343-9. damage repair in mice.” Embo J 19(7): 1703-10, Meng. X., M. B. Noyes, et al. (2008). “Targeted gene inacti Feldmann, E. V. Schmiemann, et al. (2000), “DMA double Vation in Zebrafish using engineered zinc-finger strand break repair in cell-free extracts from Ku80-defi nucleases.” Nat Biotechnol 26(6): 695-701. cient cells: implications for Ku Serving as an alignment Moore, I., M. Samalova, et al. (2006). “Transactivated and factor in non-homologous DNA end joining. Nucleic chemically inducible gene expression in plants. Plant J Acids Res 28(13): 2585-96. 45(4): 651-83. Gimble, F. S., C. M. Moure, et al. (2003). “Assessing the Moscou, M.J. and A. J. Bogdanove (2009). A simple cipher plasticity of DNA target site recognition of the PI-SceI governs DNA recognition by TAL effectors.” Science 326 horning endonuclease using a bacterial two-hybrid selec 25 (5959): 1501. tion system.”J Mol Biol 334(5): 993-1008, Moure, C.M., F. S. Gimble, etal. (2002). “Crystal structure of Gouble, A. J. Smith, et al. (2006). “Efficient in toto targeted the intein homing endonuclease PI-SceI bound to its rec recombination in mouse liver by meganuclease-induced ognition sequence.” Nat Struct Biol 9(10): 764-70. double-strand break.' J Gene Med 8(5): 616-22. Moure, C. M., F. S. Gimble, et al. (2003). “The crystal struc Green, E. L. and T. H. Roderick (1966). “Radiation Genetics.” 30 ture of the gene targeting homing endonuclease I-Sce In Biology of the Green, E. L., eds. reveals the origins of its target site specificity.”J Mol Biol ((McGraw-Hill, New York)): 165-185. 334(4): 685-95. Grizot, S., J. Smith, et al. (2009). “Efficient targeting of a Nagy, Z. and E. Soutoglou (2009). "DNA repair: easy to SCID gene by an engineered single-chain homing endonu visualize, difficult to elucidate.” Trends Cell Biol 19(11): clease.” Nucleic Acid Res 37(18): 5405-19, 35 617-29. Haber, J. E. (1995). “In vivo biochemistry: physical monitor Needleman, S. B. and C. D. Wunsch (1970). “A general ing of recombination induced by site-specific endonu method applicable to the search for similarities in the cleases.” Bioessays 17(7): 809-20. amino acid sequence of two proteins. J Mol Biol 48(3): Hanin, M., S. Volrath, et al. (2001), “Gene targeting in Ara 443-53. bidopsis," Plant J28(8): 671-7, 40 Nouspikel, T. (2009). “DNA repair in mammalian cells: Hannon, G. J. (2002). “RNA interference.” Nature Nucleotide excision repair: variations on versatility. Cell 418(6894): 244-51. Mol Life Sci 66(6): 994-1009. Harborth, J., S. M. Elbashir, et al. (2001). “Identification of Pace, P. G. Mosedale, et al. (2010). “Ku70 corrupts DNA essential genes in cultured mammalian cells using Small repair in the absence of the Fanconi anemia pathway.” interfering RNAs.”J Cell Sci 114(Pt 24): 4557-65. 45 Science 329(5988): 219-23, Hutvagner, G., J. McLachlan, et al. (2001). A cellular func Padidam, M. (2003). “Chemically regulated gene expression tion for the RNA-interference enzyme Dicer in the matu in plants.” Curr Opin Plant Biol. 6(2): 169-77. ration of the let-7 Small temporal RNA. Science Paques, F. and P. Duchateau (2007). “Meganucleases and 293(5531): 834-8, DNA double-strand break-induced recombination: per Ichiyanagi, K.Y. Ishino, et al. (2000). “Crystal structure of an 50 spectives for gene therapy. Curr Gene Ther 7(1): 49-66. archaeal intein-encoded homing endonuclease PI-Pful.”J Pingoud, A. and G. H. Silva (2007). “Precision genome sur Mol Biol 300(4): 889-901. gery.” Nat Biotechnol 25(7):743-4. Kalish, J. M. and P. M. Glazer (2005), “Targeted genome Porteus, M. H. and D. Carroll (2005). "Gene targeting using modification via triple helix formation.” Ann NYAcad Sci Zinc finger nucleases.” Nat Biotechnol 23(8): 967-73. 1058: 151-61. 55 Postal, G. V. Kolisnychenko, et al. (1999). “Markerless gene Kim.Y. G., J. Cha, et al. (1996). “Hybrid restriction enzymes: replacement in Escherichia coli stimulated by a double Zinc finger fusions to FokI cleavage domain.” Proc Natl strand break in the chromosome. Nucleic Acids Res AcadSci USA 3(3): 1156-60. 27(22): 4409-15. Kirik, A., S. Salomon, et al. (2000). “Species-specific double Potenza, C. L. Aleman, et al. (2004). “Targeting transgene strand break repair and genome evolution in plants. Embo 60 expression in research, agricultural, and environmental J 19(20): 5562-6. applications: Promoters used in plant transformation. In Lee. Y., K. Jeon, et al. (2002). “MicroRNA maturation: step vitro cellular & developmental biology plant 40(1): 1-22. wise processing and Subcellular localization. Embo J Povirk, L. F. (1996). “DNA damage and mutagenesis by 21 (17): 4663-70. radiomimetic DNA-cleaving agents: bleomycin, neocarzi Li, T. S. Huang, et al. (2010). “TAL nucleases (TALNs): 65 nostatin and other enediynes. Mutat Res 355(1-2): 71-89. hybrid proteins composed of TAL effectors and FokI Puchta, H., B. Dujon, et al. (1996). “Two different but related DNA-cleavage domain.” Nucleic Acids Res 39(1): 359-72, mechanisms are used in plants for the repair of genomic US 9,044,492 B2 103 104 double-strand breaks by homologous recombination.” Stoddard, B. L. (2005). “Homing endonuclease structure and Proc Natl AcadSci USA 93(10): 5055-60. function.” O Rev Biophy's 38(1): 49-95. Rosen, L. E. H. A. Morrison, et al. (2006). “Homing endo Sussman, D., M. Chadsey, et al. (2004). “Isolation and char nuclease I-CreI derivatives with novel DNA target speci acterization of new homing endonuclease specificities at ficities.” Nucleic Acids Res. Rothstein, R. (1991). “Targeting, disruption, replacement, individual target site positions. J Mol Biol 342(1): 31-41. and allele rescue: integrative DNA transformation in Taccioii, G. E. G. Rathbun, et al. (1993), “Impairment of yeast.” Methods Enzymol 194: 281-301. V(D)J recombination in double-strand break repair Rouet, P. F. Smith, et al. (1994). “Expression of a site-specific mutants.” Science 280(5105): 207-10, endonuclease stimulates homologous recombination in Takata, M., M. S. Sasaki, et al. (1998), “Homologous recom mammalian cells.” Proc Natl AcadSci USA 91 (13): 6064 10 bination and non-homologous end-joining pathways of 8 DNA double-strand break repair have overlapping roles in Rouet, P. F. Smih, et al. (1994). “Introduction of double the maintenance of chromosomal integrity in Vertebrate strand breaks into the genome of mouse cells by expression cells.” Embo J 17(18): 5497-508. of a rare-cutting endonuclease.” Mol Cell Biol 14(12): 8096-106. 15 Teicher, B. A. (2008). “Next generation topoisomerase I Sargent, R. G., M. A. Brenneman, et al. (1997). “Repair of inhibitors: Rationale and biomarker strategies.” Biochem site-specific double-strand breaks in a mammalian chro Pharmacol 75(6): 1262-71. mosome by homologous and illegitimate recombination.” Terada, R.Y. Johzuka-Hisatomi, et al. (2007). “Gene target Mol Cell Biol 17(1): 267-77. ing by homologous recombination as a biotechnological Sarkaria, J. N. R. S. Tibbetts, et al. (1998). “inhibition of tool for rice functional genomics.” Plant Physiol 144(2): phosphoinositide 3-kinase related kinases by the radiosen 846-56. sitizing agent wortmannin.” Cancer Res 58(19): 4375-82. Terada, R., H. Urawa, et al. (2002). “Efficient gene targeting Seligman, L. M., K. M. Stephens, et al. (1997). “Genetic by homologous recombination in rice.” Nat Biotechnol analysis of the Chlamydomonas reinhardtti I-CreI mobile 20(10): 1030-4. intron homing system in Escherichia coli.' Genetics 147 25 Wang, H. B. Rosidi, et al. (2005). “DNA ligase III as a (4): 1653-64. candidate component of backup pathways of nonhomolo Shrivastav, M., L. P. De Haro, et ah (2008). “Regulation of gous end joining.” Cancer Res 65(10): 4020-30, DNA double-strand break repair pathway choice. Cell Res Wang, M., W. Wu, etal (2006), “PARP-1 and Ku compete for 18(1): 134-47. repair of DNA double strand breaks by distinct NHEJ Siebert, R. and H. Puchta (2002). “Efficient Repair of 30 pathways.” Nucleic Acids Res 34(21): 6170-82. Genomic Double-Strand Breaks by Homologous Recom Wang, R. X. Zhou, et al. (2003). “Chemically regulated bination between Directly Repeated Sequences in the Plant expression systems and their applications in transgenic Genome.” Plant Cell 14(5): 1121-31. plants.” Transgenic Res 12(5): 529-40. Silva, G. H., J. Z. Dalgaard, et al. (1999). “Crystal structure of Williams, R. S., J. S. Williams, et al. (2007). “Mre11-Rad50 the thermostable archaeal intron-encoded endonuclease 35 Nbs1 is a keystone complex connecting DNA repair I-DmoIJ Mol Biol 286(4): 1123-36. machinery, double-strand break signaling, and the chroma Simon, P. F. Cannata, et al. (2008). “Sequence-specific DNA tin template.” Biochem Cell Biol 85(4): 509-20. cleavage mediated by bipyridine polyamide conjugates.” Yi, R.Y. Qin, et al. (2003), “Exportin-5 mediates the nuclear Nucleic Acids Res 38(11): 3531-8. export of pre-microRNAs and short hairpin RNAs. Genes Smith, J., S. Grizot, et al. (2006). “A combinatorial approach 40 Dev 17(24): 301 1-6. to create artificial homing endonucleases cleaving chosen Zeng, Y., X. Cai, et al. (2005). “Use of RNA polymerase II to sequences.” Nucleic Acids Res 34(22): e149, transcribe artificial microRNAs.” Methods Enzymol 392: Spiegel, P. C., B. Chevalier, et al. (2006). “The structure of 371-80. I-CeuI homing endonuclease: Evolving asymmetric DNA Zuo, J. and N. H. Chua (2000). “Chemical-inducible systems recognition from a symmetric protein scaffold. Structure 45 for regulated expression of plant genes. Curr Opin Bio 14(5): 869-80. technol 11(2): 146-51.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 1058

<21 Os SEQ ID NO 1 &211s LENGTH: 124 67 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: pCLS6883

<4 OOs SEQUENCE: 1 atggcagcct at CCttatga cqttic ctgat tacgctggat ttacctgcCC Cagggtgaga 60 aagtic caagg aggct Cogga t c cqgcggitt Ctggat.ccgg cqgttctggit toccagc.cg 12O aagatgccaa aaacattaag aagggcc.ca.g. c9cc attct a cc cact cqaa gacgggaccg 18O

ccggcgagca gctgcacaaa gC catgaagc gctacgc.cct ggtgc.ccggc accatcgc.ct 24 O

ttaccgacgc acat atcgag gtgga catta cctacgc.cga gtactt.cgag atgagcgttc 3 OO

US 9,044,492 B2 131 132 - Continued aagcggittag ctic ctitcggit cotcc.gatcg ttgtcagaag taagttggcc gcagtgttat 558 O cact catggit tatggcagca citgcataatt citct tactgt catgc catcc gtaagatgct 564 O tittctgtgac tggtgagtac tdaac Caagt cattctgaga at agtgt atg cggcgaccga st OO gttgct cittg cc.cggcgt.ca atacgggata at acco.cgcc acatagcaga actittaaaag 576. O tgct catcat tggaaaacgt tott.cggggc gaaaact citc aaggat citta cc.gctgttga 582O gatccagttc gatgta accc act cqtgcac c caactgatc ttcagcatct tt tact t t ca. 588 O c cagcgtttic tgggtgagca aaaac aggaa ggcaaaatgc cgcaaaaaag ggaataaggg 594 O cgacacggaa atgttgaata ct catact ct t cottt t t ca. at attattga agcatttatc 6 OOO agggittattg t ct catgagc ggata catat ttgaatgitat ttagaaaaat aaacaaatag gggttcc.gcg cacattt CCC caaaagtgc cacctgacgt. 6101

SEQ ID NO 4 LENGTH: 354 TYPE : PRT ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: SC-GS

SEQUENCE: 4

Met Ala Asn. Thir Lys Tyr Asn Glu Glu Phe Lieu. Lell Tyr Luell Ala Gly 1. 5 1O 15

Phe Wall Asp Ala Asp Gly Ser Ile Ile Ala Glin Ile Pro Arg Glin 2O 25

Ser Arg Lys Phe Llys His Glu Lieu Ser Lieu. Thir Phe Asp Wall Thr Gin 35 4 O 45

Thir Glin Arg Arg Trp Phe Leu Asp Llys Lieu. Wall Asp Glu Ile Gly SO 55 6 O

Wall Gly Tyr Val Tyr Asp Ser Gly Ser Wal Ser Glin Luell Ser 65 70 7s 8O

Glu Ile Pro Leu. His ASn Phe Lieu. Thir Glin Lell Glin Pro Phe Lieu. 85 90 95

Glu Luell Gln Lys Glin Ala Asn Lieu Wall Lieu Ile Ile Glu Glin 1OO 105 11 O

Lell Pro Ser Ala Lys Glu Ser Pro Ala Lys Phe Lell Glu Wall Cys Thr 115 12 O 125

Trp Wall Asp Glin Ile Ala Ala Lieu. Asn Asp Ser Lys Thir Arg Lys Thr 13 O 135 14 O

Thir Ser Glu Thir Wall Arg Ala Val Lieu. Asp Ser Lell Ser Glu 145 150 155 160

Ser Ser Pro Ala Ala Gly Gly Ser Asp Llys Asn Glin Ala Lieu 1.65 17O 17s

Ser ASn Glin Ala Lieu. Ser Llys Tyr Asn Glin Ala Luell Ser Gly 18O 185 19 O

Gly Gly Gly Ser Asn Llys Llys Phe Lieu. Lieu. Tyr Lell Ala Gly Phe Wall 195 2OO

Asp Gly Asp Gly Ser Ile Ile Ala Glin Ile Llys Pro Arg Glin Gly Tyr 21 O 215 22O

Lys Phe His Glin Luell Ser Luell Thir Phe Glin Wall Thir Glin Lys Thr 225 23 O 235 24 O

Glin Arg Arg Trp Phe Lieu. Asp Llys Lieu Val Asp Arg Ile Gly Val Gly 245 250 255

Wall Ala Asp Arg Gly Ser Val Ser Asp Tyr Arg Lell Ser Glu Ile

US 9,044,492 B2 153 154 - Continued ccttctat cq c ctitcttgac gagttcttct gattaattaa Caggactgac cgtgctacga 1116 O gattitcgatt ccaccgcc.gc cittctatogaa aggttgggct tcggaatcgt. titt Cogggac 1122 O gcc.ggctgga tgatcc to ca gcgcggggat Ct catgctgg agttctt cqc CCaCCCCaC 1128O ttgtttattg cagcttataa tggittacaaa taaagcaata gcatcacaaa titt Cacalaat 1134 O aaag cattitt titt cactgca ttctagttgt ggtttgtc.ca aacticat Caa tgitat ctitat 114 OO catgtctgta taccgt.cgac ctictagotag agcttggcgt. a.a. 11442

<210s, SEQ ID NO 7 &211s LENGTH: 163 212. TYPE : PRT &213s ORGANISM: Chlamydomonas reinhardtii

<4 OO > SEQUENCE: 7

Met Asn. Thir Lys Tyr Asn Lys Glu Phe Lieu. Lieu Tyr Lell Ala Gly Phe 1. 1O 15

Wall Asp Gly Asp Gly Ser Ile Ile Ala Glin Ile Pro Asn Glin Ser 25

Phe Llys His Glin Lieu. Ser Lieu. Thir Phe Glin Wall Thir Gln Lys 35 4 O 45

Thir Glin Arg Arg Trp Phe Lieu. Asp Llys Lieu Val Asp Glu Ile Gly Val SO 55 6 O

Gly Tyr Wall Arg Asp Arg Gly Ser Val Ser Asp Ile Luell Ser Glu 65 70 7s 8O

Ile Pro Lieu. His ASn Phe Lieu Thr Glin Lieu. Gln Pro Phe Lieu Lys 85 90 95

Lell Glin Lys Glin Ala Asn Lieu Val Lieu Lys Ile Ile Glu Gln Lieu. 105 11 O

Pro Ser Ala Lys Glu Ser Pro Asp Llys Phe Lieu. Glu Wall Thir Trp 115 12 O 125

Wall Asp Glin Ile Ala Ala Lieu. Asn Asp Ser Lys Thir Arg Th Thr 13 O 135 14 O

Ser Glu Thir Val Arg Ala Val Lieu. Asp Ser Lieu. Ser Glu 145 150 155 160

Ser Ser Pro

SEQ ID NO 8 LENGTH: 22 TYPE: DNA ORGANISM: Artificial FEATURE: OTHER INFORMATION: C1221 target <4 OOs, SEQUENCE: 8 caaaacgt.cg tacgacgttt to 22

SEO ID NO 9 LENGTH: 24 TYPE: DNA ORGANISM: Artificial FEATURE: OTHER INFORMATION: SC-GS t arget

SEQUENCE: 9

Ctgc.cccagg gtgagaaagt c caa 24

<210s, SEQ ID NO 10 US 9,044,492 B2 155 156 - Continued

&211s LENGTH: 22 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: SC-RAG target <4 OOs, SEQUENCE: 10 tgttct cagg tacct cagcc ag 22

<210s, SEQ ID NO 11 &211s LENGTH: 354 212. TYPE: PRT <213> ORGANISM: artificial sequence 22 Os. FEATURE: 223 OTHER INFORMATION: SC-RAG

<4 OOs, SEQUENCE: 11 Met Ala Asn. Thir Lys Tyr Asn. Glu Glu Phe Lieu. Lieu. Tyr Lieu Ala Gly 1. 5 1O 15 Phe Val Asp Gly Asp Gly Ser Ile Ile Ala Glin Ile ASn Pro Asn Glin 2O 25 3O Ser Ser Llys Phe Lys His Arg Lieu. Arg Lieu. Thr Phe Tyr Val Thr Glin 35 4 O 45 Llys Thr Glin Arg Arg Trp Phe Lieu. Asp Llys Lieu Val Asp Glu Ile Gly SO 55 6 O Val Gly Tyr Val Arg Asp Ser Gly Ser Val Ser Glin Tyr Val Lieu. Ser 65 70 7s 8O Glu Ile Llys Pro Leu. His Asn Phe Lieu. Thr Gln Leu Gln Pro Phe Leu 85 90 95 Glu Lieu Lys Glin Lys Glin Ala Asn Lieu Val Lieu Lys Ile Ile Glu Glin 1OO 105 11 O Lieu Pro Ser Ala Lys Glu Ser Pro Asp Llys Phe Lieu. Glu Val Cys Thr 115 12 O 125 Trp Val Asp Glin Ile Ala Ala Lieu. Asn Asp Ser Llys Thr Arg Llys Thr 13 O 135 14 O Thir Ser Glu Thr Val Arg Ala Val Lieu. Asp Ser Lieu. Ser Gly Lys Llys 145 150 155 160 Llys Ser Ser Pro Ala Ala Gly Gly Ser Asp Llys Tyr Asn Glin Ala Lieu. 1.65 17O 17s Ser Lys Tyr Asn. Glin Ala Lieu. Ser Lys Tyr Asn. Glin Ala Lieu. Ser Gly 18O 185 19 O Gly Gly Gly Ser Asn Llys Llys Phe Lieu. Lieu. Tyr Lieu Ala Gly Phe Val 195 2OO 2O5 Asp Ser Asp Gly Ser Ile Ile Ala Glin Ile Llys Pro Arg Glin Ser Asn 21 O 215 22O Llys Phe Llys His Gln Leu Ser Lieu. Thr Phe Ala Val Thr Glin Llys Thr 225 23 O 235 24 O Glin Arg Arg Trp Phe Lieu. Asp Llys Lieu Val Asp Arg Ile Gly Val Gly 245 250 255

Tyr Val Tyr Asp Ser Gly Ser Val Ser Asp Tyr Arg Lieu. Ser Glu Ile 26 O 265 27 O

Llys Pro Lieu. His Asn. Phe Lieu. Thr Glin Lieu Gln Pro Phe Lieu Lys Lieu. 27s 28O 285

Lys Glin Lys Glin Ala Asn Lieu Val Lieu Lys Ile Ile Glu Gln Lieu Pro 29 O 295 3 OO

Ser Ala Lys Glu Ser Pro Asp Llys Phe Lieu. Glu Val Cys Thir Trp Val 3. OS 310 315 32O

US 9,044,492 B2 163 164 - Continued

&211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene AK2 <4 OOs, SEQUENCE: 14 cggcagaacc cagtatic ct a 21

<210s, SEQ ID NO 15 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene AKT2 <4 OOs, SEQUENCE: 15

Caag.cgtggit gaatacatca a 21

<210s, SEQ ID NO 16 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene CAMK2G

<4 OOs, SEQUENCE: 16 gaggaagaga tictataccct a 21

<210s, SEQ ID NO 17 <211 LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene GK2

<4 OOs, SEQUENCE: 17 tacgittagaa gag cactgta a 21

<210s, SEQ ID NO 18 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene PFKFB4

<4 OOs, SEQUENCE: 18

Cagaaagtgt Ctggacttgt a 21

<210s, SEQ ID NO 19 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene MAPK12

<4 OOs, SEQUENCE: 19 ctggacgt at t cactic ct ga t 21

<210s, SEQ ID NO 2 O &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene PRKCE

<4 OOs, SEQUENCE: 2O US 9,044,492 B2 165 166 - Continued cc.cgac catg gtagtgttca a 21

<210s, SEQ ID NO 21 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene EIF2AK2

<4 OOs, SEQUENCE: 21 cggaaagact tacgittatta a 21

<210s, SEQ ID NO 22 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene WEE1

<4 OOs, SEQUENCE: 22 Cagggtag at tacct cqqat a 21

<210s, SEQ ID NO 23 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene CDK5R1

<4 OOs, SEQUENCE: 23 ccggaaggcc acgctgtttg a 21

<210s, SEQ ID NO 24 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene LIG4

<4 OOs, SEQUENCE: 24 caccgttt at ttggacticgt a 21

<210s, SEQ ID NO 25 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene AKAP1 <4 OOs, SEQUENCE: 25 agcgctgaac ttgattggga a 21

<210s, SEQ ID NO 26 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene MAP3K6

<4 OOs, SEQUENCE: 26 t cagaggagc tigagtaatga a 21

<210s, SEQ ID NO 27 &211s LENGTH: 21 &212s. TYPE: DNA US 9,044,492 B2 167 168 - Continued

<213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene DYRK3

<4 OOs, SEQUENCE: 27 tcgacagtac gtggcc ctaa a 21

<210s, SEQ ID NO 28 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene RPS6KA4

<4 OOs, SEQUENCE: 28 cgccacct tc atggcattca a 21

<210s, SEQ ID NO 29 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene STK17A <4 OOs, SEQUENCE: 29 cacact cqtg atgtagttca t 21

<210s, SEQ ID NO 3 O &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM; Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene GNE <4 OOs, SEQUENCE: 30 cc.cgat catgtttggcatta a 21

<210s, SEQ ID NO 31 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene ERN2 <4 OOs, SEQUENCE: 31

Ctggttcggc gggaagttca a 21

<210s, SEQ ID NO 32 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene HUNK <4 OOs, SEQUENCE: 32 cacgggcaaa gtgcc.ctgta a 21

<210s, SEQ ID NO 33 &211s LENGTH: 21 &212s. TYPE: DNA <213> ORGANISM: Artificial 22 Os. FEATURE: <223> OTHER INFORMATION: Target sequence in the gene SMG1

<4 OOs, SEQUENCE: 33 caccatggta ttacaggttc a 21