US008252917B2

(12) United States Patent (10) Patent No.: US 8,252,917 B2 Mermod et al. (45) Date of Patent: Aug. 28, 2012

(54) HIGH EFFICIENCY TRANSFER AND 6,338,066 B1 1/2002 Martin et al. EXPRESSION IN MAMMALIAN CELLS BY A 6,410,314 B1 6/2002 Baiker et al. 6,426,446 B1 7/2002 McElroy et al. MULTIPLE TRANSFECTION PROCEDURE 6.429,357 B1 8/2002 McElroy et al. OF MAR SEQUENCES 6,437,217 B1 8/2002 McElroy et al. 6,521,449 B1 2/2003 Polack et al. 6,537,542 B1 3/2003 Treco et al. (75) Inventors: Nicolas Mermod, Buchillon (CH): 6,565,844 B1 5/2003 Treco et al. Pierre Alain Girod. Lausanne (CH): 6,569,681 B1 5/2003 Ivanov Philipp Bucher, Lausanne (CH): 6,573.429 B1 6/2003 Shinmyo et al. Duc-Quang Nguyen, Saint Prex (CH): 6,583,338 B2 6/2003 McElroy et al. 6,596,514 B2 7/2003 Morris et al. David Calabrese, Lausanne (CH): 6,635,806 B1 10/2003 Kriz et al. Damien Saugy, Lausanne (CH): 6,649,373 B2 11/2003 Brough et al. Stefania Puttini, Lausanne (CH) 6,660,521 B2 12/2003 Brough et al. 6,706,470 B2 3/2004 Choo et al. 6,730,826 B2 5/2004 Wagner et al. (73) Assignee: Selexis S.A., Plan-les-Ouates (CH) 6,747,189 B1 6/2004 McElroy et al. 6,783,756 B2 8/2004 Bujard et al. (*) Notice: Subject to any disclaimer, the term of this 6,821,775 B1 1 1/2004 Kovesdiet al. patent is extended or adjusted under 35 6,897,066 B1 5/2005 Harrington U.S.C. 154(b) by 569 days. 2002fOOO1579 A1 1/2002 Hillenberg et al. 2002.0068362 A1 6/2002 Murray et al. 2002fOO73448 A1 6/2002 Michallowski et al. (21) Appl. No.: 10/595,495 2002fOO94967 A1 7/2002 Antoniou et al. 2002fOO984.75 A1 7/2002 Luo et al. (22) PCT Filed: Oct. 22, 2004 2002/0103148 A1 8/2002 Agarwal et al. 2003, OO18997 A1 1/2003 Conkling et al. (86). PCT No.: PCT/EP2004/01 1974 (Continued) S371 (c)(1), (2), (4) Date: Apr. 24, 2006 FOREIGN PATENT DOCUMENTS EP O 113551 B1 4f1988 (87) PCT Pub. No.: WO2005/040377 (Continued) PCT Pub. Date: May 6, 2005 OTHER PUBLICATIONS (65) Prior Publication Data Kries et al. A non-curved chicken lysozyme 5' matrix attachment site US 2007/O178469 A1 Aug. 2, 2007 is 3' followed by a strongly curved DNA sequence. Nucleic Acids Research, vol. 18, No. 13, pp. 3881-3885, 1990.* Related U.S. Application Data AL38.9920, Homo sapiens 1 clone RP5-852H15. Jul. 10, 2001. McLay, K.* (60) Provisional application No. 60/513,574, filed on Oct. Gail Urlaub, et al., Deletion of the diploid dihydrofolate reductase 24, 2003. locus from cultured mammalian cells, Cell, Jun. 1983, pp. 405-412, vol. 33, MIT, US. (30) Foreign Application Priority Data Chao Chen and Lawrence A. Chasin, Cointegration of DNA Mol ecules Introduced into Mammalian Cells by Electroporation, Feb. 6, 2004 (EP) ...... O4OO2722 Somatic Cell and Molecular Genetics, Jul. 1998, pp. 249-256, vol. 24, No. 4. Springer Netherlands, US. (51) Int. Cl. C7H 2L/04 (2006.01) (Continued) CI2N 15/63 (2006.01) CI2N 15/85 (2006.01) CI2N 15/09 (2006.01) Primary Examiner — Celine Qian (52) U.S. Cl...... 536/24.1; 435/320.1; 435/325; (74) Attorney, Agent, or Firm — Joyce Von Natzmer, Agris 435/455 & von Natzmer LLP (58) Field of Classification Search ...... None See application file for complete search history. (57) ABSTRACT (56) References Cited The present invention relates to purified and isolated DNA sequences having production increasing activity and U.S. PATENT DOCUMENTS more specifically to the use of matrix attachment regions 4,094,640 A 6, 1978 Iwantscheffet al. (MARS) for increasing protein production activity in a 5,464,758 A 11/1995 Gossen et al. eukaryotic cell. Also disclosed is a method for the identifica 5,610,053 A 3/1997 Chung et al. tion of said active regions, in particular MAR nucleotide 5,773,695 A 6/1998 Thompson et al. 5,831,063 A 11/1998 Hughes-Jones sequences, and the use of these characterized active MAR 5,907,078 A 5/1999 Greenberg et al. sequences in a new multiple transfection method. 6,043,077 A 3, 2000 Barber et al. 6,245,974 B1* 6/2001 Michalowski et al. ... 800/317.3 6,252,058 B1 6/2001 Thompson 46 Claims, 15 Drawing Sheets US 8,252,917 B2 Page 2

U.S. PATENT DOCUMENTS Adam C. Bell and Gary Felsenfeld, Stopped at the border: boundries 2003/OO32597 A1 2/2003 Sebestyen and insulators, Current Opinion in Genetics & Development, 1999, p. 2003.0054548 A1 3/2003 Kaleko et al. 191-198, vol. 9, Elsevier Science Ltd., US. 2003/0O82552 A1 5, 2003 Wolffe et al. Xin Bi and James R. Broach, UASrpg can function as a 2003, OO87342 A1 5, 2003 Mermodet al. heterochromatin boundary element in yeast, & Development, 2003/01OOOTT A1 5, 2003 Korte et al. 1999, pp. 1089-1 101, vol. 13, Cold Spring Harbor Laboratory Press, 2003. O140363 A1 7/2003 Rapp US. 2003. O140364 A1 7/2003 Hichney et al. 2003. O157715 A1 8, 2003 Laemmli Jurgen Bode, et al., Transcriptional Augmentation: Modulation of 2003/0224477 A1 12/2003 Heartlein et al. Gene Expression by Scaffold/Matrix-Attached Regions (SMAR 2003/0228612 A1 12/2003 Kenward et al. Elements), Critical ReviewsTM in Eukaryotic Gene Expression, 2003/0232414 A1 12, 2003 Moore 2000, pp. 73-90, vol. 10(1), Begell House, Inc., US. 2004, OO16015 A1 1/2004 Nguyen et al. Eliette Bonnefoy, et al., Specific Binding of High-Mobility-Group I 2004/0038394 A1 2/2004 Kim et al. (HMGI) Protein and Histone H1 to the Upstream AT-Rich Region of 2004/OO72352 A1 4/2004 Kim et al. the Murine Beta Interferon Promoter: HMGI Protein Acts as a Poten 2004/0076954 A1 4/2004 Caldwell et al. 2004/OO77842 A1 4/2004 Himawan tial Antirepressor of the Promoter, Molecular and Cellular Biology, 2004/OO88764 A1 5, 2004 Gleba et al. Apr. 1999, pp. 2803-2816, vol. 19, No. 4, American Society for 2004/0103454 A1 5/2004 Conkling et al. Microbiology, US. 2004/O115776 A1 6/2004 Simesen et al. Otmane Boussif, et al. A versatile vector for gene and 2004/O126883 A1 7, 2004 Liu oligonucleotide transfer into cellsin culture and in vivo: 2004/0216189 A1 10, 2004 Houmard et al. Polyethylenimine, Biochemistry, Aug. 1995, pp. 7297-7301, vol.92. 2004/0221330 A1 1 1/2004 Klimyuk et al. 2004/0242512 A1 12/2004 Misawa et al. Proc. Natl. Acad. Sci. USA, US. 2005/0022262 A1 1/2005 Vance Joaquin Castilla, et al., Engineering passive immunity in transgenic 2005.0034.187 A1 2/2005 Golovko et al. mice Secreting virus-neutralizing antibodies in milk, Nature 2005/0050581 A1 3/2005 Harvey et al. Biotechnology, Apr. 1998, pp. 349-354, vol. 16, Nature Publishing 2005, OO64467 A1 3/2005 Ivanova et al. Group, US. 2005/0129669 A1 6, 2005 Treco et al. J. Patrick Condreay, et al., Transient and stable gene expression in 2005/O130267 A1 6, 2005 Wolffe et al. mammalian cells transduced with a recombinant baculovirus vector, Cell Biology, Jan. 1999, pp. 127-132, vol. 96, Proc. Natl. Acad. Sci. FOREIGN PATENT DOCUMENTS USA, US. EP O663921 B1 9, 1993 George W. Cox, et al., Molecular Cloning and Characterization of a EP 1135512 6, 2000 Novel MouseMacrophage Gene That Encodes a Nuclear Protein EP 1471,144 10, 2004 ComprisingPolyglutamine Repeats and Interspersing Histidines, The FR 2832423 A1 5/2003 Journal of Biological Chemistry, Oct. 11, 1996, pp. 25515-25523, WO WO 97.272O7 A1 7/1997 vol. 271, No. 41. The American Society for Biochemistry and WO 97,46687 A1 12/1997 WO OO/O5393 B1 2, 2000 Molecular Biology, US. WO WOOO/O5393 B1 2, 2000 Olivier Cuvier, et al., Identification of a Class of Chromatin Bound WO WOOO.20950 A1 4/2000 ary Elements, Molecular and Cellular Biology, Dec. 1998, pp. 7478 WO 00,32800 A1 6, 2000 7486, vol. 18, No. 12, American Society for Microbiology, US. WO WOOO32800 6, 2000 Database EMBL Online Feb. 11, 1995, “G. gallus lysozyme gene WO WOOO,53137 A1 9, 2000 promoter' X84223 retrieved from EBI accession No. WO O2/OO262 A2 1, 2002 EM VRT:X84223 Database accession No. X84223. WO WO O2/OO262 A2 1, 2002 Database EMBL Online Jul. 16, 1990, "Chicken Lysozyme gene WO O2/O9507 A1 2, 2002 intrinsically curved segment of DNA” X52989 retrieved from EBI WO 02/068669 A2 9, 2002 accession No. EM VRT:X52989 Database accession No. X52989. WO 02074969 A2 9, 2002 Database EMBL Online May 17, 2000, “Cloning vector pMAR WO WO O2/O72138 A1 9, 2002 WO WO O2/O77180 A2 10, 2002 luciferase reporter vector containing MAR insulator sequence'. WO WOO2O79447 10, 2002 AJ277960 retrieved from EBI accession No. EM SYN: AJ277960 WO WOO3,O24199 A2 3, 2003 Database accession No. AJ277960. WO WO 03/043415 A1 5, 2003 Database EMBL Online Jun. 14, 1996, G. gallus lysozyme gene 5' WO WO 2004/053106 A2 6, 2004 matrix attachment region (MAR) subfragment B-1-H1 X984.08 WO 2004/055.182 A1 T 2004 retrieved from EBI accession No. EM VRT:X984.08 Database WO WO 2004/070040 A1 8, 2004 accession No. X984.08. WO 2004/094640 A1 11, 2004 Database EMBLOnline Jan. 4, 2002, “Human DNA sequence from WO WO 2004/106375 A1 12/2004 clone RP4-743D20 on chromosome 1 Contains novel gene and a WO WO 2005/021765 A2 3, 2005 CpG island.” XP002322943 retrieved from EBI accession No. WO 2005/040377 A2 5, 2005 EM HUM:AL663105. WO WO 2005/040384 A1 5, 2005 Matthias Frisch, et al., InSilico Prediction of Scaffold Matrix Attach WO 2008O23247 A2 2, 2008 ment Regions in Large Genomic Sequences, Genome Research, 2001, pp. 349-354, vol. 12, Cold Harbor Laboratory Press, US. OTHER PUBLICATIONS Frank Grosveld, Activation by locus control regions?, Current Opin ion in Gentics & Development, 1999, pp. 152-157, vol. 9, Elsevier Bode, J. et al., Scaffold/Matrix-Attached Regions: Structural Prop Science Ltd., US. erties Creating Transcriptionally Active Loci, International Review Craig Hart and Ulrich Laemmli, Facilitation of chromatin dynamics of Cytology, vol. 162A; p. 389-444 (1995). by SARS, Current Opinion in Genetics & Development, 1998, pp. Manju Agarwal, et al., Scaffold Attachment Region-Mediated 519-525, vol. 8, Current Biology Limited, US. Enhancement of Retroviral Vector Expression in Primary T Cells, Thomas Jenuwein, et al., Extension of chromatin accessibility by Journal of Virology, May 1998, pp. 3720-3728, vol. 72, No. 5, Ameri nuclear matrix attachment regions, Nature, Jan. 16, 1997, pp. 269 can Society for Microbiology, US. 272, vol. 385, Nature Publishing Group, US. George C. Allen, et al., High-Level Transgene Expression in Plant Martin Jordan, et al., Transfecting mammalian cells: optimization of Cells: Effects of a Strong Scaffold Attachment Region from Tobacco, critical parameters affecting calcium-phosphate precipitate forma The Plant Cell, May 1996, pp. 899-913, vol. 8, American Society of tion, Nucleic Acids Research, 1996, pp. 596-601, vol. 24, No. 4. Plant Physiologists, US. Oxford University Press, UK. US 8,252,917 B2 Page 3

Michael Kalos and R. E. K. Fournier, Molecular and Cellular Biol Walters, et al., The Chicken b-Globin 59HS4 Boundary Element ogy, Jan. 1995, pp. 198-207, vol. 15, No. 1, American Society for Blocks Enhancer-Mediated Suppression of Silencing, Molecular and Microbiology, US. Cellular Biology, May 1999, pp. 3714-3726, vol. 19, No. 5, American Randal Kaufman and Phillip Sharp, Amplification and Expression of Society for Microbiology, US Molecular and Cellular Biology, May Sequences Cotransfected with a Modular Dihydrofolate Reductase 1999, pp. 3714-3726, vol. 19, No. 5, American Society for Microbiol Complementary DNA Gene, Journal of Molecular Biology, 1982, pp. ogy, US. 601-621, vol. 159, Academic Press Inc. (London) Ltd., UK. Yaolin Wang, et al., Ligand-inducible and liver-specific target gene Dagmar Klehr, et al., Scaffold-Attached Regions from the Human expression in transgenic mice, Nature Biotechnology, Mar. 1997, pp. Interferon, i3 Domain Can Be Used to Enhance the Stable Expression 239-243, vol. 15, Nature Publishing Group, US. of Genes under the Control of Various Promoters, Biochemistry, 1991, pp. 1264-1270, vol. 30, American Chemical Society, US. Kevin Wells, et al., Codon optimization, genetic insulation, and an Ted H.J. Kwaks, et al., Identification of anti-repressor elements that rtTA reporter improve performance of the tetracycline Switch, confer high and stable protein production in mammalian cells, Nature Transgenic Research, 1999, pp. 371-381, vol. 8, Kluwer Academic Biotechnology, May 2003, pp. 553-558, vol. 21, Nature Publishing Publishers, NL. Group, US. Monique Zahn-Zabal, et al., Development of stable cell lines for Victor Levitsky, et al., Nucleosomal DNA property database, production or regulated expression using matrix attachment regions, Bioinformatics, 1999, pp. 582-592, vol. 15, Nos. 7/8, Oxford Uni Journal of Biotechnology, 2001, pp. 29-42, vol. 87, Elsiver Science versity Press, UK. Ltd., US. Robert McKnight, et al., Martrix-attachment regions can impart posi Robert Pawliuk, et al., Retroviral vectors aimed at the genetherapy of tion-independent regulation of a tissue-specific gene in transgenic human beta-golbin gene disorder, Annals New York Academy of mice, Genetics, Aug. 1992, pp. 6943-6947, vol. 89, Proc. Natl. Acad. Sciences, 1998, pp. 151-162, vol. 850, New York Academy of Sci Sci. USA, US. ences, US. Sylvia Miescher, et al., CHO expression of a novel human recombiant Martin Fussenegger, et al., Genetic optimization of recombinant IgG1 anti-RhD antibody isolated by ohage display, British Journal of glycoprotein production by mammalian cells, TIBTECH. Jan. 1999, Haematology, 2000, pp. 157-166, vol. 111, Blackwell Science Ltd., pp. 35-42, vol. 17. Elsevier Science Ltd., US. UK. N. M. Greenberg, et al., The rat probasin gene promoter directs Grant MacGregor and C. Thomas Caskey, Construction of plasmids hormonally and developmentally regulated expression of a that express E. coli b-galactosidase in mammalian cells, Nucleic heterologous gene specifically to the prostate in transgenic mice, Acids Research, 1989, p. 2365, vol. 17, No. 6, IRL Press, US. Tobias Neff, et al., Stem Cell Gene Therapy, Position Effects and Molecular Endocrinology, 1994, pp. 230-239, vol. 8, No. 2. The Chromatin Insulators, Hematopoietic StemCells, StemCells, 1997. Endocrine Society, US. pp. 265-271, vol. 15(suppl 1), AlphaMed Press, US. Cornelia M. Gorman and Bruce H. Howard, Expression of recombi Bejamin Ortiz, et al., Adjacent DNA elements dominantly restrict the nant plasmids in mammalian cells is enhanced by Sodium butyrate, ubiquitous activity of a novel chromatin-opening region to specific Nucleic Acids Research, 1983, pp. 7631-7648, vol. 11, No. 21, IRL tissues, The EMBO Journal, 1997, pp. 5037-5045, vol. 16, No. 16, Press Limited, UK. Oxford University Press, UK. Markus O. Imhofetal. A regulatory network for the efficient control Loc Phi-Van, et al., The Chicken Lysozyme 5' Matrix Attachment of transgene expression, The Journal of Gene Medicine, 2000, pp. Region Increases Transcription from a Heterologous Promoter in 107-116, vol. 2, John Wiley & Sons, Ltd., US. Heterologous Cells and Dampens Position Effects on the Expression Aribert Stief, et al. A nuclear DNA attachment element mediates of Transfected Genes, Molecular and Cellular Biology, May 1990, elevated and position-independent gene activity, Nature, Sep. 28, pp. 2302-2307, vol. 10, No. 5, American Society for Microbiology, 1989, pp. 343-345, vol. 341, Nature Publishing Group, US. US. Girod Pierre-Alain et al: "Genome-wide prediction of matrix attach C. Piechaczek, et al. A vector based on the SV40 origin of replication ment regions that increase gene expression in mammalian cells' in and chromosomal SMARS replicates episomally in CHO cells, Nature Methods, vol. 4, No. 9, Aug. 5, 2007, pp. 747-753. Nucleic Acids Research, 1999, pp. 426-428, vol. 27. No. 2, Oxford Tianyun Wang et al: “Increased expression of transgene in stably University Press, UK. transformed cells of Dunaliella Salina by matrix attachment regions' Leonora Poljak, et al., SARS stimulate but do not confer position in Applied Microbiology and Biotechnology, Springer-Verlag, BE. independent gene expression, Nucleic Acids Research, 1994, pp. vol. 76, No. 3, Jul. 5, 2007, pp. 651-657. 4386-4394, vol. 22, No. 21, Oxford University Press, UK. Database EMBL, Jan. 12, 2006, Birren B. Nusbaum C. Lander E.: Pierre Rollini, et al. Identification and characterization of nuclear “Mus musculus chromosome 1, clone RP23-444A8' Database matrix-attach,emt regions in the human serpin gene cluster at 14q32. accession No. AC102666. 1, Nucleic Acids Research, 1999, pp. 3779-3791, vol. 27, No. 19, Database EMBL, May 16, 2004, Kruchowski Set al.:"The sequence Oxford University Press, UK. of Mus musculus BAC clone RP23-388E14' Database accession No. Gautam Singh, et al., Mathematical model to predict regions of AC134595. chromatin attachment to the nuclear matrix, Nucleic Acids Research, Whitelaw C B A et al: “Matrix attachment region regulates basal 1997, pp. 1419-1425, vol. 25, No. 7, Oxford University Press, UK. beta-lactoglobulin transgene expression' in Gene, Elsevier, T. D. Southgate, et al., Transcriptional Targeting to Anterior Pituitary Amsterdam, NL, vol. 244, No. 1-2, Feb. 2000, pp. 73-80. Lactotrophic Cells Using Recombinant Adenovirus Vectors in Vitro Girod Pierre-Alain et al: "Use of the chicken lysozyme 5' matrix and in Vivo in Normal and Estrogen/Sulpiride-Induced Hyperplasic attachment region to generate high producer CHO cell lines' in Anterior Pituitaries, Endocrinology, 2000, pp. 3493-3505, vol. 141, Biotechnology and Bioengineering, vol. 91, No. 1, Jul. 2005, pp. No. 9. The Endocrine Society, US. 1-11. Dale Talbot, et al., The 5' flanking region of the rat LAP (C/EBPf) Gutierrez-Adan A et al: "Effect of Flanking Matrix Attachment gene can direct high-level, position-independent, copy Regions on the Expression of Microinjected Transgenes During numberdependent expression in multiple tissues in transgenic mice, Preimplantation Development of Mouse Embryos' in Transgenic Nucleic Acids Research, 1994, pp. 756-766, vol. 22, No. 5, Oxford Research, London, GB, vol. 9, No. 2, Apr. 2000, pp. 81-89. University Press, US. Kim Jong-Mook et al: “Improved recombinant gene expression in Masaaki Tatsuka, et al., Experimental Cell Research, 1988, pp. 154 CHO cells using matrix attachment regions” in Journal of 162, vol. 178, Academic Press, Inc., SE. Biotechnology, Elsevier Science Publishers, Amsterdam, NL, vol. Andor Udvary, Dividing the empire: boundary chromatin elements 107, No. 2, Jan. 22, 2004, pp. 95-105. delimit the territory of enhancers, The EMBOJournal, 1999, pp. 1-8, Vain P et al: “Matrix Attachment Regions Increase Transgene vol. 18, No. 1. Expression Levels and Stability in Transgenic Rice Plants and Their Mark Walters, et al., The Chicken b-Globin 59HS4 Boundary Ele Progeny” in Plant Journal, Blackwell Scientific Publications, ment Blocks Enhancer-Mediated Suppression of Silencing.Mark Oxford, GB, vol. 18, No. 3, 1999, pp. 233-242. US 8,252,917 B2 Page 4

Liebich I et al: "Evaluation of sequence motifs found in scaffold? MacGregor, et al., “Construction of Plasmids that Express E. Coli matrix-attached regions (SMARs)” in Nucleic Acids Research, B-Galactosidase in Mammalian Cells. Nucleic Acids Research, vol. Oxford University Press, Surrey, GB, vol. 30, No. 15, Aug. 1, 2002, 17, No. 6, IRL Press, US, 1989 p. 2365. Kiehret al., “Scaffold-Attached Regions from the Human Interferon pp. 3433-3442. B Domain Can Be Used to Enhance the Stable Expression of Genes Liebich Ines et al: "S/MARt DB: A database on scaffold/matrix Under the Control of Various Promoters.” Biochemistry, vol. 30, attached regions' Nucleic Acids Research, vol. 30, No. 1, Jan. 1, 1991, pp. 1264-1270. 2002, pp. 372-374. Roulet et al., “Evaluation of computer tools for the prediction of Bode Juergen et al: “Transcriptional augmentation: Modulation of transcription factor binding sites on genomic DNA. Bioinformation gene expression by scaffold/matrix-attached regions (S/MAR ele Systems, e.V., available at http://www.bioinfo.de/isb/199801.0004/ ments)” in Critical Reviews in Eukaryotic Gene Expression, vol. 10, main.html, accessed Sep. 7, 2010. No. 1, 2000,pp. 73-90. Evans et al., “A comparative study of SMAR prediction tools.” BMC Kries et al: "A non-curved chicken lysyzyme matrix attachment site Bioinformatics, vol. 8 (71), Mar. 2, 2007, pp. 1-29. Zahn-Zabal et al., “Development of Stable Cell Lines for Production is 3' followed by a strongly curved DNA sequence” in Nucleic Acids or Regulated Expression Using Matrix Attachment Regions.” in Jour Research, Oxford University Press, Surrey, GB, vol. 18, No. 13, Jul. nal of Biotechnology, vol. 87, 2001, pp. 29-42. 11, 1990, pp. 3881-3885. Frisch et al., “In Silico Prediction of Scaffold Matrix Attachment Yamamura J et al: "Analysis of sequence-dependent curvature in Regions in Large Genomic Sequences.” in Genome Research, Cold matrix attachment regions” in FEBS Letters, Elsevier, Amsterdam, Spring Harbor Laboratory Press, Woodbury, NY. vol. 12(2), Feb. 1. NL, vol. 489, No. 2-3, Feb. 2, 2001, pp. 166-170. 2002, pp. 349-354. Boulikas Teni: “Nature of DNA sequences at the attachment regions Singh et al., “Mathematical Model to Predict Regions of Chromatin of genes to the nuclear matrix” in Journal of Cellular Biochemistry, Attachment to the Nuclear Matrix, in Nucleic Acid Research, vol. 25(7), 1997, pp. 1419-1425. vol. 52, No. 1, 1993, pp. 14-22. Levitsky et al., “Nucleosomal DNA Property Database.” in Singh GB et al: “Mathematical model to predict regions of chromatin Bioinformatics, vol. 15(7/8), 1999, pp. 582-592. attachment to the nuclear matrix” in Nucleic Acids Research, Oxford Cox et al., “Molecular Cloning and Characterization of a Novel University Press, Surrey, GB, vol. 25, No. 7, 1997, pp. 1419-1425. Mouse Macrophage Gene that Encodes a Nuclear Protein Compris Frisch M et al: “In silico prediction of scaffold matrix attachment ing Polyglutamine Repeats and Interspersing Histidines.” in The regions in large genomic sequences' in Genome Research, Cold Journal of Biological Chemistry, vol. 271 (41), Oct. 11, 1996, pp. Spring Harbor Laboratory Press, Woodbury, NY, US, vol. 12, No. 2, 25515-25523 Feb. 2002, pp. 349-354. Database EMBL online), “Human DNA Sequence from Clone Bode Jetal: "Scaffold/matrix-attached regions: Structural properties RP11-329A14 on Chromosome 1 Contains the 5' end of the SPATA6 creating transcriptionally active loci' in International Review of Gene for Spermatogenesis Associated 6, an Amyotrophic Lateral Cytology, Academic Press, 1995, pp. 389-454. Sclerosis 2 (Juvenile) Chromosome Region, Candidate 2 Kwaks et al: “Employing epigenetics to augment the expression of (ALS2CR2) Pseudogene, a Ribosomal Protein L21 (RPL21) therapeutic in mammaliancells' in Trends in Biotechnology, Pseudogene and a CpG Island.” XP002488536, May 26, 2000. Elsevier Publications, Cambridge, GB, vol. 24, No.3, Mar. 2006, pp. Kwaks et al., “Identification of Anti-Repressor Elements that Confer 137-142. High and Stable Protein Production in Mammalian Cells,” in Nature Tatsuka et al. An Improved Method of Electroporation for Introduc Biotechnology, Nature Publishing Group, New York, NY. vol. 21(5), ing Biologically Active Foreign Genes into Cultured Mammalian May 20, 2003, pp. 553-558. Cells, Exp Cell Res, 1988, vol. 178 pp. 154-162. Phi-Van & Staetling; The matrix attachment regions of the chicken Southgate et al. Transcriptional Targeting to Anterior Pituitary lysozyme gene co-map with the boundaries of the chromatin domain, Lactotrophic Cells Using Recombinant Adenovirus Vectors in Vitro and in Vivo in Normal and Estrogen/Sulpiride-Induced Hyperplasic EMBO J. 7, No. 3: 655-664 (1988). Anterios Pituitaries, Endocr, 2000, vol. 141 pp. 3493-3505. * cited by examiner U.S. Patent Aug. 28, 2012 Sheet 1 of 15 US 8,252,917 B2

FIG.1

highest bend highest major groove depth

3.0 3.5 4. 8.8 8.9 9. 9. 9.2 Degrees Angstron

highest minor groove width Lowest melting temperature

s es

g SO S. S3 S.4 5.5 SS 5.7 Angstron U.S. Patent Aug. 28, 2012 Sheet 2 of 15 US 8,252,917 B2

g is as

to is tri g 3. 3S 4. 3. SS s Band degrgus) Beddograos 0934923s 2s2

e 3 a-A- s 8 g O 3. O 30 35 t Berd degrees Bendelegrees rates sess

g i vi 8 s e Sis 8 83 s 9. 92 s. s 8. o 9. se Major groove) dopthiangstroit MorgrowodophargstroT sists ass88

8 3 s R as s 88 88 so t 92 s s s. s 3. 54 ss S8 Maky grooye depth angston) dinor groove width angstron re-24 O88,814

3 3

s 3 a s s s s. 52 s3 S4 8s 6s s s o s 8 Mirror groove width angstron) Mating lamperaturocolsius 89.400s falso U.S. Patent Aug. 28, 2012 Sheet 3 of 15 US 8,252,917 B2

FG.3

Highest tend Highestriajor groove depth

2,6 2, , , , , , 40 e 9. 9, Degrees Angstrom

ighestminor groove twidth Launest melting tampers thira

S. 5.9 5A 34 S.S .6 SS s Angstrom Celsius U.S. Patent Aug. 28, 2012 Sheet 4 of 15 US 8,252,917 B2

FIG.4

Number of hits

4,000,000 a Chromosome, 22 permutation in 10 bp windows 3,500,000 Native chromosome 22 w Scrambled chromosome 22 3,000,000 E. Rubbled chromosome 22 2,500,000 r Generated chromosome 22 by Markov chains 2,000,000 late: i 1,500,000 1,000,000 500,000 O

Chromosome. 22 permutation in 10 bp windows Native chromosome 22 E. Scrambled chromosome 22 Rubbled chromosome 22 Generated chromosome 22 by Markov chains U.S. Patent Aug. 28, 2012 Sheet 5 Of 15 US 8,252,917 B2

FG 5

eseajou,p?o-,laqunNJossaudxaqôIH

FG.6 U.S. Patent Aug. 28, 2012 Sheet 6 of 15 US 8,252,917 B2

FIG.7

NFAT NMP4 MEF2 pF (4x) pFI (8x)

pFI (8x) pFIA (16x) pFIB (16x) 3. pFI (8x) Siri

O 1 2 3 4 5 High Expressor Number Fold increase

FIG.8

sess ears U.S. Patent Aug. 28, 2012 Sheet 7 Of 15 US 8,252,917 B2

(A)

Meling termperature (C) a f As 2

Yme war. WWE" . . . . WAYY'M' M "W. ---. H. in Bend (degrees) late Are Art II wa? W.v. W."I 'y. v

HA r" Ma. A Major groove depth (A) s W T W WNA,

HWHYA/V st v Minor groove width (A) xsie s-r "I" . A A. . . A. AfW. W. e- Y Y " -- I - H -- H

(B) fair-a-go-NMP4 MEF2C) G) C. finalsEla (222 - . rist222s. File:1. U.S. Patent Aug. 28, 2012 Sheet 8 of 15 US 8,252,917 B2

g arries

SATB1 NMP4

U.S. Patent Aug. 28, 2012 Sheet 9 Of 15 US 8,252,917 B2

F.G. 12

Snass3)

Na (5459-a- t Aass8) PBLA)

Cla(AS3 Q c (422 \ Puul (4086

pGEGFP control -Mani (18) s claizes

Barris

Apati 209)

c (ass upstream polyA signal & clai (7149) -

EcoRI (119) 1. HindIII (1291

pPAGO1 SV40 EGFP Aua (882. 7285hp

Apa (sola) HindIII (asys Dan (4444) ": Y Clai (419. sy X-Munta (agas) Malos) SV40 promoter SV40 late polyA signal HindIII (369) U.S. Patent Aug. 28, 2012 Sheet 10 of 15 US 8,252,917 B2

FIG.14

U.S. Patent Aug. 28, 2012 Sheet 11 of 15 US 8,252,917 B2

FIG.15

A in an in

s

c s s A.I.A.A.WWN", W.A.WA 5 e - f | | | | |

B 4, it rangays or powe, Fls | U.S. Patent Aug. 28, 2012 Sheet 12 of 15 US 8,252,917 B2

FIG.16

168 MAR, + dox at day 0 80

75

168 MAR, + dox at day 21 No MAR, + dox at day 0

O o 20 30 40 50 60 70 Bo Time (days) U.S. Patent Aug. 28, 2012 Sheet 13 of 15 US 8,252,917 B2

FIG.17 gg

s S. f

TA dicleotide vs sent NA

t U.S. Patent Aug. 28, 2012 Sheet 14 of 15 US 8,252,917 B2

FIG.18 Ss

3 R

A

A dinucleotidews Bent NA

s:

8

TA U.S. Patent Aug. 28, 2012 Sheet 15 Of 15 US 8,252,917 B2

FIG.19

FIG.20

Iput lasia sistences

at w second legs s SAR Scar Espisearchando is

ek

Predicted Sid ARS (or precited structure fect reef transcription factorsberg) US 8,252,917 B2 1. 2 HIGH EFFICIENCY GENE TRANSFER AND for gene therapy is also recognized (Agarwal M, Austin T. EXPRESSION IN MAMMALIAN CELLS BY A Morel F. Chen J. Bohnlein E, and Plavec I (1998), “Scaffold MULTIPLE TRANSFECTION PROCEDURE attachment region-mediated enhancement of retroviral vector OF MAR SEQUENCES expression in primary T cells' J Virol 72, 3720-3728). Recently, it has been shown that chromatin-structure modi This is the U.S. national stage of International application fying sequences including MARS, as exemplified by the PCT/EP2004/01 1974, filed Oct. 22, 2004 designating the chicken lysozyme 5' MAR is able to significantly enhance United States and claiming the benefit of U.S. provisional reporter expression in pools of stable Chinese Hamster Ovary application 60/513,574, filed Oct. 24, 2003 and priority to (CHO) cells (Zahn-Zabal M, et al., “Development of stable European application EP04002722.9, filed Feb. 6, 2004. 10 cell lines for production or regulated expression using matrix attachment regions' J Biotechnol, 2001, 87(1): p. 29-42). FIELD OF THE INVENTION This property was used to increase the proportion of high producing clones, thus reducing the number of clones that The present invention relates to purified and isolated DNA need to be screened. These benefits have been observed both sequences having protein production increasing activity and 15 for constructs with MARS flanking the transgene expression more specifically to the use of matrix attachment regions cassette, as well as when constructs are co-transfected with (MARS) for increasing protein production activity in a the MAR on a separate plasmid. However, expression levels eukaryotic cell. Also disclosed is a method for the identifica upon co-transfection with MARs were not as high as those tion of said active regions, in particular MAR nucleotide observed for a construct in which two MARS delimit the sequences, and the use of these characterized active MAR transgene expression unit. A third and preferable process was sequences in a new multiple transfection method. shown to be the transfection of transgenes with MARs both linked to the transgene and on a separate plasmid (Girodet al., BACKGROUND OF THE INVENTION submitted for publication). However, one persisting limita tion of this technique is the quantity of DNA that can be Nowadays, the model of loop domain organization of 25 transfected per cell. Many multiples transfection protocols eukaryotic is well accepted (Boulikas T. have been developed in order to achieve a high transfection “Nature of DNA sequences at the attachment regions of genes efficiency to characterize the function of genes of interest. to the nuclear matrix'. J. Cell Biochem., 52:14-22, 1993). The protocol applied by Yamamoto et al., 1999 (“High effi According to this model chromatin is organized in loops that ciency gene transfer by multiple transfection protocol. His span 50-100 kb attached to the nuclear matrix, a proteina 30 to chem.J. 31(4), 241-243) leads to a transfection efficiency of ceous network made up of RNPs and other nonhistone pro about 80% after 5 transfections events, whereas the conven teins (Bode J, Stengert-Iber M, Kay V. Schalke T and Dietz tional transfection protocol only achieved a rate of <40%. Pfeilstetter A, Crit. Rev. Euk. Gene Exp., 6:115-138, 1996). While this technique may be useful when one wishes to The DNA regions attached to the nuclear matrix are termed increase the proportion of expressing cells, it does not lead to SAR or MAR for respectively scaffold (during metaphase) or 35 cells with a higher intrinsic productivity. Therefore, it cannot matrix (interphase) attachment regions (Hart C and Laemmli be used to generate high producer monoclonal cell lines. U (1998), "Facilitation of chromatin dynamics by SARs' Hence, the previously described technique has two major Curr Opin Genet Dev 8, 519-525.) drawbacks: As such, these regions may define boundaries of indepen i) this technique does not generate a homogenous popula dent chromatin domains, such that only the encompassing 40 tion of transfected cells, since it cannot favour the inte cis-regulatory elements control the expression of the genes gration of further gene copy, nor does it direct the trans within the domain. genes to favorable chromosomal loci, However, their ability to fully shield a chromosomal locus ii) the use of the same selectable marker in multiple trans from nearby chromatin elements, and thus confer position fection events does not permit the selection of doubly or independent gene expression, has not been seen in stably 45 triply transfected cells. transfected cells (Poljak L. Seum C. Mattioni Tand Laemmli In patent application WO02/074969, the utility of MARs U. (1994) “SARs stimulate but do not confer position inde for the development of stable eukaryotic cell lines has also pendent gene expression, Nucleic Acids Res 22,4386-4394). been demonstrated. However, this application does not dis On the other hand, MAR (or S/MAR) sequences have been close neither any conserved homology for MAR DNA ele shown to interact with enhancers to increase local chromatin 50 ment nor any technique for predicting the ability for a DNA accessibility (Jenuwein T. Forrester W. Fernandez-Herrero L, sequence to be a MAR sequence. Laible G, Dull M, and Grosschedl R. (1997) “Extension of In fact no clear-cut MAR consensus sequence has been chromatin accessibility by nuclear matrix attachment found (Boulikas T. “Nature of DNA sequences at the attach regions' Nature 385, 269-272). Specifically, MAR elements ment regions of genes to the nuclear matrix, J. Cell Bio can enhance expression of heterologous genes in cell culture 55 chem., 52:14-22, 1993) but evolutionarily, the structure of lines (Kalos M and Fournier R (1995) “Position-independent these sequences seem to be functionally conserved in eukary transgene expression mediated by boundary elements from otic genomes, since animal MARS can bind to plant nuclear the apolipoprotein B chromatin domain Mol Cell Biol 15, scaffolds and vice versa (Mielke C, Kohwi Y. Kohwi-Shige 198-207), transgenic mice (Castilla J, Pintado B, Sola, I, matsu Tand Bode J. “Hierarchical binding of DNA fragments Sanchez-Morgado J, and Enjuanes L (1998) “Engineering 60 derived from scaffold-attached regions: correlation of prop passive immunity in transgenic mice secreting virus-neutral erties in vitro and function in vivo”. Biochemistry, 29.7475 izing antibodies in milk' Nat Biotechnol 16, 349-354) and 7485, 1990). plants (Allen G. Hall GJ, Michalowski S, Newman W. Spiker The identification of MARs by biochemical studies is a S. Weissinger A, and Thompson W (1996), “High-level trans long and unpredictable process; various results can be gene expression in plant cells: effects of a strong scaffold 65 obtained depending on the assay (Razin S V. “Functional attachment region from tobacco Plant Cell 8, 899-913). The architecture of chromosomal DNA domains’. Crit Rev utility of MAR sequences for developing improved vectors Eukaryot Gene Expr., 6:247-269, 1996). Considering the US 8,252,917 B2 3 4 huge number of expected MARS in a eukaryotic genome and having protein production increasing activity, in particular the amount of sequences issued from genome projects, a tool MAR nucleotide sequences, and the use of these character able to filter potential MARS in order to perform targeted ized active MAR sequences in a new multiple transfection experiments would be greatly useful. method to increase the production of recombinant proteins in Currently two different predictive tools for MARs are eukaryotic cells. available via the Internet. The first one, MAR-Finder, Singh GB, Kramer J A and Krawetz, S.A., “Mathematical model to BRIEF DESCRIPTION OF THE FIGURES predict regions of chromatin attachment to the nuclear matrix”, Nucleic Acid Research, 25:1419-1425, 1997) is based on set of patterns identified within several MARs and a FIG. 1 shows the distribution plots of MARs and non statistical analysis of the co-occurrence of these patterns. 10 MARS sequences. Histograms are density plots (relative fre MAR-Finder predictions are dependent of the sequence con quency divided by the bin width) relative to the score of the text, meaning that predicted MARS depend on the context of observed parameter. The density histogram for human MARs the submitted sequence. The other predictive software, in the SMARt DB database is shown in black, while the SMARTest; Frisch M, Frech K, Klingenhoff A, Cartharius K. density histogram for the human chromosome 22 are in grey. Liebich I and Werner T. “In silico prediction of scaffold/ 15 FIG. 2 shows Scatterplots of the four different criteria used matrix attachment regions in large genomic sequences'. by SMARSCAN and the AT-content with human MARs from Genome Research, 12:349-354, 2001), use weight-matrices SMARt DB. derived from experimentally identified MARs. SMARTest is FIG.3 shows the distribution plots of MAR sequences by said to be Suitable to perform large-scale analyses. But actu organism. MAR sequences from SMARt DB of other organ ally aside its relative poor specificity, the amount of hypo isms were retrieved and analyzed. The MAR sequences den thetical MARS rapidly gets huge when doing large scale sity distributions for the mouse, the chicken, the Sorghum analyses with it, and in having no way to increase its speci bicolor and the human are plotted jointly. ficity to restrain the number of hypothetical MARS, SMART FIG. 4 shows SMARSCAN predictions on human chro est becomes almost useless to screen forpotent MARS form mosome 22 and on shuffled chromosome 22. Top plot: Aver large DNA sequences. 25 age number of hits obtained by SMAR SCAN with five: Some other softwares, not available via the Internet, also rubbled, scrambled, shuffled within nonoverlapping windows exists; they are based as well on the frequency of MAR motifs (MRS criterion; Van Drunen C Metal., “A bipartite sequence of 10 bp, order 1 Markov chains model and with the native element associated with matrix/scaffold attachment regions'. chromosome 22. Bottom plot: Average number of MARs Nucleic Acids Res, 27:2924-2930, 1999), (ChrClass; Glazko predicted by SMAR SCAN in five: rubbled, scrambled, G V et al., “Comparative study and prediction of DNA frag 30 shuffled within nonoverlapping windows of 10 bp, order 1 ments associated with various elements of the nuclear Markov chains model and with the native chromosome 22. matrix”, Biochim. Biophys. Acta, 1517:351-356, 2001) or FIG. 5 shows the dissection of the ability of the chicken based on the identification of sites of stress-induced DNA lysozyme gene 5'-MAR to stimulate transgene expression in duplex (SIDD; Benham C and al., “Stress-induced duplex CHO-DG44 cells. Fragments B, K and F show the highest DNA destabilization in scaffold/matrix attachment regions'. 35 ability to stimulate transgene expression. The indicated rela J. Mol. Biol., 274:181-196, 1997). However, their suitability tive strength of the elements was based on the number of to analyze complete genome sequences remains unknown, high-expressor cells. and whether these tools may allow the identification of pro FIG. 6 shows the effect of serial-deletions of the 5'-end tein production-increasing sequences has not been reported. (upper part) and the 3'-end (lower part) of the 5'-MAR on the Furthermore, due to the relatively poor specificity of these 40 loss of ability to stimulate transgene expression. The transi softwares (Frisch M. Frech K, Klingenhoff A, Cartharius K. tion from increased to decreased activity coincide with B-. K Liebich I and Werner T. “In silico prediction of scaffold/ and F-fragments. matrix attachment regions in large genomic sequences'. FIG. 7 shows that portions of the F fragment significantly Genome Research, 12:349-354, 2001), the amount of hypo stimulate transgene expression. The F fragment regions indi thetical MARS identified in genomes rapidly gets unmanage able when doing large scale analyses, especially if most of 45 cated by the light grey arrow were multimerized, inserted in these have no or poor activity in practice. Thus, having no way pGEGFP Control and transfected in CHO cells. The element to increase prediction specificity to restrain the number of that displays the highest activity is located in the central part hypothetical MARs, many of the available programs become of the element and corresponds to fragment FIII (black bar almost useless to identify potent genetic elements in view of labelled minimal MAR). In addition, an enhancer activity is efficiently increasing recombinant protein production. 50 located in the 3'-flanking part of the FIII fragment (dark grey Since all the above available predictive methods have some bar labelled MAR enhancer). drawbacks that prevent large-scale analyses of genomes to FIG.8 shows a map of locations for various DNA sequence identify reliably novel and potent MARs, the object of this motifs within the cIysMAR. FIG. 8 (B) represents a Map of invention is to 1) understand the functional features of MARs locations for various DNA sequence motifs within the cIys that allow improved recombinant protein expression; 2) get a 55 MAR. Vertical lines represent the position of the computer new Bioinformatic tool compiling MAR structural features as predicted sites or sequence motifs along the 3034 base pairs a prediction of function, in order to 3) perform large scale of the cysMAR and its active regions, as presented in FIG.5. analyses of genomes to identify novel and more potent The putative transcription factor sites, (MEF2 05, Oct-1, MARs, and, finally 4) to demonstrate improved efficiency to USF-02, GATA, NFAT) for activators and (CDP, SATB1, increase the production of recombinant proteins from eukary 60 CTCF, ARBP/MeCP2) for repressors of transcription, were otic cells or organisms when using the newly identified MAR identified using Matinspector (Genomatix), and CpG islands Sequences. were identified with CPGPLOT. Motifs previously associated with MAR elements are labelled in black and include CpG SUMMARY OF THE INVENTION dinucleotides and CpG islands, unwinding motifs 65 (AATATATT and MTATT), poly As and Ts, poly Gs and Cs. This object has been achieved by providing an improved Drosophila topoisomerase II binding sites (GTNWAYATT and reliable method for the identification of DNA sequences NATTNATNNR (SEQIDNO: 242)) which had identity to the US 8,252,917 B2 5 6 6 bp core and High mobility group I (HMG-I/Y) protein network, induced from the beginning of the experiment, dis binding sites. Other structural motifs include nucleosome play a better induction of the hematocrit in comparison of binding and nucleosome disfavouring sites and a motif mice injected by original network without MAR. After 2 thought to relieve the superhelical strand of DNA. FIG. 8(A) months, hematocrits in "MAR-containing group' is still at represents the comparison of the ability of portions of the 5 values higher (65%) than normal hematocrit levels (45-55%). cLysMAR to activate transcription with MAR prediction FIG. 17 represents the scatterplot for the 1757 S/MAR score profiles with MarFinder. The top diagram shows the sequences of the AT (top) and TA (bottom) dinucleotide per MAR fragment activity as in FIG. 5, while the middle and centages versus the predicted DNA bending as computed by bottom curves show MARFinder-predicted potential for SMARSCAN. MAR activity and for bent DNA structures respectively. 10 FIG. 9 shows the correlation of DNA physico-chemical FIG. 18 represents the dinucleotide percentage distribution properties with MAR activity. FIG.9(A), represents the DNA plots over the 1757 non-S/MARs sequences. melting temperature, double helix bending, major groove FIG. 19 shows the effect of various S/MAR elements on the depth and minor groove width profiles of the 5'-MAR and production of recombinant green fluorescent protein (GFP). were determined using the algorithms of Levitsky et al (Lev 15 Populations of CHO cells transfected with a GFP expression itsky V. G. Ponomarenko MP, Ponomarenko JV. Frolov AS, vector containing or a MAR element, as indicated, were ana Kolchanov NA“Nucleosomal DNA property database'. Bio lyzed by a fluorescence-activated cell sorter (FACS(R), and informatics, 15:582592, 1999). The most active B, K and F typical profiles are shown. The profiles display the cell num fragments depicted at the top are as shown as in FIG. 1. FIG. ber counts as a function of the GFP fluorescence levels. 9(B), represents the enlargement of the data presented in FIG. 20 depicts the effect of the induction of hematocritin panel A to display the F fragment map aligned with the mice injected by MAR-network. tracings corresponding to the melting temperature (top curve) and DNA bending (bottom curve). The position of the most DETAILED DESCRIPTION OF THE INVENTION active FIB fragment and protein binding site for specific transcription factors are as indicated. 25 The present invention relates to a purified and isolated FIG. 10 shows the distribution of putative transcription DNA sequence having protein production increasing activity factor binding sites within the 5'-cLysMAR. Large arrows characterized in that said DNA sequence comprises at least indicate the position of the CUE elements as identified with one bent DNA element, and at least one binding site for a SMARSCAN. DNA binding protein. FIG. 11 shows the scheme of assembly of various portions 30 Certain sequences of DNA are known to form a relatively of the MAR. The indicated portions of the cIysMAR were “static curve', where the DNA follows a particular 3-dimen amplified by PCR, introducing BglII-BamHI linker elements sional path. Thus, instead of just being in the normal B-DNA at each extremity, and assembled to generate the depicted conformation (“straight'), the piece of DNA can form a flat, composite elements. For instance, the top construct consists planar curve also defined as bent DNA (Marini, et al., 1982 of the assembly of all CUE and flanking sequences at their 35 “Bent helical structure in kinetoplast DNA. Proc. Natl. original location except that BglI-BamHII linker sequences Acad. Sci. USA, 79: 7664-7664). separate each element. Surprisingly, Applicants have shown that the bent DNA FIG. 12 represents the plasmid maps. element of a purified and isolated DNA sequence having FIG. 13 shows the effect of re-transfecting primary trans protein production increasing activity of the present invention fectants on GFP expression. Cells (CHO-DG44) were co 40 usually contains at least 10% of dinucleotide TA, and/or at transfected with pSV40EGFP (left tube) or pMAR least 12% of dinucleotide AT on a stretch of 100 contiguous SV40EGFP (central tube) and pSVneo as resistance plasmid. base pairs. Preferably, the bent DNA element contains at least Cells transfected with pMAR-SV40EGFP were re-trans 33% of dinucleotide TA, and/or at least 33% of dinucleotide fected 24 hours later with the same plasmid and a different AT on a stretch of 100 contiguous base pairs. These data have selection plasmid, pSVpuro (right tube). After two weeks 45 been obtained by the method described further. selection, the phenotype of the stably transfected cell popu According to the present invention, the purified and iso lation was analysed by FACS. lated DNA sequence usually comprises a MAR nucleotide FIG. 14 shows the effect of multiple load of MAR-contain sequence selected from the group comprising the sequences ing plasmid. The pMAR-SV40EGFP/pMAR-SV40EGFP SEQ ID Nos 1 to 27 or a cIysMAR element or a fragment secondary transfectants were used in a third cycle of trans 50 thereof. Preferably, the purified and isolated DNA sequence is fection at the end of the selection process. The tertiary trans a MAR nucleotide sequence selected from the group com fection was accomplished with pMAR orpMAR-SV40EGFP prising the sequences SEQID Nos 1 to 27, more preferably to give tertiary transfectants. After 24 hours, cells were trans the sequences SEQID Nos 24 to 27. fected again with either plasmid, resulting in the quaternary Encompassed by the present invention are as well comple transfectants (see Table 4). 55 mentary sequences of the above-mentioned sequences SEQ FIG. 15 shows comparative performance of SMAR predic ID Nos 1 to 27 and the cIysMAR element or fragment, which tion algorithms exemplified by region WP18A10A7. (A) can be produced by using PCR or other means. SMAR SCAN analysis was performed with default settings. An "element' is a conserved nucleotide sequences that (B) SIDD analysis (top curve and left-hand side scale), and bears common functional properties (i.e. binding sites for the attachment of several DNA fragments to the nuclear 60 transcription factors) or structural (i.e. bent DNA sequence) matrix in vitro (bar-graph, right-hand side scale) was taken features. from Goetze et al (Goetze S. Gluch A, Benham C. Bode J, A part of sequences SEQID Nos 1 to 27 and the cIysMAR “Computational and in vitro analysis of destabilized DNA element or fragment refers to sequences sharing at least 70% regions in the interferon gene cluster: potential of predicting nucleotides in length with the respective sequence of the SEQ functional gene domains. Biochemistry, 42:154-166, 2003). 65 ID Nos 1 to 27. These sequences can be used as long as they FIG. 16 represents the results of a a gene therapy-like exhibit the same properties as the native sequence from which protocol using MARs. The group of mice injected by MAR they derive. Preferably these sequences share more than 80%, US 8,252,917 B2 7 8 in particular more than 90% nucleotides in length with the tions into a eukaryotic host cell, the sequence is capable of respective sequence of the SEQID Nos 1 to 27. increasing protein production levels in cell culture as com The present invention also includes variants of the afore pared to a culture of cell transfected without said DNA mentioned sequences SEQID Nos 1 to 27 and the cIysMAR sequence. Usually the increase is 1.5 to 10 fold, preferably 4 element or fragment, that is nucleotide sequences that vary to 10 fold. This corresponds to a production rate or a specific from the reference sequence by conservative nucleotide sub cellular productivity of at least 10pg per cell per day (see stitutions, whereby one or more nucleotides are substituted by Example 11 and FIG. 13). another with same characteristics. As used herein, the following definitions are Supplied in The sequences SEQID Nos 1 to 23 have been identified by order to facilitate the understanding of this invention. scanning human chromosome 1 and 2 using SMARSCAN, 10 “Chromatin' is the protein and nucleic acid material con showing that the identification of novel MAR sequences is stituting the chromosomes of a eukaryotic cell, and refers to feasible using the tools reported thereafter whereas SEQID DNA, RNA and associated proteins. No 24 to 27 have been identified by scanning the complete A “chromatin element’ means a nucleic acid sequence on using the combined SMARSCAN method. a chromosome having the property to modify the chromatine In a first step, the complete chromosome 1 and 2 were 15 structure when integrated into that chromosome. screened to identify bent DNA element as region correspond “Cis' refers to the placement of two or more elements ing to the highest bent, major groove depth, minor groove (such as chromatin elements) on the same nucleic acid mol width and lowest melting temperature as shown in FIG. 3. In ecule (Such as the same vector, plasmid or chromosome). a second step, this collection of sequence was scanned for “Trans’ refers to the placement of two or more elements binding sites of regulatory proteins such as SATB1, GATA, (such as chromatin elements) on two or more different nucleic etc. as shown in the FIG. 8B) yielding sequences SEQ ID acid molecules (such as on two vectors or two chromosomes). 1-23. Furthermore, sequences 21-23 were further shown to be Chromatin modifying elements that are potentially capable located next to known gene from the Human Genome Data of overcoming position effects, and hence are of interest for Base. the development of stable cell lines, include boundary ele With regard to SEQID No 24 to 27 these sequences have 25 ments (BEs), matrix attachment regions (MARS), locus con been yielded by Scanning the human genome according to the trol regions (LCRs), and universal chromatin opening ele combined method and were selected as examples among ments (UCOEs). 1757 MAR elements so detected. Boundary elements (“BEs), or insulator elements, define Molecular chimera of MAR sequences are also considered boundaries in chromatin in many cases (Bell A and Felsenfeld in the present invention. By molecular chimera is intended a 30 G. 1999; "Stopped at the border: boundaries and insulators, nucleotide sequence that may include a functional portion of Curr Opin Genet Dev 9, 191-198) and may play a role in a MAR element and that will be obtained by molecular biol defining a transcriptional domain in vivo. BEs lack intrinsic ogy methods known by those skilled in the art. promoter/enhancer activity, but rather are thought to protect Particular combinations of MAR elements or fragments or genes from the transcriptional influence of regulatory ele Sub-portions thereofare also considered in the present inven 35 ments in the Surrounding chromatin. The enhancer-block tion. These fragments can be prepared by a variety of methods assay is commonly used to identify insulator elements. In this known in the art. These methods include, but are not limited assay, the chromatin element is placed between an enhancer to, digestion with restriction enzymes and recovery of the and a promoter, and enhancer-activated transcription is mea fragments, chemical synthesis or polymerase chain reactions sured. Boundary elements have been shown to be able to (PCR). 40 protect stably transfected reporter genes against position Therefore, particular combinations of elements or frag effects in Drosophila, yeast and in mammalian cells. They ments of the sequences SEQ ID Nos 1 to 27 and cLysMAR have also been shown to increase the proportion of transgenic elements or fragments are also envisioned in the present mice with inducible transgene expression. invention, depending on the functional results to be obtained. Locus control regions (LCRs) are cis-regulatory ele Elements of the cIysMAR are e.g. the B. K and F regions as 45 ments required for the initial chromatin activation of a locus described in WO 02/074969, the disclosure of which is and Subsequent gene transcription in their native locations hereby incorporated herein by reference, in its entirety. The (Grosveld, F. 1999, Activation by locus control regions?” preferred elements of the cIysMAR used in the present Curr Opin Genet Dev 9, 152-157). The activating function of invention are the B. Kand Fregions. Only one element might LCRS also allows the expression of a coupled transgene in the be used or multiple copies of the same or distinct elements 50 appropriate tissue in transgenic mice, irrespective of the site (multimerized elements) might be used (see FIG. 8A)). of integration in the host genome. While LCRs generally By fragment is intended a portion of the respective nucle confer tissue-specific levels of expression on linked genes, otide sequence. Fragments of a MAR nucleotide sequence efficient expression in nearly all tissues intransgenic mice has may retain biological activity and hence bind to purified been reported for a truncated human T-cell receptor LCR and nuclear matrices and/or alter the expression patterns of cod 55 a rat LAP LCR. The most extensively characterized LCR is ing sequences operably linked to a promoter. Fragments of a that of the globin locus. Its use in vectors for the gene therapy MAR nucleotide sequence may range from at least about 100 of sickle cell disease and (3-thalassemias is currently being to 1000 bp, preferably from about 200 to 700 bp, more pref evaluated. erably from about 300 to 500 bp nucleotides. Also envisioned "MARs', according to a well-accepted model, may medi are any combinations of fragments, which have the same 60 ate the anchorage of specific DNA sequence to the nuclear number of nucleotides present in a synthetic MAR sequence matrix, generating chromatin loop domains that extend out consisting of natural MAR element and/or fragments. The wards from the heterochromatin cores. While MARs do not fragments are preferably assembled by linker sequences. Pre contain any obvious consensus or recognizable sequence, ferred linkers are BgIII-BamHI linker. their most consistent feature appears to be an overall high AT “Protein production increasing activity” refers to an activ 65 content, and C bases predominating on one Strand (Bode J. ity of the purified and isolated DNA sequence defined as Schlake T. RiosRamirez M, Mielke C, Stengart M. Kay Vand follows: after having been introduced under suitable condi KlehrWirth D, “Scaffold/matrix-attached regions: structural US 8,252,917 B2 10 properties creating transcriptionally active loci', Structural ated or undifferentiated cells. Other suitable host cells are and Functional Organization of the Nuclear Matrix: Interna known to those skilled in the art. tional Review of Citology, 162A:389453, 1995). These The terms "host cell' and “recombinant host cell are used regions have a propensity to form bent secondary structures interchangeably herein to indicate a eukaryotic cell into that may be prone to strand separation. They are often referred which one or more vectors of the invention have been intro to as base-unpairing regions (BURS), and they contain a core duced. It is understood that such terms refer not only to the unwinding element (CUE) that might represent the nucle particular Subject cell but also to the progeny or potential ation point of Strand separation (Benham C and al., Stress progeny of Such a cell. Because certain modifications may induced duplex DNA destabilization in scaffold/matrix occur in Succeeding generations due to either mutation or attachment regions, J. MoL BioL, 274:181-196, 1997). Sev 10 environmental influences, such progeny may not, in fact, be eral simple AT-rich sequence motifs have often been found identical to the parent cell, but are still included within the within MAR sequences, but for the most part, their functional Scope of the term as used herein. importance and potential mode of action remain unclear. The terms “introducing a purified DNA into a eukaryotic These include the A-box (AATAAAYAAA (SEQ ID NO: host cell' or “transfection denote any process wherein an 243)), the T-box (TTWTWTTWTT (SEQ ID NO: 244)), 15 extracellular DNA, with or without accompanying material, DNA unwinding motifs (AATATATT. AATATT), SATB1 enters a host cell. The term “cell transfected' or “transfected binding sites (H-box, A/T/C25) and consensus Topoi cell' means the cell into which the extracellular DNA has somerase II sites for vertebrates (RNYNNCNNGYNGKT been introduced and thus harbours the extracellular DNA. NYNY (SEQ ID NO: 245)) or Drosophila (GTNWAYATT The DNA might be introduced into the cell so that the nucleic NATNNR (SEQID NO: 246)). acid is replicable either as a chromosomal integrant or as an Ubiquitous chromatin opening elements (“UCOEs, also extra chromosomal element. known as "ubiquitously-acting chromatin opening ele “Promoter as used herein refers to a nucleic acid sequence ments') have been reported in WO 00/05393. that regulates expression of a gene. An "enhancer is a nucleotide sequence that acts to poten *Co-transfection” means the process of transfecting a tiate the transcription of genes independent of the identity of 25 eukaryotic cell with more than one exogenous gene, or vector, the gene, the position of the sequence in relation to the gene, or plasmid, foreign to the cell, one of which may confer a or the orientation of the sequence. The vectors of the present selectable phenotype on the cell. invention optionally include enhancers. The purified and isolated DNA sequence having protein A “gene is a deoxyribonucleotide (DNA) sequence cod production increasing activity also comprises, besides one or ing for a given mature protein. As used herein, the term 30 more bent DNA element, at least one binding site for a DNA 'gene' shall not include untranslated flanking regions such as binding protein. RNA transcription initiation signals, polyadenylation addi Usually the DNA binding protein is a transcription factor. tion sites, promoters or enhancers. Examples of transcription factors are the group comprising A "product gene' is a gene that encodes a protein product the polyOpolyP domain proteins. having desirable characteristics Such as diagnostic or thera 35 Another example of a transcription factor is a transcription peutic utility. A product gene includes, e.g., structural genes factor selected from the group comprising SATB1, NMP4, and regulatory genes. MEF2, S8, DLX1, FREAC7, BRN2, GATA 1/3, TATA, A "structural gene' refers to a gene that encodes a struc Bright, MSX, AP1, C/EBP, CREBP1, FOX, Freac7, HFH1, tural protein. Examples of structural genes include but are not HNF3alpha, Nkx25, POU3F2, Pit 1, TTF1, XFD1, AR, limited to, cytoskeletal proteins, extracellular matrix pro 40 C/EBPgamma, Cdc5, FOXD3. HFH3, HNF3 beta, MRF2, teins, enzymes, nuclear pore proteins and nuclear Scaffold Oct1, POU6F1, SRF, VSMTATA B, XFD2, Bach2, CDP proteins, ion channels and transporters, contractile proteins, CR3, Cdx2, FOXJ2, HFL, HP1, Myc, PBX, Pax3, TEF, VBP. and chaperones. Preferred structural genes encode for anti XFD3, Brn2, COMP1, Evil, FOXP3, GATA4, HFN1, Lhk3, bodies or antibody fragments. NKX3A, POU1F1, Pax6, TFIIA or a combination of two or A "regulatory gene' refers to a gene that encodes a regu 45 more of these transcription factors are preferred. Most pre latory protein. Examples of regulatory proteins include, but ferred are SATB1, NMP4, MEF2 and polyOpolyP domain are not limited to, transcription factors, hormones, growth proteins. factors, cytokines, signal transduction molecules, oncogenes, SATB1, NMP4 and MEF2, for example, are known to proto-oncogenes, transmembrane receptors, and protein regulate the development and/or tissue-specific gene expres kinases. 50 sion in mammals. These transcription factors have the capac “Orientation” refers to the order of nucleotides in a given ity to alter DNA geometry, and reciprocally, binding to DNA DNA sequence. For example, an inverted orientation of a as an allosteric ligand modifies their structure. Recently, DNA sequence is one in which the 5' to 3' order of the SATB1 was found to form a cage-like structure circumscrib sequence in relation to another sequence is reversed when ing heterochromatin (Cai S. Han HJ, and Kohwi-Shigematsu compared to a point of reference in the DNA from which the 55 T. "Tissue-specific nuclear architecture and gene expression sequence was obtained. Such reference points can include the regulated by SATB1 ' Nat Genet, 2003.34(1): p. 42–51). direction of transcription of other specified DNA sequences Yet another object of the present invention is to provide a in the source DNA and/or the origin of replication of repli purified and isolated cIysMAR element and/or fragment, a cable vectors containing the sequence. sequence complementary thereof, a part thereof sharing at “Eukaryotic cell” refers to any mammalian or non-mam 60 least 70% nucleotides in length, a molecular chimera thereof, malian cell from a eukaryotic organism. By way of non a combination thereof and variants. limiting example, any eukaryotic cell that is capable of being More preferably, the cIysMAR element and/or fragment maintained under cell culture conditions and Subsequently are consisting of at least one nucleotide sequence selected transfected would be included in this invention. Especially from the B. K and F regions. preferable cell types include, e.g., stem cells, embryonic stem 65 A further object of the present invention is to provide a cells, Chinese hamster ovary cells (CHO), COS, BHK21, synthetic MAR sequence comprising natural MAR element NIH3T3, HeLa, C2C12, cancer cells, and primary differenti and/or fragments assembled between linker sequences. US 8,252,917 B2 11 12 Preferably, the synthetic MAR sequence comprises a cIys Preferably, DNA bending values are comprised between 3 MAR element and/or fragment a sequence complementary to 5° (radial degree). Most preferably they are situated thereof, a part thereof sharing at least 70% nucleotides in between 3.8 to 4.4, corresponding to the smallest peak of length, a molecular chimera thereof, a combination thereof FIG 1. and variants. Also preferably, linker sequences are BglII 5 Preferably the major groove depth values are comprised BamHI linker. between 8.9 to 9.3 A (Angström) and minor groove width Another aspect of the invention is to provide a method for values between 5.2 to 5.8 A. Most preferably the major identifying a MAR sequence using a Bioinformatic tool com groove depth values are comprised between 9.0 to 9.2 A and prising the computing of values of one or more DNA minor groove width values between 5.4 to 5.7 A. 10 Preferably the melting temperature is comprised between sequence features corresponding to DNA bending, major 55 to 75° C. (Celsius degree). Most preferably, the melting groove depth and minor groove width potentials and melting temperature is comprised between 55 to 62° C. temperature. Preferably, the identification of one or more The DNA binding protein of which values can be com DNA sequence features further comprises a further DNA puted by the method is usually a transcription factor prefer sequence feature corresponding to binding sites for DNA 15 ably a polyOpolyP domain or a transcription factor selected binding proteins, which is also computed with this method. from the group comprising SATB1, NMP4, MEF2, S8, Preferably, profiles or weight-matrices of said bioinfor DLX1, FREAC7, BRN2, GATA 1/3, TATA, Bright, MSX, matic tool are based on dinucleotide recognition. AP1, C/EBP, CREBP1, FOX, Freac7, HFH1, HNF3alpha, The bioinformatic tool used for the present method is pref Nkx25, POU3F2, Pit1, TTF1, XFD1, AR, C/EBPgamma, erably, SMARSCAN, which contains algorithms developed Cdc5, FOXD3. HFH3, HNF3 beta, MRF2, Oct1, POU6F1, by Gene Express and based on Levitsky et al., 1999. These SRF, VSMTATA B, XFD2, Bach2, CDPCR3, Cdx2, FOXJ2, algorithms recognise profiles, based on dinucleotides weight HFL, HP1, Myc, PBX, Pax3, TEF, VBP, XFD3, Brn2, matrices, to compute the theoretical values for conforma COMP1, Evil, FOXP3, GATA4, HFN1, Lhk3, NKX3A, tional and physicochemical properties of DNA. POU1F1, Pax6, TFIIA or a combination of two or more of Preferably, SMARSCAN uses the four theoretical criteria 25 these transcription factors. also designated as DNA sequence features corresponding to However, one skilled in the art would be able to determine DNA bending, major groove depth and minor groove width other kinds of transcription factors in order to carry out the potentials, melting temperature in all possible combination, method according to the present invention. using scanning windows of variable size (see FIG. 3). For In case SMAR SCAN is envisaged to perform, for 30 example, large scale analysis, then, preferably, the above each function used, a cut-off value has to be set. The program mentioned method further comprises at least one filter pre returns a hit every time the computed score of a given region dicting DNA binding sites for DNA transcription factors in is above the set cut-off value for all of the chosen criteria. Two order to reduce the computation. data output modes are available to handle the hits, the first The principle of this method combines SMAR SCAN to (called “profile-like') simply returns all hit positions on the 35 compute the structural features as described above and a filter, query sequence and their corresponding values for the differ Such as for example, the pfsearch, (from the pftools package ent criteria chosen. The second mode (called “contiguous as described in Bucher P. Karplus K. Moeri N. and Hofmann hits”) returns only the positions of several contiguous hits and K, 'A flexible search technique based on generalized pro their corresponding sequence. For this mode, the minimum files”. Computers and Chemistry, 20:324, 1996) to predict the number of contiguous hits is another cut-off value that can be 40 binding of some transcription factors. set, again with a tunable window size. This second mode is the Examples of filters comprise, but are not limited to, default mode of SMARSCAN. Indeed, from a semantic point pfsearch, Matinspector, RMatch Professional and TRANS of view, a hit is considered as a core-unwinding element FAC Professional (CUE), and a cluster of CUEs accompanied by clusters of This combined method uses the structural features of binding sites for relevant proteins is considered as a MAR. 45 SMARSCAN and the predicted binding of specific transcrip Thus, SMARSCAN considers only several contiguous hits as tion factors of the filter that can be applied sequentially in any a potential MAR. order to select MARs, therefore, depending on the filter is To tune the default cut-off values for the four theoretical applied at the beginning or at the end of the method. structural criteria, experimentally validated MARs from The first level selects sequences out of the primary input SMARt DB were used. All the human MAR sequences from 50 sequence and the second level, consisting in the filter, may be the database were retrieved and analyzed with SMARSCAN used to restrain among the selected sequences those which using the “profile-like” mode with the four criteria and with satisfy the criteria used by the filter. no set cut-off value. This allowed the setting of each function In this combined method the filter detects clusters of DNA for every position of the sequences. The distribution for each binding sites using profiles or weightmatrices from, for 55 example, Matinspector (Quandt K, Frech K, Karas H. Win criterion was then computed according to these data (see gender E. Werner T. “MatInd and Matinspector New fast and FIGS. 1 and 3). Versatile tools for detection of consensus matches in nucle The default cut-off values of SMAR SCAN for the bend, otide sequence data, Nucleic Acids Research, 23,48784884, the major groove depth and the minor groove width were set 1995.). The filter can also detect densities of clusters of DNA at the average of the 75th quantile and the median. For the 60 binding sites. melting temperature, the default cut-off value should be set at The combined method is actually a “wrapper written in the 75th quantile. The minimum length for the “contiguous Perl for SMARSCAN and, in case the pfsearch is used as a hits’ mode should be setto 300 because it is assumed to be the filter, from the pftools. The combined method performs a minimum length of a MAR (see FIGS. 8 and 9). However, one two level processing using at each level one of these tools skilled in the art would be able to determine the cut-off values 65 (SMAR SCAN or filter) as a potential “filter, each filter for the above-mentioned criteria for a given organism with being optional and possible to be used to compute the pre minimal experimentation. dicted features without doing any filtering. US 8,252,917 B2 13 14 If SMAR SCAN is used in the first level to filter Subse matrix attachment region (MAR) nucleotide sequence which quences, it has to be used with the “all the contiguous hits” is a MAR nucleotide sequence selected from the group com mode in order to return sequences. If the pfsearch is used in prising the first level as first filter, it has to be used with only one a purified and isolated DNA sequence having protein pro profile and a distance in nucleotide needs to be provided. This 5 duction increasing activity, distance is used to group together pfsearch hits that are a purified and isolated MAR DNA sequence identifiable located at a distance inferior to the distance provided in order according to the method for identifying a MAR to return sequences; The combined method launches sequence using the described bioinformatic tool, the pfsearch, parses its output and returns sequences correspond combined method or the method comprising at least one ing to pfsearch hits that are grouped together according to the 10 filter, distance provided. Then whatever the tool used in the first the sequences SEQID Nos 1 to 27, level, the length of the Subsequences thus selected can be a purified and isolated cLysMAR element and/or fragment, systematically extended at both ends according to a param a synthetic MAR sequence comprising natural MAR ele eter called “hits extension'. ment and/or fragments assembled between linker The second and optional level can be used to filter out 15 Sequences, sequences (already filtered sequences or unfiltered input a sequence complementary thereof, a part thereof sharing at sequences) or to get the results of SMAR SCAN and/or least 70% nucleotides in length, a molecular chimera thereof, pfsearch without doing any filtering on these sequences. If the a combination thereof and variants or a MAR nucleotide second level of combined method is used to filter, for each sequence of acLysMAR element and/or fragment, a sequence criteria considered cutoff values (hit per nucleotide) need to 20 complementary thereof, a part thereof sharing at least 70% be provided to filter out those sequences (see FIG. 20). nucleotides in length, a molecular chimera thereof, a combi Another concern of the present invention is also to provide nation thereof and variants for increasing protein production a method for identifying a MAR sequence comprising at least activity in a eukaryotic host cell. one filter detecting clusters of DNA binding sites using pro Said purified and isolated DNA sequence usually further files or weightmatrices. Preferably, this method comprises 25 comprises one or more regulatory sequences, as known in the two levels of filters and in this case, SMARSCAN is totally art e.g. a promoter and/or an enhancer, polyadenylation sites absent from said method. Usually, the two levels consist in and splice junctions usually employed for the expression of pfsearch. the protein or may optionally encode a selectable marker. Also embraced by the present invention is a purified and Preferably said purified and isolated DNA sequence com isolated MAR DNA sequence identifiable according to the 30 prises a promoter which is operably linked to a gene of inter method for identifying a MAR sequence using the described eSt. bioinformatic tool, the combined method or the method com The DNA sequences of this invention can be isolated prising at least one filter. according to standard PCR protocols and methods well Analysis by the combined method of the whole human known in the art. genome yielded a total of 1757 putative MARS representing a 35 Promoters which can be used provided that such promoters total of 1065.305 base paires. In order to reduce the number are compatible with the host cell are, for example, promoters of results, a dinucleotide analysis was performed on these obtained from the genomes of viruses such as polyomavirus, 1757 MARs, computing each of the 16 possible dinucleotide adenovirus (such as Adenovirus 2), papilloma virus (such as percentage for each sequence considering both strands in the bovine papilloma virus), avian sarcoma virus, cytomegalovi 5' to 3" direction. 40 rus (Such as murine or human cytomegalovirus immediate Surprisingly, Applicants have shown that all of the “super early promoter), a retrovirus, hepatitis-B Virus, and Simian MARS detected with the combined method contain at least Virus 40 (such as SV40 early and late promoters) or promot 10% of dinucleotide TA on a stretch of 100 contiguous base ers obtained from heterologous mammalian promoters, such pairs. Preferably, these sequences contain at least 33% of as the actin promoter oran immunoglobulin promoter or heat dinucleotide TA on a stretch of 100 contiguous base pairs. 45 shock promoters. Such regulatory sequences direct constitu Applicants have also shown that these same sequences tive expression. further contain at least 12% of dinucleotide AT on a stretch of Furthermore, the purified and isolated DNA sequence 100 contiguous base pairs. Preferably, they contain at least might further comprise regulatory sequences which are 33% of dinucleotide AT on a stretch of 100 contiguous base capable of directing expression of the nucleic acid preferen pairs. 50 tially in a particular cell type (e.g., tissue-specific regulatory Another aspect of the invention is to provide a purified and elements are used to express the nucleic acid). Tissue-specific isolated MAR DNA sequence of any of the preceding regulatory elements are known in the art. Non-limiting described MARS, comprising a sequence selected from the examples of Suitable tissue-specific promoters include the sequences SEQID Nos 1 to 27, a sequence complementary albumin promoter (liver-specific: Pinkert, et al., 1987. Genes thereof, a part thereof sharing at least 70% nucleotides in 55 Dev. 1: 268-277), lymphoid-specific promoters (Calame and length, a molecular chimera thereof, a combination thereof Eaton, 1988. Adv. Immunol. 43: 235-275), in particular pro and variants. moters of T cell receptors (Winoto and Baltimore, 1989. Preferably, said purified and isolated MARDNA sequence EMBO J. 8: 729-733) and immunoglobulins (Banerji, et al., comprises a sequence selected from the sequences SEQ ID 1983. Cell 33: 729-740; Queen and Baltimore, 1983. Cell Nos 24 to 27, a sequence complementary thereof, a part 60 33:741-748), neuron-specific promoters (e.g., the neurofila thereof sharing at least 70% nucleotides in length, a molecu ment promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. lar chimera thereof, a combination thereof and variants. Sci. USA 86: 5473-5477), pancreas-specific promoters (Ed These sequences 24 to 27 correspond to those detected by the lund, et al., 1985. Science 230: 912-916), and mammary combined method and show a higher protein production gland-specific promoters (e.g., milk whey promoter, U.S. Pat. increasing activity over sequences 1 to 23. 65 No. 4,873.316 and European Application No. 264,166). The present invention also encompasses the use of a puri Developmentally-regulated promoters are also encom fied and isolated DNA sequence comprising a first isolated passed. Examples of such promoters include, e.g., the murine US 8,252,917 B2 15 16 hox promoters (Kessel and Gruss, 1990. Science 249: 374 lated matrix attachment region (MAR) nucleotide sequence 379) and thea-fetoprotein promoter (Campes and Tilghman, which is a MAR nucleotide sequence selected from the group 1989. Genes Dev. 3: 537-546). comprising Regulatable gene expression promoters are well known in a purified and isolated DNA sequence having protein pro the art, and include, by way of non-limiting example, any 5 duction increasing activity, promoter that modulates expression of a gene encoding a a purified and isolated MAR DNA sequence identifiable desired protein by binding an exogenous molecule, such as according to the method for identifying a MAR the CRE/LOX system, the TET system, the doxycycline sys sequence using the described bioinformatic tool, the tem, the NFkappaB/UV light system, the Leu3p/isopropyl combined method or the method comprising at least one 10 filter, malate system, and the GLVPc/GAL4 system (See e.g., the sequences SEQID Nos 1 to 27, Sauer, 1998, Methods 14 (4): 381-92; Lewandoski, 2001, a purified and isolated cLysMAR element and/or fragment, Nat. Rev. Genet 2 (10): 743-55; Legrand-Poels et al., 1998, J. a synthetic MAR sequence comprising natural MAR ele Photochem. Photobiol. B. 45: 18; Guo et al., 1996, FEBS ment and/or fragments assembled between linker Lett. 390 (2): 191-5; Wang et al., PNAS USA, 1999, 96 (15): 15 Sequences, 84838). a sequence complementary thereof, a part thereof sharing at However, one skilled in the art would be able to determine least 70% nucleotides in length, a molecular chimera thereof, other kinds of promoters that are Suitable in carrying out the a combination thereof and variants that can be used for present invention. increasing protein production activity in a eukaryotic host cell Enhancers can be optionally included in the purified DNA by introducing the purified and isolated DNA sequence into a sequence of the invention then belonging to the regulatory eukaryotic host cell according to well known protocols. Usu sequence, e.g. the promoter. ally applied methods for introducing DNA into eukaryotic The “gene of interest” or “transgene' preferably encodes a host cells applied are e.g. direct introduction of cloned DNA protein (structural or regulatory protein). As used herein "pro by microinjection or microparticle bombardment; elec tein’ refers generally to peptides and polypeptides having 25 trotransfer, use of viral vectors; encapsulation within a carrier more than about ten amino acids. The proteins may be system; and use of transfecting reagents such as calcium “homologous' to the host (i.e., endogenous to the host cell phosphate, diethylaminoethyl (DEAE)-dextran or commer being utilized), or "heterologous.” (i.e., foreign to the host cial transfection systems like the Lipofect-AMINE 2000 (In cell being utilized). Such as a human protein produced by vitrogen). Preferably, the transfection method used to intro yeast. The protein may be produced as an insoluble aggregate 30 duce the purified DNA sequence into a eukaryotic host cell is or as a soluble protein in the periplasmic space or cytoplasm the method for transfecting a eukaryotic cell as described of the cell, or in the extracellular medium. Examples of pro below. teins include hormones such as growth hormone or erythro The purified and isolated DNA sequence can be used in the poietin (EPO), growth factors such as epidermal growth fac form of a circular vector. Preferably, the purified and isolated tor, analgesic Substances like enkephalin, enzymes like 35 DNA sequence is used in the form of a linear DNA sequence chymotrypsin, receptors to hormones or growth factors, anti as VectOr. bodies and include as well proteins usually used as a visual As used herein, "plasmid' and “vector are used inter izing marker e.g. green fluorescent protein. changeably, as the plasmid is the most commonly used vector Preferably the purified DNA sequence further comprises at form. However, the invention is intended to include such least a second isolated matrix attachment region (MAR) 40 other forms of expression vectors, including, but not limited nucleotide sequence selected from the group comprising to, Viral vectors (e.g., replication defective retroviruses, aden a purified and isolated DNA sequence having protein pro oviruses and adeno-associated viruses), which serve equiva duction increasing activity, lent functions. a purified and isolated MAR DNA sequence identifiable The present invention further encompasses a method for according to the method for identifying a MAR 45 transfecting a eukaryotic host cell, said method comprising sequence using the described bioinformatic tool, the a) introducing into said eukaryotic host cell at least one combined method or the method comprising at least one purified DNA sequence comprising at least one DNA filter, sequence of interest and/or at least one purified and the sequences SEQID Nos 1 to 27, isolated DNA sequence comprising a MAR nucleotide a purified and isolated cLysMAR element and/or fragment, 50 sequence or other chromatin modifying elements, a synthetic MAR sequence comprising natural MAR ele b) Subjecting within a defined time said transfected eukary ment and/or fragments assembled between linker otic host cell to at least one additional transfection step Sequences, with at least one purified DNA sequence comprising at a sequence complementary thereof, a part thereof sharing at least one DNA sequence of interest and/or with at least least 70% nucleotides in length, a molecular chimera thereof, 55 one purified and isolated DNA sequence comprising a a combination thereof and variants. The isolated matrix MAR nucleotide sequence or other chromatin modify attachment region (MAR) nucleotide sequence might be ing elements identical or different. Alternatively, a first and a second iden c) selecting said transfected eukaryotic host cell. tical MAR nucleotide sequence are used. Preferably at least two up to four transfecting steps are Preferably, the MAR nucleotide sequences are located at 60 applied in step b). both the 5' and the 3' ends of the sequence containing the In order to select the Successful transfected cells, a gene promoter and the gene of interest. But the invention also that encodes a selectable marker (e.g., resistance to antibiot envisions the fact that said first and or at least second MAR ics) is generally introduced into the host cells along with the nucleotide sequences are located on a sequence distinct from gene of interest. The gene that encodes a selectable marker the one containing the promoter and the gene of interest. 65 might be located on the purified DNA sequence comprising at Embraced by the scope of the present invention is also the least one DNA sequence of interest and/or at least one puri purified and isolated DNA sequence comprising a first iso fied and isolated DNA sequence consisting of a MAR nucle US 8,252,917 B2 17 18 otide sequence or other chromatin modifying elements or b) Subjecting within a defined time said transfected eukary might optionally be co-introduced in separate form e.g. on a otic host cell to at least one additional transfection step with plasmid. Various selectable markers include those that confer the same purified DNA sequence comprising one DNA resistance to drugs, such as G418, hygromycin and methotr sequence of interest and additionally a MAR nucleotide exate. The amount of the drug can be adapted as desired in 5 sequence of step a). order to increase productivity Also preferably, the MAR nucleotide sequence of the of the Usually, one or more selectable markers are used. Prefer purified and isolated DNA sequence is selected form the ably, the selectable markers used in each distinct transfection group comprising steps are different. This allows selecting the transformed cells a purified and isolated DNA sequence having protein pro that are “multi-transformed by using for example two dif 10 duction increasing activity, ferent antibiotic selections. a purified and isolated MAR DNA sequence identifiable Any eukaryotic host cell capable of protein production and according to the method for identifying a MAR lacking a cell wall can be used in the methods of the invention. sequence using the described bioinformatic tool, the Examples of useful mammalian host cell lines include human 15 combined method or the method comprising at least one cells such as human embryonic kidney line (293 or 293 cells filter, Subcloned for growth in Suspension culture, Graham et al., J. the sequences SEQID Nos 1 to 27, Gen Virol 36, 59 (1977)), human cervical carcinoma cells a purified and isolated cLysMAR element and/or fragment, (HELA, ATCCCCL 2), human lung cells (W138, ATCCCCL a synthetic MAR sequence comprising natural MAR ele 75), human liver cells (Hep G2, HB 8065); rodent cells such ment and/or fragments assembled between linker as baby hamster kidney cells (BHK, ATCCCCL 10), Chinese Sequences, hamster ovary cells/-DHFR (CHO, Urlaub and Chasin, Proc. a sequence complementary thereof, a part thereof sharing at Nati. Acad. Sci. USA, 77, 4216 (1980)), mouse sertoli cells least 70% nucleotides in length, a molecular chimera thereof, (TM4, Mather, Biol. Reprod 23, 243-251 (1980)), mouse a combination thereof and variants. mammary tumor (MMT 060562, ATCC CCL51); and cells 25 Surprisingly, a synergy between the first and second trans from other mammals such as monkey kidney CV1 line trans fection has been observed. A particular synergy has been formed by SV40 (COS-7, ATCCCRL 1651); monkey kidney observed when MAR elements are present at one or both of cells (CV1 ATCC CCL 70); African green monkey kidney the transfection steps. Multiple transfections of the cells with cells (VERO-76, ATCC CRL-1587); canine kidney cells pMAR alone or in combination with various expression plas (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, 30 mids, using the method described above have been carried ATCC CRL 1442); myeloma (e.g. NSO)/hybridoma cells. out. For example, Table 3 shows that transfecting the cells Preferably, the selected transfected eukaryotic host cells twice with the pMAR-SV40EGFP plasmid gave the highest are high protein producer cells with a production rate of at expression of GFP and the highest degree of enhancement of least 10pg per cell per day. all conditions (4.3 fold). In contrast, transfecting twice the Most preferred for uses herein are mammalian cells, more 35 vector without MAR gave little or no enhancement, 2.8-fold, preferred are CHO cells. instead of the expected two-fold increase. This proves that the The DNA sequence of interest of the purified and isolated presence of MAR elements at each transfection step is of DNA sequence is usually a gene of interest preferably encod particular interest to achieve the maximal protein synthesis. ing a protein operably linked to a promoter as described As a particular example of the transfection method, said above. The purified and isolated DNA sequence comprising 40 purified DNA sequence comprising at least one DNA at least one DNA sequence of interest might comprise addi sequence of interest can be introduced in form of multiple tionally to the DNA sequence of interest MAR nucleotide unlinked plasmids, comprising a gene of interest operably sequence or other chromatin modifying elements. linked to a promoter, a selectable marker gene, and/or protein Purified and isolated DNA sequence comprising a MAR production increasing elements such as MAR sequences. nucleotide sequence are for example selected from the group 45 The ratio of the first and subsequent DNA sequences may comprising the sequences SEQ ID Nos 1 to 27 and/or par be adapted as required for the use of specific cell types, and is ticular elements of the cIysMAR e.g. the B. K and F regions routine experimentation to one ordinary skilled in the art. as well as fragment and elements and combinations thereofas The defined time for additional transformations of the pri described above. Other chromatin modifying elements are for mary transformed cells is tightly dependent on the cell cycle example boundary elements (BES), locus control regions 50 and on its duration. Usually the defined time corresponds to (LCRs), and universal chromatin opening elements (UCOEs) intervals related to the cell division cycle. (see Zahn-Zabal et al. already cited). An example of multiple Therefore this precise timing may be adapted as required transfections of host cells is shown in Example 12 (Table 3). for the use of specific cell types, and is routine experimenta The first transfecting step (primary transfection) is carried out tion to one ordinary skilled in the art. with the gene of interest (SV40EGFP) alone, with a MAR 55 Preferably the defined time is the moment the host cell just nucleotide sequence (MAR) alone or with the gene of interest has entered into the same phase of a second or a further cell and a MAR nucleotide sequence (MAR-SV40EGFP). The division cycle, preferably the second cycle. second transfecting step (secondary transfection) is carried This time is usually situated between 6 hand 48 h, prefer out with the gene of interest (SV40EGFP) alone, with a MAR ably between 20 h and 24 h after the previous transfecting nucleotide sequence (MAR) alone or with the gene of interest 60 event. and a MAR nucleotide sequence (MAR-SV40EGFP), in all Also encompassed by the present invention is a method for possible combinations resulting from the first transfecting transfecting a eukaryotic host cell, said method comprising step. co-transfecting into said eukaryotic host cell at least one first Preferably the eukaryotic host cell is transfected by: purified and isolated DNA sequence comprising at least one a) introducing a purified DNA sequence comprising one 65 DNA sequence of interest, and a second purified DNA com DNA sequence of interest and additionally a MAR nucle prising at least one MAR nucleotide selected from the group otide sequence, comprising: US 8,252,917 B2 19 20 a purified and isolated DNA sequence having protein pro sion culture, Graham et al., J. Gen Virol 36, 59 (1977)), duction increasing activity, human cervical carcinoma cells (HELA, ATCC CCL 2), a purified and isolated MAR DNA sequence identifiable human lung cells (W138, ATCC CCL 75), human liver cells according to the method for identifying a MAR (Hep G2, HB 8065); rodent cells such as baby hamster kidney sequence using the described bioinformatic tool, the 5 cells (BHK, ATCC CCL 10), Chinese hamster ovary cells/- combined method or the method comprising at least one DHFR(CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, filter, 77, 4216 (1980)), mouse sertoli cells (TM4, Mather, Biol. the sequences SEQID Nos 1 to 27, Reprod23, 243-251 (1980)), mouse mammary tumor (MMT a purified and isolated cLysMAR element and/or fragment, 060562, ATCCCCL51); and cells from other mammals such a synthetic MAR sequence comprising natural MAR ele 10 ment and/or fragments assembled between linker as monkey kidney CV1 line transformed by SV40 (COS-7, Sequences, ATCC CRL 1651); monkey kidney cells (CV1 ATCC CCL a sequence complementary thereof, a part thereof sharing at 70); African green monkey kidney cells (VERO-76, ATCC least 70% nucleotides in length, a molecular chimera thereof, CRL-1587); canine kidney cells (MDCK, ATCC CCL 34): a combination thereof and variants. 15 buffalo rat liver cells (BRL 3A, ATCC CRL 1442); myeloma Said first purified and isolated DNA sequence can also (e.g. NSO)/hybridoma cells. comprise at least one MAR nucleotide as described above. Most preferred for uses herein are CHO cells. Also envisioned is a process for the production of a protein The present invention also provides for a cell transfection wherein a eukaryotic host cell is transfected according to the mixture or Kit comprising at least one purified and isolated transfection methods as defined in the present invention and is DNA sequence according to the invention. cultured in a culture medium under conditions suitable for The invention further comprises a transgenic organism expression of the protein. Said protein is finally recovered wherein at least some of its cells have stably incorporated at according to any recovering process known to the skilled in least one DNA sequence of the art. a purified and isolated DNA sequence having protein pro Given as an example, the following process for protein 25 duction increasing activity, production might be used. a purified and isolated MAR DNA sequence identifiable The eukaryotic host cell transfected with the transfection according to the method for identifying a MAR method of the present invention is used in a process for the sequence using the described bioinformatic tool, the production of a protein by culturing said cell under conditions combined method or the method comprising at least one Suitable for expression of said protein and recovering said 30 filter, protein. Suitable culture conditions are those conventionally the sequences SEQID Nos 1 to 27, used for in vitro cultivation of eukaryotic cells as described a purified and isolated cLysMAR element and/or fragment, e.g. in WO96/39488. The protein can be isolated from the cell a synthetic MAR sequence comprising natural MAR ele culture by conventional separation techniques such as e.g. ment and/or fragments assembled between linker fractionation on immunoaffinity or ion-exchange columns; 35 Sequences, precipitation; reverse phase HPLC. chromatography; chro a sequence complementary thereof, a part thereof sharing at matofocusing: SDS-PAGE: gel filtration. One skilled in the least 70% nucleotides in length, a molecular chimera thereof, art will appreciate that purification methods suitable for the a combination thereof and variants. Preferably, some of the polypeptide of interest may require modification to account cells of the transgenic organisms have been transfected for changes in the character of the polypeptide upon expres 40 according the methods described herein. sion in recombinant cell culture. Also envisioned in the present invention is a transgenic The proteins that are produced according to this invention organism wherein its genome has stably incorporated at least can be tested for functionality by a variety of methods. For one DNA sequence of example, the presence of antigenic epitopes and ability of the a purified and isolated DNA sequence having protein pro proteins to bind ligands can be determined by Western blot 45 duction increasing activity, assays, fluorescence cell sorting assays, immunoprecipita a purified and isolated MAR DNA sequence identifiable tion, immunochemical assays and/or competitive binding according to the method for identifying a MAR assays, as well as any other assay which measures specific sequence using the described bioinformatic tool, the binding activity. combined method or the method comprising at least one The proteins of this invention can be used in a number of 50 filter, practical applications including, but not limited to: the sequences SEQID Nos 1 to 27, 1. Immunization with recombinant host protein antigen as a a purified and isolated cLysMAR element and/or fragment, viral/pathogen antagonist. a synthetic MAR sequence comprising natural MAR ele 2 . Production of membrane proteins for diagnostic or screen ment and/or fragments assembled between linker ing assays. 55 Sequences, 3. Production of membrane proteins for biochemical studies. a sequence complementary thereof, a part thereof sharing at 4. Production of membrane protein for structural studies. least 70% nucleotides in length, a molecular chimera thereof, 5. Antigen production for generation of antibodies for a combination thereof and variants. immuno-histochemical mapping, including mapping of Transgenic eukaryotic organisms which can be useful for orphan receptors and ion channels. 60 the present invention are for example selected form the group Also provided by the present invention is a eukaryotic host comprising mammals (mouse, human, monkey etc) and in cell transfected according to any of the preceding transfection particular laboratory animals such as rodents in general, methods. Preferably, the eukaryotic host cell is a mammalian insects (drosophila, etc), fishes (Zebra fish, etc.), amphibians host cell line. (frogs, newt, etc. ...) and other simpler organisms such as C. As already described, example of useful mammalian host 65 elegans, yeast, etc. . . . cell lines include human cells such as human embryonic Yet another object of the present invention is to provide a kidney line (293 or 293 cells subcloned for growth in suspen computer readable medium comprising computer-executable US 8,252,917 B2 21 22 instructions for performing the method for identifying a Example 3 MAR sequence as described in the present invention. The foregoing description will be more fully understood MAR Prediction of the Whole Chromosome 22 with reference to the following Examples. Such Examples, are, however, exemplary of methods of practising the present All RefSeq contigs from the chromosome 22 were ana invention and are not intended to limit the scope of the inven lyzed by SMARSCAN using the default settings this time. tion. The result is that SMAR SCAN predicted a total of 803 MARS, their average length being 446 bp, which means an EXAMPLES average of one MAR predicted per 42 777 bp. The total length 10 of the predicted MARs corresponds to 1% of the chromosome Example 1 22 length. The AT-content of the predicted regions ranged from 65.1% to 93.3%; the average AT-content of all these SMARSCAN and MAR Sequences regions being 73.5%. Thus, predicted MARs were AT-rich, whereas chromosome 22 is not AT-rich (52.1% AT). A first rough evaluation of SMAR SCAN was done by 15 SMARTest was also used to analyze the whole chromo analyzing experimentally defined human MARS and non some 22 and obtained 1387 MAR candidates, their average MAR sequences. AS MAR sequences, the previous results length being 494 bp representing an average of one MAR from the analysis of human MARs from SMARt Db were used to plot a density histogram for each criterion as shown in predicted per 24 765 bp. The total length of the predicted FIG. 1. Similarly, non-MAR sequences were also analyzed MARS corresponds to 2% of the chromosome 22. Between all and plotted. As non-MAR sequences, all Ref-Seq-contigs MARS predicted by the two softwares, 154 predicted MARs from the chromosome 22 were used, considering that this are found by both programs, which represents respectively latter was big enough to contain a negligible part of MAR 19% and 11% of SMAR SCAN and SMARTest predicted sequences regarding the part of non-MAR sequences. MARS. Given predicted MARS mean length for SMAR The density distributions shown in FIG. 1 are all skewed 25 SCAN and SMARTest, the probability to have by chance an with a long tail. For the highest bend, the highest major overlapping between SMAR SCAN and SMARTest predic groove depth and the highest minor groove width, the distri tions is 0.0027% per prediction. butions are right skewed. For the lowest melting temperature, To evaluate the specificity of SMAR SCAN predictions, the distributions are left-skewed which is natural given the SMAR SCAN analyses were performed on randomly inverse correspondence of this criterion regarding the three 30 others. For the MAR sequences, biphasic distributions with a shuffled sequences of the chromosome 22 (FIG. 4). Shuffled second weak peak, are actually apparent. And between MAR sequences were generated using 4 different methods: by a and non-MAR sequences distributions, a clear shift is also segmentation of the chromosome 22 into nonoverlapping visible in each plot. windows of 10 bp and by separately shuffling the nucleotides Among all human MAR sequences used, in average only 35 in each window; by “scrambling' which means a permutation about 70% of them have a value greater than the 75th quantile of all nucleotides of the chromosome; by “rubbling” which of human MARS distribution, this for the four different crite means a segmentation of the chromosome in fragments of 10 ria. Similarly concerning the second weak peak of each bp and a random assembling of these fragments and finally by human MARs distribution, only 15% of the human MAR order 1 Markov chains, the different states being the all the sequences are responsible of these outlying values. Among 40 these 15% of human MAR sequences, most are very well different DNA dinucleotides and the transition probabilities documented MARS, used to insulate transgene from position between these states being based on the chromosome 22 scan. effects, such as the interferon locus MAR, the beta-globin For each shuffling method, five shuffled chromosome 22 were locus MAR (Ramezani A, Hawley TS, Hawley R.G., “Per generated and analyzed by SMAR SCAN using the default formance- and safety-enhanced lentiviral vectors containing 45 settings. Concerning the number hits, an average of 3519 170 the human interferon-beta scaffold attachment region and the hits (sd: 18353) was found for the permutated chromosome chicken beta-globin insulator, Blood, 101:4717-4724, 22 within nonoverlapping windows of 10 bp, 171936.4 hits 2003), or the apolipoprotein MAR (Namciu, S. Blochinger K (sd: 2859.04) for the scrambled sequences and 24708.2 hits B. Fournier R. E. K. “Human matrix attachment regions in (sd: 1191.59) for the rubbled chromosome 22 and 2282 hits Sulate transgene expression from chromosomal position 50 effects in Drosophila melanogaster, Mol. Cell. Biol., in average (sd: 334.7) for the chromosomes generated accord 18:2382-2391, 1998). ing to order 1 Markov chains models of the chromosome 22, Always with the same data, human MAR sequences were which respectively represents 185% (sd: 0.5% of the mean), also used to determine the association between the four theo 9% (sd: 1.5%), 1% (sd: 5%) and 0.1% (sd: 15%) of the retical structural properties computed and the AT-content. 55 number of hits found with the native chromosome 22. For the FIG. 2 represents the scatterplot and the corresponding cor number of MARS predicted, which thus means contiguous relation coefficient r for every pair of criteria. hits of length greater than 300, 1997 MARs were predicted with the shuffled chromosome 22 within windows of 10 bp Example 2 (sd: 31.2), only 2.4 MARs candidates were found in 60 scrambled sequences (sd: 0.96) and none for the rubbled and Distribution Plots of MAR Sequences by Organism for the sequences generated according to Markov chains MAR sequences from SMARt DB of other organisms were model, which respectively represents 249% and less than also retrieved and analyzed similarly as explained previously. 0.3% of the number of predicted MARs found with the native The MAR sequences density distributions for the mouse, the 65 chromosome 22. These data provide indications that SMAR chicken, the Sorghum bicolor and the human are plotted SCAN detects specific DNA elements which organization is jointly in FIG. 3. lost when the DNA sequences are shuffled. US 8,252,917 B2 23 24 Example 4 Example 5 Analysis of Known Matrix Attachment Regions in Accuracy of SMAR SCAN Prediction and the Interferon Locus with SMAR SCAN Comparison with Other Predictive Tools The relevance of MAR prediction by SMAR SCAN was The accuracy of SMAR Scan R) was evaluated using six investigated by analyzing the recently published MAR genomic sequences for which experimentally determined regions of the human interferon gene cluster on the short arm MARs have been mapped. In order to perform a comparison of chromosome 9 (9p22). Goetze et al. (already cited) with other predictive tools, the sequences analyzed are the reported an exhaustive analysis of the WP18A10A7 locus to 10 analyze the suspected correlation between BURs (termed in same with the sequences previously used to compare MAR this case stress-induced duplex destabilization or SIDD) and Finder and SMARTest. These genomic sequences are three in vitro binding to the nuclear matrix (FIG. 9, lower part). plant and three human sequences (Table 1) totalizing 310 151 Three of the SIDD peaks were in agreement with the in vitro bp and 37 experimentally defined MARS. The results for binding assay, while others did not match matrix attachment 15 SMARTest and MAR-Finder in Table 1 come from a previous sites. Inspection of the interferon locus with SMAR SCAN comparison (Frisch M. Frech K, Klingenhoff A, Cartharius (FIG.9, top part) indicated that three majors peaks accompa K. Liebich I and Werner T. In silico pre-diction of scaffold/ nied by clusters of SATB1, NMP4 and MEF2 regulators matrix attachment regions in large genomic sequences, binding sites correlated well with the active MARs. There Genome Research, 12:349-354, 2001). fore, we conclude that the occurrence of predicted CUEs and MAR-Finder has been used with the default parameters binding sites for these transcription factors is not restricted to excepted for the threshold that has been set to 0.4 and for the the cIysMAR but may be a general property of all MARs. analysis of the protamine locus, the AT-richness rule has been These results also imply that the SMAR SCAN program excluded (to detect the non AT-rich MARs as was done for the efficiently detects MAR elements from genomic sequences. protamine locus). TABLE 1 Evaluation of SMAR SCAN accuracy Experimentally defined SMARTest MAR-Finder SMAR Scan MARS prediction prediction prediction Sequence, description Length positions positions positions positions and reference (kb) (kb) (kb) (kb) (kb) Oryza Saiva putative 30.034 0.0-1.2 ADP-glucose pyro 5.4-7.4 6.5-7.0 phosphorylase subunit 15.2-15.7 15.7-15.9 15.6-16 SH2 and putative 16.2-16.6 NADPH dependant 17.3-18.5 17.6-183 17.5-18.4 17.6-182 reductase A1 genes 2O.O-23.1 196-20.1 19.8-20.4 21.6-22 (U70541). (4) 207-213 21.3-21.5 236-23.9 23.9-24.2 234-238 25.0-25.4 24.7-25.1 27.5-27.9 Sorghum bicolor ADP 42.446 O.O-15 glucose pyropho 7.1-9.7 7.4-7.7 phorylase subunit SH2, 21.3-21.9 215-21.8 NADPH-dependant 22:4-24.7 229-240 23.2-24.2 229-23.2 reducatse A1-b genes 236-240 (AFO10283). (4) 273-27.6 26.9-27.5 273-27.6 32.5-33.7 334-33.9 41.6-42.3 Sorghum bicolor BAC 78.195 -0.9 clone 110KS --58 (AF124045), 37 ~6.3 --93 -15.O 15.1-15.8 -18.5 ~21.9 217-22.0 21.4-21.9 -23.3 -25.6 -29.1 292-29.5 -34.6 39.0-40.O 44.1 44.1-44.5 48.5 47.9-49.5 47.9-49.4 48.1-48.6 48.8-49.3 57.9 -62.9 631-63.7 67.1 -69.3 -73.7 743.74.7 743-746 Human alpha-1-entitry 30.461 26-6.3 5.5-6.0 30-32 54-5.8 sin and corticosteroid 51-60 binding globulin 220-304 257-26.2 249-25.3 258-26.4 US 8,252,917 B2 25 26 TABLE 1-continued Evaluation of SMARSCAN accuracy Experimentally defined SMARTest MAR-Finder SMAR Scan MARS prediction prediction prediction Sequence, description Length positions positions positions positions and reference (kb) (kb) (kb) (kb) (kb) intergenic region 275-27.8 25.5-25.8 (AF156545), (35) 26.2-26.4 27.5-28.2 Human protamine locus 53.08O 8.8-9.7 8.0-8.9 (U15422). 24) 326-33.6 33.9-348 37.2-39.4 33.9-348 51.8-53.0 * Human beta-globin 75.955 1.5-3.0 2.3-2.6 locus 156-19.O 180-18.4 15.5-160 15.3-15.6 (UO1317), 21 180-18.4 344-34.9 44.7-52.7 50.6-508 56.6-57.1 56.5-57.2 600-700 59.8-60.3 581-58.5 62.8-631 65.6-66.O 630-63.6 676-67.9 68.7-69.3 663-66.7 688-69.1

Sum (kb) 3.10.151 at least 14.5 13.8 9.5 56.1 Total numbers: 37 28 25 22 Average kbpredicted 11.076 12.406 14.097 MAR True positives number 1914) 2012) 1714) of experimentally defined MAR found False positives 9 5 5 False negatives 23 25 23 Specificity 19,28 = 68%. 20.25 - 80%. 17:22 = 77% Sensitivity 14,37-38%. 12.37 - 32%. 14,37-38%

Table 1: Evaluation of SMARSCAN Accuracy 35 Example 6 Six different genomic sequences, three plant and three human sequences, for which experimentally defined MARs Analyses of the Whole Genome Using the Combined are known, were analyzed with MAR-Finder, SMARTestand Method (SMAR SCAN-pfsearch) SMAR SCAN. True positive matches are printed in bold, 40 In order to test the potential correlation between the struc minus (-) indicates false negative matches. Some of the tural features computed by SMAR SCAN and the S/MAR longer experimentally defined MARS contained more than functional activity, the whole human genome has been ana one in silico prediction, each of them was counted as true lyzed with the combined method with very stringent param positive match. Therefore, the number of true in silico pre eters, in order to get sequences with the highest values for the dictions is higher than the number of experimentally defined 45 theoretical structural features computed, which are called MARs found. Specificity is defined as the ratio of true posi “super S/MARs below. This was done with the hope to tive predictions, whereas sensitivity is defined as the ratio of obtain predicted MAR elements with a very potential to experimentally defined MARs found. * AT-rich rule excluded increase transgene expression and recombinant protein pro using MAR-Finder. duction. The putative S/MARs hence harvested were first SMARTest predicted 28 regions as MARs. 19 (true posi 50 analyzed from the bioinformatics perpective in an attempt to tives) of these correlate with experimentally defined MARs characterize and classify them. (specificity: 68%) whereas 9 (32%) are located in non-MARs 6.1 S/MARS Predicted From the Analysis of the Whole (false positives). As some of the longest experimentally deter Human Genome mined MARS contains more than one in silico prediction, the AS whole human genome sequence, all human RefSeq, 55 (National Center for Biotechnology Information, The NCBI 19 true positives correspond actually to 14 different experi handbook Internet. Bethesda (MD): National Library of mentally defined MARs (sensitivity: 38%). MARFinder pre Medicine (US), October. Chapter 17, The Reference dicted 25 regions as MARS, 20 (specificity: 80%) of these Sequence (RefSeq) Project, 2002 contigs (release 5) were correlate with experimentally defined MARs corresponding used and analyzed with the combined method, using SMAR to 12 different experimentally defined MARs (sensitivity: 60 SCAN as filter in the first level processing, employing default 32%). SMAR SCAN predicted 22 regions, 17 being true settings except for the highest bend cutoff value, whereas a positives (specificity: 77%) matching 14 different experimen stringent threshold of 4.0 degrees (instead of 3.202 degrees) tally defined MARs (sensitivity: 38%). has been used for the DNA bending criterion. As another example, the same analysis has been applied to In the second level processing, predicted transcription fac human chromosomes 1 and 2 and lead to the determination of 65 tors binding have been sought in the sequences selected from 23 MARS sequences (SEQID NO 1 to 23). These sequences the previous step without doing any filtering on these are listed in Annex 1 in ST25 format. Sequences. US 8,252,917 B2 27 28 The analysis by the combined method of the whole human A tool for searching transcription factor binding sites in DNA genome came up with a total of 1757 putative “super sequences, Nucleic Acids Res. 31 (13):35769, 2003), a weight S/MARs representing a total of 1065.305 bp (0.35% of the matrix based tool based on TRANSFAC (Wingender E. Chen whole human genome). Table 2 shows for each chromosome: X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V. its size, its number of genes, its number of S/MARs predicted, Michael H, Ohnhauser R, Pruss M, Schacherer F. Thiele S, its S/MARS density per gene and its kb per S/MAR. This table Urbach S. The TRANSFAC system on gene expression regu shows that there are very various gene densities per S/MAR lation, Nucleic Acids Research, 29(1):2813, 2001). MatchTM predicted for the different chromosomes (standard deviation 2.0 Professional has been used with most of the default set represents more than 50% of the mean of the density of genes 10 tings MatchTM analysis was based on TRANSFAC Profes per S/MAR predicted and the fold difference between the sional, release 8.2 (20040630). The sums of all transcription higher and the lower density of genes per S/MAR is 6.5). factors binding prediction on the 1757 sequences analyzed Table 2 also shows that the kb per S/MAR varies less that the according to MatchTM are in Table 3. Based on this table, only density of genes per S/MAR (standard deviation represents the transcription factors totalizing at least 20 hits over the 25% of the mean of kb per S/MAR and the fold difference 15 1757 sequences analyzed were considered for further analy between the higher and the lower kb per S/MAR is 3.2). SCS. TABLE 2 Number of SMARS predicted per chromosome. Number of Size of the Number of Density of genes per chromosome SMARs genes Chromosome chromosome (millions bp) predicted per S/MAR

1 2544 230 85 29.9 2 772 241 143 12.3 3 4O6 198 101 13.9 4 O36 190 118 8.7 5 233 18O 116 10.6 6 247 170 94 13.2 7 383 160 179 7.7 8 942 145 77 12.2 9 1OO 119 48 22.9 10 OO3 133 71 14.1 11 692 132 67 25.2 12 278 131 78 16.3 13 SO6 97 70 7.2 14 168 88 36 32.4 15 895 83 35 25.5 16 107 81 41 27 17 421 8O 37 38.4 18 396 75 51 7.7 19 621 56 36 45.02 2O 724 60 28 25.8 21 355 34 18 19.7 22 707 34 28 25.2 X 168 154 170 6.8 Y 251 25 30 8.3

Sum 26955 3050 1757 457 43312 Mean 123 127 73 19 1804 Sd 510 72.8 45 10 462

The number of genes per chromosome corresponds to the NCBIhuman genome statistics (Build 34 Version3) (National Center for Biotechnology Information, The NCBI handbook Internet. Bethesda (M D): National Library of Medicine (US), Oct. Chapter 17, The reference Sequence (RefSeq) Project, 2002 (Available from http://www.ncbi.nih.gov/entreziguery.fcgi?db=Books) based on GenBank annotations, Chromosome sizes are the sum of the corresponding human RefSeq (National Center for Bi otechnology Information, The NCBI handbook Internet. Bethesda (MD): National Library of Medicine (US), Oct. Chapter 17, The reference Sequence (RefSeq) Project, 2002 (Available from http://www.ncbi.nih.gov/entrezi query, fcgi?db=Books) (release 5) contig lengths

6.2 Bioinformatics Analysis of “Super MARS for Transcrip Hereafter are some of the human transcription factors that tion Factor Binding Sites 60 are the most often predicted to bind on the 1757 putative S/MAR sequences and their Match description: Cdc5 (cell The 1757 predicted “super” S/MARs sequences obtained division control protein 5) a transcriptional regulator/repres previously by SMARSCAN were then analyzed for potential Sor, NkX3A a homeodomain protein regulated by androgen, transcription factors binding sites. This has been achieved POU1F1 (pituitary specific positive transcription factor 1) 5 which is specific to the pituitary and stimulates cells prolif using RMatchTM Professional (Kel A E, Gossling E. Reuter I, 6 eration. Thus, in addition to SATB1, NMP4 and MEF2, other Cheremushkin E, KelMargoulis OV, Wingender E, MATCH: transcription factors can participate in the activity of MARs. US 8,252,917 B2 29 30 TABLE 3 is a Summary of all transcription factors binding prediction (totalizing 20 hits or more) on the 1757 sequences analyzed. AP1 1 AR 2 Bach2 1 Brn2 CEBP 20 C/EBPgamma 5 CDP CR3 1 COMP1 2 CREBP1 34 ColcS 858 Cox2 35 Evil 472 FOX 78 FOXD3 79 FOXJ2 244. FOXP3 29 Freacf 272 GATA1 2 GATA3 142 GATA4 125 HFEH1 12 HFH3 1 HLF 275 HNF1 337 HNF3alpha 23 HNF3beta 71 HP1 2 LhK3 22 MEF2 114 MRF2 57 Myc 18 NKX3A 849 Nkx25 2 Oct1 191 PBX 5 POU1F1 483 POU3F2 11 POU6F1 29 Pax3 3 Pax6 2O Pit 505 SRF 8 TEF 2852 TFILA 14 TTF1 1 VSMTATA B 4 VBP S3 WinwóS 1 XFD1 65 XFD2 418 XFD3 2

6.3 Bioinformatics Analysis of Predicted “Super MARs for TABLE 4-continued Dinucleotide Frequencies Various computer analysis were performed in order to eas Dinucleotide percentages over the 1757 SMAR sequences ily identify “super S/MAR sequences using an explicit cri TA% TC9% TG 96. TT 9 terion that could be identified without computing. Among Minimum 28.63 O.OOOOO OOOOO O.OOO those, a di-nucleotide analysis was performed on the 1757 25th 33.48 O.O8696 O.969S 4.234 superMARs, computing each of the 16 possible dinucleotide percentile percentage for each sequence considering both strands in the 25 Median 35.22 O.32616 19776 7.843 5">3' direction. Mean 35.29 O.63347 2.6977 7.184 75th 37.14 O.83333 3.7543 10.110 A Summary (min... max..., median, mean, 25th percentile percentile and 75th percentile) as well as the histograms of each dinucle Maximum SO.OO 5.77889 10.4061 17.290 otide percentage over the 1757 S/MAR sequences are respec tively presented in Table 4. A similar analysis was performed 30 on randomly selected sequences from the human genome, Considering the results of the predicted S/MAR elements representing randomly selected non-S/MAR sequences and of the nonS/MAR sequences in the summary tables, (which might however contain some MARs). Table 5 repre noticeable differences can be noticed in the AT et TA dinucle sents respectively a Summary of the dinucleotide content otide contents between these two groups of sequences. AT analysis for these sequences. 35 and TA represent respectively at least 18.5% and 28.6% of the dinucleotide content of the predicted S/MAR sequences, TABLE 4 whereas the minimum percentages for the same dinucleotides in nonS/MAR sequences are respectively 0.3% and 0%. Simi Dinucleotide percentages over the 1757 SMAR sequences larly, the maximum CC and GG content in S/MAR sequences 40 AA% AC96 AG 9% AT 9 is 4.2%, whereas in nonS/MAR sequences the percentages for these two dinucleotides can amount up to 20.8%. Minimum O.OOO OOOOO OOOOO 18.50 The correlation between AT and TA dinucleotide percent 25th 4.234 O.9372 O1408 32.11 percentile ages and the DNA highest bend as computed by SMAR Median 7.843 2.2408 0.4777 34.68 SCAN is depicted in FIG. 17 for the predicted S/MAR Mean 7.184 3.2117 1.086S 34.32 45 sequences and in FIG. 18 for the nonS/MAR sequences. The 75th 10.110 4.7718 1.5096 36.94 different scatterplots of these figures show that the TA per percentile centage correlates well with the predicted DNA bend as pre Maximum 17.290 12.94.79 8.1230 SO.OO dicted by SMARSCAN. CA% CC% CG 96. CT 9 50 TABLE 5 Minimum O.OOOO OOOOOO OOOOO O.OOOO 25th O.969S OOOOOO OOOOO O.1408 Dinucleotide percentages over the percentile 1757 nonSMAR sequences Summary Median 19776 OOOOOO OOOOO 0.4777 Mean 2.6977 O.14123 O.2709 1.O865 AA% AC96 AG 9% AT 9 75th 3.7543 O.O9422 O.1256 1.5096 55 percentile Minimum O.OOO 1735 1.512 0.3257 Maximum 10.4061 4.24837 7.4410 8.1230 25th 7.096 4.586 6.466 S.1033 percentile GA% GC 9/o GG 96. GT 9 Median 9.106 S.O16 7.279 6.8695 Mean 8.976 S.OS4 7.184 7.01.08 Minimum O.OOOOO OOOOO OOOOOO O.OOOO 75th 10.939 5.494 7.969 8.7913 25th O.O8696 OOOOO OOOOOO O.9372 60 percentile percentile Maximum 17.922 13.816 12.232 23.1788 Median O.32616 OOOOO OOOOOO 2.2408 Mean O.63347 O.2104 O.14123 3.2117 CA% CC% CG 96. CT 9 75th O.83333 O.1914 O.O9422 4.7718 percentile Minimum 3.571 O.8278 OOOOO 1.512 Maximum 5.77889 9.8795 4.24837 12.94.79 65 25th 6.765 4.1077 O.4727 6.466 percentile US 8,252,917 B2 31 32 TABLE 5-continued using its default settings. Analysis of the human sequences yielded a total of 12 S/MARS predicted (representing a total Dinucleotide percentages over the 1757 nonSMAR sequences Summary length of 4 750 bp), located on 5 different intergenic Sequences. Median 7.410 5.5556 O.8439 7.279 Among the three human intergenic sequences predicted to Mean 7.411 S.9088 1.2707 7.184 75th 8.010 7.2460 1.5760 7.969 contain a “super S/MAR using SMARSCAN stringent set percentile tings, one of the corresponding mouse orthologous intergenic Maximum 15.714 20.841S 12.6074 12.232 sequence is also predicted to contain a S/MAR (human EMBL ID: Z96050, position 28 010 to 76951 othologous to GA% GC 9/o GG 9/o GT 9 10 mouse EMBL ID: AC015932, positions 59.884 to 89963). Minimum 1.319 O.4967 O.8278 1735 When a local alignement of these two orthologous intergenic 25th S.495 3.2615 4.1077 4.586 percentile sequences is performed, the best local alignement of these Median 6.032 4.4092 5.5556 S.O16 two big regions correspond to the regions predicted by SMAR Mean 6.06S 4.7468 S.9088 S.OS4 15 SCAN to be S/MAR element. A manual search for the mouse 75th 6.6O2 5.8824 7.2460 5.494 percentile orthologs of the two other human intergenic sequences pre Maximum 10.423 16.OOOO 20.841S 13.816 dicted to contain a “super S/MAR was performed using the Ensembl Genome Browser. The mouse orthologous inter TA% TC9% TG 96. TT 9 genic sequences of these two human sequences were Minimum O.OOO 1.319 3.571 O.OOO retrieved using Ensembl orthologue predictions (based on 25th 3.876 S.495 6.765 7.096 gene names), searching the orthologous mouse genes for the percentile pairs of human genes flanking these intergenic regions. Median 5.625 6.032 7.410 9.106 Because SMAR SCAN has been tuned for human Mean 5.774 6.06S 7.411 8.976 75th 7.464 6.6O2 8.010 10.939 sequences and consequently yields little “super'MARs with percentile 25 mouse genomic sequences, its default cutoff values were Maximum 24.338 10.423 15.714 17.922 slightly relaxed for the minimum size of contiguous hits to be considered as S/MAR (using 200 bp instead of 300 bp). Four of the novel super MARs were randomly picked and Analysis by SMAR SCAN of these mouse sequences pre analyzed for AT and TA dinucleotide content, and compared dicted several S/MARs having high values for the different with the previously known chicken IysMAR, considering 30 computed structural features. This finding Suggests that the windows of 100 base pairs (Table 6). human MAR elements are conserved across species. Surprisingly, Applicants have shown that all of the super MARs have AT dinucleotide frequencies greater then 12%, Example 7 and TA dinucleotides greater than 10% of the total dinucle otides analysed in a window of 100base pairs of DNA. The 35 Dissection of the Chicken Lysozyme Gene 5'-MAR most efficient MARs display values around 34% of the two dinucleotide pairs. The 3000 5'-MAR was dissected into smaller fragments that were monitored for effect on transgene expres TABLE 6 40 sion in Chinese hamster ovary (CHO) cells. To do so, seven Summary of% AT and TA dinucleotide frequencies fragments of ~400 bp were generated by polymerase chain of experimentally verified MARS reaction (PCR). These PCR-amplified fragments were con CLysMAR (average AT 9/o: 12.03 TA%: 10.29 tiguous and cover the entire MAR sequence when placed of CUEs) end-to-end. Four copies of each of these fragments were P1 68 AT 9%:33.78 TA%:33.93 SEQID No. 25 45 ligated in a head-to-tail orientation, to obtain a length corre P1 6 AT 96: 34.67 TA%: 34.38 SEQID No. 24 P1 42 AT 9.6:35.65 TA%:35.52 SEQID No. 26 sponding to approximately half of that of the natural MAR. Mean value for all AT 9:34.32 TA 9:35.29 The tetramers were inserted upstream of the SV40 promoter human “super'MARS in pGEGFPControl, a modified version of the pGL3Control Mean value for all human AT 9/o: 701 TA 9/o: S.77 non-MARS vector (Promega). The plasmid pGEGFPControl was created 50 by exchanging the luciferase gene of pCL3Control for the EGFP gene from pEGFP-N1 (Clontech). The 5'-MAR-frag 6.4. Analysis of Orthologous Intergenic Regions of Human ment-containing plasmids thus created were co-transfected and Mouse Genomes with the resistance plasmid pSVneo in CHO-DG44 cells In order to get an insight on S/MAR evolution, orthologous using Lipofect Amine 2000 (Invitrogen) as transfection intergenic regions of human and mouse genomes have been 55 analysed with SMARSCAN. The data set used is composed reagent, as performed previously (Zahn-Zabal, M., et al., of 87 pairs of complete orthologous intergenic regions from “Development of stable cell lines for production or regulated the human and mouse genomes (Shabalina SA, Ogurtsov AY. expression using matrix attachment regions' J Biotechnol, Kondrashov V A, Kondrashov A S, Selective constraint in 2001. 87(1): p. 29-42.). After selection of the antibiotic intergenic regions of human and mouse genomes, Trends 60 (G-418) resistant cells, polyclonal cell populations were ana Genet, 17(7):3736, 2001) (average length -12 000 bp) lyzed by FACS for EGFP fluorescence. located on 12 human and on 12 mouse chromosomes, the Transgene expression was expressed at the percentile of Synteny of these sequences was confirmed by pairwise high expressor cells, defined as the cells which fluorescence sequence alignment and consideration of the annotations of levels are at least 4 orders of magnitude higher than the the flanking genes (experimental or predicted). 65 average fluorescence of cells transfected with the pGEGFP Analysis of the 87 human and mouse orthologous inter Control vector without MAR. FIG.5 shows that multimerized genic sequences have been analysed with SMAR SCAN fragments B, K and Fenhance transgene expression, despite US 8,252,917 B2 33 34 their shorter size as compared to the original MAR sequence. instance, peaks of activity obtained with MAR Finder did not In contrast, other fragments are poorly active or fully inactive. clearly match active MAR sub-portion, as for instance the B fragment is quite active in vivo but scores negative with MAR Example 8 Finder (FIG. 8B, compare the top and middle panels). Bent DNA structures, as predicted by this program, did not corre Specificity of B, K and F Regions in the MAR late well either with activity (FIG. 8B, compare the top and Context bottom panels). Similar results were obtained with the other available programs (data not shown). The 5'-MAR was serially deleted from the 5'-end (FIG. 6, The motifs identified by available MAR prediction com upper part) or the 3'-end (FIG. 6, lower part), respectively. 10 puter methods are therefore unlikely to be the main determi The effect of the truncated elements was monitored in an nants of the ability of the cIysMAR to increase gene expres assay similar to that described in the previous section. FIG. 6 sion. Therefore, a number of other computer tools were shows that the loss of ability to stimulate transgene expres tested. Surprisingly, predicted nucleosome binding sion in CHO cells was not evenly distributed. sequences and nucleosome disfavouring sequences were In this deletion study, the loss of MAR activity coincided 15 with discrete regions of transition which overlap with the found to be arranged in repetitively interspersed clusters over 5'-MAR B-, K- and F-fragment, respectively. In 5' deletions, the MAR, with the nucleosome favouring sites overlapping activity was mostly lost when fragment K and F were the active B, K and F regions. Nucleosome positioning removed. 3' deletions that removed the F and b elements had sequences were proposed to consist of DNA stretches that can the most pronounced effects. In contrast, flanking regions A, easily wrap around the nucleosomal histones, and they had D, E and G that have little or no ability to stimulate transgene not been previously associated with MAR sequences. expression on their own (FIG. 5), correspondingly did not Nucleosome-favouring sequences may be modelled by a contribute to the MAR activity in the 5’- and 3'-end deletion collection of DNA features that include moderately repeated studies (FIG. 6). sequences and other physico-chemical parameters that may 25 allow the correct phasing and orientation of the DNA over the Example 9 curved histone surface. Identification of many of these DNA properties may be computerized, and up to 38 different such Structure of the F Element properties have been used to predict potential nucleosome positions. Therefore, we set up to determine if specific com The 465 bp F fragment was further dissected into smaller 30 ponents of nucleosome prediction programs might correlate sub-fragments of 234, 243, 213 bp and 122, 125 and 121 bp. with MAR activity, with the objective to construct a tool respectively. Fragments of the former group were octamer allowing the identification of novel and possibly more potent ized (8 copies) in a head-to-tail orientation, while those of the MARS from genomic sequences. latter group were similarly hexa-decamerized (16 copies), to To determine whether any aspects of DNA primary maintain a constant length of MAR sequence. These elements 35 sequence might distinguish the active B, Kand Fregions from were cloned in pGEGFPControl vector and their effects were the surrounding MAR sequence, we analyzed the 5'-MAR assayed in CHO cells as described previously. Interestingly, with MAR Scan R. SCAN. Of the 38 nucleosomal array pre fragment FIII retained most of the activity of the full-length F diction tools, three were found to correlate with the location fragment whereas fragment FIII, which contains the right of the active MAR sub-domains (FIG. 9A). Location of the hand side part of fragment FIII, lost all the ability to stimulate 40 MAR B, K and F regions coincides with maxima for DNA transgene expression (FIG. 7). This points to an active region bending, major groove depth and minor groove width. A comprised between nt 132 and nt 221 in the FIB fragment. weaker correlation was also noted with minima of the DNA Consistently, multiple copies of fragments FI and FIB, which melting temperature, as determined by the GC content. encompass this region, displayed similar activity. FIA on its Refined mapping over the MARF fragment indicated that the own has no activity. However, when added to FIB, resulting in 45 melting temperature valley and DNA bending summit indeed FIII, it enhances the activity of the former. Therefore FILA correspond the FIB sub-fragment that contains the MAR appears to contain an auxiliary sequence that has little activity minimal domain (FIG. 9B). Thus active MAR portions may on its own, but that strengthens the activity of the minimal correspond to regions predicted as curved DNA regions by domain located in FIB. this program, and we will refer to these regions as CUE-B, Analysis of the distribution of individual motifs within the 50 CUE-K and CUE-F in the text below. Nevertheless, whether lysozyme gene 5'-MAR is shown in FIG. 8A, along with some these regions correspond to actual bent DNA and base-pair additional motifs that we added to the analysis. Most of these unwinding regions is unknown, as they do not correspond to motifs were found to be dispersed throughout the MAR ele bent DNA as predicted by MAR Wiz (FIG.9B). ment, and not specifically associated with the active portions. For instance, the binding sites of transcription factors and 55 Example 10 other motifs that have been associated with MARs were not preferentially localized in the active regions. It has also been Imprints of Other Regulatory Elements in the F proposed that active MAR sequences may consist of combi Fragment nation of distinct motifs. Several computer programs (MAR Finder, SMARTest, SIDD duplex stability) have been 60 Nucleosome positioning features may be considered as one reported to identify MARS as regions of DNA that associate of the many specific chromatin codes contained in genomic with the DNA matrix. They are usually based on algorithms DNA. Although this particular code may contribute to the that utilizes a predefined series of sequence-specific patterns activity of the F region, it is unlikely to determine MAR that have previously been Suggested as containing MAR activity alone, as the 3' part of the F region enhanced activity activity, as exemplified by MAR Finder, now known as MAR 65 of the minimal MAR domain contained in the FIB portion. Wiz. The output of these programs did not correlate well with Using the Matinspector program (Genomatix), we searched the transcriptionally active portions of the cysMAR. For for transcription factor binding sites with scores higher than US 8,252,917 B2 35 36 0.92 and found DNA binding sequences for the NMP4 and NMP4, MEF-2, Sat31, and/or polyPpolyO proteins consti MEF2 proteins in the 3' part of the F fragment (FIG. 8B). tute potent artificial MAR sequences. To determine whether any of these transcription factor binding sites might localize close to the B and K active Example 12 regions, the entire 5'-MAR sequence was analyzed for bind ing by NMP4 and MEF2 and proteins reported to bind to Expression Vectors single-stranded or double-stranded form of BURs. Among those, SATB1 (special AT-rich binding protein 1) belongs to a Three expression vectors according to the present inven class of DNA-binding transcription factor that can either acti tion are represented on FIG. 12. vate or repress the expression of nearby genes. This study 10 Plasmid pPAG01 is a 5640 bp pUC19 derivative. It con tains a 2960 bp chicken DNA fragment cloned in BamH1 and indicated that specific proteins such as SATB1, NMP4 Xbal restriction sites. The insert comes from the border of the (nuclear matrix protein 4) and MEF2 (myogenic enhancer 5'-end of the chicken lyZozyme locus and has a high A/T- factor 2), have a specific distribution and form a framework COntent. around the minimal MAR domains of cIysMAR (FIG. 10). 15 Plasmid pGEGFP (also named pSV40EGFP) control is a The occurrence of several of these NMP4 and SATB1 binding derivative of the pGL3-control vector (Promega) in which the sites has been confirmed experimentally by the EMSA analy luciferase gene sequence has been replaced by the EGFP gene sis of purified recombinant proteins (data not shown). sequence form the pEGFP-N1 vector (Clontech). The size of pGEGFP plasmid is 4334 bp. Example 11 Plasmid pUbCEGFP control is a derivative of the pGL3 with an Ubiquitin promoter. Construction of Artificial MARs by Combining Plasmid pPAG01GFP (also named pMAR-SV40EGFP) is Defined Genetic Elements a derivative of pGEGFP with the 5'-Lys MAR element cloned in the MCS located just upstream of the SV40 promoter. The To further assess the relative roles of the various MAR 25 size of the pPAG01 EGF plasmid is 7285 bp. components, the cIysMAR was deleted of all three CUE regions (FIG. 11, middle part), which resulted in the loss of Example 13 part of its activity when compared to the complete MAR sequence similarly assembled from all of its components as a Effect of the Additional Transfection of Primary control (FIG. 11, top part). Consistently, one copy of each 30 Transfectant Cells on Transgene Expression CUE alone, or one copy of each of the three CUEs assembled head-to-tail, had little activity in the absence of the flanking One day before transfection, cells were plated in a 24-well sequences. These results strengthen the conclusion that opti plate, in growth medium at a density of 1.35x10 cells/well mal transcriptional activity requires the combination of CUES for CHO-DG44 cells. 16 hours post-inoculum, cells were 35 transfected when they reached 30-40% confluence, using with of flanking sequences. Interestingly, the complete MAR Lipofect-AMINE 2000 (hereinafter LF2000), according to sequence generated from each of its components, but contain the manufacturers instructions (Invitrogen). Twenty-seven ing also BgIII-BamHI linker sequences (AGATCC) used to microliters of serum free medium (Opti-MEM; Invitrogen) assemble each DNA fragment, displayed high transcriptional containing 1.4 ul of LF2000 were mixed with 27 Jul of Opti activity (6 fold activation) as compared to the 4.8 fold noted 40 MEM containing 830 ng of linear plasmid DNA. The antibi for the original MAR element in this series of assays (see FIG. otic selection plasmid (pSVneo) amounted to one tenth of the 5). reporter plasmid bearing the GFP transgene. The mix was We next investigated whether the potentially curved DNA incubated at room temperature for 20 min, to allow the DNA regions may also be active in an environment different from LF2000 complexes to form. The mixture was diluted with 300 that found in their natural MAR context. Therefore, we set up 45 ul of Opti-MEM and poured into previously emptied cell to swap the CUE-F, CUE-B and CUE-K elements, keeping containing wells. Following 3 hours incubation of the cells the flanking sequences unchanged. The sequences flanking with the DNA mix at 37°C. in a CO, incubator, one ml of the CUE-F element were amplified by PCR and assembled to DMEM-based medium was added to each well. The cells bracket the various CUEs, keeping their original orientation were further incubated for 24 hours in a CO, incubator at 37° and distance, or without a CUE. These engineered ~1.8 kb 50 C. The cells were then transfected a second time according to MARs were then assayed for their ability to enhance trans the method described above, except that the resistance plas gene expression as above. All three CUE were active in this mid carried another resistance gene (pSVpuro). Twenty-four context, and therefore there action is not restricted to one hours after the second transfection, cells were passaged and given set of flanking sequences. Interestingly, the CUE-K expanded into a T-75 flask containing selection medium 55 supplemented with 500 ug/ml G-418 and 5 g/ml puromycin. element was even more active than CUE-F when inserted After a two week selection period, stably transfected cells between the CUE-F flanking sequences, and the former com were cultured in 6-well plates. Alternatively, the cell popula posite construct exhibited an activity as high as that observed tion was transfected again using the same method, but for the complete natural MAR (4.8 fold activation). What pTKhygro (Clontech) and pSVdhfr as resistance plasmids. distinguishes the CUE-Kelement from CUE-F and CUE-B is 60 The expression of GFP was analysed with Fluorescence the presence of overlapping binding sites for the MEF-2 and activated cell sorter (FACS) and with a Fluoroscan. Sat31 proteins, in addition to its CUE feature. Therefore, FIG. 13 shows that the phenotype of the twice-transfected fusing CUE-B with CUE-F-flanking domain results in a cells (hereafter called secondary transfectants) not only was higher density of all three binding sites, which is likely expla strongly coloured, such that special bulb and filter were not nation to the increased activity. 65 required to visualize the green color from the GFP protein, but These results indicate that assemblies of CUEs with also contained a majority of producing cells (bottom right sequences containing binding sites for proteins such as hand side FACS histogram) as compared to the parental popu US 8,252,917 B2 37 38 lation (central histogram). This level of fluorescence corre Capecchi, High frequency targeting of genes to specific sites sponds to specific cellular productivities of at least 10pg per in the mammalian genome. Cell, 1986. 44(3): p. 419-28.. cell per day. Indeed, cells transfected only one time (primary Thus, the results might indicate that the MAR element sur transfectants) that did not express the marker protein were prisingly acts to promote such recombination events. MARS almost totally absent from the cell population after re-trans would not only modify the organization of genes in Vivo, and fection. Bars below 10' units of GFP fluorescence amounted possibly also allow DNA replication in conjunction with viral 30% in the central histogram and less than 5% in the right DNA sequences, but they may also act as DNA recombination histogram. This suggested that additional cells had been signals. transfected and successfully expressed GFP. Strikingly, the amount of fluorescence exhibited by re 10 Example 14 transfected cells Suggested that the Subpopulation of cells having incorporated DNA twice expressed much more GFP MARS Mediate the Unexpectedly High Levels of than the expected two-fold increase. Indeed, the results Expression in Multiply Transfected Cells shown in Table 2 indicate that the secondary transfectants 15 exhibited, on average, more than the two-fold increase of If MAR-driven recombination events were to occur in the GFP expected if two sets of sequences, one at each Successive multiple transfections process, we expect that the synergy transfection, would have been integrated independently and between the primary and secondary plasmid DNA would be with similar efficiencies. Interestingly, this was not dependent affected by the presence of MAR elements at one or both of on the promoter sequence driving the reporter gene as both the transfection steps. We examined this possibility by mul tiply transfections of the cells with pMAR alone or in com viral and cellular promoter-containing vectors gave a similar bination with various expression plasmids, using the method GFP enhancement (compare lane 1 and 2). However, the described previously. Table 3 shows that transfecting the cells effect was particularly marked for the MAR-containing vec twice with the pMAR-SV40EGFP plasmid gave the highest toras compared to plasmids without MAR-(lane 3), where the expression of GFP and the highest degree of enhancement of two consecutive transfections resulted in a 5.3 and 4.6 fold 25 increase in expression, in two distinct experiments. all conditions (4.3 fold). In contrast, transfecting twice the vector without MAR gave little or no enhancement, 2.8-fold, instead of the expected two-fold increase. We conclude that TABLE 7 the presence of MAR elements at each transfection step is Effect of re-transfecting primary transfectants at 24 hours necessary to achieve the maximal protein synthesis. interval on GFP expression. 30 Type of Primary Secondary EGFP fluorescence TABLE 8 plasmids transfection transfection Fold increase Primary transfection Secondary transfection pUbCEGFP 4,992 14,334 2.8 pSV40EGFP 4,324 12,237 2.8 35 EGFP- EGFP pMAR-SV40EGFP 6,996 36,748 5.3 fluores- fluores- Fold pUbCEGFP 6,452 15,794 2.5 Type of plasmid cence Type of plasmid cence increase pSV40EGFP 4,433 11,735 2.6 pMAR O pMAR O O pMAR-SV40EGFP 8,116 37.475 4.6 pSV40EGFP 15,437 2.3-2.5 pMAR-SV40EGFP 30.488 2.6-2.7 Two independent experiments are shown, 40 The resistance plasmid pSVneo was co-transfected with various GFP expression vectors. pMAR-SV40EGFP 11,278 pMAR-SV40EGFP 47,027 4.3-5.3 One day post-transfection, cells were re-transfected with the same plasmids with the differ pMAR 12,319 1.0-1.1 ence that the resistance plasmid was changed for pSVpuro, pSV40EGFP 6,114 pSV40EGFP 17,200 2.8 Cells carrying both resistance genes were selected on 500 ugml G-418 and 5 ugml puromycin and the expression of the reporter gene marker was quantified by Fluoroscan, pMAR 11,169 1.8-2.3 The fold increases correspond to the ratio of fluorescence obtained from two consecutive transfections as compared to the sum of fluorescence obtained from the corresponding independent transfections, The fold increases that were judged significantly higher are shown in bold, and correspond 45 Interestingly, when cells were first transfected with pMAR to fluorescence values that are consistently over 2-fold higher than the addition of those alone, and then re-transfected with pSV40EGFP or pMAR obtained from the independent transfections, SV40EGFP, the GFP levels were more than doubled as com The increase in the level of GFP expression in multiply pared to those resulting from the single transfection of the tranfected cells was not expected from current knowledge, later plasmids (2.5 and 2.7 fold respectively, instead of the and this effect had not been observed previously. 50 expected 1-fold). This indicates that the prior transfection of Taken together, the data presented here Support the idea the MAR can increase the expression of the plasmid used in that the plasmid sequences that primarily integrated into the the second transfection procedure. Because MARS act only host genome would facilitate integration of other plasmids by locally on chromatin structure and gene expression, this homologous recombination with the second incoming set of implies that the two types of DNA may have integrated at a plasmid molecules. Plasmid recombination events occur 55 similar chromosomal locus. In contrast, transfecting the GFP within a 1-h interval after the plasmid DNA has reached the expression vectors alone, followed by the MAR element in nucleus and the frequency of homologous recombination the second step, yielded little or no improvement of the GFP between co-injected plasmid molecules in cultured mamma levels. This indicates that the order of plasmid transfection is lian cells has been shown to be extremely high, approaching important, and that the first transfection event should contain unity (Folger, K. R. K. Thomas, and M. R. Capecchi, Non 60 a MAR element to allow significantly higher levels of trans reciprocal exchanges of information between DNA duplexes gene expression. coinjected into mammalian cell nuclei. Mol Cell Biol, 1985. If MAR elements favoured the homologous recombination 5(1): p. 59-69, explaining the integration of multiple plasmid of the plasmids remaining in episomal forms from the first copies. However, homologous recombination between newly and second transfection procedures, followed by their co introduced DNA and its chromosomal homolog normally 65 integration at one chromosomal locus, one would expect that occurs very rarely, at a frequency of 1 in 10 cells receiving the order of plasmid transfection would not affect GFP levels. DNA to the most Thomas, K. R. K. R. Folger, and M. R. However, the above findings indicate that it is more favour US 8,252,917 B2 39 40 able to transfect the MAR element in the first rather than in the mosomal transgene locus by homologous recombination and second transfection event. This Suggests the following thereby further increase transgene expression. molecular mechanism: during the first transfection proce When the cells were transfected a third and fourth time dure, the MAR elements may concatemerize and integrate, at with the pMAR-SV40EGFP plasmid, GFP activity further least in part, in the cellular chromosome. This integrated increased, once again to levels not expected from the addition MAR DNA may in turn favour the further integration of more of the fluorescence levels obtained from independent trans plasmids, during the second transfection procedure, at the fections. GFP expression reached levels that resulted in cells same or at a nearby chromosomal locus. visibly glowing green in day light (FIG. 14). These results further indicate that the efficiency of the quaternary transfec Example 15 10 tion was much higher than that expected from the efficacy of the third DNA transfer, indicating that propertiming between MARS as Long Term DNA Transfer Facilitators transfections is crucial to obtain the optimal gene expression increase, one day being preferred over a three weeks period. If integrated MARS mediated a persistent recombination We believe that MAR elements favour secondary integration permissive chromosomal structure, one would expect high 15 events in increasing recombination frequency at their site of chromosomal integration by relaxing closed chromatin struc levels of expression even if the second transfection was per ture, as they mediate a local increase of histone acetylation formed long after the first one, at a time when most of the (Yasui. D., et al., SATB1 targets chromatin remodelling to transiently introduced episomal DNA has been eliminated. To regulate genes over long distances. Nature, 2002. 419(6907): address this possibility, the cells from Table 3, selected for p. 641-5.). Alternatively, or concomitantly, MARS potentially antibiotic resistance for three weeks, were transfected again relocate nearby genes to subnuclear locations thought to be once or twice and selected for the incorporation of additional enriched in trans-acting factors, including proteins that can DNA resistance markers. The tertiary, or the tertiary and participate in recombination events such as topoisomerases. quaternary transfection cycles, were performed with combi This can result in a locus in which the MAR sequences can nations of pMAR or pMAR-SV40EGFP, and analyzed for 25 bracket the pSV40EGFP repeats, efficiently shielding the GFP expression as before. transgenes from chromatin-mediated silencing effects. TABLE 9 Example 16

Table 9. MARS act as facilitator of DNA integration. 30 Use of MARS Identified with SMAR SCAN II to EGFP- Fold Increase the Expression of a Recombinant Protein Type of plasmid fluorescence increase Four MAR elements were randomly selected from the Tertiary transfection sequences obtained from the analysis of the complete human pMAR 18368 2.2 35 genome sequence with SMAR SCAN or the combined pMAR-SV40EGFP 16544 2.O method. These are termed 1 6, 1 42, 1 68, (where the first Quaternary transfection number represents the chromosome from which the sequence pMAR 43,186 2.4 originates, and the second number is specific to the predicted pMAR-SV40EGFP 140,000 7.6 MAR along this chromosome) and X S29, a “super MAR pMAR-SV40EGFP 91,000 S.S 40 identified on chromosome X. These predicted MARs were pMAR 33,814 2.0 inserted into the pGEGFPControl vector upstream of the SV40 promoter and enhancer driving the expression of the The pMAR-SV40EGFP/pMAR-SV40EGFP secondary green fluorescent protein and these plasmids were transfected transfectants were used in a third cycle of transfection at the into cultured CHO cells, as described previously (Zahn-Za end of the selection process. The tertiary transfection was 45 bal, M., et al., Development of stable cell lines for production accomplished with pMAR or pMAR-SV40EGFP and or regulated expression using matrix attachment regions. J pTKhygro as selection plasmid, to give tertiary transfectants. Biotechnol, 2001. 87(1): p. 29-42). Expression of the trans After 24 hours, cells were transfected again with either plas gene was then analyzed in the total population of stably mid and pSVdhfr, resulting in the quaternary transfectants transfected cells using a fluorescent cell sorter (FACS) 50 machine. As can be seen from FIG. 19, all of these newly which were selected in growth medium containing 500 ug/ml identified MARS increased the expression of the transgene G-418 and 5ug/ml puromycin, 300 g/ml hygromycin Band significantly above the expression driven by the chicken lyso 5 uM methotrexate. The secondary transfectants initially Syme MAR, the “super MARX S29 being the most potent exhibited a GFP fluorescence of 8300. The fold increases of all of the newly identified MARs. correspond to the ratio of fluorescence obtained from two 55 consecutive transfections as compared to the Sum of fluores Example 17 cence obtained from the corresponding independent transfec tions. The fold increases that were judged significantly higher Effect on Hematocrit of in vivo Expression of mEpo are shown in bold, and correspond to fluorescence values that by Electrotransfer of Network System with and are 2-fold higher than the addition of those obtained from the 60 without Human MAR (1-68) independent transfections. These results show that loading more copies of pMAR or The therapeutic gene encodes EPO (erythropoietin), an pMAR-SV40EGFP resulted in similar 2-fold enhancements hormone used for the treatment of anemia. The EPO gene is of total cell fluorescence. Loading even more of the MAR in placed under the control of a doxycycline inducible promoter, the quaternary transfection further enhanced this activity by 65 in a gene switch system described previously called below the another 2.4-fold. This is consistent with our hypothesis that Network system (Imhof. M. O., Chatellard, P., and Mermod, newly introduced MAR sequences may integrate at the chro N. (2000). A regulatory network for efficient control of trans US 8,252,917 B2 41 42 gene expression. J. Gene. Med. 2, 107-116.). The EPO and activator and EPO genes. In each group, half of the mice were regulatory genes are then injected in the muscle of mice using Submitted to doxycycline in drinking water from the begin an in vivo electroporation procedure termed the electrotrans ning of the experiment (day 0 the day of electrotransfer) and fer, so that the genes are transferred to the nuclei of the muscle in the other half, doxycycline was put in drinking water start fibers. When the doxycycline antibiotic is added to the drink- 5 ing at day 21. ing water of the mice, this compound is expected to induce the Blood samples were collected using heparinated capillar expression of EPO, which will lead to the elevation of the ies by retro-orbital punction at different times after the injec hematocrit level, due to the increase in red blood cell counts tion of plasmids. Capillaries were centrifugated 10 minutes at mediated by the high levels of circulating EPO. Thus, if the 5000 rpm at room temperature and the volumetric fraction of MAR improved expression of EPO, higher levels of hemat 10 blood cells is assessed in comparison to the total blood vol ocrit would be expected. ume and expressed as a percentile, determining the hemat In vivo experiments were carried out on 5 week-old ocrit level. C57BL6 female mice (Iffa Credo-Charles River, France). 30 As can be deduced from FIG. 16 The group of mice ug of plasmid DNA in normal saline solution was delivered injected by MAR-network, induced from the beginning of the by trans-cutaneous injections in the tibialis anterior muscle. 15 experiment, display a better induction of the hematocrit in All injections were carried out under Ketaminol (75 mg/kg) comparison of mice injected by original network without and Narcoxyl (10 mg/kg) anesthesia. Following the intramus MAR. After 2 months, haematocrits in “MAR-containing cular injection of DNA, an electrical field was applied to the group' is still at values higher (65%) than normal hematocrit muscle. A voltage of 200V/cm was applied in 8 ms pulses at levels (45-55%). 1 Hz (Bettan M. Darteil R. Caillaud JM, Soubrier F, Delaere 20 More importantly, late induction (day 21) is possible only P. Branelec D, Mahfoudi A, Duverger N, Scherman D. 2000. in presence of MAR but not from mice where the Network “High-level protein secretion into blood circulation after was injected without the MAR. Thus the MAR likely protects electric pulse-mediated gene transfer into skeletal muscle'. the transgenes from silencing and allows induction of its Mol Ther. 2: 204-10). expression even after prolong period in non-inducing condi 16 mice were injected by the Network system expressing 25 tions. EPO without the 1 68 MAR and 16 other mice were injected Overall, the MAR element is able to increase the expres with the Network system incorporating the MAR in 5' of the sion of the therapeutic gene as detected from its increased promoter/enhancer sequences driving the expression of the physiological effect on the hematocrit.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 246

<21 Os SEQ ID NO 1 &211s LENGTH: 32O &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (32O) <223> OTHER INFORMATION: MAR of human chromosome 1, nt from 366 86 to 37008 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (32O) OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 366 86 to 37008

<4 OOs SEQUENCE: 1 ttatattatgttgttatata tattatatta tott attaga titat attatg ttgttatatt 60

attataataat attatatt at at attatata titat attata taatatataa. taat attata 12O

taattatata ttacattata taatatataa taat attata taattatata ttacattata 18O

taatatataa taat attata taataatata taattatata attatataata at attatata 24 O

at attatata at attatata atatataaat atataataat at at attata ttatataata 3 OO

gtatataata ttatataata 32O

SEO ID NO 2 LENGTH: 709 TYPE: DNA ORGANISM: Homo sapiens FEATURE; NAME/KEY: misc binding LOCATION: (1) . . (709) OTHER INFORMATION: MAR of human chromosome 1, nt from 142276 to 142984 US 8,252,917 B2 43 44 - Continued

22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (709) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 142276 to 142984

<4 OOs, SEQUENCE: 2 tacaat at at tittctatt at atatattittg tattatatat aatatacaat at attittcta 6 O ttatatataa tatattttgt attatatata ttacaatata ttttgtatta tataatatat 12 O aatacaat at ataatatatt g tattatata ttatataata caatatatta tatattgtat 18O tatatatt at atataatact atataatata ttg tattata tattatatat aatactatat 24 O aatatattitt attatatatt atatataata caatatataa tatattgtat tataatacaa 3OO tg tattataa totatt at at tdt attatat attatatata atacaatata taataatata 360 ttataatata taataataat ataatataat aataatatat attgt attat at attatata 42O atacaatata taatatattg tattatatat attitt attac atataatata taatacatta 48O tataatatat tttgtatt at atataatata ttittatt atg tattatagat aatatattitt 54 O attatatatt atatataata caatatataa tatattttgt attgtatata atatataata 6OO caatatataa tatattgt at tatatataat attaatatat tttgt attat at atttatat 660 tittatatt at aattatgttt togcattatat attt catatt atatatacc 7 O9

<210s, SEQ ID NO 3 &211s LENGTH: 409 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (409) <223> OTHER INFORMATION: MAR of human chromosome 1 nt from 13 68.659 to 1369 O67 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (409) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 1368.659 to 1369 O67

<4 OOs, SEQUENCE: 3 tacacataaa tacatatgca tatatatt at gtatatatac ataaatacat atgcatatac 6 O attatgtata tatacataaa tacatatgca tatacattat gtatatatac ataaatacat 12 O atgcatatac attatgtata tatacatalaa tacatatgca tatacattat gtatatatac 18O ataaatacat atgcatatac attatgtata tatacataaa tacatatgca tatacattat 24 O gtatatatac ataaatacat atgcatatat tatatacata aattatatta tatacataat 3OO acatatacat at attatgtg tatatataca taaatacata tacatatatt atgtgtatat 360 atacatgata catata cata tattatgt at atatata cat aaatacata 4O9

<210s, SEQ ID NO 4 &211s LENGTH: 394 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (394) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 2839 O89 to 28394.82

<4 OOs, SEQUENCE: 4 tatgtatata tacacacata totatatata cacacatatg tatatacgta tatatgtata 6 O tatacacaca tatgtatata cqtatatatg tatatataca cacatatgta tatacgtata 12 O US 8,252,917 B2 45 46 - Continued tatgtatata tacacacata totatatacg tatatatgta tatatacaca catatgtata 18O tatgtatata tacacacata totatatacg tatatatgta tatatacaca catgtgtata 24 O tatatataca catatgtata tatgtatata tacacacata totatatatg td tatgtata 3OO tatacacaca tatgtatata tacacatata tatgtatata tacacacata cittatatata 360 cacatatata totatatata cacatatgta taca 394

<210s, SEQ ID NO 5 &211s LENGTH: 832 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (832) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 14.52269 to 14531OO

<4 OOs, SEQUENCE: 5 tat attacta tatatacaat atacat atta ctatatatac catgt attac tatatatatic 6 O tactatatat attactatat atacaaaata tatat tact a tatatacaat atacat atta 12 O

Ctatatatac Catatattac tatatatato tactatatat attactatat atacaaaata 18O tat attacta tatatact at at attact.gt atatacaata tat attact a tatatatact 24 O at at attact atatatacac tatat attac tatatataca Calatatatat attactatat 3OO atacacaatig tatataacta tatatacaat at at attact atatatact a tatat attac 360 tata Catact at at attact Ctatatatac alatatatata ttacalatata tactaCatat 42O tact acatat actittatata t tactatata tactatatat tactgtatat acaatatata 48O ttactaaata tacacalatat at attact at atatacacaa tatatatatt actatatata 54 O cacattatat atgactatat atacacacta tatat attac tatatataca caatatataa 6OO ctatatatac acagtataca tattactata tatacacaat atatatatta ctatatatac 660 actatatatt actatatata cacaatatat attactictat gtata cact a tatat attac 72 O tatatataca gaatatatat aactatatat acactatatt actatatata ctatatatta 78O ctatatgtac tatatatatt actatatata ctatatatta citatatatac ac 832

<210s, SEQ ID NO 6 &211s LENGTH: 350 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (350) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 831495 to 831844

<4 OOs, SEQUENCE: 6 aatatataat atataaat at taatatgt at tatataatat at attaatat attat attat 6 O attactatat aaataatatt aatatatt at attaaaatat taataaatat atcat attaa. 12 O at attatatt aattaaat at taataaatat attat attaa tatatttata tattaalacct 18O ataa catatg catatactta tittatatata acatgcatgt act tattitat atatacaata 24 O tatatttata tattatataa tat attatat g tatttatat attatatat catatattata 3OO tg tatttata tattatatat catataatat atatattitat attatatata 350

<210s, SEQ ID NO 7 LENGTH: 386 US 8,252,917 B2 47 48 - Continued

&212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (386) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 1447.225 to 1447610

<4 OO > SEQUENCE: 7 acatttaatt taattatata citgctatata taattaaatc tatatat ct a tataactitat 6 O aatttattitt aatttaatta tatatact at at agittatat atacatatat gtaattatat 12 O at agtata at tatagtatat atgtatatat aatgtaagta aatatatagt atatattitat 18O atatactata tatttataca tatgtc.ttta tatatactaa tatatataca catatgtaat 24 O atgtacatat ggcatatatt ttatagtgta tatatacata tatgtaatat atatagtaat 3OO atgtaaatat at agtacata tittaattata tdgtaatata tacacatata totaatatgt 360 gtattatagt acatattitta tagtat 386

<210s, SEQ ID NO 8 &211s LENGTH: 585 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (585) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 4955365 to 49.55949

<4 OOs, SEQUENCE: 8 atacacacat atacacatat gtacgtatat atactatata tacacacata tacacatatg 6 O tacgtatata tactatatat acaca catat acacatatgt acgtatatat actatatata 12 O cacacatata cacatatgta cqtatatata ctatatatac acacatatac acatatgtac 18O gtatatatac tatatataca cacatataca catatgtacg tatat attat atatacacac 24 O atatacacat atgtacgitat atatactata tatacacaca tatacacata totacgtata 3OO tatact at at atacacacat atacacatat gtacgtatat atactatata tacacacata 360 tacacatatg tacgtatata tactatatat acacacatat acacatatgt acgtatatat 42O actatatata cacacatata cacatatgta cqtatatata citatatatac acacatatac 48O acatatgtac gtatatatac tatatataca cacatataca catatgtacg tatatatact 54 O atatat accc atacacatac gtatatacgt acatatatat acgta 585

<210s, SEQ ID NO 9 &211s LENGTH: 772 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (772) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 5971,862 to 5972633

<4 OOs, SEQUENCE: 9 agtaaacata tatatagitaa atatatatag tdtatatata gtaaatatat at agtgcata 6 O tatatagtgc atatatatag togtatatata gtaaatatat agtgtatata tatagtaaat 12 O atatatagtg tatatatagt aaatatatat agtaaatata tatatactat atatagtaaa 18O tatatatata ctatatatag taaatatata tatagtatat atatagtaaa tatatatata 24 O gtatatatat agtaaatata tatatagitat atatatagta aatatatata tagtatatat 3OO US 8,252,917 B2 49 50 - Continued agtaaatata tatagtatat atatagtaaa tatatatata gtatatatat agtaaatata 360 tatatagitat atatatagta aatatatata tagtatatat at agtaaata tatatagitat 42O atatatagta aatatatata gtatatatat agtaaatata tatagtatat atatagtaaa 48O tatatataca citgtatatat at agtaaata tatatacact gtatatatat agtaaatata 54 O tata cactgt atatatatag taaatatata tacactgtat atatatagta aatatatata 6OO cactgtatat a catagtaaa tatatataca citgtatatac at agtaaata tatatacact 660 gtatatacat agtaaatata tatacactgt atata catag taaatatata tacagtgitat 72 O atacat agta aatatatata cagtgtatat a catagtaaa tatatataca git 772

<210s, SEQ ID NO 10 &211s LENGTH: 304 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (304) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 622.1897 to 62222OO

<4 OOs, SEQUENCE: 10 atatataata tatata atta tattatatat aatatataat atatataatt at attatata 6 O ttatatataa tat attatat attatatata taatatatat tatat attaa atatat atta 12 O tatatataat at at attata tattaalatat at attatata tataatatat attatatata 18O atatatataa tatatatt at atatatatta tat attatat atatatatta tatatatata 24 O atatatataa tatatatt at atataatata tattatatat atataatata tataatatat 3OO atta 3O4.

<210s, SEQ ID NO 11 &211s LENGTH: 311 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (311) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 9418531 to 94 18841

<4 OOs, SEQUENCE: 11 tatatataat atttatatat aatatt catg tatttatata taaat attta tatatttata 6 O tataaatatt tatatattta tatataaata tittatatatt tatatataat atttataCat 12 O tatatataat atttatatat tatatataat atttatatat aatatttata tattatatat 18O aatatttata tatttatatg tataatatat attittatata totatgtata atatatattt 24 O tatatatgta totataatat attittatata totatgtata atatattatt atatataata 3OO tataattitat a 311

<210s, SEQ ID NO 12 &211s LENGTH: 3O2 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (3 O2) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 15 O88789 to 15089 O9 O

<4 OOs, SEQUENCE: 12 atataatata tat attatat atataaatat atataaatat atalacatata tattatatat 6 O US 8,252,917 B2 51 - Continued aaatatatat aaatatataa Catatatatt atatatataa atatatataa atatataa.ca 12 O tatatatt at atatataaat atatataaat atatalacata tat attatat attataaatat 18O at attatata tittatatata taatatatat aaatatataa tatatattta tatatataat 24 O atatataaat atataatata tatatttata tataatatat ataaatatat aatatataat 3OO at 3O2

<210s, SEQ ID NO 13 &211s LENGTH: 461 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (461) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 6791.827 to 6792.287

<4 OOs, SEQUENCE: 13 tatataatat at attatata tacacatata taatatatat tatatataca Catatataat 6 O at at attata tatacacata tataatatat attatatata Cacatatata atatat atta 12 O tatatacaca tatataatat at attatata tacacatata taatatatat tatatataca 18O

Catatataat at at attata tatacacata tataatatat attatatata Cacatatata 24 O atatat atta tatatacaca tatgtaatat at attataca cacacatata atatatatta 3OO tatacacata tataatatat attatatata catatataat at at attata tatacaCata 360 tataatatat attatatata Cacatatata atatatatta tatatacaca tataatatat 42O aatatataca catatataat atatatatta tatatgcaca t 461

<210s, SEQ ID NO 14 &211s LENGTH: 572 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (572) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 163530 to 1641O1

<4 OOs, SEQUENCE: 14 at attata at tatatatatt atatataatt atataaaata tat attataa ttatatatat 6 O tittatataat atatat atta taattaatat attatatata atatatatat tatatataat 12 O atatat atta tatatatt at atataatata tataatatat attaatatata atataatata 18O tat attatat attaatatata atatatataa tat attataa tataatatat attaatatata 24 O atataatata tataatatat aatataatat attaatatata atatatataa tatataatat 3OO aatatataat atatataata tataatataa tatataatat atataatata ttataatata 360 atatatataa tatataatat aatatatata atatataata taatatataa tatataatat 42O at atttaata tatttattaa ttatttgtta tatatttatt aatatataat atataatata 48O tittaatatat tatalactata tattatatta taattatata tattatatat atacaattat 54 O aattatatat tatatatact tataatatat at sf2

<210s, SEQ ID NO 15 &211s LENGTH: 357 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding US 8,252,917 B2 53 - Continued <222s. LOCATION: (1) ... (357) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 1842332 to 1842 688

<4 OOs, SEQUENCE: 15 tatat citata tatat citata tatatataat atagataata t ctatatata taatatagat 6 O aat attat ct atatataata tagataat at tat ctatata taatatagat aat attatct 12 O atatataaaa titat attata totatatata ttatatatat aaaattatat tatat Ctata 18O tataatatag ataatat cita tatataaata gataatat ct atatatataa tatagatatt 24 O atctat atta tagatataga taat attatc tat attatag at attatcta tatataatat 3OO agataatatt at citat atta tatatataat at atctatat tat citataat attat ct 357

<210s, SEQ ID NO 16 &211s LENGTH: 399 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (399) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 23 O956 O to 23 O9958

<4 OOs, SEQUENCE: 16 attatatata atatat atta tat attatat at atcaa.gca gcagatataa tatataatat 6 O atataatata tataatatat attgtatatt atataatata taatatatat aatatatatt 12 O gtatattata taatatataa tatatataat atatattgta tattatataa tatataatat 18O atataatata tattgtatat tatataatat ataatatatg taatatatta totaatatat 24 O tatataatat at attatata ttatatataa tatat attat atataatata tattacatala 3OO tat attacat at attacgta atatatgtta tat attacat ataatatata acatatatta 360 cgtaatatat gtaatatatt acatataata tatacatta 399

<210s, SEQ ID NO 17 &211s LENGTH: 394 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (394) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 223.1759 to 22321-52

<4 OOs, SEQUENCE: 17 atatataCtt atalaattata tactitatata tactitataaa ttatataCtt attatataCtt 6 O atalaattata tactitatata tactitatalaa ttatataCtt atatataCtt attalaattata 12 O tact tatata tactitatalaa ttatataCtt atatataCtt atalaattata tacttatata 18O tact tatalaa ttatataCtt atatataatt atalaattata tactitatata taattataaa. 24 O ttatataCtt atatataatt atalaattata tactitatata taattataala ttatataCtt 3OO atatataatt atalaattata taCatatata taattataaa ttatataCat attata attat 360 aaattatata catatata at tataa attat attac 394

<210s, SEQ ID NO 18 &211s LENGTH: 387 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (387) US 8,252,917 B2 55 - Continued OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 74 O6524 to 7406910

SEQUENCE: 18 tat attatat attaatatata ttatatataa tataaataat at at attata tataatatat 6 O aaataatata taatatataa attaatatata atatataata tataaataat attataatata 12 O taaCatataa attaatatata taatatataa attaatatata taatatataa attaatatata 18O taatatataa aaatatataa tatataatac atatataaat aatat attat attatatatg 24 O atacataata tattatatat aatatatt at atgatacata atatattata tagaatatat 3OO tatatgatac ataatatatt atatagaata tattatatga tacataatat attatatgat 360 acatalatata ttatatataa tat atta 387

SEQ ID NO 19 LENGTH: 370 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (370) OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 93.995.72 to 93.9994.1

SEQUENCE: 19 catatataca tatatacaca tatatacaca tatatataca catacatatg tacacatata 6 O tatacacata totatacaca tatatacaca tatatacaca catatataca catatataca 12 O

Cacatatata Cacatatata Cacatatata Cacatataca Catatataca Catatataca 18O tatatacaca tatatataat atacacacat atatatacac atatatacac acatatatac 24 O acatatatac acatatatat acacatatat acacatatat acatatatac acatatatat 3OO acatatatac acatatatac atatatacac atatataCat atatacacac attatatacac 360 atacatataC 37 O

SEQ ID NO 2 O LENGTH: 377 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (377) OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 12417411 to 12417787

SEQUENCE: 2O attatatata atacatataa ttatatattt atatataaat tataataaat acatata att 6 O at at attitat atatalaatta tatataataa atacatataa ttacatatat ttatalaatta 12 O taataaatac atataattac atatatttat atatgaatta tatataataa atacatataa 18O ttatatatat ttatatgtag attatatata aatatatata atttatatat ataataatat 24 O atataattta tatatata at tatatatata ataaatatat attaatttata tatataatta 3OO tatatataat aaatatataa taatatatat aatttatata tataattata tatataataa. 360 atatatataa tittatat 377

SEQ ID NO 21 LENGTH: 1524 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding US 8,252,917 B2 57 - Continued <222s. LOCATION: (1) . . (1524) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 1643307 to 1644 830

<4 OOs, SEQUENCE: 21 tataaatata tataaatata taalatatata taalatatata aatatatata aatatatata 6 O aatatataala aatatatalaa tatatatalaa tatatataaa tatataaaaa Cataaaaata 12 O tatataaata tatataaata tataaaaata tataaatata taalatatata aaaatataca 18O aatatatalaa tatata cata aatatatata aatatatata aatatataala aatatatata 24 O aatatatalaa tatatatalaa tatatatalaa tatatataaa tatataaaaa tatatataaa. 3OO tatataaata tataaaaata tatataaata tataaatata taalatatata taalatatata 360 aatatataaa taaatataag tatt tatgaa tatatatgaa tatataaata tataaaaaat 42O atatataaat atataaatat atataaatat ataaatatat acatatatac attatatalaat 48O aaataaatat aagtattitat gaatatatat gaatatataa atatataaaa aatatatata 54 O aatatatalaa tatatatalaa tataaatata taaaaatata taaaaatata tataaatata 6OO taalatatata taalatatata aatatatata aatatatata aatatataaa tatatataaa. 660 tatatatalaa tatataaata tataaatata tataaatata tataaatata taalatatata 72 O aatataaata tataaatata tataaatata tataaatata taalatatata taalatatata 78O taalatatata taalatatata taalatatata aatatatata aatatatata taalatatata 84 O taalatatata aatatatalaa tatataaaaa tatataacaa tatataaata tatataaaaa. 9 OO tatataacaa tatataaata taalatatata taaaaatata taacaatata taalatataaa. 96.O tatatatalaa tatataaata taalatataala aaatatatat aaatatataa atatatataa. O2O atatataaat gtataaatat atataaaaat atataacaat atataaatat ataaatatat O8O aacaatatat aaatatataa aaatatataa Calatatataa atataaatat attataaaaat 14 O atataacaat atataaatat aaatatatat ataaatatat aaatataaat attaaaaaata 2OO tatataaata tataaatata tatataaata tatataaata tataaatgta taaatatata 26 O taalatatata aatatataala aatatatalaa tatatataaa tatatataaa tatataaata 32O taalatatata aatatatata aatatatalaa tataaatata taalacatata taalatatata 38O taaataaa.ca tatataaaga tatataaaga tataaagata tataaatata taaatatata 44 O aagatatata aatatataaa gatatataaa tatataaaga tatataaata tataaagata SOO tataaatata atatataaat at at 524

<210s, SEQ ID NO 22 &211s LENGTH: 664 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (664) <223> OTHER INFORMATION: MAR of human chromosome 1, genomic contig; 1398763 to 1399.426

<4 OOs, SEQUENCE: 22 acacatatat atataaaata tatatatata Cacacatata tataaaatat attatatatac 6 O acacatatat ataaaatata tatatacaca Catatatata aaatatatat atacacacat 12 O atatataaaa tatatatata Cacacatata tataaaatat atatatacac acatatatat 18O aaaatatata tatacacaca tatatatalaa atatatatat acacacatat attataaaata 24 O tatatataca Cacatatata taaaatatat atatacacac atatatataa aatatatata 3OO US 8,252,917 B2 59 60 - Continued tacacacata tatataaaat atatatatac acacatatat aaaatatata tatacacaca 360 tatataaaat atatatatac acatatatat aaaatatata tatacacata tatataaaat 42O atatatacac acatatatat aaaatatata tatacacaca tatatataala atatatatat 48O acacatatat ataaaatata tatatacaca tatatataala atatatatat ataca Catat 54 O atataaaata tatatacaca catatatata aagtatatat atacacacat atatataaaa 6OO tatatatata Cacatatata taaaatatat atatacacat atatataaaa tatatatata 660

Caca 664

<210s, SEQ ID NO 23 &211s LENGTH: 1428 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1428) <223> OTHER INFORMATION: MAR of human , genomic contig; 1784 O365 to 17841792

<4 OOs, SEQUENCE: 23 aatttatt at at attatata ttatatatat tatatatatt at at attata tat attatat 6 O at attatata ttatatatat tatatatt at a tatttatat attaatatata totaatatat 12 O at attagata taatatatat ctaatatata tatattittat atatataata tat ct ctaat 18O atatatattt tatatgtata taatatat ct ctaatatata tatatttitat atgtatataa 24 O tatatoticta atatatatat tttittatata taatlatlatct Ctaatatata tattittatat 3OO atataatata tat ctaatat atataatata tat attagat atatataaaa tatatatgat 360 at attt atta tatatataat atataatata taatatatat attat attat atacatatat 42O attatataca atatat atta tatatattitt atata Catta tat attatat a tattittata 48O tacalatatat attatatatt ttatatacaa tatat attat atatattitta tatttittata 54 O tacalatatat attatatata ttittatatat aatatatatt atatatattt tatataatat 6OO at attatata tattittatat attaatatata ttatataaat tatatataat at at attata 660 ataaattata at atttittta tatatataat atgitattitta tatataatat attataatat 72 O at attittata tataatatat tataatatat attittatata taatatatta taatatatat 78O tittatatt at aatatatt at aatatatatt ttatatataa tat attataa tatatattitt 84 O atatataata tattataata tatattittat atataatata ttataatata tat attataa. 9 OO tatatattitt atatataata tattatcata tatat attaa atatatattt tatatataat 96.O at attataat atatat atta taatatatat tittatatata at at attata atatatatat 1 O2O tataatatat attittatata taatatatta taatatatat tittatatata at at attata 108 O atatatattt tatatataat at attataat atatattitta tatataatat aatatatatt 114 O ttatatataa tat attataa tatatattitt atatataata tattataata tatatttitat 12 OO atataatata ttataatata tattittatat attaatatatt attaatatata ttittatatat 126 O aatatatt at aatatatatt ttatatataa tat attataa tatatattitt attatataata 132O tattataata tatatttitat atataatata ttaattaaat ttattaattit attaattatt 1380 aatatt tatt at attattaa ttaataatat atalaattatt aatatata 1428

<210s, SEQ ID NO 24 &211s LENGTH: 4624 212. TYPE : DNA

US 8,252,917 B2 63 64 - Continued atgitatgttt acg tatgtgt atgtt tatgc atatgttata ggtttaatat at attaatat 222 O atataatata taatatataa at attaatat g tattatata atatatatta atatattata 228O titat attact atataaataa tattaatata titat attaala at attaataa atatatoata 234 O ttaaat atta tattaattaa at attaataa at at attata ttaatatatt tatat attaa. 24 OO acctataa.ca tatgcatata cittatttata tataa catgc atgtactitat ttatatatac 246 O aatatatatt tatatatt at ataatatatt atatgtattt atatattata tat catatat 252O tatatgtatt tatatatt at at at catata atatatatat titat attata tat attatat 2580 gatatataat attatataat g tattaatat at attaalacc tatatttata attctggact 264 O cact attttgttt cattggt gttctgttgttgt atcta accct atgcc-aataa totact atct 27 OO taattaccat agctittatag taagctittga aat cagatag togt attittitt at cattgttt 276 O tittaaaataa tag tittatct ttittatttga atttgtaatc agctagt cag tittctgcaaa 282O aagct tactg ggattittgct toggaattatgttacatctgt agcatgtact atccalatatt 288O ctagocttta t coacatgtg gct attaagg tittaaattaa ttaaattaaa atttaattaa 294 O ttaaaattaa aacttaataa ttggttcct c attcacacta ccatatgtca agtgttcaat 3 OOO agccacatat gigt caatgtc. ttggaaaagt caatacagta cattt coatt attgcagtaa 3 O 6 O gttctgtcaa acagcact at C9tagaccga ttaggagaga actgacttaa cagtattgga 312 O tgct coagtic aatgaa catc ttitttitttitt to atttattt cagtagt ct c togcagtatat 318O tatagatttic agtttacata ttittgcatat attitt attaa atgtataacg gtagaagitac 324 O tatt attgga tigatgtgttc tatagatgta ttt taggtoa agtttgttga tagtgttgtt 33 OO taaatctogt atacct ctitg atttittitt at titacttgttc tittgaattac tdaga cagga 3360 atgttatat c cittaactata tttgttgaatt tatt cacttic titcct tcagt totgttaact 342O tittgct tagg togctttittaa aaatgaaact ttcaatctot gcc ttittaat tdtag cattt 3480 agac cattta cattcaatgt aattatcaat at cagttitat ttaagttctga agttgtgcaa. 354 O tttitt.cct ct accitat atta taaatctitt c tatatacaaa acacatgcta tdttittctgc 36OO atatgttitta aatgacaccc ggaaagcatt gacactattt ttgctittagg ttatctitt.ca 366 O aagatgttaa aaatgaga aa gaaat attct gcatttatcc atacactitat tatttgcaaa 372 O ggtttittitta aatacctittg togtagatttic agittaccaac ttg tattitcc titcagcttga 378 O agaact taca atttcttgta gga caggit ct ctdacaacaa attat ct cag cittitt ctittg 384 O tctaaaaaag titattgcctt tatttittaaa atatattitt.c actggatatt gaattittagg 3900 tgataatctt tttitttitttgttago actitt aaatatgtct tctaatgtcc ticttgctitt c 396 O at agtttctg atgagaagtic tactgttatt agtat ct citt tdtgttgttgtc. tct ctitttitt 4 O2O c cct citctgc tattatggct atttittttitt tttittttittt ttittggit cac tdgtgtcagc 4 O8O aatttaatta tigtgtgcct totatgttt ttgttgttgttgttgttgttgttg ttgttgttgtg 414 O tgttgttgttgttg tagctgatgt totttgagct ttagaatctg tdagtttgta gttitt catca 42OO attattttitt cittitt cattc cttittattta citcatgttcg tdttt tattt tatatttitta 426 O agaattttgt gcg tatttgt aataactgtt taaatgtcat ttgttgaattic cattgcttct 432O agg taggatt c tattgacag at atttitt to cotgacgaga got catact t t cct tattot 438 O t catgitat ct agtggitttitt ggttgaatac tat attitt gaattitt atg ggagtgctga 4 44 O attctacaat atticcittaaa aatgtgttgg attttgttitt agcagatago tat ct tactt 4500 gaagat caat tt catattitt ttgatgttca ttttitt catt tattaaagaa taggit coatg 456 O US 8,252,917 B2 65 - Continued gtagagttta citgatat caa cct ttctggt gtctictaata aatgcaa.cat attcaataag 462O at CC 4624

<210s, SEQ ID NO 25 &211s LENGTH: 3616 212. TYPE : DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding &222s. LOCATION: (1) ... (3616) <223 is OTHER INFORMATION: MAR 168 of chromosome 1

<4 OOs, SEQUENCE: 25 gactictagat tataccalacc toataaaata agagcatata taaaa.gcaaa tgct cittatc 6 O ttgcagat.co Ctgaactgag gaggcaagat Cagtttggca gttgaag cag Ctggaatctg 12 O

Caattcagag aatctaagaa aagacaac cc tgaagagaga gacccagaaa Cctagoagga 18O gtttct coaa acattcaagg Ctgagggata aatgttacat gCacagggtg agcct Coaga 24 O ggcttgtc.ca ttagcaactg ctacagtttic attat ct cag ggat cacaga ttgtgct acc 3OO tattgcctac catctgaaaa cagttgctitc Citat atttca tccagtttaa tatttattta 360 alaccaagaag gttaatctgg caccagot at gtggatgttga aagtaccaat tccattctgt tttact atta act at cottt gcc ttaatat gtat cagtag gtggcttgtt gctaggaaat attaaatgaa tgg catgttt Cataggttgt gtttaaagtt gtttitttgag 54 O ttaa at Cttt Ctttaataat actittctgat gtcaaaaa.ca cittagaagtic atggtgttga acat Citat at agggittggat ctaaaatago ttct talacct titcCtalacca Ctgtttttgt 660 ttgtttgttt ttalactaagc atccagtttg ggaaattctg aattagggga at cataaaag 72 O gttt cattitt agctgggc.ca Catalaggaaa gtaagat atc aaattgtaaa aatcgittaag aact totato c catctgaag tgtgggittag gtgcct ctitc tctgtgcticc Cttaa.cat CC 84 O tatt titat ct gtatatatat at attct tcc. aaatlatcCat gcatgggaaa aaaaatctga 9 OO toataaaaat attittaggct gggagtggtg gct cacgc.ct gtaatcc cag cactittggga 96.O ggctgaggtg ggcggat cat gaggt calaga gat.cgagacic atcct gacca atatggtgaa acco Catcto tactaaagat acaaaactat tagctggacg tggtggCacg tgcctgtagt cc.ca.gctact cgggaggctg aggcaggaga acggcttgaa CCC aggaggt ggaggttgca 14 O gtgagctgag atcgc.gc.cac tgcacticcag Cctgggcgac agagcgagac tctgtct caa 2OO aaaaaaaata tatatatata tatatataca Catatatata taaaatatat atatatacac 26 O acatatatat ataaaatata tatatataca Cacatatata taaaatatat atatatacac 32O acatatatat aaaatatata tatacacaca tatatataaa. atatatatat acacacatat atataaaata tatatataca Cacatatata taaaatatat atatacacac atatatataa. 44 O aatatatata tacacacata tatataaaat atatatataC acacatatat ataaaatata SOO tatatacaca Catatatata aaatatatat at acacacat atatataaaa. tatatatata 560

Cacacatata tataaaat at atatataCaC acatatataa. aatatatata tacacacata tataaaat at atatataCaC atatatataa. aatatatata tacacatata tataaaatat atatacacac atatatataa. aatatatata tacacacata tatataaaat atatatatac 74 O acatatatat aaaatatata tatacaCata tatataaaat atatatatat acacatatat ataaaatata tatacacaca tatatataaa. gtatatatat acacacatat atataaaata 86 O tatatataca Catatatata aaatatatat ataca Catat atataaaata tatatataca 92 O US 8,252,917 B2 67 68 - Continued catatatata aaaatatata tatatattitt ttaaaatatt cca attgtct cactttgtgg 198O atgagaaaaa gaagtagtta gaggit caagt aacttggcct acatc.ttitt c ticaagattgt 2O4. O aaactic ctag tdagcaataa cca catctitc attitt ctittg tataaaacaa gaaagtttag 21OO Catgaaaaag gtact caatt acaaatgtgt tattgaat talagaccct tdgaagggga 216 O ttttgtacct gaggat ct ct ttcttittggc catattgttcaatggacaaa atttagc citt 222 O cgaaggcagg ccgatttgag gttaatact a cct ttaccac ttgatagct a tigtgaccttg 228O gccatgtggit ttcaacagtic toga acct cat titt citctgtg tatgtgtggit cct cottaca 234 O agtttgttgaaaaatgtgaag to cittagcca tdatagocca atataac agg ctaaatgata 24 OO ataggtttat gttcttitt cotttatatt ct cagataag.ca citgtc.caagt ttgaggtgtt 246 O ttgaggtotc gcc tdatttg gattgtttga gtt tatgcta ttctittgaat t ctittgagct 252O gttctgaagc agtgitat cat gaacaaaaac atc.cccagtt cagtic caaac ccctggittac 2580 atat cattct tatgccatgt tataaccagt ttgagagtgt toccitctgtt attgcattta 264 O agttt cagcc ticacacagaa attcagoagc caatttctaa gcc ctaag.ca taaaatctgg 27 OO ggtgggggggggggatggcc talaga.gcag Cattatgaat agcaccatta taattaatga 276 O t ct ct cagga agatttacaa toacaggtag cagataaaac aaatagtact gcttctgcac 282O titcc cct c ct tittatt cqct atgaaattitt atgggaaatc agt ccagtga aaaatgtaag 288O citcttaatct titcc.ca.gaaa toc tacct catttgatgaat actittgaggg aatgaattag 294 O agcatttittt tottittatag totactitcgc atttacgaag togaggacggit agcttaggct 3 OOO gcctggcc-aa. Ctgatgagaa ggit cagaggc atttittagag acctctgttgtctitt catt C 3 O 6 O atgttcattt to caca aggc aagtaatttic caacaaatca gtgtc.tt cat tagtaataag 312 O attattaa.ca acaataatag toatagtaac tatt cagtga gag to catta tatat caggc 318O attctacaag gtactittata tacatctgag taalacct cac acaattctac agggaggitat 324 O ttctat cocc atttaacaaa taaggaaacg aagtic caagt aaattaactt gcc caaggit c 33 OO acacagatag tacctggcag aac aggaatt taalacct aaa tttgtccaac tocaaaag.ca 3360 gcct tctatt tdttataaat gctgcct citc attat cacat attitt attat taacaacaac 342O aaacatacca attagcttaa gatacaatac aaccagataa totatgatgac aac agtaatt 3480 gttatact at tataataaaa tagatgttitt g tatgttact ataatcttga atttgaatag 354 O aaatttgcat ttctgaaag.c atgttcctgt catctaatat gattctgitat c tattaaaat 36OO agtact acat citagag 3 616

<210s, SEQ ID NO 26 &211s LENGTH: 466 O &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (4 660 <223 is OTHER INFORMATION: MAR 142 of chromosome 1

<4 OOs, SEQUENCE: 26 gatc ccttga gqt cagtagt ttaagaccag cctdaccaac atggtgaaac ccatc to tac 6 O taaaaataca aaaattagcc aggcgtggtg gcgggggcct gta accc.ca.g. Ctact cagga 12 O ggctgaggca Caagaatcto ttgaaccogg gaggcggagg ttgcagtgag ctgagattgt 18O gtcactgcac tocagoctogg gcaa.cagtgc cagacitctgc cittaaaaaaa aaaaaaaaaa 24 O aaaaaaggcc gggcgcggtg gctgacgc.ct gtaatcc cag cactittggga ggc.cgaggcg 3OO US 8,252,917 B2 69 70 - Continued ggtggatcat gaggtoagga gat.cgaga cc acagtgaaac C ccgt.ct cala Ctaaaaatac 360 aaaaaattag ccgggcgcgg toggtgggcgc titg tagt ccc agctact cag gaggctgagg 42O Caggagaatg gcgtgaacct gggaggcgga gcttgcagtg agcc.gagatg gcaccactgc 48O actic cagcct gggcgaaaga gtgagacticc gtc.tcaaaaa aaaaaaaaaa ttagctgggt 54 O atggtggtgc gtgcctgtaa t cc cagot act cqggaggct gaggcaggag aatcc Cttga 6OO acctgggagg C9gaggttgc agtgatctgc catcCt9tca citgcatc act acact coagc 660

Ctgggtgaca gagcdagact Ctgtctcaaa aaaaaaaaaa aaaaaaaaag Ctgggtgtgg 72 O tgg tatgcac cagctgtagt cccagctact togaggctg agttgggggg attgcct gag 78O cCagggagglt c.gaggct tca gggagc.catg attatgccac to actic cag cctgggccac 84 O agagtgaaac Cttctgtcaa aaacaaaaaa acaaaaaaac acagtgtgtt agat cittgct 9 OO agacittggtg atataattaa gaggc catta tdggcagaac tdtgcc.ccct tccaaaattic 96.O atatataaat atatataaat atatataaat atataaatat atataaatat attaaatatat O2O ataaatatat aaatatatat aaatatataa atatatataa atatatataa atatataaaa. O8O atatataaat atatataaat atatataaat atataaaaac ataaaaatat attataaatat 14 O atataaatat ataaaaatat ataaatatat aaatatataa aaatatacaa atatatalaat 2OO atataCatala atatatataa atatatataa atatataaaa atatatataa atatatalaat 26 O atatataaat atatataaat atatataaat atataaaaat atatataaat attataaatat 32O ataaaaatat atataaatat ataaatatat aaatatatat aaatatataa atatatalaat 38O aaatataagt atttatgaat atatatgaat atataaatat ataaaaaata tatataaata 44 O tataaatata tataaatata taalatatata catatataca tatataaata aataaatata SOO agtatt tatgaatatatatgaatatataaa tatataaaaa atatatataa atatataaat 560 atatataaat ataaatatat aaaaatatat aaaaatatat ataaatatat aaatatatat 62O aaatatataa atatatataa atatatataa atatataaat atatataaat attatatalaat 68O atataaatat ataaatatat ataaatatat ataaatatat aaatatataa atataaatat 74 O ataaatatat ataaatatat ataaatatat aaatatatat aaatatatat aaatatatat 8OO aaatatatat aaatatataa atatatataa atatatatat aaatatatat aaatatataa. 86 O atatataaat atataaaaat atataacaat atataaatat atataaaaat attataacaat 92 O atataaatat aaatatatat aaaaatatat aacaatatat aaatataaat attatatalaat 98 O atataaatat aaatataaaa aatatatata aatatataaa tatatataaa tatataaatg 2O4. O tataaatata tataaaaata tataacaata tataaatata taalatatata acaatatata 21OO aatatataala aatatataac alatatatalaa tataaatata tataaaaata tataacaata 216 O tataaatata aatatatata taalatatata aatataaata taaaaaatat attataaatat 222 O ataaatatat atataaatat atataaatat ataaatgitat aaatatatat aaatatataa 228O atatataaaa atatataaat atatataaat atatataaat atataaatat aaatatataa. 234 O atatatataa atatataaat ataaatatat aaa.catatat aaatatatat aaataaaCat 24 OO atataaagat atataaagat ataaagatat ataaatatat aaatatataa agatatataa 246 O atatataaag atatataaat atataaagat atataaatat ataaagatat ataaatataa 252O tatataaata tataaagata tataaatata atataaaaat atataaatat at attaaaaa 2580 tatata cata taaatatatg tatatttittt tdagatgggg tot cqct cag ccacccacgc 264 O tggagtgcag tdcacgagc ticggct cact gcaac cactg. it ct ct cq99 t c caa.gcaatt 27 OO

US 8,252,917 B2 73 74 - Continued Ctggcagagt ggt cattcta acagoagt ca cagtagagta gaaataagac to agtatat 12 O Ctaaggcaaa aagctgaggt tt Caggagct talaggtaaa gaggaagaala gaaatgggaa 18O tgggaattgg aaagacaaat atcgttalaga gaaaattgct tittaggagag gggaaagaat 24 O citatgtgtac ttaagacitat ggaat caatc ccatttaagc tigggaaact a gttt catata 3OO taactaataa attittattta cagaatat ct atttacctga tictaggcttic aagccaaagg 360 gactgtgtga aaaaccatca gttctgtcat attcc taaaa aaaaattaaa aagttaaaaa 42O taaataaata ataaaact to ttittctitt ca aaataatcaa gotgct tatt cacatccatt 48O c caatttggg gaaatactta ttitt.cctato attagcgaag agaaaagtaa cittgcatttic 54 O aattcaagtt gatacatgtc acttittaaga gqt caactaa tatttgctag titgagctaac 6OO catataggct ttaaatact t t catagtaga aagaaaatga aaatcattag tdaactgtat 660 aaaatagatc at actttittgaaagaatcag actgaagttt ccgaaaaaaa gaagtaagct 72 O t caatgaaaa gotiaagtgaa tittagcattt acticago atc tactatogac tta acaccita 78O acagtagata atctgaaggc aaa catattt gtatagggac to agaatga tagatgataa 84 O atat catc to ttctatttga atgaatattt tttcaaatct ttcacacaca gtggtttgct 9 OO atggaaagat ttgtag taca ttaaacaa at Ctgaagatgg agittagaaag Cttaggctat 96.O gttittgagca caa.catataa titt citctgtg attgtttctt catctittcaa atgaggittac O2O tgttgaagatt aaatgagata act aaatgat gataaaataa totaatctta gcago accitt O8O atttaatctg togcaacaact citctgaagtgagtagggctic agctt cagtic acttct c togc 14 O cattt attaa ctaagatagt ttggaaagtt acc Catct ct t cagotgtaa aatgatgagg 2OO at cataccta ttt tatgggg ctgcttittag gtacaaatat acaggcaa.gc actttgttaa 26 O tact aaag.ca ttacac caat tagttt tact cittitt coatt cacacatgaa attaatgtaa 32O t cagaattct gtagattacc taaatcttct gttaa.cacgt gatatgcagt totaggittaaa 38O tgtcagttga gttaccaaag cacatacata ct caccaccc tat coaaatc tacaa.gc.ctic 44 O c cagtttgtc. tt cact attt toggittaaatt aatatgaatt cctagatgaa aattt cactg SOO atccaaatga aataaaaaat at attacaaa act cacacct gtaatct caa cattttggga 560 ggccaaggca ggtagat cac ttgaggc.cag gagttcaaga C cagcctgat Caacatggtg 62O aaac cctdtc. tctactaaaa atacaaaaat tagcc aggtg toggtggcatg togcct gtagt 68O Cctacctact cqggaggctg aggcacaaga atcgcttgaa ttgggaggit ggaggttgca 74 O gtgacctgag atcgtgccac to actic cag cct aggcaac agagtgagat catgtgt cat 8OO atatatatat atatatatat atatatatat atatatatac acacacacac acatatatat 86 O atacacatat atatacgitat atatatatat gtatatatat acatatatat a catatatat 92 O atatacgitat atatatacgt. atatatatat caatgtaaat tatttgggaa atttgg tatg 98 O aatagt ctitc cct gtgaaca cagat cataa aat catatat caa.gcagaca aataagtagt 2O4. O agt cactitat atgcttatac ttgta actta aagtaaaaga attacaaaag catatgacaa 21OO agactaattt taagatat co taatttaa at tdttittctaa aagtgtgitat accattttac 216 O citat catatgaataatttag aaa catgttt ataaaattaa tdtccaaatc cattcaaaag 222 O ttttgtaatg cagat caccc acaacaacaa agaatcc tag cct attaaaa aagcaac acc 228O acct acatat aatgaaat at tag cagdatc tatgtaacca aagttacaca gtgaatttgg 234 O gccatccaac actittgagca aagtgttgaa titcatcaaat gaatgtgtaa totatt tactt 24 OO actaatgcca atacactitta agg taatctt aagtagaaga gatagagttt agaattittitt 246 O US 8,252,917 B2 75 76 - Continued aaattitat ct cittgttgtaa agcaatagac ttgaataaat aaattagaag aat cagt cat 252O t caa.gccacc agagtatttg atcgagattt cacaaactict aactittctga tacccattct 2580

Cccaaaaacg ttalacct co tt.cgatagg aacaa.cccac to agggatg titt Ctctgg 264 O aaaaaggaaa titt cittittgc attggittt ca gacctaactg gttacaagaa aaaccaaagg 27 OO c cattgcaca atgctgaagt acttittitt ca aatttaaaat ttgaaagttg ttcttaaaat 276 O citat cattta ttittaaaata cqgatgaatg agaaag.cata gatttgataa agtgaattct 282O tittctgcaat ctacagacac titccaaaaat cactacagac act acagaca ctacagaaaa 288O t cataaataa acaagtgcta gitat caat at ttttaccaaa aaatggcatt cittagaattit 294 O tittataggct agaaggtttg tacaaactaa totgccacgg attittaaaat atgagtgaat 3 OOO aaattatatt gcaaaaaaaa totaggittaca gagaactggc aaggaag act cittatgtaaa 3 O 6 O acacagaaaa catacaaaac gitatttittaa gacaaataaa alacagaactt gtacct caga 312 O tgat actgga gattgttgttg acat attagc attat cactg. tcttgctaaa acataaaaat 318O aaaaagatgg aagatgaaat tacaatacaa atgatgattit aaa catataa aaggaaaata 324 O aaaattgttctgaccalacta ctaaaggaag acctact aaa gatatgc cat coagcacatt 33 OO gccact ctac atgtggtctg taalaccagca gCatagggat cct ctagot a gagt 33.54

SEQ ID NO 28 LENGTH: 677 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) . . (677) OTHER INFORMATION: MAR of chromosome 1 genomic contig; 128O3267. 128O3943

SEQUENCE: 28 ttatatagta tatataatag tatatatt at at agtataca taatagtata tattatatag 6 O tata cataat agtatatatt ataatataca taattgtata tat catatag tatacattat 12 O agtatatat catatagtata cattatagta tatat catat agtatacatt atatagtata 18O tat catatag tatacatt at agtatatat catatagtata cattatagta tatat catat 24 O agtatacgta at agtatata t catatagta tacgtaatag tatatat cat at agtatacg 3OO taatagtata tat catatag tatacgtaat agtatatat catatagtata cqtaatagta 360 tatat catat agtatacgta at agtatata t catatagta tacgitaatag tatatat cat 42O at agtatacg taatagtata tat catatag tatacgitaat agtatatat catatagtata 48O tattatatag tatatat cat at agtatata ttatatagta tatat catat agtatatatt 54 O atatagtata tat catatag tatatatt at at agtatata t catatagta tatataatag 6OO tatatat cat at agtatata taatagtata tat catatag tatatatact atactatatt 660 atatatagta tacataa 677

SEQ ID NO 29 LENGTH: 332 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (332) OTHER INFORMATION: MAR of chromosome 1 genomic contig; 13 O79684. 13O8 OO15

SEQUENCE: 29 US 8,252,917 B2 77 78 - Continued ttaattatat tatatatatt atata attat at attaatat at attaatta tattatatat 6 O attatata at tatatattaa tatat attaa titat attata tat attatat aattatatat 12 O taatatatat taattatatt atatatatta tataattata tattaatata tattaattat 18O attatatata ttatat atta taattatata ttatataatt ataatatata tottaatata 24 O atatatataa ttaatatata attaaaacta tittaattata totat attat atataatatg 3OO tatt atttala attaataaata tatt attitat at 332

SEQ ID NO 3 O LENGTH: 479 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (479) OTHER INFORMATION: MAr of chromosome 1 genomic contig; 15 682296.15682774

SEQUENCE: 3 O acaagtacat atatatatag tatatatata caagtacata tatatagitat atatatatat 6 O acaagtacat atatatagta tatatatata tacaagtaca tatatatagt atatatatat 12 O acaagtacat atatatagta tatatatata caagtacata tatatatagt atatatatat 18O acaagtacat atatatagta tatatatata caagtacata tatatatagt atatatatat 24 O acaagtacat atatatagta tatatatata caagtacata tatatatagt atatatatat 3OO acaagtacat atatatatag tatatatata tacaagtaca tatatatata gtatatatat 360 atacaagtac atatatatag tatatataca tatatacaag tacatatata tagtgtatat 42O atatatatac aagtacatat atatacttgt attagtatat atatatatat atacaagta 479

SEQ ID NO 31 LENGTH: 531 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) . . (531) OTHER INFORMATION: MAr of chromosome 1 genomic contig; 15694611 . . 1569,5141

SEQUENCE: 31 tataatatat ataatacata atagatatat tat attatat aatagatata taattataaa 6 O cataataata tataatgaat ataatataaa ataaatataa taaaatatat aatatat cita 12 O titatgt atta tat attatat atgtttatat ataatataat tatatatgtt tatatataat 18O ataattatat atgtttatat ataatata at tatat attat at attataga tataatatat 24 O aatatact at at attataga tataatatat aatatact at at attataga tataatatat 3OO aatatact at at attataga tataatatat aatatact at at attataga tataatatat 360 aatatact at at attataga tataatatat aatatatatt atatattata gatataatat 42O ataatatatt at at attata t ctatatata atatattgta tattatatat aatatattgt 48O at attatata taatatattg tat attatat ataatatatt gtatattata t 531

SEQ ID NO 32 LENGTH: 378 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (378) US 8,252,917 B2 79 - Continued <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 886276. 8866.53

<4 OOs, SEQUENCE: 32 titat attata tat Cttacat aaattatata tatat attac attalaattata tacaatataa. 6 O attatataca atataattta tatataaaat atalaattata taaataattit attatataaaa. 12 O tataaattat ataaataatt tatatataaa atataaatta totataaaat ttatatataa 18O aatataaatt gtgtataaaa ttatatataa aatataaatt gtgtataaaa tittatatata 24 O aaatataaat tatatata at ttatatatta taatataaat tatatataat attatatoata 3OO aaatataaat tatatataat atatat cata agatataaat tatatataat atatat cata 360 agatataaaa tatataat 378

<210s, SEQ ID NO 33 &211s LENGTH: 595 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (595) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 3.326732. 3327326

<4 OOs, SEQUENCE: 33 aaaatatata aatatatata aaaatatata aaaatatata aatatatata aaaatatata 6 O aatatatata aatatatata aaaatatata aatatatata aatatatata aaatatataa. 12 O atatatataa aatatatata aatatatata aatatatata aaaatataaa tatatataaa. 18O aatataaata tatataaata tatataaaaa tataaatata tataaatata tataaatata 24 O taalatatata taalatatata taalatatata aatatatata aatatatata aatatatata 3OO aatatatalaa tatataaaaa tatatatalaa tatataaata tatataaata tataaatata 360 taaaaatata tataaatata taalatatata taalatatata taalatatata tataaatata tataaatata tatatatata aatatatata aatatatata taalatatata taalatatata tatatatata taalatatata taalatatata taalatatata tataaatata tataaatata 54 O tataaatata tatataaata tatataaata tatatataaa tatatataaa tatat 595

<210s, SEQ ID NO 34 &211s LENGTH: 738 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (738) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 44 85.716 . . 4486 453

<4 OOs, SEQUENCE: 34 ataatagata atatat atta tatgatagat atataatata ttatataata tataatatat 6 O tatata total t catataata tatataatat attaatatatt atata totat catataatat 12 O aatatatata atatataata tatat catat tatattgtat ataatatata t catattata 18O ttgtatataa tatatat cat attatattgt atataatata tat catatta tattgtatat 24 O aatatatat c at attatatt gtatataata tatat catat tatattgtat ataatatata 3OO t cat attata ttgtatataa tatatat cat attatattgt atataatata tat catatta 360 tattgtatat aatatatat c at attatatt gtatataata tatat catat tatattgtat ataatatata t catattata ttgtatataa tatatat cat attatattgt atataatata US 8,252,917 B2 81 82 - Continued tat cat atta tattgtatat aatatatat catatatt atc tattatattg tatataatat 54 O at attatata ttatct atta tattgtatat aatatatatt atatattatic tattatattg 6OO tatataatat at attatata ttatct atta tattgtatat aatatataat aaatatagta 660 tatataatag ataatatata gtatatatga tat attatat atactatata ttatatat ca 72 O tatatact at a tactata 738

SEO ID NO 35 LENGTH: 386 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (386) OTHER INFORMATION: MAR of chromosome 1 genomic contig; 5423 O67. 54.23 452

SEQUENCE: 35 taalatatata aaaatatata taaaaatata aaaat attta tataaatata taaaaatatt 6 O tatataaata tataaatata taalatatata tittatataaa tatataaata tataaatata 12 O taalatatata tittatatalaa tatataaata tatatttata taalatatata aatatatata 18O aaatatataa atatatattt atataaatat ataaatatat ataaaatata taalatatata 24 O tattittatat aaatatataa atatatataa aatatataaa tatatatatt ttatatalaat 3OO atataaatat atataaaata tataaatata tatattittat a tatttatat attataaatac 360 atatatttca tatato acat atatga 386

SEQ ID NO 36 LENGTH 584 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) . . (584) OTHER INFORMATION: MAR of chromosome 1 genomic contig; 58 O5559 .58 O6142

SEQUENCE: 36 taaatattitt taaaatatat at attittata atatataatt tat attataa totgtacata 6 O atatat atta taatataata tatataatac tdtat attat attatatata ttataatata 12 O tatt attata tatt at atta tatataatat aatatatatt attaatatatt at attataca 18O tattataatg tattataata tat attatat tatat attat aatatatatt at attatata 24 O ttataatata tatt at atta tat attataa tatat attat attat attat at at attata 3OO atacat atta taatacat at tatataatat attataatat g tattataat acat attata 360 taatat atta taatat atta tatataataa tat attataa taCat attat attataatata 42O tattatgt at attatatata atatatatta caatgtatat tatgtatatt atatatatta 48O tatat catat aatatatatt atatataata tdatatataa tatat attat ataatatatt 54 O atatgatata tataatatgt attacatgta atatatat ca taat 584

SEO ID NO 37 LENGTH: 345 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (345) OTHER INFORMATION: MAR of chromosome 1 genomic contig; US 8,252,917 B2 83 - Continued

108O2644. ... 108 O2988

<4 OO > SEQUENCE: 37 tgtatatata tactatatat atactatata tatagtgitat atatatact a tatatatact 6 O atatatatag togtatatata tactatatat at agtatata gtatatatag taatatatat 12 O atatagtata tatata cact atatatagta tatatagitat atatatattg td tatatagt 18O atatatatag togtatatata gtatatatat attgtatata tagtatatat attgttgtata 24 O tatagtatat atatagtata tatagtatat at agtatata tatagtatat atatactata 3OO tatatagitat atatatattg tatatatata ctatatatat agtat 345

<210s, SEQ ID NO 38 &211s LENGTH: 474 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (474) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1349 6468. 1349 6941

<4 OOs, SEQUENCE: 38 at attatata taatataatt at atctataa ttatatatta tatataatat aattatatat 6 O

Ctataattat at attatata taatatatat tatatataat atata attat attata attta 12 O tataatataa tatataatat attaattatat attaattatat aatataatat attaatatata 18O attatatata atttatataa tataatatat aatatataat tatatatatt tatataatat 24 O aattatatat aatatata at tatatata at ttatataata taattatata taatatataa. 3OO ttatatataa tittatataat attaattatat attaattatat attatatata atttatataa. 360 tata attata tataatatat aattatatat aatatataat tatatataat tatatataat 42O atataattat atataattta tataatataa ttatatatta tatat attat at at 474

<210s, SEQ ID NO 39 &211s LENGTH: 483 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (483) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 25 O9163. 25 O964.5

<4 OOs, SEQUENCE: 39 caaaatacat aatatataat agt attatat aatag tatgt at agittataa tatatagitat 6 O aattacaata tatgatatgg tittatatatt atatatagta taatataata taa cataata 12 O ctattataat atataaacta tataatatat act attataa tatatgaact attataatat 18O ataaactata tataatatat aatatgtact attataatat ataaactatt ataatataat 24 O atataaacta ttataataca taalactatta taatatatat aatac tatgt atacatatat 3OO tacattatgt acatactaca ttatgtatta totatgtata tatacacaaa atacataata 360 tataatagta ttatataata gtatatatag titataatata tagtataatt acaatatata 42O atatggittta tat attatat at agtataat acaatataac ataatact at tatatataaa 48O

Cta 483

<210s, SEQ ID NO 4 O &211s LENGTH: 641 TYPE: DNA US 8,252,917 B2 85 - Continued <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (641) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 2776.349 . .2776989

<4 OOs, SEQUENCE: 4 O tgttatatat atataa cata gat attatat atacatgtta tatatataac atagatatta 6 O tatata catgttatatatat aacatagata ttatatatat aacatagata ttatatatac 12 O atgttatata taacatagat attatatata catgttatat ataacagata ttatatatac 18O atgttatata taacatagat attatatatg tatgttatat ataacataga tattatatat 24 O gtttatataa tatataac at atgtttaa.ca tatataatat ataacatgtt tatataatat 3OO ataa cataat tatatgttat atatgatata aaa catatat attatatacg ttatatgtaa 360 tatata acat at attgtata cqttatatgt aatatataac atatattgta tacgittatat 42O gtaatatata acatatattg tatacgittat atgtaatata taa catatat togcatacgtt 48O atatgtaata tataac at at attgtatacg ttatatgtaa tatgtaa.cat at attgtata 54 O cgittatatgt aatatgtaat atataataca tataa catgt atatataa.ca tatatgtata 6OO taacatatat ataacatata taa catatat gttat attat a 641

<210s, SEQ ID NO 41 &211s LENGTH: 745 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (745) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 28587O3 . .2859.447

<4 OOs, SEQUENCE: 41 at atttatat atgtaataat atataatata tittatatgta tttgtatatg taataatata 6 O tatataataa aatatgtaat aatatataat at atttatat ataaatatat tat attatat 12 O at at attatt at atttataa tataatatat atttatatta tat attataa atatat atta 18O tataatatat attataaata tat attatat aatatatatt ataaatatat attat attat 24 O at attatalaa tatatatt at attaatatata ttataaatat at attatata atatat atta 3OO taaatatata ttatattitat aatatatatt tttgtatatt atatattata tattataaat 360 attatt at at ttataatata ttatatattt tatatataat atatgatata tattataaat 42O at at Cittata aatatatata tittatatata tat attataa atatataaat attaaatatat 48O aatataatat aatataatat aataaatata atatataata tatataatat attaataaata 54 O taataaatat aaatatat ca tataaatata aatataaata taalatatat C atataaatat 6OO atatatttat atgatatatt at agtatata taalatatatt tatat attat aaaat attta 660 tataatatat aattataata tatttatata tataa attala Ctaatatata taalactaata 72 O taatatataa totaataata tagta 74.

<210s, SEQ ID NO 42 &211s LENGTH: 307 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (307) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 94.5522. .945828 US 8,252,917 B2 87 - Continued

<4 OOs, SEQUENCE: 42 catatataat at at attacc tatgttatat aggtoatata taa cataaat at attacata 6 O tatgtaatat at attaaata taaatatata acatatatgt gtaactatat atgtaaatat 12 O gtacatatac atatatgtaa atatataata tatatttaca ttatattata taatatatat 18O ttacattata tatttatata taCattatat a tatttacat tataaatatt tatataatat 24 O at atttacat tat attacat tatataaaat acaatatatt acattataat a Cattataac agataaa 3. Of

<210s, SEQ ID NO 43 &211s LENGTH: 357 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (357) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 34 O2743. 34 O3 O99

<4 OOs, SEQUENCE: 43 aat attatat taalatataat at attaat at ttaatatatt taatataata ttaaataaat 6 O at attataaa taaattataa tatataaata tat attatgt atttatgtat aatatataaa 12 O aattatatat aatatatata tttittatalaa tatataaata tataataaat aaatat atta 18O aataaataat aatatattaa at attaatat attaaatatt atatattaaa tataatatgt 24 O aatatgaaat at attaaata ttatatatta aatataatat ataatgtgaa atatattaaa 3OO tattatatat taaatataat atataatatgaaatatatta aat attatat attaaat 357

<210s, SEQ ID NO 44 &211s LENGTH: 323 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (323) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 34 8583 O. 3486152

<4 OOs, SEQUENCE: 44 at atttatag actatatatt tatatattta gtg tatttgt atactatata tittatatagt 6 O tagtatattt gtatactata tatttatata tittagtatat ttgtatact a tatatttata 12 O tatttagaat atttgtatac tatatattta tatatttagt at atttgtat actatatatt 18O tagtatattt gtatactata tatttatata tittagtatat ttgtatact a tatatttata 24 O tatttagtat atttatatac tatatactta tatatttagt at atttatat actatatact 3OO tatatattta gtatattitat ata 323

<210s, SEQ ID NO 45 &211s LENGTH: 498 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (498) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 35 4.833 6. 35.488.33

<4 OOs, SEQUENCE: 45 aatt attact at attgttaa tataattatt atataatata atataattat at cact atta 6 O US 8,252,917 B2 89 - Continued titat attata gtattaatat aatagtgitat aac attaata taatatagta ttaatataat 12 O agcgtataac attaatataa tatag tatta atataatago gtata acatt aatataatat 18O agtattaata taatagtgta tattaatata atatag tatt aatatataat attaatataa 24 O tatat caata taatagtata taatataata taatatat ca atataatagt atataatata 3OO atataatata t caatataat agtatataat ataatataat at atcaatat aatagtatat 360 aat attaata taatataata t caatataat agtatataat attaatatat taatataata 42O gtatataata ttaatgtaat ataat attaa cataatgitat ataataatat aatagtatat 48O aatactaata taatataa. 498

SEQ ID NO 46 LENGTH: 4 OO TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (4 OO) OTHER INFORMATION: MAr of chromosome 1 genomic contig; 45.95.109 . . 45.955.08

SEQUENCE: 46 aaatat atta tattatatat tatatatt at t caatatact ataatatata ttatatatgt 6 O ttaatacaat atataatatt tacatatatt cocatttatt tatataa.cat at attatatg 12 O at attatata t tactic cata taatataata tattata cat aatat attac toagtataat 18O acataatata tataatatat tact.cggitat aatatataat attatatgtt atgcaatata 24 O atatataata ttatatataa taCattatto: aatataatat attaat attat attataataca 3OO ttattoaiata taatatataa taCactatto: aatataatat acaat attat attataataca 360 ttattoaiata taatatatat tatataatat atatattitat 4 OO

SEO ID NO 47 LENGTH: 4 O3 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (403) OTHER INFORMATION: MAr of chromosome 1 genomic contig; f2O5509 f2O5911

SEQUENCE: 47 agtatatata totgtatata tatgagtata tatatgtgta tatatatgag tatatatatg 6 O tgtatatata tagtatata tatgtgtata tatatgagta tatatatgtg tatatatatg 12 O agtatatata totgtatata tatgagtata tatatgtgta tatatatgag tatatatatg 18O tgtatatata tagtatata tatgtgtata tatgagtata tatatgtgta tatatgagta 24 O tatatatatg togtatatatg tdagtatata tatgtgtata tatatgagta tatatgtgta 3OO tatatatgag tatacatatg togtatatata tdagcatata totgtatata tatgagtata 360 tatatgtgta tatatatgag tatatatgtg tatatatatgagt 4 O3

SEQ ID NO 48 LENGTH: 309 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (309) OTHER INFORMATION: MAR of chromosome 1 genomic contig; 75 Of28 Ofs O7588 US 8,252,917 B2 91 - Continued

<4 OOs, SEQUENCE: 48 tataaaatat at attattta tat attatat ataaaatata tattatatta tat attatag 6 O atataataaa taaataatat attaatatatt atata attat ttata Catala ttatatataa. 12 O ttatatgtaa ttgtacaatt atatataatt atatacaatt atacacataa ttatatacaa 18O ttatacaatt atataCatala ttatatatat aatatacata attatatatt aattatacala 24 O ttatatacat aattatatat aattatacaa ttatata cat aattatt atg tat attatat 3OO tatataata 309

<210s, SEQ ID NO 49 &211s LENGTH: 516 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (516) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 35.81085 . .35816 OO

<4 OOs, SEQUENCE: 49 atatatatat atatatatat atttatatat atatatatta atatatatta tatataaaaa. 6 O tatataaaat ttatatatat aatttatata tataaaaata tataaaattit attatatataa. 12 O tittatatata taaaaatata taaaattitat atatataatt tatatatata aaaatatata 18O aaatttatat atataattta tatatataala aatatataala atttatatat attaatttata 24 O tatataaaaa tatataaaat ttatatatat aatttatata tataaaaata tataaaatett 3OO atatatataa tittatatata taaaatatat aaattatata tataattata tatataatat 360 aaaattatat atataattat atatataata taaaattata tatataatta tatatataat 42O ataaaattat atatatattg tatatatata aaatatacaa aatttatata tataaaatat 48O aaaatataca taaaaataaa tatatata at ttatat 516

<210s, SEQ ID NO 50 &211s LENGTH: 534 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (534) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 3O84851. 3O853.84

<4 OOs, SEQUENCE: 50 atataatata tatgactata tattittatat tatattotat ttcaataaaa tatttatatt 6 O ttattatata ttataatata taattatata totaataata tataatatat aatatatatt 12 O titat attata ttittatattt atttittatat tittat attat attitt attat at at attata 18O atatataatt atatatgcaa taatatatta tat attataa tatataatta tatatgcaat 24 O aatatatt at at attataat atata attat atatgcaata atatattata gattataata 3OO tataattata tatgcaataa tat attatat attatatatt agataatata ttaatatata 360 ttatalacata taatatataa Catataatat attaatatatt atctaatata taatataa.ca 42O tataatatat aatatatt at ataatatatt attacatata taatatattg taatatataa 48O tattacatat atcttcaaaa agagittatgt gtatataata catatatata ccat 534

<210s, SEQ ID NO 51 LENGTH: 583 US 8,252,917 B2 93 - Continued

&212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (583) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 16 OO87. 16O 669

<4 OOs, SEQUENCE: 51 tatttatata aaatatataa aatatatt at atataaatat attatatata atatattitat 6 O at attataca atatattitat at attatata taatatattt tatataatat acatalatata 12 O ttittatatat tatatataat at attittata tataatgtac aatatattitt at at attata 18O tataatatat tittatatata Citatacaata tattittatat attatatatt ttatatatat 24 O ttitt catgta acatatatat tittatatata atatatatac catatataat at attittata 3OO tataatatat ataccatata taatatattt tatatataat atgtatat ca tatatagitat 360 attittatata taataggitat accatatata atatattitta tatataatag gtata acata 42O tataatatat tittatatata atatgtatac catatataat at attittata tattatagat 48O accatatgta atatactitta tatataatat agataccata totaatatac tittatatata 54 O atatagatac catatgtaat at actittata tataatatag ata 583

<210s, SEQ ID NO 52 &211s LENGTH: 314 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221 > NAME/KEY: misc binding <222s. LOCATION: (1) ... (314) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 435.0424. 4.350737

<4 OOs, SEQUENCE: 52 tatgtgtata taaatatatg tatatatgtg tatataaata tatataaata tatgtatata 6 O tgtatatata catatattta tatataaata tatgcatata tittatatata aaatatatgc 12 O atatatgt at atatataaaa tatatacata tatgtatata tataaaatat atacatatat 18O gtatatatat aaaatatata catatatgta tatatataaa atatata cat atatgtatat 24 O atataaaata tatacatata totatatata taaaatatat acatatattt atatatataa 3OO aataccaagt citta 314

<210s, SEQ ID NO 53 &211s LENGTH: 828 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (828) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 8443267. 8444 O94

<4 OOs, SEQUENCE: 53 tattatataa ttatatatac tatataatta tataatatat agittatatag tatatataat 6 O atatataata tatactatag tatatataat atatataata tatactatag tatatataat 12 O atataattat atataatata tataatatag tatat attat atatatatta tatatatata 18O atatatatat aatatatata atatagtata tataatatat aattatatat aatatataat 24 O at agtatata taatatataa tatatatata attatatact ataatatata taatatataa 3OO ttatat atta tatactatag tatatatt at tatatataat agatataata tatataatta 360 US 8,252,917 B2 95 96 - Continued ttatataata tagtatatat aatatata at tatatataat agatataata taatataatt 42O atatataata tagtatatat aatatata at tat attatat tatatataat atata attat 48O aatatata at tat attatat aatatatata atatataatt at attatata attat attat 54 O ataatatata taatatataa titat attata taatatatat aatatataat tat attatat 6OO aatatatata atatataatt at attatata atatatataa tatataatta tattatataa. 660 tatatataat atataattat at attatata taatatagta tatataatat gtaattatat 72 O at catataat atataa catt gtatataata tataattaca tattatataa totatataat 78O atataattat atacattata taatatagta tataattata tattatgt 828

<210s, SEQ ID NO 54 &211s LENGTH: 573 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (573) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 87 O319 O. 87 O3762

<4 OOs, SEQUENCE: 54 tat attatat ataaaatata catataatat acctataata taCatataat attataatata 6 O tattatgtac atataatata catataatat atataatata taatgtacat ataatataca 12 O tataatatat gttatatatt atatataaaa tataggatat atataatata gaatatatat 18O actatattgt atatataaga tatataatat at agtatata tactatataa tatataatat 24 O at agtatata taatatataa tatagaatat atatacaata tataatatag aatataggat 3OO atatatagaa tatacatata taatatgt at at attatata ttatattata tattatataa 360 aaatatataa tatataatat aaaaatatat tatat attat attaatataala at at attata 42O tattatatat tatataatat aaaatatatt at at attata tattatatat aaaatatatt 48O at at attata tattatatat aaaaatatat tatat attat at attatata taaaaatata 54 O ttatat atta tatataaaaa tatatatt at tac sf3

<210s, SEQ ID NO 55 &211s LENGTH: 597 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (597) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 8819. Of 6.8819 672

<4 OO > SEQUENCE: 55 acat at Ctta tatataaaat atataaatat acacatattt tatatataat at at attata 6 O tatatgaaat atacacat at ttittatatat ataatatata tattatatat aatatatgca 12 O tat attatat ataaaatata tat attatat ataaaatatg catat attat atataatata 18O tataatataa aatatataat at at attata tattatatat aatatatatt attatataata 24 O

Catatatata atatataata tatataaaat attaatatata tattatataa tatatatata 3OO aatatatata atatatatat aatatatata ttatatataa aatatatatt atatgtaaaa 360 tatataatat atataatata tat attatat gtaaaatata tattatatat aaaatatata 42O atatataaaa tatatatt at atataaaata tataatatat aaaatatata atatatataa. 48O aatatataat atatataaat at at attata tataaaatat attaatatata taalatatata 54 O US 8,252,917 B2 97 - Continued ttatatataa aatatataat atatataaat at at attata tataaaatat at attat 597

<210s, SEQ ID NO 56 &211s LENGTH: 646 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (646) <223> OTHER INFORMATION: MAr of chromosome 1 genomic contig; 759 619 . . f6O264

<4 OOs, SEQUENCE: 56 taatatatat aatatatatt atataataat atataatata tattatatta taatatataa. 6 O tat attatat aataatatat attatataat atataataat atatataata Cat attattt 12 O aataatatat aatatatatt atataataat atataatata tattatataa taatataCat 18O tat attatat aatatataat atatataata tat attatat aataatatat aatatatatt 24 O atagaatgat at attagata ttatataatt atatatataa tattatatat tatataataa 3OO tatataatat at attatata attatatata taat attata tattatataa ttatatataa. 360 tat attatat aattatatat attaat attat at attatata attatatata atatat atta 42O tata attata tatataatac tatatatt at attaattatat attaatactat at attatata 48O atttatataa ttatatatat tatatatt at attaattatat at attatata ttatataata 54 O acatatatat tatatatt at attaataaCat at at attata tattatataa taCatatata 6OO ttatat atta tataataCat tattatataa tatataatat at atta 646

<210s, SEQ ID NO 57 &211s LENGTH: 752 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (752) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 122671.O. 12274 61

<4 OO > SEQUENCE: 57 taalacatata tataaatata tataaatata tatataaata tatataaata tataaatata 6 O taaatatata tdaatatata aatatatata aatatatatgaatatataaa tatatatata 12 O aatatatata aatatatata taalatatata taalatatata taalatatata taalatatata 18O taaataaata tataaatata tataaatata taalatatata tataaatatg taaataaata 24 O tatataaata tataaatata tataaatata tataaatata tatagaaata tatatagaaa 3OO tatatataaa tatatataga aatatataga aatatatata gaaatatata taaatatata 360 taaatataga aatatatata aatatatata aatatatata gaaatatata atatatataa 42O atatatataa atatataaat atatatataa atatatatat aaatatatat aaatatatat 48O aaatatatat aaatatatat aaatatatat attaatatat aaatctatat taatatatat 54 O taatatataa atctat atta atatatatta atatatatat taatatatat taatatataa. 6OO atatatatat taatatataa atatatataa atatatatgt aaatatatat ataaatatat 660 ataaatatat atataaatac atataaatat atatataaat atatataaat attatatataa. 72 O atatatataa atatatatat aaatatatat aa 7s2

<210s, SEQ ID NO 58 &211s LENGTH: 3 OO TYPE: DNA US 8,252,917 B2 99 100 - Continued <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (3 OO) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1119 O49. 1119348

<4 OOs, SEQUENCE: 58 taatatacat tittatataat atatgtaata tatattittat atatatgtaa tatatattitt 6 O atataatata totaatatat attittatata tatgtaatat at attittata taatatatgt 12 O aatatatatt ttatataata tatgtaatat at attittata taatatatgt aatatatatt 18O ttatataata tatgtaatat at attittata taatatatgt aatatatatt ttatataata 24 O tatgtaatat at attittata taatatatgt aatatatatt ttatatatat gtaatacata 3OO

<210s, SEQ ID NO 59 &211s LENGTH: 617 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (617) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 36 O3 613. 36O4229

<4 OO > SEQUENCE: 59 aaaatataat atatataata tataatatat attaatatatt atatataaaa tatataatat 6 O ataatatata taataaaata tacataatat ataatgtata ataaaatata cataatatat 12 O aatatataat aaatatataa tatataatat attaataaaat atataatata taatatataa. 18O taaaatatat aatatatt at atataataala atatataata tattatatat aataaaatat 24 O ataatatatt atatataata aaatatataa tat attatat attaataaaat attataatata 3OO ttatatataa taaaatatat aatatatt at atataataala atatataata tataatatat 360 aataatatat attaatatata atatatataa taaaatatat attaatatata atatatataa. 42O taaaatatat aatatataat atatataata aaatatatat gatatataat atatataata 48O aaatatatga tatataatat atataataaa atatataata tataatataa tatataatat 54 O atatactaala aaatatataa tatataataa aaaatatata atatataata tatataatat 6OO ataataaaat atatata 617

<210s, SEQ ID NO 60 &211s LENGTH: 674 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (674) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 25924 6 O. .2593 133

<4 OOs, SEQUENCE: 60 taagcttata tatatatata agcttatata tatatatata agcttatata tatatagaaa 6 O gcttatatat atatagaaag cittatatata taagaagctt atatataaaa gct tatgtat 12 O aaatatatat aaatatattt atttatgctt atagatacat atataaatat atttattitat 18O atttatatat aaa.catatat ttatatatat ttatataata tittatttatt attataaataa. 24 O atatataata aataataaat atatataata tatttattgt attatttata taaatttatt 3OO aatataatat attaataaaat aataattata taalatatata aatat Ctata aatatatata 360 aatatatata at atctataa atatatataa atataaatat atata at at C tataaatata 42O US 8,252,917 B2 101 102 - Continued gataaatata aatatatata at atctataa atatagataa atataaatat atataactat 48O atataaatat atatalactat atataaatat atatataaat atatata act attatatataa. 54 O

Ctatatatat aaatatatat alactatatat ataaatatat atataaatat attatalactat 6OO atatataaat atatataact atatataaat atatatataa atatatatala Ctatatatat 660 aaatatatat attaa. 674

<210s, SEQ ID NO 61 &211s LENGTH: 1694 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1694) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 289.168 O. .2893.373

<4 OOs, SEQUENCE: 61 atatgtaata catatatt at atatgcatat atacatgcat atgtatatac atatattata 6 O tatgcatata tacatgcata totatataca tatataaagt atgattatat ataatatata 12 O catgtatatg tatata catg tatatatt at attatatatt atttatacat attattatgt 18O

Ctatatataa tataatatat acat attaat aatataatac attaatataat attaatatatt 24 O atataataca taatataata taatatatta tataataCat aatataatat aatat attat 3OO atgata cata atataatata at at attata tdata cataa tataatataa tacat attaa 360 taatat atta t tatt attaa tataatatat acat attaat atacatacat at at attata 42O ttatatataa tatacatata atataatatg taat attata tataatataa tacataatat 48O aatacatatt aataatatat tattaataag ataatatata totatictata atatatacat 54 O atatgtatat g tatgtatat attatagata tacatgttta tacatgtata tattatagat 6OO atatacatgt atatacatgt at at attata gatatataca totatatacg tatat attat 660 agatatacat gtatatatgt atatatatta tagatataat atatacaaga atataagaat 72 O atatataata taatatataa tacacataat acgtatatat tatatataca totatattat 78O atatgtacat atatacatgt at attatata tacatgtata ttatatatac atgcatatta 84 O tatatattitt tatatataat atc catgitat attatgtata tttgttgtata ttatatatac 9 OO atgitat atta tatata catg cat attatat at atttittat atataatatic catatatatt 96.O atgtatattt gtgitat atta tatatacaca tat attatat atacatggat attatatata O2O cacatatatt atatatacat at at attata tatacacata tattatatat acatgtatat O8O tatatataca cqtatatt at atatacacac gitat attata tatacacgta tattatatat 14 O acacacgitat attatatata cacgtatatt atatatacac acgtatatta tatatacacg 2OO tat attatat atacacacgt at attatata tacacgtata ttatatatac acacgtatat 26 O tatatataca cqtatatt at atatacacac gitat attata tatacacgta tattatatat 32O acacacgitat attatatata cacgtatatt atatatacac acgtatatta tatatacatg 38O tat attatat atacatgitat attatatata cacatgtata ttatatatac atgtatatta 44 O tatatacaca totatatt at atatgcatgt at attatata tacacatgta tattatatat SOO acacatgitat attatatata catatatatt atatata cat gtatatt atg tatacatata 560 tattatatat acatgitat at tatagataca tatat attaa atata catgt at attatgta 62O tacatatata ttaaatatac atgtatattg tatatacata tat attatat acatgtatat 68O US 8,252,917 B2 103 104 - Continued tacatgtata cata 1694

<210s, SEQ ID NO 62 &211s LENGTH: 587 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (587) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 34.3256 O. 34.33.146

<4 OOs, SEQUENCE: 62 gaattatata tatatagotgaattatatac atatataata tatacaatat at attatata 6 O tittatatatg atatatacaa tatat attac at attatata tacaatatat aatatataat 12 O atataatatt at at attata tattgtatat aatatatatt atata acatt atataatata 18O taat attata tattatatat tdtatataat at at attata taacattata taatatatac 24 O tattatatat tataatatat aatatataat aatatataat agtatatatt atatatattg 3OO tatatatt at atataaatat attaatatata atatatatta tataatatat attatataat 360 at at attatt at at attata tatttatata taatatatat tatatatatt at attittata 42O tataaatata taatatataa taatatataa tittaatatat attaatatata Calatatataa. 48O tatataatat attaatatat attaatatata caatatataa tatataatat attaatatata 54 O atataaatta ttatatataa tatatatt at atatagotga attatat 587

<210s, SEQ ID NO 63 &211s LENGTH: 313 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (313) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 38 OS392. .3805 704

<4 OOs, SEQUENCE: 63 tatataatat gtatat tatg taatattitta tatagoatat atgtatatta tatataatct 6 O tittatatata gtatataata totatatt at at attatata attatataat tatgt attat 12 O ataaaatata ttatataata tataattata tatttitttga aatatagatt atatataata 18O tatatggcag tdagctgaga tataatatat attatctata citatataata tat attatat 24 O at actictata ttatatatgt at at attata tataatatat acatatataa totgtatata 3OO ttatatataa taa 313

<210s, SEQ ID NO 64 &211s LENGTH: 349 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (349) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 4521.378 . . 4521726

<4 OOs, SEQUENCE: 64 ttatatacac tatataatat g tatttatat at acttatat acactatata totatttata 6 O tataattata tacactatat aatatgtatt tatatataat tatatacact atataatatg 12 O tatttatata taattatata cactatataa tatgt attta tatataattig tatacactat 18O ataatgtata tittatatata attgtataca citatataatg tatatt tatg tataattgta 24 O US 8,252,917 B2 105 106 - Continued tacactatat aatgtatatt tatgtata at tdtatacact atataatgta tatt tatgta 3OO taattgtata taccatataa totatattta totataattig tatatacca 349

<210s, SEQ ID NO 65 &211s LENGTH: 5OO &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (500) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 324 O166 . .324 O 665

<4 OOs, SEQUENCE: 65 ttaatatata atatat atta tatatttata tattaatata taatatatat ttatatataa. 6 O tatatatt at at atttatat tacatatatt tatatgttaa tatatattitt atatattitat 12 O at attittata tatttatata ttatatattt at at attata tittatatatt attatattitat 18O attatatatt tatatatt at atttatatat tatatattta tattatatat ttatatattg 24 O tatatttata ttatatattt atatattgta tittatatatt atatattitat atactatata 3OO tatttatata tattatatat ttatatatta tatatattta tatat attat a tatttatat 360 attatatata tittatatata ttatatattt at at attata tatatttata tat attatat 42O at atttatat at attatata tittatatata atatatatta tatattittat citatatattt 48O at at attaat at at attata SOO

<210s, SEQ ID NO 66 &211s LENGTH: 866 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (866) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 4O9429 .410294

<4 OOs, SEQUENCE: 66 atatatataa tat attatat at attatata ttatatatat aataCatata ttatatatat 6 O aatatataat acatat atta tatatatt at at attatata taatatataa taCatatatt 12 O atatataata tataatatat aatatatt at attaatataat tatataatta tataatataa. 18O tataatatat aat attatat aattatataa tatatataat tat attatat attataaata 24 O ttatataata tatatattac aaatatatat tatatatatt ataaat atta tataa.catat 3OO at attatata atatatataa tatataatat atataaaaat ataatatata agatatatat 360 aatatatgat atatatgata tataatatat gatatatatg atatatataa tatatgatat 42O atatgatata tatgatatat ataatatatg atatatatga tatatatgat atatgatata 48O tatgatatat gatatatatg atatatatga tatatgatat atatgatata tatgatatat 54 O gatatatatg atatatatga tatatgatat gatatatata atatatgata t datatatat 6OO aatatatgat atatatgata tatgatatgt aatatatatg atatattata tataatatat 660 aatatataca taatatataa tatataatat ataatatata taatatgtga tatatataat 72 O atatgatata tdatatatga tatatatt at ataatatata taatatatat tatatataat 78O at at attata taatatatat aatatatatt atatataata tataagatat aagatataat 84 O atatataata tataatatat attaata 866 US 8,252,917 B2 107 108 - Continued <210s, SEQ ID NO 67 &211s LENGTH: 335 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (335) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 614754 . . 615088

<4 OO > SEQUENCE: 67 acccaatata totgtatata totatgtata tatacatata catacataca tatatgtaca 6 O tacatatata catacataca tatatatgta catacatata tacatacata catatataca 12 O tata acatat atacacacat atatacagat atacatatat acatacatat atacatataa 18O

Catatataca taCatatata catatalacac atacatacat acatatatac atlacaa.cata 24 O tata cataca tatata cata totatacata catatatgta tacatatatg tatacatata 3OO tgtata cata tatgtatata tatattgtta tatat 335

<210s, SEQ ID NO 68 &211s LENGTH: 45.5 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (455) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 129952 O. 1299.974

<4 OOs, SEQUENCE: 68 ggatatatat attatt agitt gttatatt at tatat attat at at attatt atatataata 6 O tatt at at Ca tatatatt at tatatataat at attatato atatatatta ttatataata 12 O tatt at at Ca tatatatt at tatatataat at at attata tatatt atta tatataatat 18O at attatata tatt attatg tataatatat at attatata ttatttatat atatataaat 24 O tatataataa tatataatta attatacata tatacatata taagtataca tataatatat 3OO ttatatagta tatataaata tatatacaat at atttatat attatatatt atatataaat 360 atatacaata tatttatat catatattitta tatatgatac atataatata tattatatat 42O gatatataat atatat cata tatgatatat aac at 45.5

<210s, SEQ ID NO 69 &211s LENGTH: 404 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (404) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1970.778. 1971181

<4 OOs, SEQUENCE: 69 atatataata totataatat ataatatata t catatattgttctatotat attacatata 6 O atatgcatta tat attatat attgcatata atatgcatta tat attatat attgcatata 12 O atatgcatta tat attatat attgcatata atatgcatta tat attatat attgcatata 18O atatgcatta tat attatat attgcatata atatgcatta tat attatat attgcatata 24 O atatgcatta tat attatat aatatataca catataatat atataattta tatatattta 3OO tatatattta catttatt at a tatt tatta tatataaata tatttittata tattactitat 360 at attatata taatatatat aatatatata ttatatataa tata 4O4. US 8,252,917 B2 109 110 - Continued

<210s, SEQ ID NO 70 &211s LENGTH: 605 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (605) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 3562918 . .3563522

<4 OO > SEQUENCE: 7 O tatatatata aaataCatat at attatata tattatatat aataCatata ttatat atta 6 O tatataatac acgtatataa tatataatat ataatacata taatatatat gatatataat 12 O acatataata tatatgat at ataatacata tataatatat atgatatata atacatatat 18O aatatatatt atatataata catatataat at at attata tataataCat attataatata 24 O tattatatat aataCatata taatatatat tatatataat acatatataa tatat attat 3OO ataatacata tataatatat attatataat acatgtatat aatatatatt atatataata 360 catatatatt atataataca totatataat at at attata tataatacat at at attata 42O tattatatat taatatattt atataatagt aatatataat attaatatat tatatatatt 48O aat attatat attaatacata tattatatat aatataaata tatataatac attatataata 54 O cacatatt at atataataca tat attatat ataatatata tattatatat aatatatatg 6OO taata 605

<210s, SEQ ID NO 71 &211s LENGTH: 317 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (317) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 189743. 190059

<4 OOs, SEQUENCE: 71 tatttitttat atttatatat tatatatatt tittatatgta atatattata tataaaatta 6 O tataattitta CtaCatataa tatataaaat tatataattt tactaCatat aatatataaa. 12 O attatata at tttactatat attaatatata aaattatata attittatata taatatatat 18O tataatatat attatatgca atatatatta tat attatat tataatatat td tatattitt 24 O tgtatataaa atatataata tataatatat ttatagacaa taatatataa tataatatat 3OO aaaattittat atataaa. 317

<210s, SEQ ID NO 72 &211s LENGTH: 522 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (522) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 2291.11 . .229 632

<4 OOs, SEQUENCE: 72 gatatatata tittatatata taaaagatat at attattta tatataaaga tatatattta 6 O tatatataaa agatatatat tatttatata tataaaagat atatattitat atatatgata 12 O tatatt attt atatatataa aagatatata tittatatata t datatatat tatttatata 18O taaaagatat atataaaaga tatatatt at ttatatatat aaaagatata tatataaaag 24 O US 8,252,917 B2 111 112 - Continued atatat atta tittatatata taaatgatat at attattta tatataaaag atatatatta 3OO tittatatata aaagatatat attatttata tatataaaag atata catat aaaagatata 360 tatttatata taaaagatat atatatttat atataaaaga tacatatatt tatatatata 42O aaagatatat at atttitt at atataaaata tat attatat atataaaaga tatatataaa 48O tatatatat c ttittatatat aaaagatata tataaatata ta 522

<210s, SEQ ID NO 73 &211s LENGTH: 1110 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1110 <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1138O3 O. 11391-39

<4 OO > SEQUENCE: 73 tatgtatgta tacataatat attatatatg tat attatgt atacataata tattatatat 6 O gtatat tatg tatacataat at attatata totatatt at gtata cataa tat attatat 12 O attatatgta tattatgt at a cataatata ttatatatta tatgtatatt atgtatacat 18O aatatatt at at attatatg tat attatgt atacataata tattatatat tatatgtata 24 O titatgtatac ataatatatt at at attata totat attat gtata cataa tat attatat 3OO attatatgta tattatgt at a cataatata ttatatatta tatgtatatt atgtatacat 360 aatatatt at at attatatg tat attatgt atacataata tattatatat tatatgtata 42O titatgtatac ataatatatt at at attata totat attat gtata cataa tat attatat 48O attatatgta tattatgt at a cataatatt tatat attat atgtatatta totatacata 54 O at at attata tattatatgt at attatgta tacataatat gtacacataa tatttatata 6OO ttatatgt at attatgtata cataatattt at at attata totat attat gtata cataa 660 tatttatata ttatatgt at attatgtata cataatattt atatattata totatattat 72 O gtatacataa tatttatata ttatatgt at attatgtata cataatatat tatat attat 78O atgitat atta totata cata at at attata tattatatgt at attatgta tacataatat 84 O attatatatt atatgtatat tatgtataca taatattitat at attatatg tat attatgt 9 OO atacataata tattatatat tatatgtata t tatgtatac ataatatatt at at attata 96.O tgtatatt at gtatacataa tat attatat attatatatg tat attatgt atacataata 1 O2O tattatatat tatatatgta tattatgt at tat attatat attatgtata ttatagatta 108 O tgitatgcata cataatatgt attgtatatt 111 O

<210s, SEQ ID NO 74 &211s LENGTH: 521 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (521) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 28 634. Of .2863 927

<4 OOs, SEQUENCE: 74 aatatatata aatatatalaa tatatatalaa tatatataca tataaatata taalatatata 6 O tatgtaaata tatgtaaata tatgtaaata tatgtatatg tatatatatg taaatgitatg 12 O taaatatata taaatatatg taaatatata taalatatacg taaatatata aatatatata 18O US 8,252,917 B2 113 114 - Continued actatatata aatatatata aatataaata tataaatata tataaatata tataaatata 24 O taaataaata catataaata tataaataaa taCatataaa tatatataaa tatataaaaa. 3OO tatatatalaa tatatatata aatatatalaa Catatataaa tatataaata tatataaata 360 tataaataca taaaatatat aaatatatat aaatatataa atatatataa atatagataa 42O atatagataa atatataaat atatataaat atataaatat agataaatat ataaatatat 48O aaatataaat atataaaaat atatataaat atataaaaat a 521

<210s, SEQ ID NO 75 &211s LENGTH: 560 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (560) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 57123O3. 5712869

<4 OO > SEQUENCE: 75 atataattat atatat atta tat attatat ataattatat attatatata atgtataatt 6 O at at attata tataatatat ataaatatat a tatttittta tataaatata ttatatattt 12 O at at attata tatalaattta tatatatalaa tttittatata ttatatatat ttatat atta 18O tatattgt at at atttatat attacatatt gtatatattt atatattata tattatatat 24 O ttatat atta tat attatat atttatatat tatat attat atatatttat at attatata 3OO taaattattt atatataata tataaatata tattatataa tataaatttg tatatataat 360 at at attitat attatatata aaatattitat attatatata aaatataata taalatatata 42O

Catataatat at at attata tatttata at tatat attat atataataca tataatatat 48O aatatataat a catatatat catatatgaa atatatat ca tat attatac at attatata 54 O taaCatatat attatatato 560

<210s, SEQ ID NO 76 &211s LENGTH: 479 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (479) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 85.78812. 8579.29 O

<4 OO > SEQUENCE: 76 tatggtatac atatagtata tatgggg tac atatatggta tatatatggg ttatatatat 6 O gatatatatt atatatgt at atggtatata tatggtatat at attataca togcatatggit 12 O atgtatatgg tatatatatg atatatacat atggtgtata tatatgttat atatgatata 18O tataaggitat atatatggta tatataaggt atatatagta tatatatggit atatataagg 24 O tatatattgt atatatatgg tatatataag gtatatatat totatatatg gtatatatat 3OO ggtttatata tatggtgttgt atatatggtg tittatataca cactittatat actatatatt 360 atatacacac tatatataat at at attata tatagittaaa tatatggitat atgcaattag 42O atatatggta tatgtaatta tatatatggt atatagatgg togtatatatg gtatatata 479

<210s, SEQ ID NO 77 &211s LENGTH: 477 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens US 8,252,917 B2 115 116 - Continued

22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (4.77) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 8579.294 857977 O

<4 OO > SEQUENCE: 77 tatagtatat atacacacta tagg taatat act acatatt atatacacac tataaataaa 6 O atatataata tataat attt totatatagt at at attata tattgtatat actatatata 12 O atatatacta tag acagtag at actittata tactatagac agtatatact atatact.gta 18O tacactatag acagtatata ctatatactg tatacagtat atgtagt gta tatgtag tdt 24 O atataatata tagtatatat tat ctatact atatacagta tatatagtgt atacataata 3OO tat attatat attatatata ctatatacag tatacatagt gtatatgtag togtataatat 360 atataatgtg tatataaaat atatatacta tatataatat at attatata taatatatac 42O actatatata ctatagatac actatatatt cactatatat actatatata ctatata 477

<210s, SEQ ID NO 78 &211s LENGTH: 331 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (331) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 858OO24 .858O3.54

<4 OO > SEQUENCE: 78 actatatgtt atatacataa gatatagitat ataccatata ttatatacat tatatatagt 6 O gtatactata tataatgitat ataatatata gtatatatac actatatata citatgtatat 12 O atacactata tatactatgt atatatacac tatatatact atgtatatat acactatata 18O tactatgt at atatacacta tatatact at gtatatatac actatatata citatgtatat 24 O atacactata tatactatgt atatatacac tatatatact atgtatatat agtgtatata 3OO tact.gtatat gttatagtgt atatatagta t 331

<210s, SEQ ID NO 79 &211s LENGTH: 410 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (410) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 858 Of OS 8.581114

<4 OO > SEQUENCE: 79 tatagt citat attatataca gttctatataa tatatagitat atactatata tacttitt cot 6 O cattctgact atatactata tatatact at atatagtata totagtgitat atatacacca 12 O tatatact at atatagtata cataccatat at agtatact atacatacca tatatagitat 18O acat accqta tatagtatac tat acttaccatatatagta tacatactat atataatata 24 O tctggtgitat atatacacta tatatact at atatactata tatagtatat gtacactatt 3OO tatag tattt at agtatata tactgtatat at agtatgta gtatatatac tatat attat 360 gtag actata tataatatag actatgtgta gagtatatat actatatata 41 O

<210s, SEQ ID NO 8O &211s LENGTH: 433 &212s. TYPE: DNA US 8,252,917 B2 117 118 - Continued <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (433) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 129791.67. 129795.99

<4 OOs, SEQUENCE: 80 atatataata tatatatgtc ctatatataa aatatat cat atatataaat atatatgata 6 O tattittatat attaalatata taattatata taalatatata tittatatata aatat attat 12 O ttcaatatat ataaatatat ttaaatatat ttaaatagaa tattaaatat ataaatatat 18O aattatattt aatatatalaa tatat attaa atatataatt at atttaata tatataaata 24 O tat attaaat atataattat at atttatat atttattata tataaatata tatttgttct 3OO aaataaatat a tattotalaa tatataat at tittat attat attaatatata atataaaata 360 tataataaat atataatata taaataaata aatatttatt ataaaataca tataaatatt 42O aaatatatat taa 433

<210s, SEQ ID NO 81 &211s LENGTH: 385 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (385) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1633 6644. 16337 O28

<4 OOs, SEQUENCE: 81 tittatatalaa tat Ctatata aataaatata taalatatata aatataaata tatataaata 6 O tataaataaa tatataaata tatataaata taalatatata tataactatgaatttatatt 12 O tatataaata tatat citata tdaatataaa tatatattta tataaatata aatatatata 18O taaatatata tatttatata gatataaata tatatataaa tatatatatt tatatagata 24 O taaatatata t ctatatatgaatatatat c tataggaata taaatatata t ctatataaa 3OO tataaatata tataagtata aatatatata aatatatat c tatataaata taaatatata 360 tataaatata aatatatata taaat 385

<210s, SEQ ID NO 82 &211s LENGTH: 363 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (363) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 2O624448 . .2O624810

<4 OOs, SEQUENCE: 82 tatatatata gttatatata tatttatata tatagittata tatatattitt tatatagitta 6 O tatatatagt tatatatata gttatatata tatagittata tatatagitta tatatatagt 12 O tatatatata tagttatata tatagittata tatatagitta tatatatagt tatatatata 18O tagittatata tatagittata tatatatagt tatatatata gttatatata tatagittata 24 O tatatagitta tatatatagt tatatatata gttatatata tagttatata tatatagitta 3OO tatatatata gttatatata tatagittata tatatagitta tatatatata gttatatata 360 tag 363 US 8,252,917 B2 119 120 - Continued <210s, SEQ ID NO 83 &211s LENGTH: 310 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (310) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 566O25. 566,334

<4 OOs, SEQUENCE: 83 tatatataat atatattgta tat attatat attgtatata taatatatat td tatatatt 6 O atatattgta tatataatat at attgtata tattatatat totatatata atatatatat 12 O tgtatatatt atatattgta tatataatat atatattgta tat attatat attgtatata 18O taatatatat attgtatata ttatatattg tatatataat atatatattg tatat attat 24 O at attgtata tataatatat at attgtata tattatatat agtatatatt atatatagta 3OO tatataat at 31 O

<210s, SEQ ID NO 84 &211s LENGTH: 1236 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1236) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1171429. 1172664

<4 OOs, SEQUENCE: 84 aaagtatt at atgtattata totatatgta ttatatatta catatgtatt atatataata 6 O tat attatat attattatat attatatatt at at attatt atttatataa tdt attatat 12 O attatatagt atatatagta tatataatgt attatatatt atatagtata tatagtatat 18O ataatgtatt atatatagta tatataatgt attatatagt atatatact a tataatgitat 24 O tacatatt at gtatagtata totaatgitat tatat attat at agtatatg taatgitatta 3OO tatgtatt at at agtatata ttatatatga tigt attattt agtatatata atatatatga 360 tg tattatat aacatatata atatatatga tigt attatat agcatgtata gtatatatga 42O tg tattatat agcatgtata gtatatatga tigt attatat atago atgta tagtatatat 48O gatgtatt at atatagcatg tatagtatat atgatgtatt atatatagoa totatagitat 54 O atatgatgta ttatatatag catgtatagt atatatgatg tattatatat agcatgtata 6OO gtatatatga tigt attatat at agcatgta tagtatatat gatgt attat atatagoatg 660 tatagtatat atgatgtatt atatatagca totatagitat atatgatgta ttatatatag 72 O catgtatagt atatatgatg tattatatat agcatgtata gtatatatga tigt attatat 78O attatatatg gtatatatga tigt attatat attatatatg gtatatatga tigt attatat 84 O attatatatg gtatatatga tigt attatat attatatata atatatatga tigt attatat 9 OO attatatata atatatatga tigt attatat atgatgitatt atatataata tatatgatgt 96.O attatatata ttattatcta ttatatacga tigt attatat gcaagttatt atgtataata 1 O2O tataatgitat tatatatt at ataatgtata atatataaat atataaatat ataattatgt 108 O ataaatatag aaatatatac attatacatt atata catta taatgtataa tatataaata 114 O tattatatat aaatgtatac attatatata aatat attat atacattata tataaaatat 12 OO gtatatagitt attatacctt atatatacta aac agt 1236 US 8,252,917 B2 121 122 - Continued <210s, SEQ ID NO 85 &211s LENGTH: 309 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (309) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1925173 . . 1925 481

<4 OOs, SEQUENCE: 85 atatatttat ataaatatat tittatataaa tatat attat ataat attat aatatatgtt 6 O at attatata tattittatac alatatataat at at attata tatattittat acaatatata 12 O atatat atta tatatattitt atataatata taatatatat tatatatatt ttatacalata 18O tataatatat attatatatt atataatata tattatatat attittatata atatataata 24 O tatattittat acaatatata atgtatat cattatattata taatgtatat cat attatat 3OO aatgtatat 309

<210s, SEQ ID NO 86 &211s LENGTH: 312 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (312) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 439 6756. .4397 O67

<4 OOs, SEQUENCE: 86 cacagtgitat atatagtata tatactgt at atatact.gtg tatatacact gtatatacac 6 O agtgtatata cagtatatat actatatata cactgtgitat atatagtata tataa attct 12 O aggaatatat atactatata tatactatat atataaattic taggaatata tacacactat 18O atata Cacta tatatacaca tatatacact atatatatta tacacatata ttatatatat 24 O acactatata tacacgagat atata acata tacactatat actatacata acatatatac 3OO tatatatact at 312

<210s, SEQ ID NO 87 &211s LENGTH: 398 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (398) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 56 O57. 56.454

<4 OO > SEQUENCE: 87 atatat atta Cat attatat atataatata tattatataa tatat attat attatataat 6 O atataatata aatataatat aaattatatt atataatata taatataaat attaatataaa. 12 O ttatataaat attaatatata ttitt attata taatataata tat attatat aaatataata 18O tatalaattat attaatataat at at attata taatataata tattt tatta tataaatata 24 O tatt at atta tataatatat attittatt at ataatatata ttatatattt atagaatata 3OO atatatattt tattatataa tatatatt at attaatatata ttatatttat attatalacata 360 tatt attata taaaatatgt ataatatata ttatataa 398

<210s, SEQ ID NO 88 &211s LENGTH: 391 &212s. TYPE: DNA US 8,252,917 B2 123 124 - Continued <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (391) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 56984. 57374

<4 OOs, SEQUENCE: 88 tactataata Cat attatat attaat attat a tactatata t tactatatt attat attat 6 O atataattaa actatatt at agtatataat atataatata tactatatgt aat attact a 12 O tgat actgat attatatt at atata attaa attat attat attaatatat aaattatata 18O taatacataa tatataaatt at attatatt atttatatat aatgitatgcc atata attta 24 O tatataatgc attatatata atttatatat aatgcattaa atataaatta tatataatgc 3OO attatatata attatatata atgcattata tataattitat atttaatata taaatttata 360 tittaatatat ttatat atta tatataataa a 391

<210s, SEQ ID NO 89 &211s LENGTH: 309 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (309) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 469547. 46985.5

<4 OOs, SEQUENCE: 89 atatatatgt aatatatatgttatatatgt aatatatatgttatgttata tatgttatat 6 O atatgttata tataatatat atgttatata tacgittatat gttatatata tottatatat 12 O aatatatgtt atatatacgt tatatgttat atatgttata tataatatat gttatatata 18O atatatgtta tatatgttat atataatata tdttatatat attatatata atatatgtta 24 O tatatatt at atataatata taatatatgt gatatataat ataaaatata tdtgatatat 3OO attatatat 309

<210s, SEQ ID NO 90 &211s LENGTH: 441 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (441) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 54 619 O. 54663 O

<4 OOs, SEQUENCE: 90 atacacaa.ca tatgtgtata tatatagitat atatacacaa catatgtgta tatatatagt 6 O atatatacac aatatatgtg tatatatata gtatatatac acaatatatg td tatatata 12 O gtataaatat atactatata tagtatatat agtataaata tatactatat at agtatata 18O catagtataa atatatacta tatatagitat atacatagta taaatatata ctatatatag 24 O tatata cata gtataaatat atactatata tagtatatac at agtataaa tatatactat 3OO atatagtata tacatagitat aaatatatac tatatatagt atata catag tataaatata 360 tactatatat agtatataca tagtataaat atatactata tatagtatat a catagtata 42O aatatatact atatatagta t 441

<210s, SEQ ID NO 91 &211s LENGTH: 1367 US 8,252,917 B2 125 126 - Continued

&212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1367) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 124 643 . . 126 OO9

<4 OOs, SEQUENCE: 91 at atttatat gatatataat atatataata ttatatataa tattatatat gatatataac 6 O attatataat attatatatg atatatatta tatat attat atatgatata taatatatat 12 O aat attatat atgatatt at at at catata taatatataa aat attatat atgatatata 18O atatatataa tattatatat attatatata ttatatatoa tatataatat totaalatata 24 O taat attata tdatatataa gattatatac attatatata atatataata ttatatatga 3OO tatataat at tatata catt atatataata tataatgitat ataat attat at attatata 360 tittatatt at atacaatgta tataat atta tatat catat at atttatat tatatacaat 42O gtatataata ttatatat ca tatataat at tatatacaat gtatataata tat attatat 48O at atttatat tatatacaat gtatataata tat attatat at atttatat tatatacaat 54 O gtatataata tat attatat at atttatat tatatacaat gtatacaata ttatatatta 6OO tat attatat attitat atta tatacaatgt at at attata tattatatat titat attata 660 tacaatgitat at attatata ttatatattt at attatata caatgtatat attatatatt 72 O atatatttat attatataca atgtatatat tatat attat at atttatat tatatacaat 78O gtatat atta tat attatat atttatatta tatataatgt atgtaatatt atatattata 84 O tatttatatt atatataatg tatgtaat at tatat attat at atttatat tatatataat 9 OO gtatgtaata ttatat atta tatatttata ttatatataa totatgtaat attatatatt 96.O atatatttat attatatata atgitatgtaa tattatatat tatatattta tattatatat 1 O2O aatgitatgta at attatata ttatatattt at attatata taatgitatgt aat attatat 108 O attatatatt tat attatat ataatgitatg taat attata tattatatat titat attata 114 O tataatgitat gtaatatt at at attatata tittat attat atataatgta totaat atta 12 OO tat attatat attitat atta tatataatgt atataatatt atatattata tatttatatt 126 O gtatataata ttatat atta tatatttata ttgtatataa tatat attat at atttatat 132O tgtatataat attatatatt atatatttat attatatata atgtata 1367

<210s, SEQ ID NO 92 &211s LENGTH: 458 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (458) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 58908. 593.65

<4 OOs, SEQUENCE: 92 tatatgatat atatgatata tatgggatat atatgatata tatgatatat atggtatata 6 O tatgatatat agtatatatg atatatatgg tatatatatg atatatagta tatatgatat 12 O atatggtata tatgatatat agtatatatg atatatatgg tatatatggit atatatatga 18O tatatgatat atatgatata tatgatatat gatatatatg atatatatga tatatatggit 24 O atatatgata tatatggitat atatggtata tatatgatat atatgatata tatggtatat 3OO atatgatata tatgatatat atggtatata tatgatatat atgatatata tdgtatatat 360 US 8,252,917 B2 127 128 - Continued atgatatata tdatatatat gigtatatata tdatatatat gatatatat catatatatgg 42O tatatatatg atatatatga tatatat cat atatatgg 458

<210s, SEQ ID NO 93 &211s LENGTH: 330 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (330) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 3 O 6867 - 3 Of 196

<4 OOs, SEQUENCE: 93 ataatatata aatatatatg atatatat ct atatatat ca tatataaata tatatgatat 6 O atat citat at at at catata taaatatata tdatatataa atatatatga tatatat cita 12 O tatatat cat atataaatat atatgatata taalatatata t datatatat citatatatat 18O catatataaa tatatatgat atatat citat at at catata taaatatata tdatatatat 24 O ctatatatat catatataaa tatatatgat atctatotat atatat cata tataaatata 3OO tatgatat ct atctatatat at catatata 33 O

<210s, SEQ ID NO 94 &211s LENGTH: 353 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221 > NAME/KEY: misc binding <222s. LOCATION: (1) ... (353) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 63 6899. 637251

<4 OOs, SEQUENCE: 94 tatgtataca tatacacata tacgtatata tatacatata tacacatata cqtatatata 6 O tacgtataca tacatatgta tatgtatacg tatacacaca tatgtatatg tatacgtata 12 O cacacatata cqtatatatg tatacgtata cacacatata cqtatatgta tacatatata 18O tgttgtacata tacgtatata cqtatatgta tacatatata cqtttatgta tatatacgta 24 O tatacgtata tatgtatatg tatacatata tacatatatg togtatatacg tatatacgta 3OO tatgtgtata tatacaat at a catacatgc acatatatgt gtatatgcac ata 353

<210s, SEQ ID NO 95 &211s LENGTH: 345 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (345) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 143551.O. 1435854

<4 OO > SEQUENCE: 95 at catatata ttatatatica tatatatgat atataaaaat tatatat cat atatatgata 6 O tataaaaatt atatatat ca tatataatat atataatata ttatatatat aaattatata 12 O taatatatat aaattatata tat catatat atgatatata atttatatat catatatatg 18O atatatataa tatatt attt atatataata tattatatat tatataatat gtaatatata 24 O ttatat atta Cat attatat tatttatalaa taatlattitta taatatatat aat attatat 3OO aatatagaat attatatatt at at attaca tattatataa tatat 345 US 8,252,917 B2 129 130 - Continued

<210s, SEQ ID NO 96 &211s LENGTH: 521 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (521) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 39 695 - 4 O215

<4 OOs, SEQUENCE: 96 tatatatata atagat atta tatatictatt atatat citat tatatatata atagatatta 6 O tatat c tatt atatatataa tagat attat at atc tatta tatatataat agatattata 12 O tatictatt at atataatata tat Ct attat at attatata totattatat attaatatata 18O tct attatat at attatata t ct attatat atataataga tattatatat c tattatata 24 O taatatatat ct attatata ttatatat ct attatatata totatictatt atatatatta 3OO tgitatic tatt atatataata tatatictatt atatatatat tatatataat at at attata 360 tat attatat atct attata tataatatat atct attata tat attatat atctattata 42O tat attatat atct attata tataatatat atct attata tat attatat atctattata 48O tataatatat attatatata tattatatat tdtataticta t 521

<210s, SEQ ID NO 97 &211s LENGTH: 484 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (484) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 1286 OOf 128649 O

<4 OO > SEQUENCE: 97 atat catata tattatatat catatatatg atatataaaa attatatat catatatatga 6 O tatatatalaa ttatatatat catatataat atatataata tattatatat attalaattata 12 O tataat atta tatataaatt atatat caca tatatgacat ataaattata tat cacatat 18O atgatatata atttatatat cacatatatg atatataatt tatatat cat atatatgata 24 O tataattitat at at catata tatgatatat aatttatata t catatatat gatatatata 3OO at at attatt tatatataat at attatata ttatataata totaatatat attatatatt 360 atataatatg taatatatat tatat attac at attatatt atttataaat aatatttitat 42O aatatatata at attatata atatagaata ttatatatta tat attacat attatataat 48O at at 484

<210s, SEQ ID NO 98 &211s LENGTH: 244 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (244) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 73556.73879

<4 OOs, SEQUENCE: 98 attatatatt at attatata atatataata at attatata attatatatt acattatata 6 O atatataata at attatata attaatatata attatataat atataataat attatataat 12 O attatataat attatataat atataaatat ataataatat at attatatt atataatagt 18O US 8,252,917 B2 131 132 - Continued at at attata ttatataata tatgttatta tattatataa tataaactat tatataatat 24 O aata 244

<210s, SEQ ID NO 99 &211s LENGTH: 463 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (463) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 179038. 1795OO

<4 OOs, SEQUENCE: 99 tacaat at at tittctatt at atatattittg tattatatat aatatacaat at attittcta 6 O ttatatataa tatattttgt attatatata ttacaatata ttttgtatta tataatatat 12 O aatacaat at aatatattgt attatataat atataatact atataatata ttg tattata 18O tattatatat aatactatat aatatattitt attatatatt atatataata Citatataata 24 O tattitt atta tat attatat ataatacaat atataatata ttg tattata atacaatgta 3OO ttataatgta ttatatataa tatataatac aatatataat attatatata tittatatata 360 tatatattitt g tattatata ttttgtatta tatatattitt gtattatata tittatattitt 42O at attata at tatgttittgc attatatatt toatattata tat 463

<210s, SEQ ID NO 100 &211s LENGTH: 390 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (390) <223> OTHER INFORMATION: MAR of chromosome 1 genomic contig; 55 617. 56 OO6

<4 OOs, SEQUENCE: 1.OO tgtataatat atatactitta tatataatat atatactitta tatatatact atatactaat 6 O atatataata tatactatat attaatatata Ctaatatata taatatatac actatatata 12 O atatatacta atatat atta tatatactitt atataatata tactaatata tataatatat 18O at actittata tataatatat actaatatat ataatgtata tactittatat ataatatata 24 O

Ctaatatata atatatatac tittatatata atatatacta atatatatta tatatactitt 3OO atatatataa tatatactta tat attatat atgcttatat ataatatata cactaatata 360 taatatatat actittatata ttatattitta 390

<210s, SEQ ID NO 101 &211s LENGTH: 582 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (582) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 11574 OS. 1157986

<4 OOs, SEQUENCE: 101 tgtatatgta tatatacaca tacgcacata tatgtatatg tatatataca catacgcaca 6 O tatatgtata totatatata cacatacgca catatatgta tatgtatatg tatatgtata 12 O tatacacata tacacatata totatatgta tatatacaca tatacacata tatgtatatg 18O US 8,252,917 B2 133 134 - Continued tatatataca catatacaca tatatgtata totatatata cacatacaca tatatgtata 24 O tgtatatgta tatatacaca tacacatata totatatgta tatgtatata tacacatata 3OO cacatatata catatatgta tacatatatgtgtatatata tacacatata tatacatata 360 tgtata cata tatgtgtata tatacacata tatatacata tatacatata catatatatg 42O tgitatgtata tatacacata tacatatata totatatgtg tatatatatt agacagatat 48O atatgtacat atacatatat atgtatatgt atatgtatat gtatatgtat atgtatatgt 54 O atatgcatat ataatataca tatacatata totatatgta ta 582

<210s, SEQ ID NO 102 &211s LENGTH: 322 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (322) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 1858 638. 1858959

<4 OOs, SEQUENCE: 102 acac catata tacaccatat atata Catac Catatatata CCatatatat acataccata 6 O tatataccat atatataCat accatatata Caccatatat atacatacca tatatataca 12 O

CCatatatat acataccata tatataccat atatataCat accatatata taccatatat 18O atacatacca tatatataca CCatatatat acataccata tatatacaCC atatatatac 24 O attaccatata tataccatat atacaccata tatatacaCC atatatacac accatatata 3OO

CCatatatat acaccatata ta. 322

<210s, SEQ ID NO 103 &211s LENGTH: 914 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (914) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 5712196. 5713109

<4 OOs, SEQUENCE: 103 aaatatatat tctatatata gaaaatatat attctatata tatagaatat atatagaata 6 O tatatt citat atatatt cita tatatataga atatatatat aaaacatata ttctatatat 12 O aaaatatata ttctatatat ataaaatata tatt ctatat atatagaatg tatataaaat 18O atatatt cita tatatataga atgtatataa aatatatatt citatatatat agaatgtata 24 O taaaatatat attctatata tatagaatgt atataaaata tatatt citat atatatagaa 3OO tatatataac atatatatga aatatatata aaatatatat aaatacatat ttctatatat 360 aaatatatat aaataCat at ttctatatat aaatatatat Caatacatat ttctatatat 42O aaatatatat aaatatatat it catatatat aaaaatatat aaatatata t t catatatat 48O aaaatatata tdaatatata ttct citat at ataaaatata tataatatat attatatata 54 O taaaatatat attaatatata ttatatatat aaaatatata taatatata t t catatatat 6OO aaattatata taalatatata ttcatatata taatatatat aaatatttat ttcatatata 660 aaatatattt aaatatatat ttctatatag aatatatatt citatatataa aatatatata 72 O taaatatatt ttctatatag aaatatatat gaaatatata gaatatatat aaatatatat 78O tatatatact atatatacaa tatatatt at atataaaata tatatacaat attatatt Cta 84 O US 8,252,917 B2 135 136 - Continued tat attaata tatagaatat at attaa.cat at atttcaat at attaatat atgaaatata 9 OO tataaatatt to at 914

<210s, SEQ ID NO 104 &211s LENGTH: 370 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (370) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 5713 613. 5713982

<4 OOs, SEQUENCE: 104 tatt toatat attaatatata tataaaatat a tatt toata taCatalatat attataatata 6 O aataaaatat a tattt cata tatataatat atataatata tataaaa.cat a tatt toata 12 O tataatatat attaalactata tattt Catat attaatatata taalactatat attt Catata 18O

Catalatatat attaatatata titt Cattitat attatatata taatatatat ttcatatata 24 O taatatataa aatagatata aatatatata aatatatatt toatatataa tatatataaa 3OO atatat atta atatatattt tatatataat atatatattt Catatataaa tataaaaaaa. 360 tatatatt to 37 O

<210s, SEQ ID NO 105 &211s LENGTH: 442 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (442.) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 7481647.7482O88

<4 OOs, SEQUENCE: 105 atataaatta tataatatgt tatataatat ataaatatat tatataa.cat gttatataat 6 O atataa catgttatataata tataa catgt tatataatat ataacatgtt atataatata 12 O taacatgtta tataatatat tatgtaatat gttatataat atataatata ttatataa.ca 18O tgttatataa tatata acat gttatataat atgttatata atatataaat at attatatt 24 O atatgttata taatatataa at at attata ttatatgtta tataatatat aaatatatta 3OO tattatatgt tatataatat ataaatatat tatattgtat gttatataat atataaatat 360 attatattgt atgttatata atatataaat at attatatt gtatgttata taatatataa 42O at at attata ttatatatgt ta 442

<210s, SEQ ID NO 106 &211s LENGTH: 338 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (338) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 9594557. 95.94.894

<4 OOs, SEQUENCE: 106 tatataaata tataccatat atataaatat atatatt CCa tatataaata tatatattoc 6 O atatatataa atatatatat it coatatata aatatatata titcCatatat attaaatatat 12 O atataaatat atatatto.ca tatatatalaa tatatatata aatatatata ttcatatata 18O aatatatata tattocatat ataaaaatat atatatatto Catatataala aatatatata 24 O US 8,252,917 B2 137 138 - Continued tatt Coatat atataaatat atatatatto Catatatata aatatatata tatto Catat 3OO atataaatat atatatatto Catatatata aatatata 338

<210s, SEQ ID NO 107 &211s LENGTH: 364 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (364) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 1051972 O. 10520O83

<4 OOs, SEQUENCE: 107 ttatatatat ttataataat atatataagc tatatatatt tatatataat at attatata 6 O tattagctat atatattitat ataataatat attatatatt agctatatat atttatatat 12 O aataatatat ataagctata tatttatata tattatatat tagctatata tatttatata 18O taatat atta tat attagct atatatttat atataataaa taatatatat attagctata 24 O tatatttata tataataata tatataagct atatattitat atataatata ttatatatta 3OO gctatatata tittatatata ataatatatt at at attagc tatatatatt tatatataat 360 at at 364

<210s, SEQ ID NO 108 &211s LENGTH: 342 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (342) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 11481-943. 11482284

<4 OOs, SEQUENCE: 108 taCatataat atataattat atataatata tattatatat taCatatata atatatatat 6 O tacatatgta atatatatat tatatatgta atatatatta tatatgtaat atatatatta 12 O tatatgtaat at at attata tatatgtaat atatatatta tatatgtaat atatatatta 18O tatgtaatat atatatgtaa tatatatata atatatatgt aatatatata taatatatat 24 O gtaatatata tataatatat atgtaatata tat attatat atatgtaata tatat catat 3OO atatgtaata tatat catat atatgtaata tatat catat at 342

<210s, SEQ ID NO 109 &211s LENGTH: 415 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (415) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 13499.598. 135 OOO12

<4 OOs, SEQUENCE: 109 tatatatata tatatatata atataatata atatatatat aaatatatat aatatalaatt 6 O tatatatata tatttatata taCatatata aatatatatt tatatttata tataaatata 12 O tataaatata tataaatata tatttatata tacatatata aatatatatgttcatataaa 18O tatatatgta tatata cata tataaatata tattatatat gtatatatat aatataatat 24 O ataataataa tataatatat attatatalaa tataatatat tatatataat attatataata 3OO US 8,252,917 B2 139 140 - Continued tataatatat aatatataat atataatata tattatatat tatataatat attaaaatata 360 tattatataa tatatataca taatatatat aaataaatat atataaagat ataaa 415

<210s, SEQ ID NO 110 &211s LENGTH: 330 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (330) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 16370976. 163713 OS

<4 OOs, SEQUENCE: 110 catttacata totatgtata agtatgtata ttacatactt atacatacat acttataaat 6 O atataagtat aatacataca tacttataaa tatataagta taatacatac at acttatac 12 O atatataagt ataatacata catacttata catatataag tataatacat a catactitat 18O acatatataa gtataataca tacatactta tacatataag tataatacat a catactitat 24 O acatatataa gtataataca tacatactta tacatatata agtataatac atact tatta 3OO catatgtata taagtatatt acatactitat 33 O

<210s, SEQ ID NO 111 &211s LENGTH: 702 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (702) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 62 6641 .. 627342

<4 OOs, SEQUENCE: 111 tatatataca Catata cata tataatatat atacatatac atatatatta tatataCata 6 O tat attacat at atcatata taCatatata ttatatatac atatatatta tatatat Cat 12 O atataCatat at at attata tattatatat at Catatata catatatatt attatat atta 18O tatatatoat atataCatat at attatata tattatatat acatatatat tatatatat C 24 O atataaaCat at at attata tatat catat atacatatat attatatata ttatatatat 3OO

Catatataca tatatatt at atatatoata tataatatat attatatata ttatatataa. 360 tatatatt at atataCatat at attatata taCatatata ttatatatac attatat atta 42O tatata cata tat attatat atacatatat attatatata taCatatata ttatatatac 48O atatat atta tatata cata tattatatat acatatatat tatatataca tat attatat 54 O atataCatat at attatata taCatatatt atatatatac atatatatta tatataCata 6OO tattatatat atacatatat attatatata catat attat atata catat at attatata 660 taCatatata ttittatatat atataatata tattittatat at 7 O2

<210s, SEQ ID NO 112 &211s LENGTH: 679 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (679) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 3196O47 . .3196725

<4 OOs, SEQUENCE: 112 at at attata tattoatata t catalaat at at at attata tattoatata ttatatatot 6 O US 8,252,917 B2 141 142 - Continued at at attitat a tattoatat attatatato tatatattta tatattoata tattatatat 12 O ctatttatat att catatat tatatat cita tatattittat atatt cqtat attatatatic 18O tatatatt at at attcgitat attatatat c tatat attat g tatt catat at atctatat 24 O attatatata t t catatata ttataaatta tatt catata gtatatat ct attataaatg 3OO tatatt cata tagtatatat citatatatta taalatataca tat attatat atttatatat 360 tatatatt.ca tatagatcta tat attatat atatt catat atgaatatat at attatatg 42O tatatatatt ataaatatat ttatatagta tagat attat at agtatatg catatttata 48O ttataaataa tttacatagt atatgtatat ttataaatta tatatattta cat attacat 54 O gtatatttat at attataaa tacat attta cat attataa atatattitat at attatgaa 6OO tataattitat at attattac at atttacat atatgcatag titatatatta taaatatgca 660 tittatgtaaa tatatattt 6.79

<210s, SEQ ID NO 113 &211s LENGTH: 728 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (728) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 31.96778. 3197505

<4 OOs, SEQUENCE: 113 tacataaata tatatttaca atatgtaaat atctgatatg taaatatgta tittataatat 6 O ataaatatac atataatatg taaatatata aatatacata tactatotaa atatatgtta 12 O tatata cata tactatataa atatagaata tataaatata catatactat ataaatatgt 18O aatatataaa tatatact at ataaatatac atatactata taaatgtatt tataatatat 24 O aaatatacat atactatata aatt catata tdaatatata atatataaat atatataata 3OO tatgaatata tact catata taaatatata tdaatatata tittataatat atagatataa 360 tatgaatata tatttataat atatagatat at attatatgaatatatatt tataatatat 42O agatatatac catatgaata tat attatac actatatgaa tatatattta taatatataa 48O atagatatat actatatgaa tatataatat atatacticta togaatatata atatatatac 54 O tatatgaata tattatatac td tatgaata tataatatat agatgtatac tatatgaata 6OO tataatatat agatatatat actatatgaa tatatataat atatagatat atactatatg 660 aatatatatg atatatagat atatactata tdaatatata atatatagat atatattitat 72 O gatatatg 728

<210s, SEQ ID NO 114 &211s LENGTH: 413 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (413) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 256O 638 . .2561.050

<4 OOs, SEQUENCE: 114 atataaatat a tatttatat attittatata aatatatata tittatatatt tttatataaa. 6 O tatatatatt tatatatatt tatataaata tatatattta tatatattta tataaatata 12 O taalatatata tatttatata aatatatalaa atatataaat a tatttatat aaatatataa. 18O US 8,252,917 B2 143 144 - Continued aatatatalaa tatatttata taalatatata aaatatataa atatatttat attataaatat 24 O ataaaatata taalatatott tatatatalaa tatataaaat atataaatat Ctttatatat 3OO aaatatataa aatatatalaa tatatttata tataaatata taaaatatat aaatatattt 360 atatacaaat atataaaata tataaatata tittatatata aatatataala ata 413

<210s, SEQ ID NO 115 &211s LENGTH: 361 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (361) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 49653 O9 . . 4965 669

<4 OOs, SEQUENCE: 115 tatacgtata tatacatata tatacgtata tatatacata tatatacgta tatatacata 6 O tgtatatatgtctgtacatg tatatatata catatgtaca tatatatgta cacatatata 12 O tata catata tatgtacaca tatacatata tatgtacaca tatacatata catatatatg 18O tacacatata tatacatata tatgtacaca catatatata catatatatg tacacacata 24 O tatacgtata tatgtacaca catatatacg tatatatatg tacacacata tatacgtata 3OO tatatgtaca cacatatata tacgtatata tatgtacaca tatatatata cqtatatata 360 t 361

<210s, SEQ ID NO 116 &211s LENGTH: 325 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (325) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 525,815 O. 5258.474

<4 OOs, SEQUENCE: 116 tacacacaca tatacatata tacatatata cqtgtatacg tatacgtata tacgtatata 6 O tacatatatg tatacgtata cqtatatacg tatatataca tatatgtata cqtatacgta 12 O tatacgtata tatacatata totatacgta tacgtatata cqtatatata catatatgta 18O tacgtatacg tatatacgta tatatacata catatgtata cqtatacgta tatatgtata 24 O tatacgtata totatacgta tacatatata cqtatatata cqtatatgta tatgtatata 3OO cgtatatgta tatatgtaca tatac 3.25

<210s, SEQ ID NO 117 &211s LENGTH: 1508 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1508 <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 605.7499. 605.9 OO6

<4 OOs, SEQUENCE: 117 atataatata tatalaattat attaatatata aaaattaata tataatatat attalaattata 6 O taatatataa attaattata taatatatat aaattatata atatataaati taattatata 12 O atatatataa attatataat acatataaat taattatata atatataaat tatataatat 18O US 8,252,917 B2 145 146 - Continued atacaa atta tatact at at taattatata ttatataatti aattatataa tatatataaa. 24 O ttatat atta ttalaattaat tatataatat atalaattata taatatataa attaattata 3OO taatatataa attatataat atatalaatta attatataat atatalaatta tataatatat 360 aaattaattig tataatatat aaattaatta tataatatat aatatataat taataaataa 42O ttatat atta attatata at taataaataa attaataaata tatataatta atatataata 48O tacat catat at at cacata tagattatat aatagittata tattatataa taaattatat 54 O ataatatata ataaac at at ataacatatgttatatatta cataatatag tataatatat 6OO aacatatgtt at at attaca taatatagta taatatataa catgttatat attacataat 660 at agtataat atataa cata tottatatat tacataatat agtataatat ataacatatg 72 O ttatat atta cataatatag tataatatat aacatatgtt atatattaca taatatagta 78O taatatataa catatgttat at attacata atatagtata atatataa.ca tatgttatat 84 O attacataat at agtataat atata acata tdttatatat tacataatat agtataatat 9 OO ataa catatgttatat atta cataatatag tataatatat aacatatgtt at at attaca 96.O taatatagta taatatataa catatgttat at attacata atatagtata atatataa.ca O2O tatgttatat attacataat at agtataat atata acata tottatatat tacataatat O8O agtataatat ataacatatgttatatatta cataatatag tataatatat aacatatgtt 14 O at at attaca taatatagta taatatataa catgttatat attacataat at agtataat 2OO atataa cata togctatatat tacataatat agtataatat atatgttata tattacataa 26 O tatagtataa tatataacat atgttatata ttacatatta tagtataata tatatgttat 32O at attatata atatagtata atatataatg tatgttatat attatataat at agtataat 38O atataa catgttatat atta tataatatag tataatatat atgttatata ttatataata 44 O tagtataata tataatatat gttatatatt atataatata gtataatata tatgttatat SOO attatata 508

<210s, SEQ ID NO 118 &211s LENGTH: 415 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (415) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 7996.866 fee f28 O

<4 OOs, SEQUENCE: 118 caattatata atatacat at tatataattig tataaattat acaat catat aattatatta 6 O tatataatat acatataata taattatata taattatata attittataat attaattatat 12 O ataattatat aattatatat aatatatatt attaattatat atataatata tat attatat 18O at attatata taatatataa attaatatata taatatatat attaattatat attaataatat 24 O atgtaatata tataatatat atataatata ttatttataa ttatatatta tatatatatt 3OO ataatatata taattatalaa taatatatat tataatatat attaataatat attatata att 360 atatataata atatat atta taattatata taataatata tataattitat attaat 415

<210s, SEQ ID NO 119 &211s LENGTH: 526 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding US 8,252,917 B2 147 148 - Continued <222s. LOCATION: (1) . . (526) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 83 OO93 O. 83 01455

<4 OOs, SEQUENCE: 119 tatat catat gatatatt at acaatatat catataatatg atatattata tdatatattg 6 O tacaatatat catatgat at atgatatatt atacaatata t catataagg tatat attat 12 O at catatata atatataata taatatatga tataatatat gatatatgat atataatata 18O tgatatatga tatatgat at ataatatatg atatatgata tatgatatat aatatatgat 24 O atatgatata tdatatataa tatatgatat atgatatatg atatgatata tdatatatga 3OO tataatatat gatataatat atgatatata ttatatgata tataatatat gatataattit 360 atatgatata taatatatga tatataatat ataatatatg atatgatata tattatat ca 42O tatataatat ataatataat atatgatata tattatatat ttittatacat tatatatata 48O aactatataa caatataa.ca tattatgtgt ataatatata ttacat 526

<210s, SEQ ID NO 120 &211s LENGTH: 4 O2 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (4 O2) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 8576553 . .8576954

<4 OOs, SEQUENCE: 120 atgitat atta tatacaat at agtatat cat atatagtata tattatatag taatgitatta 6 O tatataatgt ataatgtata aatatataat atatactaca tactatact a ttatatatac 12 O tatatatt at atatgataca tatactatat aatatgctat at attatact atataatatg 18O citat at atta tactatataa tatgctatat attatact at ataatatgct at at attata 24 O ctatataata togctatatat tatactatat aatatact at ataatatgct at at attata 3OO

Ctatataata tactatatat tatactatat aatatactat atalacatact at at attata 360 tatgatacat atactatatt acatatataa tatatatata ta 4 O2

<210s, SEQ ID NO 121 &211s LENGTH: 477 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (4.77) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 8785649.8786 125

<4 OOs, SEQUENCE: 121 tatttatata tatatttata tatatattta tatatattta tatatatatt tatatatata 6 O tittatatata tatttatata tittatatata tatatttitta tatatttata tatatattta 12 O tatatttata tatatttata tittatatata tatttatata tatttatata tatttatata 18O tatatattta tatatattta tatatatata tittatatata tittatatata tittatatata 24 O tatttatata tatatttata tatatatt Ca tatatattta tatatatatt Catatatatt 3OO tatatatata ttcatatata tittatatata tatttatata tatatttata tatatttata 360 tatatttata tatatattta tatatatatt tatatatata tatttatata tatatttata 42O tatatatatt tatatatata tittatatata tatatttata tatatattta tatatat 477 US 8,252,917 B2 149 150 - Continued

<210s, SEQ ID NO 122 211 LENGTH: 773 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (773) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 1OO647371 OO65509

<4 OOs, SEQUENCE: 122 at attatata tattacatat at attatatt gtatataata tatat attat attgtatata 6 O atatatatat tatattgt at ataatatata tattatattg tatataatat at at attata 12 O ttgtatataa tat attatat tdtatatatt at attgtata tattatattg tatacaatat 18O at attatatt gtatacaata tat attatat tdtatataat at attatatt gtatataata 24 O tatt at attg tatatatt at attgtatata at at attata ttgtatataa tat attatat 3OO tgtatatatt at attgtata taatatatta tatgtatata atatagtgta tactatatta 360 tataatatat attatataca atatataata tattgtatat catatatgat at attgtata 42O taatatataa tatatgat at attgtatata at at attata tatgatatat tdt at attat 48O at attatata tdatatattg tat attatat attatatatt gtatattgta tattatatat 54 O tatatattgt atataatatgttatatattg tatataatat gttatatatt atatattgta 6OO tatatgttat at attatgta ttgtatataa tatgttatat attatatatt gtatataatg 660 tattatatat tatatatatt atatattgta tataatgitat tatatattgt at attatata 72 O ttatatattg tatataatat attatataca ttatattata tattatatat tdt 773

<210s, SEQ ID NO 123 &211s LENGTH: 1554 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1554) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 103.9775. 1041328

<4 OOs, SEQUENCE: 123 ataatatatt aaatgtatat ataatatatt aaatataaat at atttataa tatataaata 6 O tittatatalaa tataaaatat at attaaata taalatatata taaaatatat attaalatata 12 O taaaatataa atatat atta aatatatatt aaatatataa aatataaata tat attalaat 18O at attittaala tatataaaat ataaatatat attaalatata ttittaalatat attaalatata 24 O aatatatatt aaatatattt taalatatatt aaatataaat acatatatta aatatatatt 3OO atatatataa aatatataaa atataaatat at attaaata tatataaaat atatatgtta 360 aatatataaa agatatataa aatataaata tat attaaat atatataaaa tatatatata 42O ttaaatatat at attaaata taaatatata taaaatataa atatatgtat taaatatata 48O tattaaatat aaatatatgt attaaatata tattaaatat gaatatatgt attaaatata 54 O tattaaatat aaatatatgt attatatata tagaatataa atatatgtat taaatatagt 6OO at attaaata taaatatata taaaatatat attaaatatgaatatatata aaatatatat 660 attaaaaata tatataatat aaatatatat aaaatatata tattaaaaat attatataata 72 O taalatatata taaaatatat at attaaaaa tatatataala atatatatat taaaaatata 78O tataaaatat at at attaala aatatatata aaatatatat attaaaaata tat attalaat 84 O US 8,252,917 B2 151 152 - Continued ataaatatat at attaaaaa tatat attaa atatalactat at attaaata tat attalaat 9 OO atalactatat attaalatata tattaalatat alactatatat taalatatata ttaaatataa. 96.O

Citat at atta aatatatatt aaatataact at at attaala tatat attaa atatalactat O2O at attaaata tat attaaat atalactatat attaalatata tattaalatat alactatatat O8O taaatatata tdaaatataa citatatatta aatatatatt aaatataact atatgtatta 14 O aatataaata tatgtc.ttaa atatatatta aatataaata tatgt attaa atatatatta 2OO aatataaata totgtattaa atatatatta aatataaata totgt attaa atatatatta 26 O aatataaata totgtattaa atatatatta aatataaata totgt attaa at atctatat 32O taaatataaa tatatgtatt aaatatatat taalatataaa tatat attaa atatatatat 38O taalatatalaa tatatattaa atataaatat at at attaala tatatatatt aaatatalaat 44 O atatataaaa tatatatatt aaatataaat ataaatataa aatatatatt aaatatalaat SOO acatat atta aatatatgta ttaaatatat atataaaata tatgt attaa at at 554

<210s, SEQ ID NO 124 &211s LENGTH: 650 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (650) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 394 4813 . .3945. 462

<4 OOs, SEQUENCE: 124 catgatatat tatgtataat at at attata gattacatat aaattatata tataatatat 6 O aattatataa tatataat at tatataatat attatatata ttatacaatt attataatata 12 O tataatatac aattatataa tatataatat acaattatat aatatataat acaatataat 18O atatatttaa tat attatat aatacatatt taatatatta tat attatat gttatatact 24 O aaatatataa tatgtattta atatatacta ttatatatgt aatat attat ataattitatg 3OO taac at atta tat attatat atgcaatata ttacatgtta catatatatt acatataata 360 tatgtaatat ataatataca citat attatt at agtatata atatactata t tatgtaatt 42O atataatata gtatattata cactatatta tattat cata taattatata ttatatact a 48O tattacatat at attatgta atataatatgcaatatgtta catatataat atatatgtat 54 O tatatagitat atatactata gtatatataa aatatatgct ataatatata ttittatatat 6OO tatataatac atataatgta t catatatta tatataatat attittataat 650

<210s, SEQ ID NO 125 &211s LENGTH: 441 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (441) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 5314265 53.14705

<4 OOs, SEQUENCE: 125 tataaatata tatgaaatat atataaatta tatataattt atatatacat atataaatta 6 O tatataaatt atatataaat tatatataca tatataaatt atatattata tataaaattg 12 O tatatattta tatataaatt gtatatataa tittatatata aattgtatat ataatttata 18O tatacaatgt at at attaat ttatatatac attgtatata taatttatat atacattgta 24 O US 8,252,917 B2 153 154 - Continued tatacaattt atatatacat tdtatataca atttatatat acattgtata tacaattitat atataaatta tattattitat atatagtata tataaatata tatactatat ataaattata 360 tatt tattta tat attatat tatttatata taaattatat attatttata tataCattat atatalaatta tatatt attt a 441

<210s, SEQ ID NO 126 &211s LENGTH: 1169 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (1169 <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 5953971. 595.5139

<4 OOs, SEQUENCE: 126 atgt attcat attatatatt tatatataaa taatata cat t catattata tatttatata 6 O taaataatat at attcat at tatatattta tatataaata tataatatat titatgtataa 12 O ataatatata tattoatatt atatattt Ct atataaataa tatatatatt Cat attatat 18O atttatatat aaatatataa tatatttata tataaatata taatatattt attatataata 24 O tatatatt Ca tattatatat ttatatataa atatataata tatttatata taaataatat 3OO atatattoat attatatatt tatatatalaa taatatatat it catattata tatttatata 360 taaataatat a tattoatat tatataCtta tatataaata atatatatt C at attatata

Cttatatata aataatatat att Catatta tatatttata taaaaataat attatatt Cat attatatatt tatatataat atatatatto at attatata tittatatatt Ctatatatt C 54 O at attatata tittatatata aataatgitat att catatta tatatttata tataaataat gtatatt cat attatatatt tatatataaa tatatatt ca tattatatat ttatatataa 660 atatatatto at attatata tittatatata aatatatatt Cat attatat atttatataa. 72 O aatatatata ttcat attat atttatatat aaatatatat att Catatat a tatttatat ataatatata tattoatatt atatattitat atataatata tatattoata ttatatattt 84 O atatataaat aatatatata t t catatt at at atttatat ataaataatg tatatt cata 9 OO ttatatattt atatataaat aatgtatatt cat attatat atttatatat aaatatatat 96.O attcat atta tatatttgta tataaatata tatt catatt atatatttgt atatatatt c atatatattt atatataaat atataatatt Cat attatat ataaatatat a tatt Catat 108 O tatatattta tatatatalaa taatatatat it catattatt tatatatata aataatatat 114 O attcat atta tittatatata taaataata 1169

<210s, SEQ ID NO 127 &211s LENGTH: 653 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (653) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 6427669. 6428321

<4 OOs, SEQUENCE: 127 tatatatgta tacatatatg tatatatgtg tatatatgta tacatatatg tatatatgtg 6 O tatatatgta tacatatatg tatatatgtg tatatatgta tacatatatg tatatatgtg 12 O tatatatgta tacatatatg tatatatgtg tatatatgta tacatatatg tatatatgtg 18O US 8,252,917 B2 155 156 - Continued tatatatgta tacatatatg tatatatgtg tatatatgta tacatatatg tatacatgtg 24 O tacatgtgta tacatatatg tatacatgtg tacatgtgta tacatatatg tatacatgtg 3OO tacatgtgta tacatatatg tatatatgtg tatacatata totatatatg td tatatatg 360 tata catata totatataag togtatatatgtgtatatgta tataagtgta tatatgtgta 42O tatgtatata agtgtatata totgtatatg tatataagtg tatatatgtg tatatatgta 48O tacatatatg tatatatgtg tatatatgtg tatatgtata taagtgtata tatgtgtata 54 O tatgtataca tatatatgtg tatatatgta tacatatatg tatatatgtg tatatatgta 6OO tacatatatg taaatatgtg tatatatgtg tatatgtata taagtgtata tat 653

SEQ ID NO 128 LENGTH: 414 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (41.4) OTHER INFORMATION: MAR of chromosome 2 genomic contig; 1089 O453. 10890866

SEQUENCE: 128 tatattttgt aaatatatat at agtaaata tatgtaaata tatatattitt gtaaatatat 6 O atatattittg taaatatatg taaatatata tattttgtaa atatatgtaa atatatatat 12 O tttgtaaata tatgtaaata tatatattitt gtaaatatat gtaaatatat at attttgta 18O aatatatgta aatatatata ttttgtaaat titatgtaaat atatatattt totaaatata 24 O tgtaaatata tatatattitt gtaaatatat atacatatat attttgtaaa tatataaaca 3OO tatatattitt ataaatatat ttataaatat atatattgta aatatattta taaatatatt 360 tataatatat at attgtaaa tatgtttata aatatatata ttgtatatat aaat 414

SEQ ID NO 129 LENGTH: 496 TYPE: DNA ORGANISM: Homo sapiens FEATURE: NAME/KEY: misc binding LOCATION: (1) ... (496) OTHER INFORMATION: MAR of chromosome 2 genomic contig; 1395.2568 13953 O63

SEQUENCE: 129 taatatacat attatatatt atatattgta tatataatat acatattata tattatatat 6 O tgtatatata atatacat at tatatatt at at attgtata tataatatac at attatata 12 O ttatatattg tatatataat atacat atta tat attatat attgtatata taatatacat 18O attatatatt atatattgta tatataatat acat attata tattatatat td tatatata 24 O atatacat at tatatatt at at attgtata tataatatac at attatata ttatatattg 3OO tatatataat atacat atta tat attatat attgtatata taatatacat attatatatt 360 atatattgta tatataatat acat attata tattatatat totatatata atata catat 42O tatatatt at at attgtata tataatatac at attatata ttatatattg tatatataat 48O atacat atta tatatt 496

SEQ ID NO 130 LENGTH: 317 TYPE: DNA ORGANISM: Homo sapiens FEATURE: US 8,252,917 B2 157 158 - Continued <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (317) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 16942865. 1694,3181

<4 OOs, SEQUENCE: 130 tctic ctagta gttatatata tatatatgtg tatatatata tat cotagta gatatatata 6 O tatatatat c ctagtagata tatatatata tatat cotag tagatatata tatatatata 12 O t cctag tagt tatatatata tatatat colt aac agittata tatatatata t cc tagtagt 18O tatatatata tatat cotag tagttatata tatatatata t cotagtagt tatatatata 24 O tatat cotag tagttatata tatatatat c ctagtagtta tatatatata ttatatatta 3OO tataatatat atataat 317

<210s, SEQ ID NO 131 &211s LENGTH: 464 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (4 64) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 17217049. 17217512

<4 OOs, SEQUENCE: 131 acatactata tatatacaca tactatatat actatataca gtatatagta tacatatact 6 O atacatatac atatactata catatacata tacatatact aagtatacgt. atata cagta 12 O catagtatat gtatactata tag tatgtat atatagdata tag tatgcgt. atactictata 18O tagcatatag tatgcatata cqctatatag catatagitat gcatatact a tatatagitat 24 O agag tatgcg tatact at at atatagtata gag tatgcgt. atactatata tatagtatag 3OO agtatgcgta tactatatat at agtataga gitatgcqtat actatatata tagtatagag 360 tatgcgtata ctatatatat agtatagagt atgcgtatac tatatatata gtatagagta 42O tgcgtatact atatatatag tatagagtat gtatatatat agta 464

<210s, SEQ ID NO 132 &211s LENGTH: 430 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (430) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 19647266. 19647695

<4 OOs, SEQUENCE: 132 tgtaaatata totaaatata tatttatatt at at attata taaaaatata atatataata 6 O tataatatat aaactatata ttaatataat atatataaac tattatataa atacat atta 12 O aatatatt at atttittaata tittatatatt aaatataata tatatttaat atttatatat 18O taaatatata atatatttaa tatttatata atatatagoa tattittatat titat attata 24 O tatalacattt tatatttata tittatattta tatatattta atttatattt at attatatt 3OO tatatttata ttatatataa cataattata tatattitt.ca tattgtatat aataaagaaa 360 tgtatatttgttatatataa tatatatt at ataatttatt atatattata taatatatat 42O tatataat at 43 O

<210s, SEQ ID NO 133 &211s LENGTH: 21.31 US 8,252,917 B2 159 160 - Continued

&212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (2131 <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 2O481223 . .204 83353

<4 OOs, SEQUENCE: 133 tatatatalaa tatatttata tittaatatat atttatataa atatatttitt attataaatat 6 O at atttaata taalatatott tatatttaat atatatttala tataaat at C tittatattta 12 O atatatattt atatataaat atatatttat atttaatata tattaatatt taatatacgt. 18O ttatatttala tatatatt to tatataaata tatttatatt alacatatatt tatatataaa. 24 O tatatttata tittaatatat ttacatataa atatattitat atgtaatata tttacatata 3OO aatatattta tatttaatat atatgcatat gtaaatatat ttatatttaa taatattitat 360 atataaatat atttatattt aataatattt atatataaat a tatttatat ttaatatata 42O ttaalatatat atttatattt aatatatatt aatatttaat atatatttat atttaatata 48O tattatatat aaa.catatat ttatatttala tatat attat atataaa.cat a tatttatat 54 O ttaatatata ttatatataa acatatattt at atttaata tatatttata tittaatatat 6OO tatatatalaa Catatattta tatttaatat a tatttatat taalatatata ttatatataa. 660 acatatattt at atttaata tatatttata ttaalatatat atttatattt aatatatata 72 O tattaalatat a tatttatat ttaatatata t titat attaa atatatattt at attaalata 78O tatttatatt taatatatat titat attaala tatat attaa at atttaata tatatttata 84 O tittaatatat acatatatat ttatatttala tatatacata tatatttata tittaatatat 9 OO acatatatat ttatatttala tatatacata tatatttata tittaatatat aaatttatat 96.O tittatatata taaaaatata tatttatatt taatatatat aaatatatat ttatatttala O2O tatatatatt tatattgaat atata cataa atatatattt at atttaata tataalacata O8O tatttatatt tatatattaa atatatattt at atttaata tataaatata tatttatatt 14 O taatatattt atatatacta atatatttat atttaatata tittatatata gatatattta 2OO tatttaatat atttatgtgt attaatatat ttatatttaa tatatttata tattaatata 26 O tittatattitt at atttatat attaatatat ttatattitta tatttatatt ttatatattt 32O at at attaat a tatttatat ttatatatat ttittatatat taataaattit at attittata 38O tatttatata ttaataaatt tatatttitat acagttatat aaatatattt at attittata 44 O cagttatata aatatattta tattittatag titatataaat at atttatat tittatacagt SOO tatataaata tatttatatt ttata cagtt atataaatat atttatattt tatacagtta 560 tataaatata tittatattitt atacagttat ataaatatat ttatattitta tacagttata 62O taaatatatt tatatttitat acagttatat aaatatattt at attittata cagittatata 68O aatatattta tattittatac agittatataa atatattitat attittataca gttatataaa 74 O tatatt tatgttittatacat ttatataaat at atttatat tittatacatt td tatttaat 8OO atatatttat atataaatat attittatatt taatatattt atatataaat atatattgat 86 O atttaatata tatttatata taaatatata ttgat attta atatgttitat atataaatat 92 O at atttatat ttaatatata tdtttatata t caatatata tittatattta atatatattt 98 O acatataaat atatattitat atttgatata tatttatatt tdatatatat tittatatata 2O4. O ttaatatatt tacatttgat atatattitta tatat attaa tatatttaca tttgatatat 21OO US 8,252,917 B2 161 162 - Continued attittatata tattaatata tttacatttg a 21.31

<210s, SEQ ID NO 134 &211s LENGTH: 842 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) ... (842) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 2O48.3478. 2048 4319

<4 OOs, SEQUENCE: 134 tatatattta totttaatat at atttatag ataaatatat atttacgttt aatatatatt 6 O tatagataaa tatatattta cqtttaatat at atttatct ataaatatat ttacgtttaa 12 O tatatattta tat attaata tatt tatgtt taatatatat ttatatatat taatatattt 18O atgtttaata tatatttata tattaatata tittatgttta atatattitat at at attaat 24 O at atttatgt ttaatatata tittatatgtt aatatattta ggtatatata tatttatatg 3OO ttaatatata tittatattaa tat attatat ttatatataa aagtatatat aatatataaa 360 tattatataa attattatat agtatttitta tatatattta tatataaatt ttatatattt 42O tatatatata aatatatatt tatatataca ttittatatat aaatatatat ttatatatac 48O attatatata taalatatata tatttatatt ttatatataa atatatatat ttatatatac 54 O attittatata ttittatatat gtaaatatat atataaattt tatatattgt atatatattt 6OO atalaattitta tatatatatt tatatatata atatatataa tatatataala ttittatatat 660 attatatata tittatattitt at at attata tatttattta tatatattta tatgttatat 72 O at atttatat ttatattt at tttittattta tatattittat atatatattt atatatgtat 78O attatatata ttatat atta tataatatat tatatatatt at attatata t titat attat 84 O at 842

<210s, SEQ ID NO 135 &211s LENGTH: 645 &212s. TYPE: DNA <213> ORGANISM: Homo sapiens 22 Os. FEATURE: <221> NAME/KEY: misc binding <222s. LOCATION: (1) . . (645) <223> OTHER INFORMATION: MAR of chromosome 2 genomic contig; 2O897566 . .2O898.210

<4 OOs, SEQUENCE: 135 gtatatttat attatatatt atataatata tattatatat taataaatta tatataatat 6 O aatatatatg tatatttata tittatgttat aatatacata taattatata totatgtata 12 O catgtataca tatacgtata totgtatatg tatacatata ggtatatgtg tacatgtata 18O catataggta tatgtatatg tatacatgita tacatataat ataattacat atgitatgtat 24 O acatacatat gtaattatat tatatatgta tatgtatatt tatataatat ataatatgta 3OO ttatat atta tacatgcata tittatatgta tattatatat acacatataa tataattata 360 tatgtatgta tatatacaca tatatattta tattatatat gtatattata tacatatatt 42O tat attatat atgtatatat attitat cata tittatatgta atatgcatgt gtaataaata 48O atatacacat ttatatatgt at attatata catatattta tattgtatat gtatatatat 54 O ttatatatat ttgtatatica tatatttata tattgtatat titatgtatat tatatattta 6OO tat attatat atgtattata taatatatat gtaaatatat attat 645