USOO7442782B2

(12) United States Patent (10) Patent No.: US 7.442,782 B2 Ranum et al. (45) Date of Patent: Oct. 28, 2008

(54) INTRONASSOCIATED WITH MYOTONIC Brook et al., “Molecular basis of myotonic dystrophy: expansion of DYSTROPHY TYPE 2 AND METHODS OF a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a USE kinase family member.” Cell, Feb. 21, 1992; 68(4):799-808. Buxton et al., “Detection of an unstable fragment of DNA specific to (75) Inventors: Laura P. W. Ranum, St. Paul, MN (US); individuals with myotonic dystrophy.” Nature, Feb. 6, 1992; John W. Day, Minneapolis, MN (US); 335(6360):547-8. Christina Liquori, Minneapolis, MN Davis et al., “Expansion of a CUG trinucleotide repeat in the 3' (US) untranslated region of myotonic dystrophy protein kinase transcripts results in nuclear retention of transcripts.” Proc. Natl. Acad. Sci. USA, (73) Assignee: Regents of the University of Jul. 8, 1997: 94(14):7388-93. Minnesota, Minneapolis, MN (US) Day et al., "Clinical and genetic characteristics of a five-generation family with a novel form of myotonic dystrophy (DM2).” (*) Notice: Subject to any disclaimer, the term of this Neuromuscul. Disord, 1999; 9(1):19-27. patent is extended or adjusted under 35 Flink et al., “Organization of the encoding cellular nucleic U.S.C. 154(b) by 427 days. acid-binding protein.” Gene, 1995; 163(2):279-82. (21) Appl. No.: 10/890,685 Fu et al., “An unstable triplet repeat in a gene related to myotonic muscular dystrophy.” Science, Mar. 6, 1992; 255(5049): 1256-58. (22) Filed: Jul. 14, 2004 Fu et al., “Decreased expression of myotonin-protein kinase messen ger RNA and protein in adult form of myotonic dystrophy.” Science, (65) Prior Publication Data Apr. 9, 1993; 260(5105): 235-8. Groenen et al., “Expanding complexity in myotonic dystrophy.” US 2005/OOO3426A1 Jan. 6, 2005 Bioessays, Nov. 1998; 20(11):901-12. Related U.S. Application Data Harley et al., “Expansion of an unstable DNA region and phenotypic variation in myotonic dystrophy.” Nature, Feb. 6, 1992; (62) Division of application No. 10/143,266, filed on May 335(6360):545-6. 10, 2002, now Pat. No. 6,902,896. Harper, “Myotonic Dystrophy.” Second Edition, W.B. Sanders, Lon (60) Provisional application No. 60/337,831, filed on Nov. don 1989. Title Page and Table of Contents. 13, 2001, provisional application No. 60/302,022, Helderman-van den Enden, "Monozygotic twin brothers with the filed on Jun. 29, 2001, provisional application No. fragile X syndrome: different CGG repeats and different mental 60/290,365, filed on May 11, 2001. capacities.” J. Med. Genet., 1999; 36(3):253-7. Kawaguchi et al., "CAG expansions in a novel gene for Machado (51) Int. Cl. Joseph disease at 14q32.1.” Nat. Genet., 1994; C7H 2L/04 (2006.01) 8(3):221-8. (52) U.S. Cl...... 536/23.1 Khajaviet al., “Mitotic drive' of expanded CTG repeats in myotonic

(58) Field of Classification Search ...... 536/23.1 dystrophy type 1 (DM1).” Hum. Mol. Genet., 2001; 10(8):855-63. See application file for complete search history. Klesert et al., “Trinucleotide repeat expansion at the myotonic dys trophy locus reduces expression of DMAHP” Nat. Genet. Aug. (56) References Cited 1997; 16(4):402-6. U.S. PATENT DOCUMENTS Klesertet al., “Mice deficient in Six5 develop cataracts: implications for myotonic dystrophy.” Nat. Genet. May 2000; 25(1): 105-9. 6,812,339 B1 * 1 1/2004 Venter et al...... 536, 24.31 Koob et al., “An untranslated CTG expression causes a novel form of 2003. O108887 A1 6/2003 Ranum et al. spinocerebellar ataxia (SCA8).” Nat. Genet., Apr. 1999; 21(4):379 FOREIGN PATENT DOCUMENTS 84. WO O 1/72799 A1 10, 2001 (Continued) OTHER PUBLICATIONS Primary Examiner Teresa E Strzelecka Bucket al., Biotechniques, vol. 27. No. 3, pp. 528-536, 1999).* Assistant Examiner Cynthia Wilder Sequence Information Contained in Celera Accession No. (74) Attorney, Agent, or Firm Mueting Raasch & Gebhardt, 2HTBKUAD8C, “Conting Overlapping ZNF, pp. 1-310. P.A. AlwaZZan et al., “Myotonic Dystrophy is associated with a reduced level of RNA from the DMWD allele adjacent to the expanded (57) ABSTRACT repeat.” Hum. Mol. Genet., 1999; 8(8): 1491-7. Aminoff et al., “Automotive function in myotonic dystrophy.” Archives of Neurology, Jan. 1985:42(1): 16. Andrews et al., “The glycoprotein Ib-IX-V complex in platelet adhe The present invention provides methods for identifying indi sion and signaling.” Thromb Haemost, 1999; 82(2):357-64. viduals not at risk for developing myotonic dystrophy type 2 Armas et al., “Primary structure and developmental expression of (DM2), and individuals that have or at risk for developing Bufoarenarum cellular nucleic acid-binding protein: changes in DM2. The present invention also provides isolated polynucle Subcellular localization during early embryogenesis.” Dev. Growth otides that include a repeat tract within intron 1 of the zinc Differ. Feb. 2001; 43(1):13–23. finger protein 9. Boucher et al., “A novel homeodomain-encoding gene is associated with a large CpG island interrupted by the myotonic dystrophy unstable (CTG)n repeat.” Hum. Mol. Genet., 1995; 4(10): 1919-25. 8 Claims, 19 Drawing Sheets US 7,442,782 B2 Page 2

OTHER PUBLICATIONS National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, GenBank LocusAF388526, Korade-Mirnics, “Myotonic dystrophy: molecular windows on a Accession No. AF388526, “Homo sapiens ZNF9 gene, intron 1 and complex etiology.” Nucleic Acids Res., Jun. 1998; 26(6): 1363-8. expanded CL3N58 repeat region” online). Bethesda, MD retrieved Kruglyak et al., “Parametric and nonparametric linkage analysis: a on Aug. 20, 2002). Retrieved from the Internet:

Sambrook et al., “Molecular Cloning: A Laboratory Manual.” Cold Timchenko et al., “Identification of a (CUG), triplet repeat RNA Spring Harbor Laboratory, New York; 1989; title page, publication binding protein and its expression in myotonic dystrophy.” Nucleic page, table of contents; 30 pgs. Acids Res., Nov. 1, 1996; 24(22);4407-14. Sarkar et al., “Heterozygous loss of Six5 in mice in sufficient to cause Ullrich et al., “Rab11 regulates recycling through the pericentriolar ocular cataracts.” Nat. Genet. May 2000; 25(1): 110-4. recycling endosome,” J. Cell. Biol. Nov. 1996; 135(4):913-24. Schneider et al., “Proximal myotonic myopathy: evidence for antici Venter et al., “The sequence of the human genome.” Science, Feb. 16, pation in families with linkage to chromosome3d.” Neurology, 2000; 2001; 291 (5507): 1304-51. 55(3):383-8. Warner et al., “A general method for the detection of large CAG Spielman et al., “Transmission test for linkage disequilibrium: the repeat expansions by fluorescent PCR.J. Med. Genet. Dec. 1996; insulin gene region and insulin-dependent diabetes mellitus 33(12): 1022-6. (IDDM).” Am. J. Hum. Genet. Mar. 1993; 52(3):506-16. Wong et al., “Somatic heterogeneity of the CTG repeat in myotonic Takahashi et al., “The CUG-binding protein binds specifically to UG. dystrophy is age and size dependent.” Am. J. Hum. Genet. Jan. 1995; dinucleotide repeats in a yeast three-hybrid system.” Biochem. 56(1): 114-22. Biophys. Res. Commun., Oct. 22, 2000: 277(2):518-23. "Zinc Finger Protein 9; ZNF9.” Online Mendelian Inheritance in Taneja et al., “Foci of trinucleotide repeat transcripts in nuclei of Man-John Hopkins University, retrieved on Apr. 30, 2002). myotonic dystrophy cells and tissues.” J. Cell Biol. Mar. 1995; Retrieved from Internet: . Tapscott, “Deconstructing myotonic dystrophy. Science, Sep. 8, "ZNF9: zinc finger protein 9 (a cellular retroviral nucleic acid bind 2000: 289(5485): 1701-2. ing protein)” online. Homo sapiens Office Gene Symbol and Name Tapscott, “Reconstructing Myotonic Dystrophy.” Science, 2001; (HGNC), retrieved on Apr. 30, 2002). Retrieved from the Internet: 293:816-7. . Thornton et al., “Myotonic dystrophy with no trinucleotide repeat Zhuchenko et al., “Autosomal dominant cerebellar ataxia (SCA6) expansion.” Ann. Neurol. Mar. 1994; 35(3):269-72. associated with Small polyglutamine expansions in the OA-voltage Thornton et al., “Expansion of the myotonic dystrophy CTG repeat dependent calcium channel.” Nat. Genet., Jan. 1997: 15(1):62-9. reduces expression of the flanking DMAHP gene.” Nat. Genet. Aug. 1997; 16(4):407-9. * cited by examiner U.S. Patent Oct. 28, 2008 Sheet 1 of 19 US 7.442,782 B2

A. --

O + cy rt + + + CO CO O CO O) Z Z O) i 2 : c. c. 9 wr Co 5 O) CO: -CO, COO883 c O O RHS- 39 c gC () O RP11-81.4L 21 RP 11-723024 RP11-22e20 OC) ()

-320 Kb O) 2. d B. MN MN10 MN12 E.

C. expanded alleles

normal expanded allele alleles (# CCTG's) 8, 2 8, & 'a loading contro s

allele (CCTG's) 2 2 2 U.S. Patent Oct. 28, 2008 Sheet 2 of 19 US 7.442,782 B2

s?eadau91OOJO13quunN

e es e s N N Seee Jo equn N U.S. Patent Oct. 28, 2008 Sheet 3 of 19 US 7.442,782 B2

Kb b 24- 3

13 -

4 5 6 7

1 OOOO

t 8000 H d 9 6000 O 4000 2 t=O-4 2 2000 36s O O O 2O 30 40 50 60 70 8O Age at Time of Blood Draw (Yrs)

Fig. 3 U.S. Patent Oct. 28, 2008 Sheet 4 of 19 US 7.442,782 B2

Fig. 4 U.S. Patent Oct. 28, 2008 Sheet 5 of 19 US 7.442,782 B2

CD 53C (f) C ce.5 g 2CD (US 2O GO On i.6 95 SCgS 55 33 - al > E- (d a F - cus 8 (f)

Fi 9. 5 U.S. Patent Oct. 28, 2008 Sheet 6 of 19 US 7.442,782 B2

Repeat ASSay lxpanded Results Normal lixpanded

(T(in (TCT(in (CCT (in Normal \lce

(in ICICI CCI in

C3NSSI)PCR Southern Analysis PCR Expansion Assay DM1 SCVS o, wavvavara v- v. W. a vav a ra

U.S. Patent Oct. 28, 2008 Sheet 8 of 19 US 7.442,782 B2

EXO : 433 - 4 4.15 Intron 1 Gap (size unknown) : 4. A 69 - 14473 CL3N58 repeat region: 1 TO2-1785 EXO 2 : 18662 - 18799 EXOn 3 : 18896-1898 ExOn 4 : 1956-93.56 Exo 5 1986.5 - 2 O 845 actaatgaaa tgttcattaa atgatttgtc. agtgttt Caa agttt Cttta tcatCagtta gCatticcicta CaCCatCaCt ttagggag to aatagaattit tataggagta ggCtt Cagta agtactggga ggCCagtag C actgttcaag CaCatgggct gtctgttgttt gaag CCtgac ctgtcatttg tt CaCtt tect gaCttgagca gattact taa actic to tcca cctgtt tott Catggggata at a CaagtaC ttCtaggggg tttgttgaaala Ct cataaaga ggttaaaaac attgcCtggC a tacagtaag CaCCCaataa gaataagaaa taatatttgt a tagtaatta togccagaaac tgttatalaga gctgtatata tattaa Calat tagc tattac tag tattact aatt CC tagt toaggattta gCttacgtaaa Cttt ctgctt Cagaa.gcaaa taCgagaggt gaaaa.cat Ca attal tcc. Ctcagt ctta gttacatact tt CCaagttca agtCaCagag CaCaatitt CC ttgctggcag ggacaagaca togggtttaca tgatat CaCC at CCCCt.ca atttaa.ca.gc atgtactato Cagttggg Ca tittaa.gcaga aattaa.gagt tggCagg tat tgct Caactg gtaccoattit taggaataat gctgaat cat agcatttitat ctggtottct ct Caggatac ttaatgctaa tttitttgtat titt tag taga gacgggattit caccittgtta gCCaggatdg tct coatcto CtgacCtcat gacccatctg cc toggcc.cc cCaaagtogct gggattatag gCgtgat CCa CCgtgCCCgg act t t t t titt tttitt titt tt tttgaga Cag agtCt c cct c tgttgttg Cag gCtggag tdC agtggCCCaa totcttgctic CCaggtgcaa. gcc attctico tgccticagot to Coaag tag Ctgggattac agg togcCCaC CaCCaCaCtC ggctaattitt tgttgttatta gtagagaCaa gttt toacta tgttgcCoag gctgct Ctca aact cotca C Ctcaggtgat CCaCCaCC CoggCCtcCCa agtgCt999a agttgttt tt ttitt Cutt ct titt tttga gactgagt ct tgct CC9tca CCC aggCtgg agtgCagtgg cg tdatct co, gct cactdca agctctg.cct citcaggttca aacgattctic ctdcotcaac Ctct coagta gCttggacta tagg tocccd. C Caccacgac Cag Ctaattit tttgtatttit tag taga Cag gattt Caccg tgttagcCag gatggtCttg at CagCtda C CtCdtgat CC gCCCgCCtcd gCCtccCaaa gtgCtggatt aCaggCGtsa ‘g CCaCC9aaC CCagCC gaCa CttaataCtt tot tatggcc Cttgttatcc tgcaagttct tdaagggcaa. actctgttgtc. ttaagtag to acct to taa ccct togcaat gCtgagcato agaCtgaa.ca Ctggaggaga ggaggggaat aaaa.catct C Cagggalagag daatgtaatg ggag CCtctt caag toccac toggCagCtta tottt toagt Gag Ctttitt C Ct attitt Caa acatttctag taaaataggC a taccalaagg gtatat CCag gcaatacaag C Cattgtag t talaatt CCaC CataCaCCtt tt CtggCdCC tcacgatcag CCtggCt Cta ttaataatag tCgtta Cagg aag Ctgcatg Ccagg tagaa agagcCatta gctogttacca citccactgcc aagaagtaaa. gaCattgttt CC at CttC tact taagtC ttittaaaaCC tatagaacat Fig. 7a U.S. Patent Oct. 28, 2008 Sheet 9 of 19 US 7.442,782 B2

tatgtcCagt at Ct. Ct at Ct CaCaCt CaCt. tt Cattitt Ct at agctgttg aaattitt tdt tittaat atta ggaatattoc att CCtgggt Ctataatgaa tag caaacat tittata Cagt aCtatggttg gaatgg taala caaaaataag to agaaaata ttaattitttg gccatatggit aattittaaCt tgtcCtcttg gtgttggtgtg gaCg CaCCCa ggttggaCtt catacatagc cticttgcatt at attgactic attgtcagag CtCaCgaagt CaCtaCCtaa gtgtctgatt gCta Cactat aCatta CttC aagatactat gaaggittaat Cagattacaa aggggaaatc ataaag Ctga gtaag Ctt Ct tgg taataala aCtatataaa taCaaaataC tgtttitt tat tgg Cagataa tatat cd tigt tittagcacaa Cacataagct gCtagg Catt tatt Caat Ct gattgggaat gggittaaatt tggittaaaaa attitta CCtt aggttgCttt aattaaaaaa atgtttaagg CtgagtgCag tggct Cacac ctgtaatcct agcactittgg gggCactggg tgcagtggct CacaCCtgta at CCtagCaC tittggCC tag CaCCgcttga ggC Caggagt toaag.cccag CCtggCCaac atgatgaaac CCC at CtCta CtaaaaataC aaaaattagc CaggCdtggit ggtgggCgCC tgcag tocca gCtact Cagg aggCtgaggC agaattggtC alaa Catggga agC9 gaggitt gCagtgagct gaga cagoaC to cagottgg gCaacagagg gaga CCCtgt Ct Caaaaaat agtaataaat aaatttaaaa agttcggcca 99C9C99 togg Ctcatgcctg taatcccagt gCtttgggag gaCgatCaCC aCtttggg td ggCggat CaC Ctgaggtogg gagtttgaga cCagoctgac Caa Catggag aaacCCCgCC tataCtaaaa at a Caaaatt agCCggg.cgt. ggtggCgCat gcctataatc ccagctactg gggaggctga 99 Ca99 agaa tCaCttgaat CCaggaggCG gaggttgCag tgagct gaga togtgccatt gCactCcagc Ctgggcaacg. gag tdagaCt CCgttcticaa alacaaaaaaaa aagtttaaaa tat cattggt ctittaaagtt attacatt Cat totttgataa ttgctatgtt gaacgcaaCC tcCtaactgC tttacaatga ttaagcacta atgatttgaa ccCaggittta aagtctgact Ct CaaCaCat gtgCtCtgCC ttcticacgaa Catgattitca aaaatcatag CCCCoggatt tgggattggit ggct tatgcC tgtaaaCCCa gCga Cagag C aagaccCtac tCttaaaaaa aaaaaaaatt aaagaaaaaa agaaaaataa at Catag togt tgaact dgCa ggttt Cactg agacgaaact tgggact citt CCtttitttitt tgtttcgaat aaag.ccattC tagaatgaga Caaaatt Cta alaat attitta tagttaa.cag tttaaattgg gtttaatctt gaCaagacta tctagggCta tata CaCaaa tct Cttittgg agaaaataCC a CaaCtaaaC tgaagt citat tcctgaatat gacagaCCag gtCaaatggit tat cottgcc CtCCCGgggg atgt CaCt Ca taaacgt.cCC aaaagt caca gtctaggCCC Catta CCtta Catgct Catg acct tccCag 99 aggCCCCt cgc.ccttacc agg cact ttc at Cttgggaa gaCaCat Cag tCCtggCgga gaaag Cag Ca aggCCttt CC CCGgct caca aaaattaata caaat Ctcag aggCtgcatc ccacagc.cgt. gaCCaCCgtg act tdgCatC cccttttctg Caala Cttaala tottatctag aaatcgggCC tggCtctgaa agCCaagggC Ctgg Caggag CCCdagaaag g99 agaala Ct ttctg.cggCC cCaagctaat gdCagtCact gCaccgagaC ccg toccctg gcatccottt gCtcCagctg gCCaaga Cag aCCaCCaagg tCagccagat titccacccag totggCCggg CCC.ggaCCca gCtgggaatg ala CCGagalag CaCCGgga CC Cggat CCCGg Cotgaaaggc CgCg C9C9gg gCaC9gCdgg aaaagacgct gC9C9 Cagaa acaccc.gc.cc CgCgCCgCgC tC tagt ggg C ggCCctg.ccg Fig. 7b U.S. Patent Oct. 28, 2008 Sheet 10 of 19 US 7.442,782 B2

CgggCggCtC tgattggaCt gCCgaac CCC gC9C9Ctgat tggcc.gc.gtg 99C9a99C99 aggaga gCCg tgCdCagCdg Cctatctggg gCCgtgttgca gaCCCGCG td tggCg CaggC aagga CCCtc aaaataala Ca gcctictacct tgCgagCC9t CttCCCCagg Cctg.cgtcCg agtCtcCgCC gCtgCgggCC cgctCcqacg Cggalaggtga gggCtggggg a 9999CCC99 C9CtgaCgga gCCg CagtgC 999 togggtC tgttggCggaC agagaggg ta gggagC9gCg a99tggC9 at 99C99CC9 Ca CtttggCCtg cgcctctgct gCdt Cagg.cg ggaa.gctC9g ctogctogcc.gc CdCctCdgac CC9ggtttct ggCgCaccgC tgtC9gaCCa cactitctgtc citt tottcgt. CCtggaaagC tggg togCCg agCatgCggg tctttcqgcg CcacggCCgC accCCaggcc gCaggCttag ggcagaggag gCCCgCCCgt gCgCCCttgg 99CC9aggCC Ctgacgct tc gaggg tog C9 gaatgaggga CC9aggg togg atttggC9gg aact cactgg aaggagt cc.g tgtggtgggg aaaggctCCC ggctg.cggat gala999999a tggggtgggt atag togtgC aggCCatgttg Ctggggtcgt. gCdCCtggCg ggccatgtgC Caagggittitt ggggg CCtta gaalaagggtt CttaggCC9g gC9C9gtggC tCaCgCCtgt aatcCCagca Ctttgagagt CCC aggCGgg Cggat Cacga gg to aggagt togagaccag cctgaccaat atggtgaaag ttggtotgta Ctaaaaataa aaaattagcC gggCatggtg gCgggCg Cat gtag toccag CagctCdgga ggCtgga Cag gagaat CGCG tgaaCCCCGg aggCCgaggt tgttggtgagC C9agat C9C9 CCaCta Cact CCagCatggg Cala Cagaga.g agact CCdt C ttaaaaaaaC alala Calala Cala a Calala Calala C aacaacaaag ggttcCtgaa gaagcCtttg tgtttggagt ggCgaga Ctg Ctggaagact tgggagCttt tagagtttat aCtCCCt at C Cttgatagitt tt cogattct tgaatttitta tcgtCattta aataCtaagt tgcttgttgtt a Catta CCat tcCaaaaggg gCtgatgggg CtCaCatt CC aagagittaac actatttaag ttgCtgggat cCtttaaaag cgc.cattacc agaaaaaa.ca cgaatttgtc aaaCCt C Caa alaccacagca gCggg.cggta gtotgcatca tttcttggat taatgaaa.ca gatgtaatta Calala C9 agaC acgaaattica aCtagct CCC CtcCatctag attitt to Cat at Cotgagaa cctdttt tag aatggCataa tdgtcCacat ttgggtttag gtgttgattit tattatgggt aaggcttgttg Cttgttccca catgttaacc atatggcctic agCCaCaggg CaCtt CCaaa ggaagtgact gtttctgg to ttggggg.tct tgtaaaaaga gala CattgCt Cagtaat C9t Ctgttgattitt agctag tdtg titt Cagg Cat tatt Cagaag cact Caggtg acatalagCCa aaactgaatt tgtttitttgt Cttt Ct Caala gtgaaggagg tctaatgaat at CCCC at Get tgct tttaaa tta Cattt tt aaaagtagat ttt to CCCCt ttcc tattgt ttgaCC Caat tttggagtga aaCgtaaCCa gttactattt ccattcgaat ttaaattagc aattittatgt tatttgtttg ttcaagcagt ataactdgag tgtagagCtt tdagggittt C aaaaagatala gagatatagt actitat CtCC tgggctitcCC CCt CCCCCCt CCtaaatagt tittaaatgct totaatgagt tactctggitt aaggataatc aaacaCC tot aaactg.cCag gatCC taggit a catgctgtt tttagtttgt tgag CCtgat tct to tctac aagadtt Ctt tdtg tattgg aatataaaag gaataattta tta Catt CCC aaggg Cagaa ttaaagacitt aagttt ttcc gattitcatct Cttgataagt ttitt Ctt tala aaaaataa Ca gtttgtgttt ttctgaggaa CCaaagg to C tCt t t t t titt Cat attgg ta a Caggagagg taatg tattt CagatggtgC agtctgtaaa a tattttgaa Fig. 7c U.S. Patent Oct. 28, 2008 Sheet 11 of 19 US 7.442,782 B2

ccaaatcagt ggalagaCCag gggtttitt Ct t t t t t t t t tit Ctgaga C9ga gtct cactict gtCGCCCaag Ct99agtgCa gtggCg.cgat Ctcggct cac tgcgaCCtcC gCCt CCCdga ttaag.cgatt ctcctgcc to agCCtcCdaa gtag Ctggga ttacagg.cgc CCgCCgCCaC accCagctag tttittg tatt ttag tacaga Cggggttt Ca CcatgttggC CaggCtggtC togaacticct gaCCttgttga toccacticcC tcc.gc.ctict C aaagtgC tag gaalaa CaggC aggagcCacc gCgCCtggCC aggtttittct taaactggCa tttgaacatc tggaaCaggC agggagatgt Ctt t t t taala gtataaatgt gttttgttac atgatt tatg a Calatt CtaC ttgtc.tttitt ttitt tttitt t tttittttgag acagag tott totctgtcgg CCagg Ctgga atgcagtggC acagtCtcgg Ct Cacagoag CCtCC at Ctc CCgggct Caa gcaattctico tgcct cagcc tcCCaagtag Ctgggatta C agggCdtgttg CCaC cacgc.c cggctaattit ttg tatttitt tgtaaaga C9 gggttt CaCC atgttggcca ggCtggtctt gatctoctga cct Cagg taa ttcacccd.cc toggcc tocc aaagtgctgg gattacaggC Ctgagccacc gtgCCttgCC aacaatt Cta cittgtcttitt aaagttcaat aaaaatatgt ggCacgtata tdggatagta Ccaaactggt gCCtaaaag.c agtgaaacCa Ccattggact aattggaatg atttgtctat tggCtgaaga tttgaCCaCa gagagatt Ct gcttittttitt CCttgCaggg atgaaaaatt aaaaaaaaaa. aaaaagattg gttcc tittitt ctottcc tag CCtcCtgaca gtaagtagag agCCagalaga atcatgcCaa cgcatcCtgg CCtcgctatgt ggagaacgCt ctitt cottac tgtctoactt aatagaactic Ctgttctggc agtgtcagat gCtgcagcag Caagggaatg cCattgagtg attgCagtaa gCtatgcagc attitt catgt ttaaaaCtaC tgagataata aagtgagaac ttgaggCCaC Caaattittaa gttgtaatta daaggattitt gttaattagg aatatgagag tgCta Cagtg at Cacctgga atggCtcCat aaata Calaat gaggtgttala Ctag tdaagC aagttgCCag tgtttgttgttg tttggtgaga Ctcctaagtt CtgcCatgaa gttaaagaaa at atttitt tal agattcaaga aag Ctgttgttg aatgaattica aaattatt at gaCtgtagat Cttittaaaaa. gctat cagta ttagttttac tittgatttitt at Ctaaagag aaatacagaa tgaatactta CagCattaca attcaaatct gC9tggCttt t t t t t t t Ctt agtta Ctaga tata tag tag taata CCttt atgtaatatt ttgaagtaga gattgaattg gtataatticc Cta CCttaaa aat atta CaC aatagcattt ttgtCatata ttacgatago atttttgttgt act t taCCaC ttaactitt tt ttitt CCttitt Cttt t t t titt tggaga Caaa gtc.ttgctict gtCGCC Cagg C999 agtgCa atggCaggat cticagotcac tgcaacct ct gCCtcCtggg tittaa.gc.cat totcctgcct CagcctCctd agtag Ctggg actataggCg tgtgccacca CgCCCggCta attitttgttt ttittagttitt ttittggagaC ggagtCt C9C tttgtcacco acactdgagt gCaaatggCa tdatct CdgC tCactgCagC Ct CCaC Ct. CC tgggttcaag cgatt Ctctt gcc toatgca CCaCCaC gCC Cagittaattit ttgtatattt agtagagatg gggtgtcact atgttggcca ggCtgCCgaC Ct Caagtgat Ctt CCC to Ct cagoctocca aagtgCtggg attacaggCa tgagCCaCtg CCCCtc.gc.ca gtgt Cagatg tttagtttgt Cattaaaatg gagcaagaat acatalacticg tgaggttgta agattataga tatgtt tact aatgactgac tCatagatat CCacCtgtta aaaCt Ctt Ca agaagtaatc agggCagg C9 gaaatggatg taattaaCCa agg toaag Ca gtaagttcag gaacCaggat aaaaata Cag Fig. 7d U.S. Patent Oct. 28, 2008 Sheet 12 of 19 US 7.442,782 B2

aattgct CCC gagtaagta C tCtgttt to C attatt ctogg Ctggaatgca gg taata Cag aaagtatatt gct tcct titc attgct tttit ttitt Ctt Ctt ttt titcCttt gagg tdgagt ttcgctict tog ttgCCCaggc tggagtgCga tggCatgatC toggct cacc gcaacctctg cct CCtgggt toaa.gcaatt CttgtgcgtC agcct cotga gtagctdgga ttacaggCat gCaccaccat gtc.cagotaa tttttgtatt tt tagtagag acagggittt C accatgttgg Ctagctggtc togaacticct gacct Caggit gatgCatctg cctcggcc to Ccaaaatgct gggattagag gtgtgggCCa CCCCgCCCgg cCcagacCtt atcttgacta tottag toat tt Ctt. Ct. Ctt gCCtgaCatg CCCtgtgctic CtaCCaCCCt ttaaagtggit ttgttgtcata aacatttgat aCaCaaaaat ggaaact tag gacaaatat c ttgatgtctg gtggttgaaa. atgttgaaCtg atttggaaat CaCCggtgtt tCt CCtCtta at CtCtt CtC Catt CCatt C aggaaataga Ct9talaggtg ggaala Caagt ataag Cagtt agcct cactic talaa CCtgCt atgtaataga Cattggactg agttctgtct actictotgta agCalat CCala gg taattggC gaaagtggaa gdaatatgta Ctcagaagac caaaactittg gtttittaaat tgaatat cta ttaagcacaa gg taacaatt CttaCCaCaC acatcagttt tatt attt CC Cttitta Caala taaga Cacag atdgg tag to agatgtctitt gaggtaaCaC agCaag tagt taalactgggt taagtgatta aCCCaggttg agtatggitt C Caaaat. Ct. Ct. taCagtgtCa ggCaggctaC at CagtgCag tatacgtaca tCaggittt Ca cgaaaaattit titt CCagaga alaa CaCalala C CCaaggaaCC ttcagtaagt ggtdcctitat attagtggitt tittagcaaaa ggaagaaact taag togttitt cctgctogcct ga Caaaagttg aaaaa Cagta ttittggittitt tattgaagtt agCatg tatg tttgtagctt dCataaaata gtact.cgaaat cCaattgatt atgaatt Ctt ggaCtalacag aacCtggatg acaaattaga ggttctggCC tggttgctgg ct tttittagt tgtCttgggt gtaaatttct CagcCaCacg tggggattgt gttagataat ctgaaatcta attitt Catgg ttt tatgatt cagcagctitt ct tcctittga tatttitctag tatttgctitt attatagatt ggaatcCtca aaataa Catt gaCaagtaga agatact tct gttag toggat ttaaaaaaaa attacattgg gaatgtc.ctt tdag toggttg gccctaatcc Ctgtcagaag Ctogaaagttc. toggat.cctaa att Catctgg gCagaatct C acctatgatt to agaaagct gagagtttca gaga.gtgact gtag tdagt c Cttagtgagt acaaaattga gaataCatCa ttaCtt taala ttaatggtgC agtaactCtt gtgactgata gCaataattit aggtgCtttg ttgttagta C ttgattagat tggattgggit Cagittagttt Caccalaattg Ctaaaga CaC ctgtccccct agaattaaaa tactgagtta Cataatggct actaaaagga talactatatg 999 tott C9a tgattcaaag gtdaattact tgg totctac Cttcaaggaa tatgatacaa ggcaatatgg tact gCCatt aga Cagatat taacaaagtg tott.cgggact taatagggag gg tagttcCa ggCtgggaga tg tagt caga tt Cttt tata gagttcgCat ttgagttggC CCG tdalaggt tggaaaaagt td toga Caggit ggaaaaggag Caggggaga C Cagga Cagtg Cagtgaaatt cCagcCagga gCagt Catag gCaatgaga C agaCt Catgg agCCatgatt Ctcagotgtc tta CCttaCC t tagttitt CC taaggaatat Catogaatt C tgtaaagaCC tittaaaCtaa ataatgttca tatgagatga gtogC taggat 99.99 a CCtgC tgcctaatat aagtag to td agt ctaaaac attgttggaaa gtggittagtt taataatgtt attaaagaga caagttctato aCaagggaCC Fig. 7e U.S. Patent Oct. 28, 2008 Sheet 13 of 19 US 7.442,782 B2

agttaccagt gaaactgtag accacct gat tCaCtgCgat agggittagcc aaagggagga gaggg Cagat tgcatacata gtaCCta agg CCaCtcaaag a CCt Cttitta aaatcacgtg toatgttgat gaCatttgga ggctattaat gtttittctitc CCttt talaga cittagtgttt tott tattag Cattaatt ta. Ctctag taala Caaaattatg tgttgaCtaala aatggcaaaa CaggCtgggC gCagtggCtC acgCCtgtaa tCC tala CaCt ttgggaggCC aaggC9ggtg gatcactagg toaggagat C gagaCCat CC tggccaaCat ggtgaaacCC cgtctic tact aaaata Caaa aaattacctg ggC9tgg togg tgcacgCCtg tagt ccCagc tatgttgggag gCtgagg Cag gggaat C9Ct tgaaccCagg aggtgaaggt tgCagtgagC Caagattggg CCaccdCact Ccago Ctggg a Cagag C9ag aCt. C Cat Ct C aaaala Calada aaaagat CCa aattagalaga acatggtggC atgcgc.ctgt agtCCCagct aCttgggagg CtgaggCagg agaat tactt gaacCCggga ggCagaggitt gCagtgagcC gagattgCaC aaCta CaCtC Cag CCtgCgC aacagagcaa gacticcatct Caaaaaaaaa aaaagaaaga aagaaagaaa gaala Ctggag ggaacaatgc cctaatgitat tala Caat Cat CaCatatgag gtdtcaaaat gtgagtggtt tttittctgat tttctg tatt ttatala Cttt tttttgtttg agatggagt C Ctgctctgct gCCC aggCtg gag CdCagtg ggacgatct C ggCtCaCtgC aaCct CtgCC tCCCaggttc aagtgattct cCtgccticag cct cotgagt agCtgggatt a CaggtgCCt gCCatatgcC cagotaattit tttttgtatt tt tagtagag a CagggitttC aCCatgttgg CC aggCtggit citcgaactico tgaccttgtg attctoccgc Ctcaggcticc CaaagtgCtg ggattacagg CatgagcCaC tg.cgCCtggC tataaCtCCt Ctgtag taala aaatatatt C Ctt Catalatt aatggCaCaa tatt taalact Ctgaattatt tittaagggat gg tag tdgCC tatgcaaaac tagCtgtgga ataatgaatt ttaaaataag cagoatttaa taaaaataga aCta tattitt ttittaaaata gaaaag CCaa Ctagaaggag atataa Caaa atgCtaataa tggtgaaaat actdgcatca ttcttctictog CCt Ct CataC titt t t CCtat galagtgttga CtaCCttt Ct aalaa CaCalad at Caaaaccd. aCtaaaaCtC CagaCtaa.ca gtttcaaatt atatt Cagga ggtttggctg aaagaaggag galaagg togg td togCCCtat ttggatt CaC acaaaagtag CtCCaCttitt CtCCttitt tt ttt tttgaga tdgag titt CG citcttgctgt CCaggCtgga gtgcaatggC acgatctogg Ct Caccgcaa CCtCCaCCCC tCagattcaa gcaattctoc tgtctoagtc tCCtgagtag Ct999agtaC agg CatgCaC caccatgccC agctaattitt gtatgtttag tggaga C999 gtttctocat gttggt Caag Ctggtotcta act CCC taCC tCagg tdatC cgc.ccacctic agcCtcc.caa agtgCtggga ttacaggcat gag CCaCagt gCtgggCCtC aCttitt Ct CC attitt taCat ttagg gtttg gcc caagatt g tatttgttc tttggittatc atttgttcaa Ctaataagta aCtgaaacat gacctgattic aatgaact tc agagcCtgCC cCaatcgttc tgggaaactt Calaat aggga aacticcittgt ccagactgac agattagcac CtgcCaaagg Cagaat CCtc. CaCCagCCaa tcCtgggcac actittccago cc caattgta tgg CatgggC Ctatgatt Ct atcCCagttc ttaagaattic toagittaaaa totgggaaca ataatt CCta Cactataagg CtcyttatcCa actaagaaaa aaaagta aga gCagttagca tatagoatat Ctact Ctt at gattattacC aatgaaaggc taaaactgtc a Caala CttaC ttacgttctt titt Caaa.ca.g CtCt CtaaCa cCaggcaaat cittittgctgc tcCaaagtaC Fig. 7f U.S. Patent Oct. 28, 2008 Sheet 14 of 19 US 7.442,782 B2

ttgtaacct C ggttt CCtgg gacttctttt cott catgat CCag catttt agggCCaact ttct tattgg gaagaaaaaa agagaaaatg gatctgttag ttagttagtt agittattatt tatt tattta tittgaggCGg agtictogctic tgttgcccat t tatt tattt 9aggtggagt ctcgctctgt tgCCCaggct ggagtgcagt agcacaatct cactgcaacc tCCaCCt CCt gggttcaagt gatt CtcCtg ccticagocto Ctgagtagct gggattacag gtgCgtgCCa CCaC gCCtgg ctaatttittg tatttittagt agaga C9999 tttcaccatg ttgg toagga tggtc.ttgaa citcct gacct catgatccac ccacctogac CtCCCaaagt gCtgggatta CaggCgtgag coaccg.cgcc Cggcattgttg ggttt tttitt ttaatgctgt tgtttitttgt ttgttgtgttt gtttittctitt tataataaCC CtcaggCCat totaticatag gtotgctgaa gtttgctggg gg to togg to C agat CCC agt tgccttgttt ttitt CCCata cctggacitta tCaCCagtoga agCCtttalaa acagcaaaga tag tag Caag ct cottcctg tggaagttcC at CCC.ggggc ggttctgacc tgttgccaac ccacatgcat Caggaggttg Ctggaga CCC Ccattgggag ggct taccCa gt Caggagga acagaat Cag tgact tacCC aaagaagcag totgactgct ttittgg taga cCagttgttgC tCCaCt Ctgg gaga.ccct tc CttgtcCaga cagoctg tat tct CCaCagt CtgCatgCtg gagcagotca at Caacagga CCaCagagat gg tdgCagCC Ctt CCCC cag gaact coat C CCagggagag at Cagag titt tat Ctcytaga aCCCtggCtg gag toggCtga agcCCCtgca aggagat CCt gCCC agtgag gaggaatgga togggat CCC acgaacgggC tgatgaacta CtgtcCatcg a taggg.cginn nnnatagatt tcatgatgaa gttgacgcta gtgg tala Cala gttatataga acatgatcqt Cct catatgg Cggagtttag tgagcattgt gttcCttttg tgagagtaaa gctttittatt aatgatagag tgttattttg gtgaggttitt ttagggtgtg gCgagtgttgc gtatagcCat gtCttgaaaa tgggggatgg gattag tatg at Cagaaggg agttggggag galat CaCtgg tttgtaalatt gagggggalag ggcctatota atgCaggaala CalaggtggCC atgCdggag C tgat Cagcag gCCaaaattg tggggtgaat ggagttaatt Catagcaggg tttaaagagc ttatgttgggg a Cagatgaag attitat catg gtotagaat C attt Cdgagc tttgtttgcg tgtcaggCCC cgtgatatgt gCaagagcgC catcagtacg CgtCatggga gCatactgtt tt C9ggatgg gtttcgagcg aagatgtgag CagatactgC tgtcaatggit gaagccttga ttagagg CaC Catatgcagt tott catgat gCtttacatc cataaaag.cc togg Cag CdC Ccagoaa.gag aatt Cagtgg tgctattoct tttgaggtgg ggag toggagt atctotcgat Cagcgc.gtgt ttaccatgCC ccagtcttag titat Ctt Cat gttcaaggtt CC99999 Cala agtgatt CtC Ctacct CatC Ctctagagta gttatgacta Cagag Catgt tat caccacg accggg taat gaaagtatta tag tagattg gggg. t t taCa CCatcgttgga Caggatggta tta at t t CCt gaCCtcatga to cqcctgcC tCCCaaagtg Ctgagattac aggCgtgagC Caccacgc.ct gcc ctaattit tgttgtttitta gtagagatgg agttt Cactg tgttgg toag gCtgatgtcC aactcCtgac ct Caggtgat ccticcitgcct tggCgt CCCa aagt gCtggg atta Cagg td tgagcCactd tgcccatcct tgttttgtat tttctaaaag agatgtat Ct. tgtttaaata ttaa attata agatatt Cag gCCttgcaaa ttgtctggat tacactgtaa aagtaatcat titatgtgcaa at alatt CCtt gagat Caata gttaaatgag Ctcaagctga tctgactaaa ttggaga aga taCaaaatga Fig. 7g U.S. Patent Oct. 28, 2008 Sheet 15 of 19 US 7.442,782 B2

agatgg99ag gaagtggtgC cataagcagc Ctttitt tott tgaccatttit atatgcctitt t t t t t t t titt ttittgagatg gagtttcact cttgtaacco acdttggagt gCaattgctt ggCttgcaac ala CCCCaCC tCCC gggitt C aagagattat cctgcct cog CCtcCtgagt agCtgggatt a tagg Catga gCCaCCaag C CtggCtaatt ttgcatttitt agtagagacg gggitt totCC ttcttggtga ggCtggit ct c gaactCcCaa CCtcagg toga accatcct cog toggcct CCC aaagtgctgg gattacaggt gtgagCCaCC gtgCCCtgCC CdC Catt Cot ttt t t t t titt tttitt t t titt ttt taatt Ct gact Cttctg tggtggalaa C CagCaaatac tt CaCataat ttaggatgct aatactag ta Cagittaaaag aatgattaca aag CagataC tatttcaaat totgtaaaaa totgtttitta at at CCtt Ca Ctggctgttt gttctgaCta gaaatgtt tt gtata totga aag Caccagt aact catago Catataattit ttittgg taat atgtt Catag gCaagtgg Ca agagttagta gaaagattitc tctaagaatt tat CCtalaat Cagattacac agagttgggg taagtgagta ttgtgttatt ttcttttgta tatttgaCaa tgggaactitt ttgaaactCa actitcagtgt aattittaagt CaCtalaattit gtCCaCaagt taatgattaa a Cagtt actic aaagtggaga accttgcCat ttt toggact gCgttittggg tctttggcac tdtggittagg ttagctaatt cgattatcca cticaagttitt actCagttgg aaatatgttt ttCtagatga tgg togCCtgt gct taggttt gagaggatat ttaaaatacg actttgtgtg CCattgtttg a Cagtggaat taaggg taaa aatatttaga tatggalagtg tgaaaatgta gttgcattgt titt Cattatg tt Ctatt CCa ttt Catt Cta ttittaagaat agCCtcaatt tatttittaga ttgtta Cata agtacaaaat ccatttgctt tag togggagt titt at titt ta. ttittaaaatg ataaCCalatt aaaggagttt attatgaaat totalagtagc attgtttaaa atgtaaaatt a Cattacaga aa Catttgga aaggggagaa taaaagaaaa Calaaa. CalCala atgttgC Cag tgCtd taggit gC tatt atta gcgctttggit gtaact catg gtogttt toc taCtatt titt attata Cagt Catct Cttgg tatctgttgaa gtggttcCaC aaactccCtc aalataCCaaa. at CCtcCtat gCt Caagttc CCaatataaa at agtgtag t act togcatta caacctttgc aCat Ctt CCC at at aCttta aaat Cat Ctt tagattactt ataataCCta acacaatgta aatgctdaat alagtag ttgt taacattgta ttgtttaggg aataatggCa agaaaagtCt gCatgttcaa taCagatgca aCttitt CCaC tgaatattitt tatt CCaagg ttggttgaag CCatggatgC agala CCC atg gatatagagg gcc tact gta Cttgtaccat Ctagagataa gatttgtatic ttgcatttgt tittaa Catat Ctgttctaag gaatat Ctca gtCaCCaggC aagtgCtgCa gtataaCtag gtact acgtC agg togctaag gttaagagag tattitt CCtt cactgact CC tCaCt CCGag aat CCattitt acagott cat tggtttgggt tatt CCalatt ttittgatgtg agtaaataaa tgaCttctat titcCC Caaaa. taaagctitat ataggcCtta taaCCatgca aatgtgtc.ca ttaagttgga Cttggaatga gtgaatgagt attact.gc.ca gtgttgttgttgt gtgttgttgttgt gtgttgttgttgt gtgttgttgttct gtctgtctgt ctgtctgtct gtctgtctgt ctgtctgcct gCCtgCCtcC CtgCC togCCt gCCtgcCtgg ctgcctgtct gCCtgttctg.c ctgcctgcct gCCtgcCtgc CtgCC to tct gtctoactitt gtCCCC tagg CtggagtgCa gtgg tatgat CtCggct CaC tgCaa CCtcC accCCCC9gg tt Caag.cgat tottctgcct cago CtcCtg agtag Ctggg atta Cagg Cg Catgcc.gc.ca tgCCCdgCtg Fig. 7h U.S. Patent Oct. 28, 2008 Sheet 16 of 19 US 7.442,782 B2

tttitttgtat titt tag taga gacggggttt cgc.catgttg gCCagaCtgg tCt Calala Ct. C Ctgacct Cag atggit CCaCC cgCtt CagcC toccaaagtg ctaggattac agg CatgagC CaccgtgCCC agcCactaCC aattattt Ct Cttaatggat titt Cattgac cctaaccotg taaatt CCat CaCttt tatt C alaggtgtata ttataataag tCtataataC Cca at Catgt agttgttgttga ttattittatt tttittgagac agagt citcaa tgttgCCCag gCtggagtaC agtggCaCCa tCt CagCt Ca CtgtaagctC cgcct Cotgg gttcacacca ttcticctgcc toagcctccC alagtag Ctgg gatta Cagg C gCCtgccact to accgggct aatttitttgt atttittcgta gaga C9999 t tt Caccatgt tagcCaagat gg tot C9 at C tcatcatCca cccacct cogg CCtcccaaag tgatgggatt actggCgtga gCCaCCatgC CCagctattt ttittaaCCaa tatattagct agctttitt to ccCagaataa ttitt CCalaala at a Catttala tagagaataa aagttaaaag aactitt Cagt ggtttaatgc tgttacttitt aa tattt Caa agat CtgaCt gCagCCatga gCag Caatga gtgcttcaag tgtggacoat Ctggccactg ggCCCgggaa tgtcc tactg gtggaggCCg tg9tcgtgga atgagalag CC gtggCagagg tggttt tacc tCggatagag g tattttgtC gaatagaaaa atttgaagta ctitcag tatt tgttagtatc aagactgg to tgaCtagCCg aattctttgt ttittgct Caa aacaggttt C cagtttgttt cct cotct ct to cagacatt tgttatcgct gtggtgagtC tggtoat Ctt gCCaaggatt cytgat Cttca ggaggatggt aag tatttaa CaCtt CCttt tCat a CCCCt CtagagCttg gag aggtgag cacatgcaac tgtcgtatago attt CCaCCt. ttgaggttitt g tattgtata atttaaaacg. taa Cactittg taalaggttitt a tagt cittgg cctgtttctt ttCct tattg ttgaagCCtg CtataactgC 99tagaggtg gcca cattgc caaggact gC aaggagCCCa agagaga gCd agag Caatgc tgCtaCaact gtgg Calaa CC aggccatctg gct Cotgact gccaccatgC agatgag cag aaatgctatt Cttgttggaga att CdgaCaC attcaaaaag actgcacCaa agtgaagtgc tatagg talag gtgtcagaat gttgttagaa gaaaact Cat tgcagagatt Ctt CCagaga tgaattagct ataaatggaa gggCCttagt alaatt Cagtg aaaCttagct gtgaccagat aagaccaatt ttcagoatat gtaaCtggCa gtctatotgt a tataatt Ct gtatt CtgCC Ctgatat CCt gtogct tatg gtacctgggC agttt toaca actggaCttt tittaatatat aaaagta aga gtgttataat ttgaaact tc cagagact tc atagaaagct Ctgtaatata Catalaat Ctt ttat Catgta accagaaatc tttgcctgtt tgttgacatgt aagtgtataa tittgataaat gttgttgttgt a catat Ctgt gaaac Cittag gggittaattg Catgaaaa.ca aag at CaggC gttttgttct gCatggtgaC tgttgCtttg gtaga cagtt tttittctgag gCCCattgtg aaaaCttitta attt Cttitt t taggtgttggit gaala CtggtC atgtagccat caactgcago aaga Caagtg aagttcaactg ttaccgctgt ggCgagt cag ggCaCCttgC acgggaatgC acaattgagg Ctacagocta atta t t t t c C tttgtcgCCC CtCC t t t t t C tgattgatgg ttg tattatt ttctictogaat CCt. Ctt CaCt ggCCalaaggt tggCagatag aggCaactCC CaggCCagtg agctt tactt gcc.gtgtaaa aggaggaaag gggtggaaaa aaa CCCaCtt totgcattta aCta Caaaaa aagtt tatgt ttagtttggit agaggtgtta tgtataatgc tttgttaaag ala CCCCC titt CCgtoccact ggtgaatagg gattgatgaa tgggalagagt tgagt CagaC cagtaag CCC gtCCtgggtt CCttgaaCat Fig. 7i U.S. Patent Oct. 28, 2008 Sheet 17 of 19 US 7.442,782 B2 gttccCatgt aggagg taaa acCaattctg gaagtgtCta tgaact tcca taaataaCtt taattittagt ataatgatgg tottggattg totgacctica gtagctatta aataa Cat Ca agtaaCatct gtat CaggCC CtaCatagaa Catacagttg agtgggagta aacaaaaaga taalacatgcg tgttaatggC tgttcdagag aaatcggaat aaaagCCtaa acaggaacaa Ctt CatCaCa gtgttgatgt tggaCaCata gatggtgatg gCaaaggttt agaacacatt atttitcaaag aCtaaat Cta aaaCCC agag taala Cat Cala tgct Cagagt tagCataatt tggagctatt Cagga attgC agagaaatgC attitt cacag aaat Caagat gttatttittg tataCt at at CaCttaga.ca actgtc..tttC atttgCtgta atcagtttitt aaaagt Caga tggaaagagc aactgaagtC Ctagaaaata gaaatgtaat tittaala Citat tcCaataaag Ctggaggagg aaggggagtt tgaCtaaagt totttttgtt tgtttcaaat titt Cattaat gtatatagtg CaaaataCCa tattaaagag ggsgaatgttgg aggaCtgaaa gCtgaCagtt tggacttitt c tttttgtact taagtCatgt Ctt Caataat cgaaaattgct gttaaaagga tg tatgggat ttagatactt ttgcaaagct atagaaaatt cactttgtaa tctgttataa taatgccctt gagttctgtg tt Cagtctga acaggtttitt tggtggtggit ggttttgttt tgttittggag acggagtCt C actCttgtcg CCC aggCtgg agtgCaggCt tggct Cactg Cala CCt CCaC CtCCCdggitt Caag Caatt C toctogcctica gCCtcCtgag tagCtgggat tacaggCaCC cgccaccacc cccc.gctaat ttitttgtatt tt tatt titta ttt tattt tt tta t t t t t tit ttgaga Caga gtgtcgctict gttgCC Cagg Ctggagtgta gtggtgCgat Ctcggct cac tgCaagct CC gCCtCCtggg tt CoCgCCat totcctgcct cagoctoctd agtag Ctggg gctacaggta CCC.gccacco CCCCCagcta atttitt ttitt tttgtattitt tagtaaagaC ggggttt CaC ggtgttagCC aggatgg tot caatctoCtg acct Cdtgat CCg.cccdc.cit tggCCtcCCa alagtogCtggg at CaCaggCd tgagcCaccg CogCCCggCCt atttitttgta ttitt tag tag agaCtgggitt toat catgtt ggtCGggCtg gtct coaact cCtgacctica gg tdat CCaC Ctgcc.ccgcc CCCCaaagtg Ctagtgttac aggtCC9agC CaccgtgtcC ggCCG attct gaacagttitt aataccattg ctatttittgt gtttitt.cctg ggCCtttittt Ctt t t t t titt ttitt tittittg agaCagtCtC gCtctgttgc CC aggCtaga gtgcaatggit gCaat Ct Cag Ct Cactggaa CCtt CaCCCC CCaC CCCCaC accCtgttca agtaattctic CtgCCtcagc CtCCCaaata GCtgggatta CaggtgtcCg CCaCCaCaCC cagotaattit ttgttattitt tag tagagat gggg titt CaC tgtgttggtc aggCtgg tot Ccaactgttg CCCt Cagg to agCCaCtgtg CCCCaCC titt to Ctgggttt CataaggatC tgaagtggtg gatticcittgt ttittgctagt at Ct Cattta gagttgagat ggaCCttaaa acticatctgt tittaa. Ct. CaC tttittaatag atgagttaaa Ctta atttaC ttaaggatgt a Cagttagag Cctggaactt CaaCCattat toa Ct CCCCa tgcc.ctgttt CCCCCCaCtt Cgaaattaaa tgCdgttagC at Catatagt toattitt CCC cct coatgct gCtgttgttgat tottgactitt ggg tatgagt ttitt Cat CCt tCatgCaggg ttctgtcagt tcatgg tata Fig. 7 U.S. Patent Oct. 28, 2008 Sheet 18 of 19 US 7.442,782 B2

Age of blood draw vs. Repeat Site

3S t s g s s f x8 3.

o 2000 4000 600g 800 dog I too y = 0.0 0 19 x + 35-38 4 Reper a 5 r R" a 0.273

Ago conset Yx Ropea Scre

d al Kas s O Ca 9.

O 0. Apa y = 44.035x +3.900.4 C R = 0.0428

Age of onse of Weakness W. Repeat Sze

o X 4) se 7 Are y = 70.689 + 3 42.5 D R = 0.132.3

Age of Catafat Extraction W. Repeat Size t2000.------...---l.

e B8 e Yo as wo o Ayer y = 39.8361 + 470.7 Fig. 8 U.S. Patent Oct. 28, 2008 Sheet 19 of 19 US 7.442,782 B2

A. Intergenerational Differences in DM2 Repeat Size (Kb) 5 it of Trans- 4 missious

-31 -21 -11 - 1 0 11 21 to to tO to to to tO -40 -30 -20 -10 10 20 30

Microsatellite Disorders Relative Repeat Sizes Poly Q DM - - DM2 US 7,442,782 B2 1. 2 INTRONASSOCATED WITH MYOTONIC Defining a second human mutation that causes the multi DYSTROPHY TYPE 2 AND METHODS OF systemic effects of DM, and identifying what is common to USE these diseases at the molecular level, provides an independent means of determining the pathogenic pathway of DM and CONTINUING APPLICATION DATA 5 allow methods for diagnosing this disease to be developed. This application is a divisional application of application SUMMARY OF THE INVENTION Ser. No. 10/143,266, Confirmation No. 2285, filed on May 10, 2002, which claims the benefit of U.S. Provisional Appli The present invention represents an advance in the art of cation Ser. No. 60/290,365, filed May 11, 2001, U.S. Provi 10 detecting whether a human individual is at risk for myotonic sional Application Ser. No. 60/302,022, filed Jun. 29, 2001, dystrophy type 2 (DM2). The inventors have discovered that and U.S. Provisional Application Ser. No. 60/337,831, filed DM2 is caused by a CCTG expansion in intron 1 of the Nov. 13, 2001, all of which are incorporated by reference nucleotides encoding zinc finger protein 9 (ZNF9). This herein. expansion is located in a region of the genome for which the nucleotide sequence was not completely ordered prior to the GOVERNMENT FUNDING 15 present invention. The correct sequence of this region has been determined and is disclosed herein. Accordingly, the The present invention was made with government Support present invention provides isolated polynucleotides. The under Grant Number NS35870, awarded by the National polynucleotides include a nucleotide sequence of about Institutes of Health. The Government has certain rights in this 2O nucleotides 1-14468 of SEQ ID NO:1, about nucleotides 1nVent1On. 14474-22400 of SEQ ID NO:1, about nucleotides 17501 BACKGROUND 17701 of SEQ ID NO:1, about nucleotides 17501-17701 of SEQ ID NO:1 and a repeat tract, about nucleotides 17858 DM is a dominantly-inherited, multisystemic disease with 18058 of SEQID NO:1, a repeat tract and about nucleotides a consistent constellation of seemingly unrelated and rare 25 17858-18058 of SEQID NO:1, or the complements thereof. clinical features including: myotonia, muscular dystrophy, The present invention also provides isolated polynucleotides cardiac conduction defects, posterior iridescent cataracts, and that include at least about 15 consecutive nucleotides from endocrine disorders (Harper, Myotonic Dystrophy, W. B. nucleotides 16701-17701 of SEQID NO:1, at least about 15 Saunders, London, ed. 2, 1989)). DM was first described consecutive nucleotides from nucleotides 17858-18862 of nearly 100 years ago, but the existence of more than one 30 SEQ ID NO:1, or the complements thereof. genetic cause was only recognized after genetic testing The present invention provides a method for detecting a became available for myotonic dystrophy type 1 (DM1) polynucleotide that includes a repeat tract within an intron 1 (Thornton et al., Ann. Neurology, 35,269 (1994), Rickeret al., of a zinc finger protein 9 (ZNF9) genomic sequence. The Neurology, 44, 1448 (1994)). method includes amplifying nucleotides of an intron 1 region DM1 is caused by an expanded CTG repeat on chromo 35 of a ZNF9 genomic sequence to form amplified polynucle some 19 that is both in the 3' untranslated region of the otides, wherein the amplified polynucleotides includes repeat dystrophia myolonica-protein kinase (DMPK) gene, and in tracts, and detecting the amplified polynucleotides. Alterna the promoter region of the immediately adjacent home tively, the method includes digesting genomic DNA with a odomain gene SIX5 (Groenen and Wieringa, Bioessays, 20, restriction endonuclease to obtain polynucleotides, probing 901 (1998), Tapscott, Science, 289, 1701 (2000)). How the 40 the polynucleotides under hybridizing conditions with a CTG expansion in a noncoding region of a gene causes the detectably labeled probe which hybridizes to a polynucle complex DM phenotype remains unclear. Suggested mecha otide containing a repeat tract within an intron 1 of a ZNF9 nisms include: (i) haploinsufficiency of the DMPK protein; genomic sequence, and detecting the probe which has hybrid (ii) altered expression of neighboring , including SIX5; ized to the polynucleotides. and (iii) pathogenic effects of the CUG expansion in RNA 45 The present invention further provides a method for iden which accumulates as nuclear foci and disrupts cellular func tifying an individual not at risk for developing myotonic tion. Several mouse models have developed different aspects dystrophy type 2 (DM2). The method includes analyzing of DM1: a model expressing mRNA with CUG repeats mani intron 1 regions of ZNF9 genomic sequences of an individual fests myotonia and the myopathic features of DM1; a DMPK for two not at risk alleles that include repeat tracts of no knockout has cardiac abnormalities; and SIX5 knockouts 50 greater than 176 nucleotides. For instance, the method may have cataracts. Taken together, these data have been inter include amplifying nucleotides of intron 1 regions of ZNF9 preted to suggest that each theory may contribute to DM1 genomic sequences of an individual to form amplified poly pathogenesis and that DM1 may be a regional gene disorder. nucleotides, wherein the amplified polynucleotides include To better define the pathophysiological cause of DM, we repeat tracts, comparing the size of the amplified polynucle have studied families with many of the clinical features of 55 otides, and analyzing the amplified polynucleotides for two DM but without the DM1 CTG expansion. After genetic not at riskalleles. The act of amplifying may include perform testing became available for DM1, families with DM2 and ing a polymerase chain reaction (PCR) with a primer pair that Proximal Myotonic Myopathy (PROMM) were identified includes a first primer and a second primer, wherein the first and linkage analysis excluded involvement of the DM1 locus, primer and the second primer flank the repeat tracts located as well as excluding the muscle chloride and sodium channel 60 within the intron 1 regions. The first primer includes at least genes. Proximal Myotonic Dystrophy (PDM) and Myotonic about 15 nucleotides selected from nucleotides 14469-17701 Dystrophy type 2 (DM2) were subsequently described, of SEQ ID NO:1, and the second primer includes at least broadening the recognized phenotype of non-DM1 forms of about 15 nucleotides selected from nucleotides 17858-18661 dominantly inherited multisystemic myotonic disorders. In of SEQ ID NO:1. Alternatively, the method may include 1998 the DM2 locus was mapped to 3q21, and it was dem 65 amplifying nucleotides of intron 1 regions within ZNF9 onstrated that the genetic cause of PROMM map to the same genomic sequences of an individual to form amplified poly locus in many families. nucleotides, wherein the amplified polynucleotides include US 7,442,782 B2 3 4 repeat tracts, and analyzing the repeat tracts of the amplified BRIEF DESCRIPTION OF THE FIGURES polynucleotides for two not at risk alleles including repeat tracts of no greater than 176 nucleotides. FIG. 1. Expanded CL3N58 allele found in DM2 patients. Also provided by the present invention is a method for (A) DM2 critical region. Black represents the minimal DM2 identifying an individual that has DM2 or is at risk for devel critical region, white represents DM2 excluded regions, and oping DM2. The method includes analyzing an intron 1 grey represents regions in which recombination has occurred. region of a ZNF9 genomic sequence of an individual for one Markers defining recombination events and establishing link at risk allele including a repeat tract including at least about age disequilibrium are shown, along with previously pub 75 CCTG repeats. In another aspect, the method includes lished markers. The relative significance of the p-values are digesting genomic DNA of an individual with a restriction 10 indicated by plusses above the marker names, with endonuclease to obtain polynucleotides, probing the poly ++’s0.01, +++’s 0.001 ++++’s0.0001, and ++++++ nucleotides under hybridizing conditions with a detectably s0.000001. Three BACs (orientation unknown) within the labeled probe that hybridizes to a polynucleotide containing a region of linkage disequilibrium are shown. Not drawn to repeat tract within an intron 1 of a ZNF9 genomic sequence, scale. (B) Pedigrees of three different DM2-linked families, detecting the probe that has hybridized to the polynucleotide, 15 each represented by a nuclear family. (C) PCR analysis of and analyzing the intron 1 region of the hybridized polynucle CL3N58 marker. The genotype of each individual is shown, otide for one at risk allele including a repeat tract including at with the size of eachallele given in basepairs below each lane. least about 75 CCTG repeats. In yet another aspect, the Unamplified alleles are represented by '-'. (D) Southern method includes amplifying nucleotides of an intron 1 region blot analysis of expansion mutations. Individuals with an ofa ZNF9 genomic sequence of an individual to form ampli expanded CCTG tract are represented by “EXP and indi fied polynucleotides, wherein the amplified polynucleotides viduals with 2 normal alleles are represented by 'N'. The blot include a repeat tract, and analyzing the repeat tracts of the was also hybridized with an SCA8 loading control, showing amplified polynucleotides for one at risk allele including a that all but the first lane was evenly loaded. (E) High resolu repeat tract including at least about 75 CCTG repeats. tion sizing of expansions. Lane 3 contains DNA from a con The present invention also provides kits. In one aspect of 25 trol sample. The number of CCTG’s of each individuals the invention, the kit is for identifying whether an individual expanded allele is shown, with "N' representing a normal is not at risk for developing DM2. The kit includes a first length CCTG tract. primer having at least about 15 consecutive nucleotides FIG. 2. Analysis of DM2 affected and normal alleles. (A) selected from nucleotides 14469-17701 of SEQID NO:1, and Distribution of CL3N58 alleles among controls (n=1360). the second primer having at least about 15 consecutive nucle 30 Alleles represent the total basepair size of the combined TG, otides selected from nucleotides 17858-18661 of SEQ ID TCTG, and CCTG repeat tracts. (B) Schematic diagram of NO:1. An individual who is not at risk has two not at risk DM2 expansion region, showing sequence configurations of alleles of ZNF9 genomic sequences including repeat tracts of normal and expanded repeat tracts. (C) Distribution of no greater than 176 nucleotides. expanded alleles among 51 affected members of six DM2 In another aspect, the kit is for identifying whether an 35 families. All expanded allele sizes were included for individu individual is at risk for developing DM2. The kit includes a als with multiple bands and, in contrast to (B), are given in probe having at least about 200 nucleotides, wherein the CCTG repeat units. probe hybridizes to SEQID NO:1 or the complement thereof. FIG. 3. Instability of the DM2 expansion. (A) Somatic An individual who is at risk has one at risk allele of a ZNF9 heterogeneity in blood. Southern blots of BsoBI-digested genomic sequence including a repeat tract including at least 40 genomic DNA from blood revealed multiple expanded alleles about 75 CCTG repeats. Alternatively, the kit includes a first in Some affected individuals, some discrete in size (lanes 1 & primer having at least about 15 nucleotides selected from 2), others broad (lane 3). (B) Southern blots of EcoRI-di nucleotides 14469-17701 of SEQ ID NO:1 or nucleotides gested genomic DNA from blood of monozygotic twins 17858-18661 of SEQID NO:1, and a second primer having a (lanes 4 and 5). (C) Expanded alleles increase in length over nucleotide sequence selected from the group consisting of 45 time. Southern blot of EcoRI-digested genomic DNA (CCTG) and (CAGG), where n is at least 4. An individual samples from blood taken from a single patient at 28 (lane 6) who is at risk has one at risk allele of a ZNF9 genomic and 31 (lane 7) yrs of age, respectively. (D) Correlation sequence including a repeat tract including at least about 75 between the size of the expanded allele in individuals with a CCTG repeats. single allele and age at the time blood sample was taken. In yet another aspect, the kit is for identifying whether an 50 FIG. 4. Genomic organization of the ZNF9 gene. The posi individual has DM2. The kit includes a probe having at least tion of the DM2 expansion in intron 1 is shown. The gene about 200 nucleotides, wherein the probe hybridizes to SEQ spans 16.5 kb of genomic sequence with an mRNA of 1.5 kb. ID NO:1 or the complement thereof. An individual who is at FIG.5. Northern analysis of ZNF9RNA expression. Upper risk has one at riskallele of a ZNF9 genomic sequence includ panel, human multiple-tissue northern blot hybridized a ribo ing a repeat tract including at least about 75 CCTG repeats, 55 probe that included exon 5 of ZNF9; lower panel, actin used and displays a symptom of DM2. Alternatively, the kit as a loading control; 1.5, 2.0, 1.8, size in kilobases. includes a first primer including at least about 15 nucleotides FIG. 6. (A) Schematic diagram of repeat assay PCR reac selected from nucleotides 14469-17701 of SEQID NO:1 or tion products. (B) PCR analysis of CL3N58 marker. Lane 1, nucleotides 17858-18661 of SEQ ID NO:1, and a second from the unaffected mother, shows two alleles. Lanes 2 and 3, primer including a nucleotide sequence selected from the 60 from the affected father and affected son, respectively, show group consisting of (CCTG), and (CAGG), where n is at only one allele. There is no shared allele in lanes 2 and 3, as least 4. An individual who is at risk has one at risk allele of a would be expected in normal Mendelian inheritance of PCR ZNF9 genomic sequence including a repeat tract including at alleles. (C) Southern-blot analysis of expansion mutations. least about 75 CCTG repeats. Lanes 1-4 show affected individuals with detectable Unless otherwise specified, “a,” “an.” “the and “at least 65 expanded bands. Lane 5 shows an unaffected individual with one' are used interchangeably and mean one or more than only the normal-sized band. Lanes 6 and 7 show affected OC. individuals with no detec' expansion. Lane 8 shows an US 7,442,782 B2 5 6 affected SCA8 individual with an expanded band. (D) Repeat is joined in Such a way that expression of the genomic assay of DM2 mutations. Lanes 1-5 and 8 show affected sequence is achieved under conditions compatible with the individuals who are expansion-positive, indicated by Smears regulatory sequence. above the normal allele, by the Repeat assay. Lanes 1, 2, 4, 5, The ZNF9 genomic sequence maps to chromosome 3, and 8 show affected individuals who had expansions by 5 position 3q21, in the human genome. The sequence tagged Southern-blot analysis, while lane 3 shows an affected indi sites (STS) associated with the ZNF9 genomic sequence vidual who had no detectable expansion by Southern-blot include N22238 and stG51 107. The polypeptide encoded by analysis. Lanes 6, 7, 9, and 10 show unaffected individuals the ZNF9 genomic sequence contains 7 zinc finger domains who are expansion-negative, indicated by the lack of Smears and functions as an RNA-binding polypeptide by binding the above the normal allele, by the Repeat assay. (E) Abbreviated 10 sterol regulatory element (see Rajavashisth et al. Science, pedigree of a DM2 family. Filled-in symbols represent 245, 640-643). As used herein, a “polypeptide' refers to a affected individuals. Below each symbol: age of blood draw, polymer of amino acids linked by peptide bonds and does not CL3N58 PCR allele sizes (where “B” signifies evident exist refer to a specific length of a polymer of amino acids. The ence of a non-amplifying blank allele), and either the size of ZNF9 genomic sequence contains 5 exons and 4 introns (see the expansion detected by Southern (“Nkb’) or the result of 15 the Repeat assay, given as Exp(+) or Exp(-), where Exprefers FIG. 4). to expansion, for those with no expansion on Southern analy The sequence of a ZNF9 genomic sequence obtained from one individual is disclosed in FIG. 7. In this sequence, exon 1 S1S. corresponds to nucleotides 4337-4415, exon 2 corresponds to FIG. 7. Nucleotide sequence of a human Zinc fingerprotein nucleotides 18662-18799, exon 3 corresponds to nucleotides 9 (ZNF9) genomic sequence (SEQID NO:1). N, nucleotide 78896-18987, exon 4 corresponds to nucleotides 19156 A, C, T, or G. 19356, and exon 5 corresponds to nucleotides 19865-20845. FIG.8. Correlation of Repeat Length with Clinical Sever Intron 1, which corresponds to nucleotides 4416-18661, ity. includes a gap of unknown size. This gap is depicted in SEQ FIG. 9. Intergenerational changes in repeat length. ID NO:1 between nucleotides 14469-14473. An intron 1 of a 25 ZNF9 genomic sequence includes a TG/TCTG/CCTG repeat DETAILED DESCRIPTION OF PREFERRED tract, which is also referred to herein as a “repeat tract.” The EMBODIMENTS OF THE INVENTION characteristics of repeat tracts are described in greater detail below. In SEQID NO:1, the repeat tract corresponds to nucle Compositions 30 otides 17702-17858. In the ZNF9 genomic sequence, the The present invention provides isolated polynucleotides transcription initiation site is nucleotide 4337, the first nucle that include a portion of an intron 1 region of a Zinc finger otide of exon 1, and the transcription termination site is nucle protein 9 (ZNF9) genomic sequence. As used herein, the term otide 20845. "polynucleotide refers to a polymeric form of nucleotides of An intron 1 of a ZNF9 genomic sequence typically any length, either ribonucleotides or deoxynucleotides, and 35 includes at least about 14247 nucleotides. The sequences of includes both double- and single-stranded DNA and RNA. A an intron 1 immediately adjacent to exon 1 (i.e., the 5' end of polynucleotide may include nucleotide sequences having dif intron 1) are preferably nucleotides 4416-4426 of SEQ ID ferent functions, including, for instance, genomic sequences, NO:1, more preferably nucleotides 4416-4466 of SEQ ID and other sequences such as regulatory sequences and/or NO:1, most preferably nucleotides 4416-4516 of SEQ ID introns. A polynucleotide can be obtained directly from a 40 NO:1. The sequences of an intron 1 immediately adjacent to natural source, or can be prepared with the aid of recombi exon 2 (i.e., the 3' end of intron 1) are preferably nucleotides nant, enzymatic, or chemical techniques. A polynucleotide 18641-18661 of SEQID NO:1, more preferably nucleotides can be linear or circular intopology. A polynucleotide can be, 18611-18661 of SEQID NO:1, most preferably nucleotides for example, a portion of a vector, Such as an expression or 18561-18661 of SEQID NO:1. Intron I of a ZNF9 genomic cloning vector, or a fragment. An "isolated polypeptide or 45 sequence also includes several nucleotide sequences that are polynucleotide means a polypeptide or polynucleotide that highly conserved by intron 1 regions present in different has been either removed from its natural environment, pro alleles of ZNF9, and preferably are not present elsewhere in duced using recombinant techniques, or chemically or enzy the human genome. For instance, an intron 1 of a ZNF9 matically synthesized. Preferably, a polypeptide or poly genomic sequence contains one, preferably two, more pref nucleotide of this invention is purified, i.e., essentially free 50 erably 3, most preferably, 4 of the following: GCCGCAGT from any other polypeptide or polynucleotide and associated GCGGGTCGGGTCTGTGGCGGAC (SEQ ID NO:39), the cellular products or other impurities. As used herein, a nucleotide sequence generated by using the primers 'genomic sequence' includes a polynucleotide that encodes GAGAACCTTGCCATTTTCG (SEQID NO:22) and CAC an unprocessed preRNA (i.e., an RNA molecule that includes CTACAGCACTGGCAACA (SEQID NO:23) to amplify an both exons and introns), and the preRNA. When placed under 55 intron 1 of ZNF9, preferably SEQID NO:1, GCCTAGGG the control of appropriate regulatory sequences, a genomic GACAAAGTGAGA (SEQ ID NO:10), GGCCTTATAAC sequence produces an mRNA. The boundaries of a genomic CATGCAAATG (SEQ ID NO:1), or the complements sequence are generally determined by a transcription initia thereof. tion site at its 5' end and a transcription terminator at its 3' end. Examples of the polynucleotides of the present invention A genomic sequence typically includes introns and exons. A 60 include polynucleotides located upstream (i.e., 5') or down regulatory sequence is a polynucleotide that regulates expres stream (i.e., 3') of the repeat tract. Polynucleotides of the sion of a genomic sequence to which it is operably linked. A present invention located upstream of the repeat tract prefer non-limiting example of a regulatory sequence includes pro ably include, in increasing order of preference, about nucle moters. “Operably linked’ refers to a juxtaposition wherein otides 17501-17701 of SEQ ID NO:1, about nucleotides the components so described are in a relationship permitting 65 17101-17701 of SEQ ID NO:1, about nucleotides 16701 them to function in their intended manner. A regulatory 17701 of SEQ ID NO:1, most preferably, about nucleotides sequence is "operably linked to a genomic sequence when it 15701-17701 of SEQID NO:1, or the complements thereof. US 7,442,782 B2 7 8 Polynucleotides of the present invention located downstream otides may also be present. Examples of normal repeat tracts of the repeat tract preferably include, in increasing order of are depicted at SEQ ID NO:2, SEQ ID NO:3, and SEQ ID preference, about nucleotides 17858-18058 of SEQID NO:1, NO:4 (see FIG.2B) and at nucleotides 17702-17857 of SEQ about nucleotides 17858-18458 of SEQ ID NO:1, about ID NO:1. A ZNF9 genomic sequence containing a normal nucleotides 17858-18858 of SEQID NO:1, most preferably, repeat tract is referred to herein as a “normal allele.” As used about nucleotides 17858-19858 of SEQ ID NO:1, or the herein, an “allele' of ZNF9 refers to one of several alternative complements thereof. forms of the nucleotide sequence that occupies the location of Optionally and preferably, the polynucleotides of the the ZNF9 genomic sequence on chromosome 3, position invention that include a portion of SEQ ID NO:1 further 3q21. An individual with two “normal” or “not at risk” alleles include a repeat tract, or the complements thereof. More 10 of ZNF9 will not display symptoms of DM2 during his or her preferably, the polynucleotides of the invention include the lifetime, and is considered to be “not at risk.” repeat tract and polynucleotides located upstream and down An “at risk” repeat tract of a ZNF9 genomic sequence also stream of the repeat tract. The upstream nucleotide of such includes consecutive TG nucleotides, preferably about 16, polynucleotides can begin at, in increasing order of prefer followed by consecutive TCTG nucleotides, preferably about ence, about nucleotide 17501, about nucleotide 17101, about 15 9, followed by consecutive CCTG nucleotides. The number nucleotide 16701, most preferably, about nucleotide 15701 of of consecutive CCTG nucleotides, also referred to herein as SEQ ID NO:1. The downstream nucleotide of such poly “a CCTG repeat,” is at least about 75 (i.e., the four nucleotides nucleotides can end at, in increasing order of preference, CCTG repeated at least about 75 times), more preferably at about nucleotide 18058, about nucleotide 18458, about least about 100, most preferably, at least about 500. Typically, nucleotide 18858, most preferably, about nucleotide 19858 of a CCTG repeat of an at riskallele is uninterrupted in that there SEQID NO:1. are no other nucleotides present in the CCTG repeat. An The present invention also includes shorter polynucle example of an at risk repeat tract is depicted at SEQID NO:5 otides, also referred to herein as primers and probes. A poly (see FIG. 2B). As used herein, “at risk” describes an indi nucleotide of this aspect of the invention has a nucleotide vidual having an allele of the ZNF9 genomic sequence that is sequence that is complementary to a nucleotide sequence of a 25 associated with DM2. Herein, this includes an individual who ZNF9 genomic sequence, or the complement thereof. Prefer may be manifesting at least one symptom of DM2, as well as ably, such a polynucleotide includes a nucleotide sequence of an individual who may develop at least one symptom of DM2 the intron 1 that flanks the repeat tract, exon 2, or the comple in the future. An allele of the ZNF9 genomic sequence that is ments thereof, and optionally, further includes nucleotides of associated with DM2 is referred to hereinas an "at riskallele.” the repeat tract and the complements thereof. In some 30 This mutation is dominant, thus an individual with an at risk embodiments, a polynucleotide of this aspect of the invention allele of ZNF9 may display at least one symptom of DM2 includes consecutive nucleotides selected from about nucle during his or her lifetime. Typically, individuals have either otides 15701-16700 of SEQ ID NO:1, about nucleotides two normal alleles or one normal allele and one at risk allele. 16701-17100 of SEQ ID NO:1, about nucleotides 17101 The present invention includes methods for detecting a 17500 of SEQ ID NO:1, about nucleotides 17501-17701 of 35 polynucleotide including a repeat tract within an intron 1 of a SEQ ID NO:1, about nucleotides 17858-18058 of SEQ ID ZNF9 genomic sequence, methods for identifying an indi NO:1, about nucleotides 18059-18458 of SEQ ID NO:1, vidual not at risk for developing DM2, and methods for iden about nucleotides 18459-18858 of SEQ ID NO:1, about tifying an individual that has or is at risk for developing DM2. nucleotides 18859-19858 of SEQ ID NO:1, SEQID NO:2, The methods of the present invention can involve known SEQID NO:3, SEQID NO:4, SEQID NO:5, or the comple 40 methods for detecting a specific polynucleotide, including ments thereof. A polynucleotide of this aspect of the invention detection of DNA or RNA, preferably, DNA. For instance, includes, in increasing order of preference, at least about 15 polymerase chain reaction (PCR) techniques can be used with consecutive nucleotides, at least about 20 consecutive nucle primers that amplify all or a portion of a repeat tract. Alter otides, at least about 25 consecutive nucleotides, at least natively, Southern blotting hybridization techniques using about 200 nucleotides, at least about 350 nucleotides, most 45 labeled probes can be used. The source of polynucleotides is preferably, at least about 500 nucleotides. a biological sample that includes genomic DNA and/or unprocessed RNA, preferably genomic DNA. As used herein, Methods a “biological sample” refers to a sample of material (solid or The identification of a genomic sequence that is associated fluid) obtained from an individual, including but not limited with a disease allows for unproved diagnosis of the disease. 50 to, for example, blood, plasma, serum, or tissue. An indi The present invention discloses that an expansion in the intron vidual can be a rat, mouse, human, chimpanzee, or gorilla, 1 of a ZNF9 genomic sequence is associated with the disease preferably human. Typically, the number of nucleotides in a myotonic dystrophy type 2 (DM2). The expansion occurs in a repeat tract, including the number of CCTG repeats in a TG/TCTG/CCTG (SEQID NO:40) repeat tract, also referred repeat tract, can be inferred by the approximate molecular to herein as a “repeat tract.” A repeat tract begins with at least 55 weight of the detected polynucleotide containing the repeat about 14 consecutive TG nucleotides (i.e., the TG dinucle tract. Other techniques, including nucleic acid sequencing, otide repeated 14 times), followed by at least about 3 con can also be used for determining the number of nucleotides in secutive TCTG nucleotides, followed by at least about 4 a repeat tract. consecutive CCTG nucleotides. A “normal repeat tract, also The present invention provides methods for detecting a referred to herein as a “not at risk” repeat tract, includes no 60 polynucleotide including at least a portion of a repeat tract greater than about 176 nucleotides, more preferably no within an intron 1 of a ZNF9 genomic sequence. Preferably, greater than 164, most preferably, no greater than 154 nucle the polynucleotide includes an entire repeat tract within an otides, where the total number of nucleotides is determined intron 1 of a ZNF9 genomic sequence. In one aspect, the by counting from the first nucleotide of the first TG to the last method includes amplifying nucleotides within an intron 1 nucleotide of the last CCTG. When greater than 4 consecutive 65 region of a ZNF9 genomic sequence of an individual to form CCTG nucleotides are present in a repeat tract, preferably a amplified polynucleotides that include a repeat tract, and normal repeat tract, intervening GCTG and/or TCTG nucle detecting the amplified polynucleotides. Preferably, nucle US 7,442,782 B2 9 10 otides are amplified by PCR. In PCR, a molar excess of a Instead of comparing the sizes of the amplified polynucle primer pair is added to a biological sample that includes otides after amplification, the size of the repeat tracts of the polynucleotides, preferably genomic DNA. The primers are amplified polynucleotides may be determined by, for extended to form complementary primer extension products instance, inferring the size of the repeat tract based on the which act as template for synthesizing the desired amplified observed molecular weight of the amplified polynucleotides, polynucleotides. As used herein, the term “primer pair or by determining the nucleotide sequence of the repeat tract. means two oligonucleotides designed to flank a region of a The presence of repeat tracts having no greater than 176 polynucleotide to be amplified. One primer is complementary nucleotides, and no repeat tract having at least about 75 to nucleotides present on the sense Strand at one end of a CCTG repeats, indicates the individual is not at risk. The polynucleotide to be amplified and another primer is comple 10 mentary to nucleotides present on the antisense Strand at the presence of a repeat tract having at least about 75 CCTG other end of the polynucleotide to be amplified. The poly repeats indicates the individual is at risk. nucleotide to be amplified can be referred to as the template Alternatively, the methods that include amplifying nucle polynucleotide. The nucleotides of a polynucleotide to which otides within an intron 1 region of a ZNF9 genomic sequence a primer is complementary is referred to as a target sequence. 15 may be used to identify an individual that has or is at risk for A primer can have at least about 15 nucleotides, preferably, at developing DM2. In this aspect, the primer pair includes a least about 20 nucleotides, most preferably, at least about 25 first primer having a target sequence that does not include the nucleotides. Typically, a primer has at least about 95% repeat tract. The first primer includes at least about 15 con sequence identity, preferably at least about 97% sequence secutive nucleotides located either upstream or downstream identity, most preferably, about 100% sequence identity with of a repeat tract. When selected from nucleotides upstream of the target sequence to which the primer hybridizes. The con a repeat tract, the nucleotides are, in increasing order of ditions for amplifying a polynucleotide by PCR vary depend preference, about nucleotides 17501-17701 of SEQID NO:1, ing on the nucleotide sequence of primers used, and methods about nucleotides 17101-17701 of SEQ ID NO:1, about for determining such conditions are routine in the art. nucleotides 16701-17701 of SEQID NO:1, most preferably, The methods that include amplifying nucleotides within an 25 about nucleotides 15701-17701 of SEQ ID NO:1. When intron 1 region of a ZNF9 genomic sequence may be used to selected from nucleotides downstream of a repeat tract, the identify an individual not at risk for developing DM2. In this nucleotides are, in increasing order of preference, the aspect, the primer pair includes primers that flank a repeat complement of about nucleotides 17858-18058 of SEQ ID tract. The first primer includes at least about 15 consecutive NO:1, about nucleotides 17858-18458 of SEQ ID NO:1, nucleotides selected from about nucleotides 17501-17701 of 30 about nucleotides 17858-18858 of SEQID NO:1, most pref SEQ ID NO:1, about nucleotides 17101-17701 of SEQ ID erably, about nucleotides 17858-19858 of SEQID NO:1. The NO:1, about nucleotides 16701-17701 of SEQID NO:1, most second primer of the primer pair includes either (CCTG), or preferably, about nucleotides 15701-17701 of SEQID NO:1. (CAGG), where n is at least 4, preferably, at least 5. The The second primer includes at least about 15 consecutive second primer binds randomly at multiple sites within a nucleotides selected from the complement of about nucle 35 repeat tract, which results in amplified polynucleotides that otides 17858-18058 of SEQ ID NO:1, about nucleotides vary in size but are larger than the amplified polynucleotides 17858-18458 of SEQ ID NO:1, about nucleotides 17858 that contain a normal allele. Thus, after determining the sizes 18858 of SEQ ID NO:1, most preferably, about nucleotides of the amplified polynucleotides, the presence of one ampli 17858-19858 of SEQID NO:1. In a preferred embodiment of fied polynucleotide and a population of amplified polynucle this aspect of the invention, one primer includes the nucle 40 otides having a range of sizes that are greater than the one otide sequence GGCCTTATAACCATGCAAATG (SEQ ID amplified polynucleotide indicates the individual has an at NO:11) and the second primer includes the nucleotide risk allele, and is considered to be at risk (see FIG. 6D and sequence GCCTAGGGGACAAAGTGAGA (SEQ ID Example 2). NO:10). Optionally and preferably, the second primer of this aspect After amplification, the sizes of the amplified polynucle 45 of the invention is modified to increase the efficiency of the otides may be determined, for instance by gel electrophoresis, amplification. The modification includes adding an addi and compared. The amplified polynucleotides can be visual tional nucleotide sequence present at the 5' end of the second ized by staining (e.g., with ethidium bromide) or labeling primer. Such a nucleotide sequence is referred to herein as a with a suitable label known to those skilled in the art, includ "hanging tail” sequence. A hanging tail sequence includes at ingradioactive and nonradioactive labels. Typical radioactive 50 least about 20 nucleotides, more preferably at least about 22 labels include P. Nonradioactive labels include, for nucleotides, and negligible complementarity to any nucle example, ligands Such as biotin or digoxigenin as well as otide sequences in the human genome. Whethera hanging tail enzymes such as phosphatase or peroxidases, or the various has negligible complementarity to any nucleotide sequences chemiluminescers such as luciferin, or fluorescent com in the human genome can be determined by hybridizing the pounds like fluorescein and its derivatives. 55 hanging tail sequence with the human genome under the Due to the size of the expansion of CCTG repeats in an at hybridization conditions described herein. A hanging tail has risk allele, this method of amplifying nucleotides within an negligible complementarity to any nucleotide sequences in intron 1 region of a ZNF9 genomic sequence typically does the human genome when it does not hybridize to the human not result in detectable amplified polynucleotides from an at genome. When the second primer of this aspect of the inven risk allele. Thus, when the comparison of the sizes of the 60 tion is modified in this way, the amplification also includes a amplified polynucleotides indicates the presence of two poly third primer having a nucleotide sequence Such that it is nucleotides, both copies of the individual’s repeat tracts were complementary to the hanging tail nucleotide sequence when amplified and the individual is considered to be not at risk incorporated into an amplified polynucleotide. In a preferred (see, for instance, FIG. 6B, lane 1). When only one amplified embodiment of this aspect of the invention, the first primer is polynucleotide is present after amplification as described 65 CL3N58-D R (5'-GGCCTTATAACCATGCAAATG (SEQ above, it is not possible to conclude that the individual is not ID NO:11)), the second primer is JJP4CAGG (5'-TACG at risk (see, for instance, FIG. 6B, lanes 2 and 3). CATCCGAGTTTGAGACGCAGGCAGGCAG US 7,442,782 B2 11 12 GCAGGCAGG (SEQ ID NO:36)), and the third primer is The digestion of genomic DNA with endonucleases is routine JJP3(5'-TACGCATCCGAGTTTGAGACG (SEQ ID in the art, and numerous endonucleases are known. Preferred NO:37)). restriction endonuclease enzymes include EcoRI and BsoBI. In another aspect of the methods for detecting a polynucle Typically, the polynucleotides resulting from digestion are otide including a repeat tract within an intron 1 of a ZNF9 fractionated, for instance by gel electrophoresis, denatured to genomic sequence, polynucleotide probes are used that yield single stranded polynucleotides, and then exposed to the hybridize to a polynucleotide. As used herein, “hybridizes.” probe under hybridizing conditions. The probe that has hybridized to the polynucleotide is then detected, and the size “hybridizing, and “hybridization” means that a probe forms of the hybridized polynucleotide may then be determined. a noncovalent interaction with a target polynucleotide under 10 standard conditions. Standard hybridizing conditions are The repeat tract may then be characterized, preferably by those conditions that allow a probe to hybridize to a target determining the number of CCTG repeats in the repeat tract. polynucleotide. Such conditions are readily determined for a Typically, the number of nucleotides in a repeat tract, includ probe and the target polynucleotide using techniques well ing the number of CCTG repeats in a repeat tract, can be 15 inferred by the approximate molecular weight of the detected known to the art, for example see Sambrook et al. Molecular polynucleotide containing the repeat tract. The presence of Cloning: A Laboratory Manual Cold Spring Harbor Labo one repeat tract having at least about 75 CCTG repeats indi ratory: New York (1989). Preferred probes useful in the present invention hybridize to a target polynucleotide by cates the individual is at risk. using prehybridization in a hybridization buffer, preferably In another embodiment of this aspect of the invention, RAPID-HYB buffer (Amersham, Piscataway, N.J.), at 60° polynucleotides may be used for in situ hybridization of tis for 1 hour, and hybridization overnight at 60° C. Preferably, at Sue samples, preferably muscle tissue or fibroblasts, more least 4x107 counts perminute (cpm) total of the labeled probe preferably muscle tissue. Preferably, the muscle tissue is skel is used in the hybridization. When the probe used is at least etal muscle. This method routine and known in the art (see, for about 200 nucleotides, the wash conditions used are: 2 25 instance, Taneja et al., J. Cell. Biol., 128,995-1002 (2002)). washes for 5 minutes each at room temperature in a solution Preferred polynucleotides useful in this aspect of the inven containing 2xSSC (one liter of 20xSSC contains 175.3 grams tion include (CAGG), where n is at least 4, preferably, at NaCl and 88.2 grams sodium citrate, pH 7.0) and 0.05% least 5. Preferably, such a polynucleotide includes a fluores sodium dodecyl sulfate (SDS), followed by 2 to 3 washes for cent label. The cells of an individual having an at risk allele 30 minutes each at 52 in a solution containing 0.15xSSC and 30 will include numerous nuclei containing the fluorescent 0.1% SDS. Other hybridization conditions for use when the labeled polynucleotide, while the cells of an individual not probe is at least about 200 nucleotides use the same prehy having an at risk allele will not include nuclei containing the bridization and hybridization conditions as described above, fluorescent labeled polynucleotide. but the wash conditions used are: 2 washes for 5 minutes each The present invention also provides a kit for identifying at room temperature in a solution containing 2xSSC and 35 whether an individual as at risk or not at risk for developing 0.05% SDS, followed by 1 wash for 15 minutes at 50° C. in a DM2. The kit includes the primers and/or probes discussed solution containing 0.15xSSC and 0.1% SDS, followed by 1 above in a Suitable packaging material in an amount Sufficient wash for 10 minutes at 50° C. in a solution containing 0.15x for at least one assay. Optionally, other reagents such as SSC and 0.1% SDS. When the probe used is about 20 to about buffers and solutions needed to practice the invention are also 22 nucleotides, the same prehybridization and hybridization 40 included. Instructions for use of the packaged polypeptide or conditions described above are used, but the wash conditions primer pair are also typically included. used are: two 15 minute washes at 45° C. in 2xSSC and 0.1% As used herein, the phrase "packaging material” refers to SDS. The nucleotide sequence of a target DNA molecule is one or more physical structures used to house the contents of generally a sequence complementary to the probe. The 45 the kit. The packaging material is constructed by well known hybridizing probe may contain 1 to 10 nonhybridizing nucle methods, preferably to provide a sterile, contaminant-free otides, preferably no greater than 5, more preferably no environment. The packaging material has a label which indi greater than 2 nonhybridizing nucleotides, that do not inter cates that the polynucleotides can be used for identifying fere with forming the noncovalent interaction. The nonhy whether an individual as at risk or not at risk for developing bridizing nucleotides of a probe may be located at an end or 50 DM2. In addition, the packaging material contains instruc within the hybridizing probe. Thus, a probe does not have to tions indicating how the materials within the kit are be complementary to all the nucleotides of the target DNA employed. As used herein, the term "package” refers to a solid sequence as long as there is hybridization under standard matrix or material Such as glass, plastic, paper, foil, and the hybridization conditions. In increasing order of preference, a like, capable of holding within fixed limits the primers and/or probe has at least about 20 nucleotides, at least about 200 55 probes. Thus, for example, a package can be a glass vial used nucleotides, at least about 350 nucleotides, most preferably at to contain milligram quantities of a primer pair. “Instructions least about 500 nucleotides. Preferred polynucleotides useful for use typically include a tangible expression describing the in this aspect of the invention include TTGG ACTTGGAAT reagent concentration or at least one assay method parameter, GAGTGAATG (SEQ ID NO:38), and nucleotides 16507 Such as the relative amounts of reagent and sample to be 16992 of SEQID NO:1. 60 admixed, maintenance time periods for reagent/sample In one embodiment of this aspect of the invention, the admixtures, temperature, buffer conditions, and the like. methods include identifying an individual that has or is at risk The present invention is illustrated by the following for developing DM2. The method includes digesting genomic examples. It is to be understood that the particular examples, DNA of an individual with a restriction endonuclease to 65 materials, amounts, and procedures are to be interpreted obtain polynucleotides, and probing the polynucleotides broadly in accordance with the scope and spirit of the inven under hybridizing conditions with a detectably labeled probe. tion as set forth herein. US 7,442,782 B2 13 14 EXAMPLES linkage disequilibrium were as follows: RP11-814L21 (AC022944); RP 11-723o4 (AC022993); and RP11-221e20 Example 1 (AC023598). Expanded CL3N58 Allele Found in DM2 Patients Identification of the Molecular Basis for Myotonic One of the markers in linkage disequilibrium with DM2. Dystrophy Type 2 CL3N58 (ps0.000001), showed an aberrant segregation pat tern. All affected individuals appeared to be homozygous by The myotonic dystrophy type 2 locus (also referred to as PCR, and affected children appeared not to inherit an allele proximal myotonic myopathy (PROMM) locus) was previ from their affected parent (FIG. 1, B and C). The PCR to ously mapped to chromosome 3q21 (Ranum et al., Nature 10 amplify the DM2 repeat region from genomic DNA used Genet., 19, 196 (1998), Day et al., Neuromuscul. Disord., 9, primers CL3N58-D F (5'-GCCTAGGGGACAAAGTGAGA 19 (1999)). Positional cloning was used to identify the DM2 (SEQ ID NO:10)) and CL3N58-D R (5'-GGCCTTATAAC mutation. We identified, obtained informed consent, per CATGCAAATG (SEQ ID NO:11)) in a PCR reaction con formed neurological exams, and collected blood samples 15 taining 200 uM dNTPs, 10 mM tris-HCl (pH 9.0), 50 mM from DM2/PROMM family members. Genomic DNA was KC1, 0.1% Triton X-100, 0.01% (w/v) gelatin, 1 mM MgCl, isolated from blood using the Puregene kit #D-5000 (Gentra 0.4 uM each primer, and 0.1 U Taq. The reaction was cycled Systems. Minneapolis, Minn.). Linkage analysis was per 30 times, where each cycle was 94°C. for 45 seconds, 57°C. formed using the LINKAGE package of computer programs for 45 seconds, and 72° C. for 1 minute. (version 5.1) (Lathrop et al., Proc. Natl. Acad. Sci. U.S.A. 81, Southern analysis was performed to investigate the possi 3443 (1984)). bility that the aberrant segregation pattern was caused by a The DM2 region was narrowed to a 2 centiMorgan (cM) repeat expansion or other rearrangement. BSOBI-digested interval by analyzing 10 recombinant (Ranum genomic DNA (5ug) was separated on an 0.8% agarose gel et al., Nature Genet., 19, 196 (1998)). Sequence data from this run for 4 hours at 11 OV, transferred to Hybond N+ membrane region, which is partially covered by 14 BACs, was used to 25 (Amersham, Piscataway, N.J.), and hybridized with a 485 develop 80 short tandem repeat (STR) markers. The sequence ZNF9 probe generated by PCR using the primers data was from McPherson et al., (Nature 409,934 (2001)), probe A F (5'-GAGAACCTTGCCATTTTCG (SEQ ID and BACs spanning the DM2 region were identified and NO:22) and probe A R (5'-CACCTACAGCACTGGCAACA ordered by sequence tagged site (STS) content mapping. (SEQ ID NO:23)) and random-prime-labeled (GibcoBRL, Additional polymorphic STR markers were developed using 30 Carlsbad, Calif.) with P-O-deoxyadensoine triphosphate di-, tri-, and tetranucleotide repeat sequences that mapped to (NEN, Boston, Mass.). To avoid partial digestions with the region (McPherson et al., Nature 409,934-41 (2001)). BsoBI, we used 120 U of enzyme in a digestion volume of 120 PCR primers for the following markers were as follows: ul. Membranes were prehybridized using RAPID-HYB CL3N49 (CL3N49 F 5'-GTGTGTGTGCATTTGTGTGC buffer (Amersham, Piscataway, N.J.) at 60° for 1 hour. (SEQ ID NO:6), CL3N49 R 5'-GAGGTTGCAGTGAGCT 35 Hybridization was done using at least 4x10 counts per GAATC (SEQ ID NO:7)); CL3N88 (CL3N88 F 5'-AGCT minute (cpm) total of the labeled probe, and incubation was GACCCTTGTCTTCCAG (SEQ ID NO:8), CL3N88 R overnight at 60°. The wash conditions were as follows: 2 5'-CAAACAAACCCAGTCCTCGT (SEQ ID NO:9)); washes for 5 minutes each at room temperature in a solution CL3N58 (CL3N58-D F 5'-GCCTAGGGGACAAAGT containing 2xSSC (one liter of 20xSSC contains 175.3 grams GAGA (SEQ ID NO:10), CL3N58-D R 5'-GGCCT 40 NaCl and 88.2 grams sodium citrate, pH 7.0) and 0.05% TATAACCATGCAAATG (SEQ ID NO:11)); CL3N59 sodium dodecyl sulfate (SDS), followed by 2 to 3 washes for (CL3N59 F 5'-GCTGGCACCTTTTACAGGAA (SEQ ID 30 minutes each at 52 in a solution containing 0.15xSSC and NO:12), CL3N59 R 5-ATTTGCCACATCTTCCCATC O.1% SDS. (SEQ ID NO:13)); CL3N83 (CL3N83 F 5'-GTGTG In addition to the expected normal allele, a variably sized TAAGGGGGAGACTGG (SEQ ID NO:14), CL3N83 R 45 expanded allele, too large to amplify by PCR, was detected by 5'-AAGCCCAAGTGGCATTCTTA (SEQ ID NO:15)); the Southern analysis and was found only in affected indi CL3N84 (CL3N84 F 5-TCATTCCCAGACGTCCTTTC viduals (FIG. 1, B and D). Modified electrophoresis condi (SEQ ID NO:16), CL3N84 R 5'-AATCGCTTGAACCTG tions enabled us to resolve a range of expansions between 10 GAAGA (SEQID NO:17)); CL3N99 (CL3N99 F5'-CTGC and 48 kb (FIG. 1E). For more accurate sizing of the high CGGTGGGTTTTAAGT (SEQ ID NO:18), CL3N99 R 50 molecular weight expansions, EcoRI-digested genomic DNA 5'-TGCAAGACGGTTTGAAGAGA (SEQ ID NO:19)); (5ug) was separated on a 0.4% agarose gel run 24 hours at 35 CL3N9 (CL3N9 F 5'-AGACACTCAACCGCTGACCT V along with high molecular weight DNA markers (Gibco (SEQ ID NO:20), CL3N9 R 5'-GATCTGGAAGTGGAGC BRL). BsoBI digests were more useful as a screening tool to CAAC (SEQID NO:21)). identify individuals with DM2 expansions, as the bands were Linkage disequilibrium analysis was performed on 64 par 55 stronger and more discrete. EcoRI digests worked better for ent-offspring trios in which affected individuals had the clini accurate sizing of large alleles, but the bands were often cal features of DM, which include myotonia, muscular dys present as Smears and were sometimes less distinct. The wash trophy, cardiac conduction defects, posterior iridescent conditions were as follows: 2 washes for 5 minutes each at cataracts, and endocrine disorders (Harper, Myotonic Dys room temperature in a solution containing 2xSSC and 0.05% trophy, ed. 2, Saunders, London, (1989).), but not the DM1 60 SDS, followed by 1 wash for 15 minutes at 50° in a solution mutation. Transmission disequilibrium testing (TDT) (Spiel containing 0.15xSSC and 0.1% SDS, followed by I wash for man et al., Am. J. Hum. Genet., 52, 506 (1993)), which was 10 minutes at 50° C. in a solution containing 0.15xSSC and performed using the GENEHUNTER program (version 1.0) O.1% SDS. (Kruglyak et al., Am. J. Hum. Genet., 58, 1347 (1996)), and To determine if this expansion was involved in the DM2 analysis of conserved ancestral haplotypes narrowed the 65 disease process, PCR and Southern analysis were performed DM2 locus to ~320 kilobases (kb) (FIG.1A). Genbank acces on: (i) 51 affected individuals in six families whose disease sion numbers for the three BACs spanning the region of was consistent with linkage to the DM2 locus; (ii) one US 7,442,782 B2 15 16 affected individual from each of 20 additional families with CAG-b, Orr et al., Nature Genet., 4, 211-226 (1993)), SCA2 ancestrally conserved DM2 haplotypes; and (iii) a panel of (SCA2-A & SCA2-B, Pulstet al., Nature Genet. 14, 269-276 control genomic samples representing 1360 chromosomes. (1996)), SCA3 (MJD52 & MJD25, Kawaguchi et al., Nature By PCR all 51 affected individuals in the six DM2 families Genet. 8, 221-228 (1994)), SCA6 (S-5-F1 & S-5-R1, appeared to be homozygous, with only one band detectable Zhuchenko et al., Nature Genet., 15, 62-69 (1997)), SCA8 by PCR, but had and expanded allele on subsequent Southern (SCA8 F3 & SCA8 R2, Koob et al., Nature Genet., 21,379-84 analysis. The maximum lod scores at 0–0.00 between the (1999))), sex, and disease status to confirm that the twins disease locus and the CL3N58 expansion for the six families described in FIG. 3B were monozygotic (p20.001). DNA were: MN1=6.9, MN6=1.5, MN10–8.2, MN12=2.8, from both parents and the twins were used to establish hap F134=10.4, and F047=1.8. The maximum LOD scores for 10 lotypes. Further examples of somatic instability included the these families providestrong evidence that the disease and the observation that the expansion size in lymphocyte DNA from expansion mutation are linked, and thus that the expansion an affected individual increased in size by approximately 2 kb mutation is responsible for DM2. Expanded alleles detected during the 3-year interval between blood donations (FIG. by Southern analysis were also found in affected representa 3C), and the age of affected individuals at the time they tives of all 20 additional families with ancestrally conserved 15 donated a blood sample directly correlated (r-0.41, r=0.17, DM2 haplotypes. PCR and Southern analysis identified no p=0.008) with the size of the expansion (FIG. 3D). Expansion control samples with an expansion. Unrelated control DNA sizes in the blood of affected children are usually shorter than samples included the grandparents from the panel of 40 Cen in their parents: the time-dependent somatic variation of tre d’Etude du Polymorphisme Humain (CEPH) families, repeat size complicates the interpretation of this difference spouses of patients diagnosed with muscular dystrophy or (Table 1). No significant correlation between age of onset and ataxia, and ataxia patients (n=1360 chromosomes). expansion size was observed. Analysis of DM2-affected and Normal Alleles Sequence of the CL3N58 marker contained the complex TABLE 1 repeat motif (TG), (TCTG)(CCTG) (SEQ ID NO:40). In 25 Parent-offspring transmissions of the expanded allele in blood. Allele our control group, the size of the (TG), (TCTG)(CCTG), sizes are given in Kb. Multiple expansion sizes indicative of somatic (SEQ ID NO:40) repeat tract ranged from 104-176bp (Het instability are found in some individuals. erozygousity=0.89) (FIG. 2A). Eight normal alleles were amplified from genomic DNA as described above, cloned Male Transmissions Female Transmissions with the TOPO cloning kit (Invitrogen, Carlsbad, Calif.) and Parental Alleles Offspring Alleles Parental Alleles Offspring Alleles sequenced. All of these normal alleles had CCTG repeat tracts 30 27, 20, 16 9 40 24 that were interrupted by both GCTG and TCTG motifs or by 36 2O 40 13 one or two TCTG motifs (FIG. 2B). The repeat tract in the 36 23 49 19 largest normal allele (combined TG/TCTG/CCTG (SEQ ID 49 27 19 10, 6 NO:40) repeats of 176 bp) was sequenced and shown to 29 27, 20, 6 40 11 35 48, 25 20, 5 40 16 contain 26 CCTG repeats with two interruptions. Smaller 17, 5 18, 9 42 20, 8 expanded alleles were amplified from genomic DNA using 48, 25 38 20, 8 7 primers CL3N58-B F (5'-TGAGCCGGAATCATACCAGT 33, 12 49, 17, 12 38 33 (SEQ ID NO:24)) and CL3N58-DR in a PCR reaction (200 uM dNTPs, 50 mM Tris-HCl (pH 9.1), 14 mM (NH4)SO4, 2 mM MgCl2, 0.4 uM each primer, 0.1% Tween-20, 10% dim 40 Assembly of the ZNF9 Genomic Sequence ethylsulfoxide, 0.75 UProof Sprinter enzyme (Hybaid-AGS, The DM2 expansion (CL3N58) was located in a region of Ashford, Middlesex, UK)) cycled 35 times (94° C. for 30s, the genome for which the available sequence was not com 51° C. for 30s, 72°C. for 1 mm). These expansions were also pletely ordered. To determine the location of the DM2 expan cloned with the TOPO cloning kit and sequenced, demon 45 sion, portions of the BACRP11-814L21 were sequenced to strating that the CCTG portion of the repeat tract is expanded. assemble unfinished sequence contigs. Unordered sequence In contrast to alleles from the control samples, the CCTG contigs from BAC RP11-814L21 (AC022944) were con repeat tracts on expanded alleles were uninterrupted. Expan nected by sequencing from the ends of the known sequence sion sizes for very large alleles were estimated by Southern contigs using the following primers: 773 (5'-CCTGACCT analysis assuming that, consistent with the sequenced expan 50 TGTGATCCGACT (SEQIDNO:25)),663 (5-TGCTTTAT sions, lengthening of the CCTG repeat tract accounts for the TATAGATTGGAATCCTCA (SEQ ID NO:26)), 66B 3' (5'- increase in molecular weight. The range of expanded allele AAGACACCTGTCCCCCTAGAA (SEQ ID NO:27)), 39-5' sizes is extremely broad, from 75 to ~ 11,000 CCTG repeats (5'-GGGTGACAGAGCAAG ACTCC (SEQID NO:28)), 52 with a mean of ~5000 (FIG. 2C). Shorter expansions were 3' (5'-TTTTAAACAATGCTACTTAGAATTTCA (SEQ ID found in individuals with multiple allele sizes in blood. 55 NO:29)),525 (5'-GCCGAATTCTTTGTTTTTGC (SEQID Instability of the DM2 Expansion NO:30)),595 (5'-TTGCTGCAGTTGATGGCTAC(SEQID In approximately 25% of affected individuals two to four NO:31)), 59B 3' (5'-TGAATTTACTAAGGCCCTTCCA bands were observed in DNA isolated from blood, represent (SEQ ID NO:32)), and 59C 3' (5'-GTGCTCACCTCTC ing expanded alleles of various sizes (FIG. 3A, Table 1). CAAGCTC (SEQID NO:33)). These connections were also Some bands were discrete in size, some appeared as unre 60 verified by overlap with sequence from Celera Solved compression bands at the top of the gel, and others (x2HTBKUAD8C) (Venter et al., Science 291, 1304-51 showed a broad variation of molecular weight. An additional (2001)). example of Somatic instability included a pair of genetically Our sequencing data and sequence from the Human confirmed (ps0.001) monozygotic twins (31 y/o) that had Genome Project (McPherson et al., Nature 409,934 (2001)) dramatically different expanded alleles (13 kb and 24 kb) 65 indicate that the expansion is located in intron 1 of the Zinc (FIG. 3B). Bayesian statistics were used on 6 STR markers fingerprotein9 (ZNF9) gene (FIG. 4A), also referred to as the from different chromosomes (D3S3684, SCA1 (CAG-a & cellular nucleic acid-binding protein gene. GenBank acces US 7,442,782 B2 17 18 sion numbers are as follows: genomic sequence of the DM2 is expressed but we do not yet know if the RNA foci contain region (AF389886, AF3898.87); CL3N58 sequence the entire unprocessed ZNF9 transcript. The antisense CCUG (AF388525); expanded CL3N58 sequence (AF388526); probe showed no nuclear foci in DM1 muscle. Although the ZNF9 mRNA (M28372); original ZNF9 genomic sequence antisense probe to the CUG repeat also hybridized to foci in (U19765). The Celera accession number for the contig over DM2 muscle, we believe this signal was caused by non lapping ZNF9 is x2HTBKUAD8C. specific cross-hybridization to the extremely large CCUG ZNF9 contains seven Zinc finger domains and is thought to repeat tract (11,000 repeats). bean RNA-binding protein (Rajavashisthet al., Science 245, 640 (1989), Pellizzoni et al., J. Mol. Biol., 267,264 (1997)). Discussion Although the originally reported genomic sequence for ZNF9 10 These results demonstrate that DM2 is caused by an (Pellizzoni et al., J. Mol. Biol., 281, 593 (1998)) did not untranslated CCTG expansion. DM2 shows remarkable clini contain the CL3N58 marker, we have generated additional cal similarity to DM1, although the disease course of DM2 is sequence, used sequence from Celera (Flinket al., Gene 163, usually more benign. Clinical and molecular parallels 279 (1995)), and performed Southern and RT-PCR analysis to between these diseases indicate that the CUG and CCUG. confirm the location of the expansion. To confirm the 15 expansions expressed at the RNA level can themselves be genomic organization of the ZNF9 gene. Nsil-digested pathogenic and cause the multisystemic features common to genomic DNA (5 g) was hybridized with an exon 5 probe DM1 and DM2. Given the similarity of the DM1 and DM2 generated by PCR using the primers ZNF9-E5F (5'-GTAGC repeat motifs and the fact that the expansions accumulate as CATCAACTGCAGCAA (SEQID NO:34)) and ZNF9-E5 R RNA foci, RNA-binding proteins that bind to the DM1 CUG. (5'-TAATACGACTCACTATAGGGAG expansion may also bind to the DM2 CCUG expansion caus GACGGGCTTACTGGTCTGACTC (SEQ ID NO:35), T7 ing similar global disruptions in RNA splicing and cellular RNA polymerase promotor sequence is in italics). metabolism (Timchenko et al., Nucleic Acids Res., 24, 4407 The expression of ZNF9 RNA was evaluated in different (1996), Luet al., Hum. Mol. Genet., 8,53 (1999), Milleret al., tissues by Northern analysis. A human multiple-tissue North EMBO.J., 19, 4439 (2000)). One of these proteins has been ern blot (Clontech, Palo Alto, Calif.) containing tissues from 25 shown to have a preferential affinity for UG dinucleotides brain, heart, muscle, colon, thymus, spleen, kidney, liver, (Takahashi et al., Biochem. Biophys. Res. Commun., 277,518 Small intestine, placenta, lung, and peripheral blood lympho (2000)), which are found in both DM1 and DM2 expansions. cytes (PBLs) was hybridized at 68°C. in UltraHyb hybrid If these same RNA-binding proteins are involved in DM2 ization buffer (Ambion, Austin,Tex.) with a 423 bp riboprobe pathogenesis, then one could speculate the longer CCUG that included exon 5 of ZNF9 (FIG. 5, upper panel). The PCR 30 repeat tracts cause the milder DM2 phenotype because the product was generated from genomic DNA with the primers affinity of these proteins for the CCUG repeat tract is not as ZNF9-E5 F and ZNF9-E5 R as described above and used for strong. Alternatively, a different set of RNA-binding proteins in vitro transcription using the Maxiscript kit (Ambion) and may bind to the CCUG expansion. incorporating 32P-(alpha)-deoxycytosine triphosphate (ICN, DM2 is the fourth example of a dominant disease that is Costa Mesa, Calif.) into the riboprobe. ZNF9 transcripts were 35 caused by a microsatellite expansion located in a transcribed found to be broadly expressed, and most abundant in heart but untranslated portion of its respective genes. On the and skeletal muscle, two tissues prominently affected in molecular level, the CCTG DM2 expansion has parallels to DM2. the untranslated CTG expansions involved in both DM1 In situ hybridization has been used to detect nuclear foci (Groenen et al., Bioessays 20, 901 (1998), Tapscott, Science containing the CUG expansion in DM1 cells (Taneja et al., J. 40 289, 1701 (2000)) and SCA8 (Koob et al., Nature Genet., 21, Cell Biol., 128,995 (1995)). Because DM2 is also caused by 379 (1999)) as well as the ATTCT expansion in SCA10 (Mat an expansion motif, we performed fluorescent in situ hybrid suura et al., Nature Genet., 26, 191 (2000)). The DM2 tetra ization to determine if similar repeat-containing nuclear foci nucleotide and the SCA 10 pentanucleotide expansions are are found in DM2. Briefly, for in situ hybridization of muscle generally longer than the expansions associated with the trip sections (Reddy et al., Nature Genet., 13, 325 (1996)), we 45 let repeat diseases, with the largest DM2 and SCA10 repeats used 0.2 ng/ul 2'-O-methyl RNA oligonucleotides 5" labeled estimated to be 211,000 and 4,500 repeats, respectively. with Cy3 (IDT, Coralville, Iowa). The (CAGG)n, (CCUG)n, Repeat instability in DM2 is complicated by the compound and (CAG)n oligonucleotides were all 20 bases in length. repeat motif (TG), (TCTG) (CCTG) (SEQ ID NO:40) and Fluorescence was visualized using a Zeiss Axioplan2 micro time-dependent somatic instability of the expansion. Scope equipped with a Spot CCD camera (Diagnostic Instru 50 Although similar somatic instability is seen in DM1 and ments, Sterling Heights, Mich.). Appropriate exposure times FMR1 (Wong et al., Am. J. Hum. Genet. 56, 114(1995), were computed using the DM2/CAGG slide, and the other Moutou et al., Hum. Mol. Genet. 6,971 (1997), Helderman probes were photographed using this exposure setting. van den Enden et al., J. Med. Genet. 36,253 (1999), Lopez de Fluorescently labeled antisense oligonucleotide probes to Munain et al., Ann. Neurol. 35, 374 (1994)), the size differ the CCUG repeat were hybridized to control, DM2, and DM1 55 ences for DM2 can be much larger, up to 9000 repeats in the muscle biopsy tissue. The DM2 muscle biopsy was from an blood of one affected individual. Clinical anticipation has affected member of the 3q-linked MN1 family (LOD=6.9), been reported in DM2/PROMM families (Schneider et al., who had a CCTG expansion detected by Southern analysis. Neurol., 55,383 (2000)). Similarly, DM1 tissue was taken from a genetically con firmed DM1 patient. Numerous intense CCUG-containing 60 Example 2 nuclear foci were observed in DM2 but not control muscle. In DM2 muscle, 1-5 foci were seen per nucleus, with no foci Repeat Assay detected in the cytoplasm. In general, more foci were seen per nucleus in DM2 than were seen using antisense probes to the In most cases the expanded alleles are too large to amplify CUG expansions in DM1 muscle. The sense CCUG probes 65 by PCR (FIG. 5A, and Example 1 above). All affected indi showed no nuclear foci, indicating that the probe hybridized viduals appear to be homozygous by PCR (FIG. 5A, lanes 2 to RNA not DNA. Our results show that the CCTG expansion and 3), and affected children often do not appear to inheritan US 7,442,782 B2 19 20 allele from their affected parent. Because some normals can Example 3 be true homozygotes that can not be distinguished from the DM2 hemizygotes, in some cases CL3N58 PCR is not a DM2 in 133 Families: Clinical Features Common to definitive test for the DM2 expansion. Southern-blot analysis DM1 Demonstrate Pathogenic Effects of can used to detect the presence of the expanded allele in 5 CUG/CCUG RNA Expansions are Multisystemic affected individuals, as well as confirm the lack of any expan sion in any unaffected homozygotes. However, in some cases Methods it can be difficult to visualize the expansion with Southern Family Identification and Clinical Studies blot analysis. Note that there is not the 1:1 correlation in 10 intensity between normal and expanded alleles (FIG. 5B, Subjects with the clinical diagnosis of DM2 or PROMM, lanes 1-4), as is seen in other expansion diseases such as and without a DM1 expansion were enrolled in the research, SCA8 (FIG. 5B, lane 8). Sometimes the expanded allele(s) as were all available family members at risk for the disorder, can appear so much fainter than the normal allele as to be and spouses with an affected or at risk child. Findings are indistinguishable from background (FIG. 5B, lanes 6 and 7). 15 reported for subjects with CCTG DM2 expansions. Studies were performed over a 10-year period of time, with To detect the presence of DM2 expansions from individu additional testing included as understanding of the disease als for whom Southern blot analysis either appears to be evolved. Subjects were interviewed and examined in both negative or is inconclusive, an additional assay, referred to as clinical and community settings. Electrophysiological the Repeat assay or repeat assay (RA), was developed by assessment was done with portable electromyographic equip modifying a version of PCR developed by Warner et al. (J. ment, including Nicollet and Dantec/Medtronik electromyo Med. Genet., 33, 1022-1-26 (1996)) and Matsuura and Ash graphic equipment. Ophthalmologic examinations in the field izawa (Ann. Neurol., 51, 271-272 (2002)) for the detection of were performed with direct ophthalmoscopy; some individu DM1 and SCA10 repeat expansions. This assay can reliably als additionally underwent slit lamp examinations in ophthal identify the presence or absence of DM2 expansions, 25 mology clinics. Muscle biopsies were quick-frozen, sec although the size of any detected expansion cannot be deter tioned and stained with hematoxylin and eosin for most mined. results reported, identifying fiber types by ATPase staining at The DM2 repeat region (TG/TCTG/CCTG) (SEQ ID different pH values. Clinical results are reported as percent NO:40) was amplified from genomic DNA using the primers 30 ages of individuals tested for each specific feature. CL3N58-D R (5'-GGCCTTATAACCATGCAAATG (SEQ Genetic Methods ID NO:11), JJP4CAGG (5'-TACGCATC CL3N58 PCR amplification across the DM2CCTG repeat, CGAGTTTGAGACGCAGGCAGGCAGGCAGGCAGG and Southern analysis, was done as described Example 1. The (SEQ ID NO:36)), and JJP3(5'-TACGCATC Repeat assay was performed as described in Example 2. CGAGTTTGAGACG (SEQ ID NO:37)). CL3N58-D R 35 binds to a unique sequence upstream of the TG/TCTG/CCTG Results (SEQ ID NO:40) repeat tract. JJP4CAGG consists of the Patient Population repeat sequence with 5' hanging tail sequence that has negli We have studied 352 subjects genetically diagnosed as gible complementarity to any known human sequence. The 40 having DM2 from 133 German and Minnesota families. Most repeat portion of JJP4CAGG will bind randomly at multiple families could trace an affected ancestor to Germany or sites within an expanded CCTG tract, giving rise to PCR Poland and all were of European descent. 332 males and 420 products of varying sizes, visualized as a Smear. JJP3 was females at risk for the disease participated in the study, of complementary to the hanging tail sequence in JJP4CAGG those 147 males and 208 females were positive for the DM2 when incorporated into a PCR product, and was used to 45 expansion. The age of the participants ranged from 8 to 85 increase the robustness of the PCR reaction. Optimal ampli with a mean age of 47 years. fication was found using PCR reactions of 25ul volumes with Clinical Features of DM2 Patients the following buffer components: (200 uM dNTPs, 50 mM Muscle Symptoms and Signs. Similar to DM1, myotonia Tris pH 9.1, 14 mM (NH)SO 2 mM MgCl, 0.4 uM each 50 and muscle weakness are the most common symptoms primer, 0.1% Tween-20, 10% DMSO, 0.75 U Proof Sprinter reported in DM2 subjects of all ages (Table 2). Similarly, the enzyme (Hybaid-AGS)). The PCR conditions consisted of an characteristic pattern of muscle weakness in DM1 affecting initial denaturing at 95°C. for 15 minutes, 35 PCR cycles (94° neck flexion, thumb or finger flexion, and elbow extension is C. for 30 seconds, 51° C. for 30 seconds, 72° C. for 2 min also present in DM2(Harper et al., Neurology, 56, 336-340 55 (2001)). Facial and ankle dorsiflexor weakness, features of utes), and an additional extension at 72° C. for 10 minutes. DM1, were present to a lesser degree in DM2 subjects. Sub Five microliters of 6x loading dye was added to the PCR jects with DM2 or PROMM frequently developed symptom product and 25ul were loaded onto an 1% agarose gel with 1 atic weakness after age 50 years, when they began to com ul of ethidium bromide solution (10 ug/ul) per 100 mls and plain of difficulty standing up from a squatting position. run for 45 mm to 1 hr at 150 V. The gels were transferred to 60 Although hip-flexion weakness is the reason most DM2/ Hybond N-- membrane (Amersham, Piscataway, N.J.) and PROMM subjects seek medical attention, in DM1 it often hybridized with an internal primer CL3N58E-R (5'-TTG develops after patient have sought medical assistance for GACTTGGAATGAGTGAATG (SEQ ID NO:38)) probe other problems. Muscle pain, which is common in DM2, is end-labeled with P-g-dATP using Rapid-Hyb buffer (Am also common, although less recognized in DM1. Among ersham, Piscataway, N.J.) according to the manufacturers 65 DM2 patients between 21-34 years of age, only 36% com instructions. After hybridization the membrane was washed plained of weakness, but on examination weakness was at 45° C. in 2xSSC and 0.1% SDS and exposed to X-ray film. demonstrable in 59%. US 7,442,782 B2 21 22 this is a prominent and early feature of the disease. In DM1, TABLE 2 cataracts evident by ophthalmoscopy typically develop in the 3'-5' decades of life, with small percentages occurring in the CLINICAL FEATURES OFDM2AND DM1 2" decade of life. DM2 Subjects by Age Cardiac Features. In DM2 patients, cardiac complaints include frequent palpitations, intermittent tachycardia and 21–34y 35–50 y >50 y episodic syncope. These symptoms, which are present in both (n = 45) (n = 77) (n = 100) DM1 DM1 and DM2, increase in frequency with age (Table 2). Skeletal Muscle Features Cardiac conduction abnormalities, either atrioventricular or 10 intraventricular blocks, were seen in 20% (9/44) of DM2 History of Muscle Pain 43% 61% 63% +f++ Myotonia. By History 39 39 34 ------patients. Conduction abnormalities are more frequent in On Physical Exam 8O 84 71 ------DM1, butpatients with either disease can develop unexpected On EMG (210) 87 94 92 ------fatal arrhythmias. Cardiomyopathy, a debilitating and life Weakness By History 36 69 84 ------threatening condition found in 7% of DM2 patients over 50 Any Weakness on 59 85 99 15 years, is rarely reported in DM1. Exam Facial 18 9 13 ---- Systemic Changes. A striking feature of myotonic dystro Neck Flexion 47 75 95 ------phy is the idiosyncratic involvement of nearly all organ sys Elbow Extension 8 16 52 ---- tems. Many of the features that include the broad clinical Thumb? Finger Flex 39 63 49 ------Hip Flexion 36 58 88 -- presentation of DM1 are mirrored in DM2. Laboratory results Ankle Dorsiflexion 9 14 19 ---- from 150 patients showed elevated serum CK (typically less Deep Knee Bend 26 48 77 -- than 5x the upper limits of normal) and GGT. Additional High CK 88 91 93 ---- serological testing on 20 patients showed low IgG and IgM, Multisystemic Features but normal IgA. As in DM1, evidence of primary male Cardiac Arrhythmia Palp 79% 27% 27% - hypogonadism was present in the majority of males, with Cardiomyopathy O O 7 +f- 25 elevated FSH, low or low-normal testosterone levels, and Cataracts By history or exam 36 59 78 ---- several men having documented oligospermia. Blood glucose Hx Extraction 13 18 55 levels showed diabetes in 23% (n=79). Formal glucose toler Diabetes By history 4 17 36 -- ance testing showed insulin insensitivity (elevated basal insu Additional Laboratory Findings lin levels or prolonged insulin elevation, n=16). Age-indepen 30 dent hyperhydrosis reported by 20-30% of DM2 patients is DM2 also present in DM1, and early-onset male frontal balding is Mean Age common in both disorders. (Age Range) % Affected DM1 Age of onset. Initial DM2 symptoms were reported to have occurred from ages 8-67, with a mean age of onset of 48 y. Serology High GGT (152) 46 y 64% -- 35 (13–78 y) Individuals less than 21 years of age were not routinely Low IgG (20) 46 (28-64) 65 ---- enrolled in the study, but analysis of reports from 12 such Low IgM (20) 46 (28-64) 11 -- genetically affected individuals showed reports of muscle Low Testosterone (22) 45 (27–64) 29 ---- pain, myotonia, and hyperhydrosis, but not weakness, cardiac High FSH (26) 42 (16-64) 65 ---- symptoms, diabetes or visual impairment from cataracts. A insulin Insensitivity (16) 47 (28–75) 75 ---- 40 EKG AV Block (44) 47 (1673) 11 ---- severe congenital form of DM2 has not been observed. V Block (44) 47 (1673) 11 -- Muscle internal nuclei (42) 50 (16-64) 95 ---- Genetic and Molecular Features Biopsy Nuclear Bag fibers (36) 50 (16-64) 89 ---- Diagnostic Methods and Instability. The unprecedented Abnl fiber typing (31) 50 (16-64) 16 +f- size and somatic instability of the DM2 expansion complicate Necrotic fibers (38) 50 (16-64) 47 -- Fibrosis (38) 50 (16-64) 71 -- 45 molecular testing and interpretation of genetic test results (FIG. 6). The DM2 locus contains a complex repeat motif DM1 features are reported as being almost universally present (+++), (TG)n(TCTG)n (CCTG)n (SEQID NO:40), with the CCTG common and almost universally present late in the course of the disease (++), portion expanding on affected alleles. The expanded alleles well recognized and common late in the course of the disease (+), are too large to amplify by PCR, causing all affected indi and recognized but not common (+/-). 50 viduals to appear homozygous (FIG. 6B, lanes 2 and 3) and thus indistinguishable from the 15% of unaffected controls Muscle Biopsies. Muscle biopsies from 42 DM2 patients who are truly homozygous. Family studies can distinguish were indistinguishable from DM1 biopsies on routine stud true homozygotes from expansion carriers, (FIG. 6B lanes ies, with a high percentage of fibers having centrally located 1-3), because affected children often do not appear to inherit nuclei that sometimes occur in chains, angulated atrophic 55 an allele from their affected parent. We refer to this apparent fibers sometimes occurring in groups, severely atrophic non-Mendellian inheritance pattern, which is caused by the (“nuclear bag') fibers, hypertrophic fibers, occasional failure of the expanded allele to amplify, as the presence of a necrotic fibers, fibrosis and adipose deposition. There was no “blank allele.” Demonstration of a blank allele provides consistent abnormality of fiber type distribution, with 2 biop strong evidence that a family carries a DM2 expansion, but sies having mild type 1 predominance and 2 having mild type 60 can also occur due to non-paternity. 2 predominance. Atrophic angulated fibers of both fiber In other expansion disorders, Southern analysis (FIG. 6C) types, as determined by ATPase staining, were evident in can reliably confirm the presence of expansions too large to most biopsies. Cataracts. The posterior Subcapsular irides amplify by PCR. Because of the unprecedented size (>11,000 cent cataracts are identical in DM1 and DM2 patients. Cata CCTG repeats) and somatic instability of the DM2 repeat, racts needed to be extracted in 75 individuals at ages ranging 65 genomic Southerns fail to detect 26% of expansions in known from 28-74 years. Among 10 genetically positive Subjects carriers. Expanded alleles when detected can appear as single under 21 years, cataracts were present in two, indicating that discrete bands, multiple bands, or smears (FIG. 6B). Com US 7,442,782 B2 23 24 pared to other expansion disorders, such as SCA8 (FIG. 6D, erational changes in repeat length are much greater for DM2 lane 8), in which the expanded and normal alleles are equally than for any other microsatellite disorder (FIG. 9B). intense, detectable DM2 expansions are almost always less Pedigree Examples of Instability intense than the normal alleles. This intensity difference indi The pedigree shown in FIG. 6E illustrates the diagnostic cates that even when a proportion of the expanded alleles 5 challenges and types of repeat instability that are typical in a create a discrete visible band, the rest of the expanded alleles DM2 family. Intergenerational repeat sizes can vary dramati vary markedly in size resulting in a diffuse undetectable cally. For example individual III-7 has a smaller expansion Sea. than her affected parent, which is larger in one of her children To detect the presence of DM2 expansions in individuals and smaller in the other. Somatic instability is strikingly with inconclusive Southern blots, the repeat assay (RA) 10 illustrated by monozygotic twins III-1 and III-2, with expan described in Example 2 was used. By using a PCR primer that sion sizes that differ in size by 11 kb (2750 CCTGs). Some hybridizes and primes from multiple sites within the elon family members have single discrete expansions, multiple gated CCTG repeat tract, this assay reliably identified the expansions and diffuse bands. An example of the utility of the presence or absence of DM2 expansions. To insure specific repeat assay is demonstrated by individual II-5, who was RA ity, the PCR products were transferred to a nylon membrane 15 positive but negative by Southern analysis. and probed with an internal oligonucleotide probe. When the probe was used there were no false positives in 320 control Discussion chromosomes. In contrast there was a 5% false positive rate Clinical Features when the PCR products were visualized directly without use This study details the broad idiosyncratic features common of an internal probe. As detailed in Table 3, the DM2 repeat to DM1 and DM2, demonstrating the multisystemic effects of assay is a sensitive and specific method to identify DM2 CUG and CCUG RNA expansions in disease pathogenesis. expansions, increasing the detection rate from 74% by DM2 closely resembles adult-onset DM1, with a long list of genomic Southern analysis alone to 99% using both methods. common features including progressive weakness, myotonia, Among all of the samples tested, 352 individuals have been disease specific muscle histology, cardiac arrhythmias, iri identified from 133 families who were genetically confirmed 25 descent cataracts, male hypogonadism, early-onset balding, by Southern and/or RA analyses. insulin insensitivity, and hypogammaglobulinemia. The pres ence of these seemingly unrelated features in both DM1 and TABLE 3 DM2 indicates that a common pathogenic mechanism is Expansion Expansion Expansion Detected by 30 likely responsible for both disorders. Confirmed Detected Detected either Southern or Repeat Despite the striking similarities of DM2 and adult-onset DM2 Cases by Southern by Repeat Assay Assay DM1, there are differences. One clear distinction is the lack of 174 128 166 172 a congenital form of DM2. Other differences of DM2 include 74% 95% 99% an apparent lack of mental retardation, and less evident cen tral hypersomnia, severe distal weakness, and marked muscle *Individuals independently confirmed to have DM2 expansions by presence 35 atrophy. DM1 individuals often come to medical attention of “blankallele" or linkage analysis because of the mental retardation or disabling distal weakness The correlation of the repeat size with various measures of and myotonia, but DM2 patients typically first seek medical disease onset for individuals with single bands on Southern evaluation when they develop proximal lower extremity analysis is shown in FIG.8. For repeat size versus age at onset 40 weakness. Although many DM2 features are milder than in of initial symptom (n=91) there was a positive correlation DM1 (clinical myotonia, distal and facial weakness), some r=0.28 (p=4.2x10, r=0.08). For repeat size versus age at appear to be equally significant (cataracts, hypogonadism, onset of weakness (n=59) a positive correlation coefficient of and insulin insensitivity), and others may be more severe in r=0.53 was obtained (p=8.7x10, r=0.28). No significant DM2 (cardiomyopathy). It remains to be determined whether the generally milder phenotype of DM2, despite the presence correlation was observed between repeat size and age of 45 cataract extraction (n=29). The positive correlations between of a much larger genetic repeat expansion, indicates that the repeat length and age of onset, as well as repeat length and pathophysiological effects of CCTG expansions are simply onset of weakness, were Surprising because in all other mic less severe than CTG expansions, or whether secondary pro rosatellite expansion disorders larger expansions are associ cesses augment the pathophysiological mechanisms in DM1. ated with earlier ages of onset. To determine if these positive 50 We have identified 389DM2-positive individuals from 133 correlations could be explained by the increase in CCTG families. Our ability to identify a large number of DM2 fami repeat tracts with age, multivariate analysis was performed lies in both Minnesota and Germany indicates that initial and indicated that the effect of age on repeat length explained estimates that 98% of DM families have the DM1 expansion more than 98% of the apparent of effect of repeat length on are too high, at least in Northern European populations. DM1 families often come to medical attention when a child is onset of symptoms. 55 Although complicated by Somatic instability and increases severely affected. In contrast the lack of congenital DM2 may in repeat length with age, we compared repeat lengths of 19 explain its apparent underdiagnosis. DM2 patients often seek medical attention for isolated disease features, without being affected parent-child pairs from a subset of individuals in aware of their complex underlying disease. A genetic diag which both the parent and child had single bands on Southern nosis of DM2 will improve patient care by facilitating better analysis (FIG. 9A). Surprisingly, we observed apparent 60 reductions of repeat length in 16 of 19 transmissions, with a monitoring of the diverse clinical features known to be part of mean change of -13 Kb (-3250 CCTG repeats). In one the disease, including early onset cataracts, diabetes, testicu instance the repeat size was 38 Kb smaller in the affected lar failure and cardiac arrhythmias. child (-9500 CCTG repeats). There were apparent size Genetics increases in 2 transmissions (+8 and +13. Kb). No differences 65 Unique genetic features of the DM2 expansion include the in degree or direction of intergenerational changes were seen following: i) it is the first pathogenic tetranucleotide expan in male VS. female transmissions. These apparent intergen sion; ii) expansions are larger than reported in any other US 7,442,782 B2 25 26 disease (more than 44 Kb in DM2 versus 12 Kb in DM1); iii) of pathology. Determination of DM2. Somatic mosaicism in there is an unprecedented degree of Somatic instability. The tissues other than blood may help clarify the pathogenic Somatic instability is so dramatic that ~/4 of the expansions effects of the expansion, although somatic mosaicism are not detectable by Southern analysis, which results in a observed in other tissues (such as skeletal muscle) may con diagnostic challenge not previously reported, even among 5 tinue to obscure length-dependent pathological effects. The disorders with large expansions such as DM1, SCA8, and intergenerational differences in repeat length in DM2, with SCA10. Although the somatic instability complicates the Surprisingly shorter repeat tracts seen after both maternal and Allisi EM2, combining it with the RA paternal transmission, may also be affected by the marked improvesIn other detection reported tomicrosatellite >99%. expansion disorders larger 10 range of repeat size in each affected individual and the repeat tracts are associated with earlier onset and increased increase in repeat length over time. disease severity. Although anticipation has been reported in The complete disclosure of all patents, patent applications,

DM2/PROMM families based on clinical criteria,s the and publications, and electronically available material (in expected trend of longer repeats being associated with earlier cluding, for instance, nucleotide sequence submissions 1n, ages of onset was not observed. The somatic heterogeneity, 15 e.g. GenBank, dbSTS, and RefSeq, and amino acid sequence and the fact that the size of the repeat dramatically increases submissions 1Il, C.2. SwissProt, PIR, PRF, PDB, and transla in size with age, complicate this analysis and may mask tions from annotated coding regions in GenBank and RefSeq) meaningful biological effects of repeat size on disease onset. cited herein are. incorporated by reference. The foregoing It is also possible that expansions over the pathogenic size detailed description and examples have been given for clarity threshold exert similar effects regardless of how large they 20 of understanding only. No unnecessary limitations are to be become, or even that Smaller repeats are more pathogenic understood therefrom. The invention is not limited to the than larger repeats. In adult-onset DM1, the tightest correla- exact details shown and described, for variations obvious tO tions between repeat length and disease onset are for repeats one skilled in the art will be included within the invention less than 150 CTGs, which may indicate that correlations at defined by the claims. larger repeat sizes for DM1 are also difficult to measure either 25 All headings are for the convenience of the reader and because of increased somatic mosaicism or a ceiling effect in should not be used to limit the meaning of the text that follows which repeat sizes over a certain length cause similar degrees the heading, unless so specified.

SEQUENCE LISTING

<16 Oc NUMBER OF SEO ID NOS: 39

<210 SEQ ID NO 1 <211 LENGTH: 224 OO &212> TYPE: DNA <213> ORGANISM: homo sapiens &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (14469).. (14473) <223> OTHER INFORMATION: any nucleotide <4 OO SEQUENCE: 1 actaatgaaa tott cattaa atgatttgtc agtgtttcaa agtttctitta to at cagtta 60

gcatt coct a caccatcact ttagggagtic aatagaattt tataggagta ggcttcagta 12O agtactggga ggc.cagtagc actgttcaag cacatgggct gtctgttgttt galagcctgac 18O

ctgtcatttgttcactittct gactitgagca gattacttaa actict citcqa cotgtttctt 24 O

catggggata at acaagtac ttctaggggg tttgttgaaaa ct cataaaga ggittaaaaac 3 OO

attgcctggc at acagtaag cacccaataa gaataagaaa taat atttgt at agtaatta 360 tgc.ca.gaaac tdttataaga gotgtatata tattaacaat tagct attac tagt attact 42O aattic ctagt to aggattta gottagtaaa ctittctgctt cagaa.gcaaa tacgagaggit 48O

gaaaa.catca atttatt ct c ct cagt citta gttacatact titccaagttca agt cacagag 54 O

cacaattitcc ttgctggcag ggacaagaca tdggitttaca tdatat cacc tatcc cctica 6 OO

atttalacagc atgtact atg cagttgggca tttalagcaga aattaa.gagt tdgCaggitat 660

tgct caact g g tacc cattt taggaataat gctgaat cat agcatttitat ctdgtottct 72O

cticaggatac ttaatgctaa tttitttgtat ttittagtaga gacgggattt caccittgtta 78O

gcc aggatgg to tcgatctic ctacct cat gacc catctg cct cqgc.ccc ccaaagtgct 84 O

US 7,442,782 B2 47 48

- Continued <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 8 agct gaccct tdt citt coag

<210 SEQ ID NO 9 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 9 caaacaaacc cagt cct cqt

<210 SEQ ID NO 10 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 10 gcct agggga caaagtgaga

<210 SEQ ID NO 11 <211 LENGTH: 21 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 11 ggcc titataa ccatgcaaat g 21

<210 SEQ ID NO 12 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 12 gctggCacct titt acaggaa

<210 SEQ ID NO 13 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 13 atttgccaca tott cocatc

<210 SEQ ID NO 14 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 14 gtgttgttalagg giggagactgg US 7,442,782 B2 49 50

- Continued <210 SEQ ID NO 15 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 15 aag.cccaagt giggatt citta

<210 SEQ ID NO 16 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 16 t cattcc.cag acgtcc titt c

<210 SEQ ID NO 17 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 17 aatcgcttga acctggaaga

<210 SEQ ID NO 18 <211 LENGTH: 19 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 18

Ctgc.cggtgg gttittaagt 19

<210 SEQ ID NO 19 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 19 tgcaagacgg tttgaagaga

<210 SEQ ID NO 2 O <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 2O agacacticaa cc.gctgacct

<210 SEQ ID NO 21 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer US 7,442,782 B2 51 52

- Continued <4 OO SEQUENCE: 21 gatctggaag tagg caac

<210 SEQ ID NO 22 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 22 gaga accttg ccatttitt cq

<210 SEQ ID NO 23 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 23 caccitacago actggcaa.ca

<210 SEQ ID NO 24 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 24 tgagc.cggaa t cataccagt

<210 SEQ ID NO 25 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 25 cctgaccttg tdatc.cgact

<210 SEQ ID NO 26 <211 LENGTH: 26 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 26 tgctitt atta tagattggaa toc to a 26

<210 SEQ ID NO 27 <211 LENGTH: 21 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 27 aaga cacctg. tcc ccctaga a 21

<210 SEQ ID NO 28 <211 LENGTH: 2O US 7,442,782 B2 53 54

- Continued

&212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 28 gggtgacaga gcaagacticc

<210 SEQ ID NO 29 <211 LENGTH: 27 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 29 ttittaaacaa togctact tag aattitca 27

<210 SEQ ID NO 3 O <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 30 gcc.gaattct ttgtttittgc

<210 SEQ ID NO 31 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 31 ttgctgcagt tatggctac

<210 SEQ ID NO 32 <211 LENGTH: 22 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 32 tgaatt tact aaggcc ctitc. ca. 22

<210 SEQ ID NO 33 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 33 gtgctic acct citccaa.gctic

<210 SEQ ID NO 34 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: primer <4 OO SEQUENCE: 34 US 7,442,782 B2 55 56

- Continued gtagccatca actgcagdaa

SEO ID NO 35 LENGTH: 44 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: primer

<4 OO> SEQUENCE: 35 taatacgact Cactataggg aggacgggct tactggtctg actic 44

SEQ ID NO 36 LENGTH: 41 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: primer

<4 OO> SEQUENCE: 36 tacgcatc.cg agtttgagac gcaggcaggc aggcaggcag g 41

SEO ID NO 37 LENGTH: 21 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: primer

<4 OO> SEQUENCE: 37 tacgcatc.cg agtttgagac g 21

SEQ ID NO 38 LENGTH: 22 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: probe

<4 OO> SEQUENCE: 38 ttggacttgg aatgagtgaa t 22

SEO ID NO 39 LENGTH: 30 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: primer <4 OO SEQUENCE: 39

3 O

What is claimed is: 55 5. The isolated polynucleotide of claim 4 wherein the at 1. An isolated polynucleotide comprising about nucle risk repeat tract comprises at least 75 CCTG repeats uninter otides 17501-17701 of SEQ ID NO:1 and an at risk repeat rupted by other nucleotides. tract, and the full complements thereof. 6. The isolated polynucleotide of claim 2 wherein the at 2. An isolated polynucleotide comprising about nucle risk repeat tract comprises (TG), (TCTG), (CCTG), otides 17858-18062 of SEQ ID NO:1 and an at risk repeat 60 wherein X is an integer from 14 to 25, y is an integer from tract, and the full complements thereof. 3 to 10, and Z is an integer from 75 to 11,000. 7. The isolated polynucleotide of claim 2 wherein the at 3. The isolated polynucleotide of claim 1 wherein the at risk repeat tract comprises from 75 to 11,000 CCTG repeats. risk repeat tract comprises (TG), (TCTG), (CCTG), 8. The isolated polynucleotide of claim 7 wherein the at wherein X is an integer from 14 to 25, y is an integer from risk repeat tract comprises at least 75 CCTG repeats uninter 3 to 10, and Z is an integer from 75 to 11,000. 65 rupted by other nucleotides. 4. The isolated polynucleotide of claim 1 wherein the at risk repeat tract comprises from 75 to 11,000 CCTG repeats. k k k k k