USOO9677128B2

(12) United States Patent (10) Patent No.: US 9,677,128 B2 Robertson et al. (45) Date of Patent: Jun. 13, 2017

(54) METHODS AND KITS FOR DETECTION OF (58) Field of Classification Search 5-HYDROXYMETHYLCYTOSNE None See application file for complete search history. (75) Inventors: Adam Brian Robertson, Oslo (NO); John Arne Dahl, Oslo (NO): Arne (56) References Cited Klungland, Oslo (NO) U.S. PATENT DOCUMENTS (73) Assignee: OSLO UNIVERSITETSSYKEHUS 2005/0227235 A1 * 10, 2005 Carr ...... C12O 1/6848 HF, Oslo (NO) 435.6.16

(*) Notice: Subject to any disclaimer, the term of this FOREIGN PATENT DOCUMENTS patent is extended or adjusted under 35 U.S.C. 154(b) by 518 days. WO 2010/037001 4/2010

(21) Appl. No.: 13/878,909 OTHER PUBLICATIONS (22) PCT Filed: Oct. 20, 2011 Dipaolo et al. Molecular Cell. 2005. 17:441-451.* Grover et al. Angew. Chem. Int. Ed. 2007. 46: 4839-2843.* (86). PCT No.: PCT/US2011/057107 Unligil et al. Current Opinion in Structural Biology. 2000. 10: 51O-517.* S 371 (c)(1), Levine et al. PNAS. 1960. 46(8): 1038-1043.* (2), (4) Date: Jul. 11, 2013 Sabatini et al. The Journal of Biological Chemistry, 2002. 277(31):28150-28156.* (87) PCT Pub. No.: WO2012/054730 Cross et al. The Embo Journal. 1999. 18(21): 6573-6581.* Panyutin et al. Acta Oncologica. 1996. 35(7): 817-823.* PCT Pub. Date: Apr. 26, 2012 Yu et al. Nucleic Acids Res. 2007. 35(7):2107-2115.* Safarik et al. BioMagnetic Research and Technology. 2004. 2:7.* (65) Prior Publication Data (Continued) US 2013/0323728A1 Dec. 5, 2013 Primary Examiner — Joseph G Dauner Related U.S. Application Data (74) Attorney, Agent, or Firm — Casimir Jones S.C. (60) Provisional application No. 61/405,706, filed on Oct. 22, 2010. (57) ABSTRACT The present invention relates to methods and kits for the (51) Int. Cl. detection of 5-hydroxymethylcytosme (5hmC). In some C7H 2L/02 (2006.01) embodiments, the present invention relates to methods and C07K 6/00 (2006.01) kits for detection of 5hmC in nucleic acid (e.g., DNA, C07K I4/00 (2006.01) RNA). In some embodiments, the present invention relates CI2O I/68 (2006.01) to detection of 5hmC in genomic DNA, e.g., mammalian (52) U.S. Cl. genomic DNA. CPC ...... CI2O I/6827 (2013.01); C12O 1/6806 (2013.01); Y10T 436/147777 (2015.01) 5 Claims, 7 Drawing Sheets

US 9,677,128 B2 Page 2

(56) References Cited Munzel, M. et al., “Quantification of the Sixth DNA Base Hydroxymethylcytosine in the Brain'. Angew. Chem. Int. Ed. 2010, 49, 5375-5377. OTHER PUBLICATIONS Nestor, C. et al., “Enzymatic approaches and bisulfite sequencing Flusberg, B. et al., “Direct detection of DNA methylation during cannot distinguish between 5-methylcytosine and single-molecule, real-time sequencing. Nature Methods, vol. 7. 5-hydroxymethylcytosine in DNA'. BioTechniques 48: 317-319 No. 6, Jun. 2010: 461–465. (Apr. 2010). Hendrich, B. et al., “Identification and Characterization of a Family Szwagierczak, A. et al., “Sensitive enzymatic quantification of of Mammalian Methyl-CpG Binding Proteins”. Molecular and 5-hydroxymethylcytosine in genomic DNA'. Nucleic Acids Cellular Biology, Nov. 1998, p. 6538-6547. Research 2010, vol. 38, No. 19, e181. Illingworth, R. et al., “CpG islands- A rough guide', FEBS Letters Tahiliani, M. et al., “Conversion of 5-Methylcytosine to 583 (2009) 1713-1720. 5-Hydroxymethylcytosine in Mammalian DNA by MLL Partner Ito, S. et al., “Role of Tet proteins in 5mC to 5hmC conversion, ES TET1, Science May 15, 2009 vol. 324:930-935. cell self-renewal, and ICM specification', Nature Aug. 26, 2010; Tomaschewski, J. et al., “T4-induced alpha- and beta 466(7310): 1129-1133. glucosyltransferase: cloning of the genes and a comparison of their Jin, S. et al., “Examination of the specificity of DNA methylation products based on sequencing data. Nucleic Acids Research vol. 13 profiling techniques towards 5-methylcytosine and No. 21 (1985) 7551-7.568. 5-hydroxymethylcytosine”. Nucleic Acids Research, 2010, vol. 38. Robertson, A. et. al., “A novel method for the efficient and selective No. 11, e125. identification of 5-hydroxymethylcytosine in genomic DNA'. Kriaucionis, S. et al., “The nuclear DNA base, Nucleic Acids Research, 2011, vol. 39, No. 8, e55. 5-hydroxymethylcytosine is present in brain and enriched in Purkinje neurons”, Science May 15, 2009; 324 (5929): 929-930. * cited by examiner U.S. Patent Jun. 13, 2017 Sheet 1 of 7 US 9,677,128 B2

FIGURE

8 : :

i. M : U.S. Patent Jun. 13, 2017 Sheet 2 of 7 US 9,677,128 B2

FIGURE 2

U.S. Patent Jun. 13, 2017 Sheet 3 of 7 US 9,677,128 B2

FIGURE 3

A { ::::::::

3:38.3:33.: ::::::::::::::::

s: i88 (88:88 U.S. Patent Jun. 13, 2017 Sheet 4 of 7 US 9,677,128 B2

FIGURE 4

------'-a'--

U.S. Patent Jun. 13, 2017 Sheet S of 7 US 9,677,128 B2

U.S. Patent Jun. 13, 2017 Sheet 6 of 7 US 9,677,128 B2

Figure 6 JBP1 (Crithidia fasciculate) Recombinant (SEQ ID NO:1)

MGSSHHHHHH SSGLVPRGSH MEPKSKKVKQ DIFNFPDGKD VPTTKEKAEA YVDALKAHPF YDNVHSVVDV YDSATLRDGK GRVIGVMLRKALPEHATTAA GLLSAAAVRTSLRSSMFGG ESPLSGIAGY FDYRGSPVEL KARKTAFTYE HEKKWPAVFP LVDYVSEIYKSVMPEHWAAQ DSAIPDIVRI HGTPFSTLTI NSRFRTASHT DAGDFDGGYS CIACIDGDFK GLALGFDDFH VNVPMQPRDV LVFDSHYFHS NSELESCPT EEWRRLTCVF YYRSALGEPS SYAEYRRRLA AAQQDSTAQP VVSSVVEKPNGKNLYKPSTV FPIDPTPFAV VAQLHRLHHC AAKGLCVHEL LAVPSSPLAV LLFGERLSCS DGIPLRAAEO KLKANADGAS RGVTSSGGFS ESDAVLTTAV EKSKYLERDH LSQCISAELL AMWVEARKHW LRLVATEWAR MIATAPERTD FLWKNKSPMN TAFFDLCEVA KQVMLGLLDK ETATPTEERH FWSVYAAHLH RACAERLMMP EEAMSLRKLNVKLKDFSFGG TRYFKDMPVE EQERRVARKA SIEEARRRST AAKDGEQRSN WLTNDAFDYQ TEDCEVDYAGHGWAVPKQHA KTVTANVHQE AVAATTEAVR VLVVLPRPPS GDRGDAAVDL PKEVTTSAEW VRLMSSPAVR RVLAAKQRNLTLLPNCNVEA VSLNFAYHDS LPQKATFDFV VLQHVLSAMPEDAIATDYVS RMRSICTGCL FVVETDVQCR QYFTLHYPLRVQYDAVAPAF FQLLHRCSYG TPLARTRTKA EVEALFPFVC CARYKLQGSP MNTVVHLLALE U.S. Patent Jun. 13, 2017 Sheet 7 Of 7 US 9,677,128 B2

Figure 7 fi-glucosyltransferase (T4) bgt Recombinant (SEQ ID NO:2)

MGSSHHHHHH SSGLVPRGSH MKIANMIGN NVINFKTVPS SETIYLFKVI SEMGLNVIDI SLKNGVYTKS FIDEVDVNDYD RLVVNSSIN FFGGKPNLA LSAQKFMAKY KSKIYYLFTD IRLPFSQSWP NVKNRPWAYL YTEEELLIKS PIKVISQGINLDIAKAAHKK VDNVIEFEYF PIEQYKIHMN DFQLSKPTKK TLDVIYGGSF RSGORESKMV EFLFDTGLN EFFGNAREKQ FKNPKYPWTK APVFTGKIPM NMVSEKNSQA IAALIIGDKNYNDNFITLRV WETMASDAVM LIDEEFDTKH RIINDARFYV NNRAELIDRV NELKHSDVLR KEMLSIQHDI LNKTRAKKAE WODAFKKAIDL US 9,677,128 B2 1. 2 METHODS AND KITS FOR DETECTION OF distinguish 5meC from 5hmC (Nestor et al. (2010) Biotech 5-HYDROXYMETHYLCYTOSINE niques 48:317-319; herein incorporated by reference in its entirety). Furthermore, commercially available antibodies CROSS-REFERENCE TO RELATED raised against 5hmC cannot distinguish between 5meC and APPLICATIONS 5hmC (Ito et al. (2010) Nature 466:1129-1133; herein incor porated by reference in its entirety). Using polymerase This application claims priority to U.S. Provisional Patent kinetics, one group has been able to differentiate between Application No. 61/405,706, filed Oct. 22, 2010, the con 5meC and 5hmC (Flusberg et al. (2010) Nature Methods tents of which are incorporated by reference in its entirety. 7:461–465; herein incorporated by reference in its entirety). 10 However, this method is impractical for identification of the FIELD OF THE INVENTION 5hmC status of individual genes (Flusberg et al. (2010) Nature Methods 7:461–465; herein incorporated by refer The present invention relates to methods and kits for the ence in its entirety). detection of 5-hydroxymethylcytosine (5hmC). In some Improved methods for detecting 5-hydroxymethylcyto embodiments, the present invention relates to methods and 15 kits for detection of 5hmC in nucleic acid (e.g., DNA, sine residues in DNA are needed. In particular, improved RNA). In some embodiments, the present invention relates methods for identifying the level of 5hmC modification at to detection of 5hmC in genomic DNA, e.g., mammalian specific regions (e.g., specific gene loci) are needed. genomic DNA. SUMMARY OF THE INVENTION BACKGROUND OF THE INVENTION The present invention relates to methods and kits for the DNA methylation consisting of a cytosine modified by a detection of 5-hydroxymethylcytosine (5hmC). In some methyl group at the N5 position (5meC) is a well described embodiments, the present invention relates to methods and modification affecting gene expression in mammalian cells. 25 kits for detection of 5hmC in nucleic acid (e.g., DNA, The 5meC modification generally occurs at the CpG RNA). In some embodiments, the present invention relates dinucleotide sequence; however, it has been identified else to detection of 5hmC in genomic DNA, e.g., mammalian where in the genome (Illingworth et al. (2009) FEBS Letters genomic DNA. 583:1713-1720; herein incorporated by reference in its 5-hydroxymethylcytosine (5hmC) has been identified in entirety). Another type of gene-regulatory DNA modifica 30 mammalian genomic DNA. Identifying the genomic loca tion has more recently been described, 5-hydroxymethylcy tion of this modified base is of interest for research and tosine (5hmC) (Tahiliani et al. (2009) Science 324:930-935: diagnostic purposes (e.g., tumor detection). In experiments Kriaucionis et al. (2009) Science 324:929-930; each herein conducted during the course of developing some embodi incorporated by reference in its entirety). Tahiliani et al. ments of the present invention, a method was developed for demonstrated that the Tetl, an -dependent C.-ke 35 the rapid and inexpensive identification of genomic regions togluterate dioxygenase, catalyzes the formation of 5hmC containing 5hmC. In some embodiments, this method from 5meC (Tahiliani et al. (2009) Science 324:930-935: involves the selective glucosylation of 5hmC residues by the herein incorporated by reference in its entirety). Further B-glucosyltransferase from T4 bacteriophage creating B-glu more, the 5hmC base may be an intermediate in the con cosyl-5-hydroxymethylcytosine (B-glu-5hmC). In some version of 5meC to cytosine, thus identifying an enzyme that 40 embodiments, the B-glu-5hmC modification provides a tar can potentially demethylate DNA (Tahiliani et al. (2009) get which can be efficiently and selectively pulled down by Science 324:930-935; herein incorporated by reference in its J-binding protein 1 coupled to magnetic beads. In some entirety). 5hmC is a stable DNA modification found in embodiments, DNA that is precipitated is suitable for analy specialized nondividing neurons and in all animal tissues sis by quantitative PCR, microarray, sequencing, or other studied to date (Kriaucionis et al. (2009) Science 324:929 45 techniques. The methods described herein find use in e.g., 930; herein incorporated by reference in its entirety). research, clinical diagnostics, screening assays, etc. In one Intriguingly, 5hmC was not detected in cancerous cell lines. non-limiting example, methods of the present invention find The inability to detect 5hmC in cancerous cell lines indicates use in studying the temporal and spatial effects that 5hmC that lack of 5hmC may be associated with tumorigenesis. may have on epigenetic regulation at the single gene level. The involvement of 5hmC in epigenetic regulation has been 50 In another non-limiting example, methods of the present experimentally verified by Jin et al., who showed that 5hmC invention find use in identifying regions (e.g., tissues, cells) in DNA inhibits binding of several methyl-CpG-binding with increased or decreased levels of 5hmC residues relative domain proteins (Jin et al. (2010) Nucleic Acids Res. to a second region, e.g., as an indication of differential 38:e125; Hendrich et al. (1998) Mol. Cellular Biol. 18:6538 biological state (e.g., neoplastic state, tumor development, 6547; each herein incorporated by reference in its entirety). 55 etc.). Tet2 and Tet3 also catalyze the formation of 5hmC (Ito et al. Methods of the present invention are not limited by the (2010) Nature 466:1129-1133; herein incorporated by ref. sample type to which they are applied. In some preferred erence in its entirety). Additionally, the total level of 5hmC embodiments, the sample is a biological sample. The bio present in several mammalian tissues has been quantified logical sample may be cellular (e.g., tumor sample, biopsy (Szwagierczak et al. (2010) Nucleic Acids Res. (advanced 60 sample, organ sample, tissue sample, bone sample, etc.), or online publication); Munzel et al. (2010) Angewandte Che a biological fluid (e.g., Sweat, saliva, urine, urine sediment, mie Intl. Ed. 49:5375-5377: each herein incorporated by blood, semen, tears, cerebrospinal fluid, mucus secretion, reference in its entirety). However, specific genomic loca milk, interstitial fluid, etc.), or a combination thereof. tions of 5hmC remain unknown. Methods of the present invention are not limited by type The identification of specific genomic regions containing 65 of Subject. In some embodiments, the Subject is a mammal. 5hmC is technically challenging. The most frequently used In some preferred embodiments, the mammalian Subject is method for identifying 5meC, bisulfite sequencing, cannot human. Methods of the present invention are not limited by US 9,677,128 B2 3 4 age of the Subject, physical attributes or physical state (e.g., limited to, buffers, (e.g., glycosyltransferases, T4 medical condition) of the subject. bacteriophage beta-glucosyltransferase, binding proteins, In some preferred embodiments, the sample is exposed to J-binding protein 1), Substrates (e.g., glucose), tubes, ves an agent capable of transferring a moiety to 5hmC and sels, containers, beads, resins, magnets, instructions, and the thereby “tagging it. In some preferred embodiments, the 5 like. agent capable of transferring a moiety to 5hmC is an In certain embodiments, the present invention provides a enzyme. In some particularly preferred embodiments, the method for detecting 5-hydroxymethylcytosine in a sample enzyme is a glycosyltransferase and the moiety is a carbo comprising: a) obtaining a sample, b) exposing the sample hydrate (e.g., monosaccharide). Methods of the present to a glycosyltransferase under conditions that facilitate invention are not limited by type of glycosyltransferase. The 10 activity of the glycosyltransferase, and c) contacting the glycosyltransferase may be a glucosyltransfearse, galacto sample exposed to the glycosyltransferase with an agent Syltransfearse, sialyltransferase, Xylosyltransferase, or any Such as a J-binding protein or an antibody. In some embodi other type of glycosyltransferase. In particularly preferred ments, the sample is a biological sample. In some embodi embodiments, the glycosyltransferase is a beta-glucosyl ments, the biological sample is obtained from a mammal. In . In particularly preferred embodiments, the beta 15 Some embodiments, the mammal is human. In some embodi glucosyltransferase is a T4 bacteriophage beta-glucosyl ments, the glycosyltransferase is a glucosyltransferase. In transferase. Some embodiments, the glucosyltransferase is a beta-gluco In some embodiments, the method involves use of an Syltransferase. In some embodiments, the beta-glucosyl agent that binds to (e.g., recognizes) 5hmC to which a transferase is a T4 bacteriophage beta-glycosyltransferase. moiety has been added (e.g., glucosylated 5hmC), without In some embodiments, the agent is a J-binding protein. In limitation to the type or identity of the agent. The agent may Some embodiments, the J-binding protein is Crithidia fas be a lectin, an antibody, a J-binding protein, or any other ciculate J-binding protein 1. In some embodiments, the agent (e.g., enzyme, binding protein, nonproteinaceous sample comprises nucleic acid. In some embodiments, the binding partner) that binds (e.g., noncovalently interacts nucleic acid is DNA. In some embodiments, the agent is with) 5hmC to which a moiety has been added (e.g., 25 immobilized to a solid . In some embodiments, the glucosylated 5hmC). In some embodiments, the association Solid Substrate is a type such as a particulate material or a or binding of the agent with the modified (e.g., glycosylated) nonporous material. In some embodiments, the particulate 5hmC results in direct generation of a detectable signal (e.g., material is a type such as a bead, a colloid, a gel, or a a fluorescent signal, a colorimetric signal). In some embodi magnetic particle. In some embodiments, the nonporous ments, the association or binding of the agent with the 30 material is a type such as a well, a plate, or a membrane. In modified (e.g., glycosylated) 5hmC does not result in direct Some embodiments, the method further comprises additional generation of a detectable signal. step d) isolating the agent immobilized to the solid substrate In some embodiments, the method further comprises to generate purified sample. In some embodiments, the isolation of bound and/or modified 5hmC without limitation method further comprises determining the amount of nucleic to the technique(s) used for Such isolation. In some embodi 35 acid present in the purified sample. In some embodiments, ments, an agent capable of binding to a modified 5hmC (e.g., the determining occurs through a such as PCR, real-time a J binding protein capable of binding to glucosylated PCR, mass spectrometry, hybridization, gene arrays, or 5hmC) is immobilized on a solid substrate without limitation DNA sequencing. to the type of substrate. Examples of solid substrates include In certain embodiments, the present invention provides a beads, gels, agar, agarose, resins, particles, magnetic par 40 kit for detecting 5-hydroxymethylcytosine in a sample, the ticles, paramagnetic particles, impermeable surfaces (e.g., kit comprising a glycosyltransferase and an agent such as a slides, wells (e.g., wells of microtiter plates), chips (e.g., J-binding protein or an antibody. In some embodiments, the array chips), etc). In some preferred embodiments, immo glycosyltransferase is a beta-glucosyltransferase. In some bilization of an agent capable of binding to a modified 5hmC embodiments, the beta-glucosyltransferase is a T4 bacterio facilitates isolation (e.g., pull-down, sedimentation) of the 45 phage beta-glucosyltransferase. In some embodiments, modified 5hmC residues. J-binding protein is Crithidia fasciculate J-binding protein In some embodiments, modified 5hmC (e.g., glucoslated 1. 5hmC) is subjected to further analysis techniques, e.g., for Additional embodiments will be apparent to persons the purpose of quantifying the amount of 5hmC present in a skilled in the relevant art based on the teachings contained sample. Methods of the present invention are not limited by 50 herein. the type(s) of downstream analysis techniques. Analytical methods include, but are not limited to, PCR, real-time PCR, DESCRIPTION OF THE DRAWINGS mass spectrometry techniques, FRET. Scintillation counting, sequencing, array analysis (e.g., microarray, gene chip, FIG. 1 shows a scheme for the selective pull down of Affymetrix array), colorimetric assay, enzymatic assay, etc. 55 5hmC. The top panel shows naturally occurring cytosine Methods of the present invention find use in diagnostic or modifications in mammalian cells. The middle panel gives risk assessment tests, in research (e.g., genetics research, cell an overview of the method used to specifically identify biology research, medical research, developmental 5hmC bases in genomic DNA. The bottom panel shows research), in expression analyses, in drug screening assays, Some applications that can be employed using this method of and the like. Methods of the present invention are not limited 60 identifying 5hmC regions. by the biological state correlated with level of 5hmC FIG. 2 shows beta-glucosyltransferase (b-gt) and J base detected. Such biological states include, but are not limited binding protein 1 (JBP1 purified to near-homogeneity. Pro to, cancer, autoimmune disorders (e.g., diabetes, rheumatoid teins were purified as described herein (Example 1). 3 ug of arthritis, allergies), genetic and epigenetic diseases, envi purified b-gt (lane 1) and 3 ug of JBP1 (lane 2) were loaded ronmentally-mediated conditions, and metabolic conditions. 65 onto a 12% SDS-PAGE to assay protein purity. The present invention provides kits for the detection of FIG. 3 shows that B-gt can specifically modify 5hmC 5hmC in a sample. Kit components may include, but are not residues at a high efficiency. (a) Oligonucleotides that were US 9,677,128 B2 5 6 either incubated in the presence or absence of the B-gt were DEFINITIONS digested with TaqI, treated with alkaline phosphatase, 5'-end labelled using T4 polynucleotide kinase, and digested to 5' To facilitate an understanding of the present invention, a mononucleotides using DNase I and Snake Venom Phos number of terms and phrases are defined below: phodiesterase. Radiolabelled mononucleotides were ana As used herein, the term “sensitivity” is defined as a lysed by two-dimensional thin-layer chromatography statistical measure of performance of an assay (e.g., method, (TLC). Abbreviations: C, 3'-deoxyribocytosine-5'-mono test), calculated by dividing the number of true positives by phosphate: T, 3'-deoxyribothymidine-5'-monophosphate: the Sum of the true positives and the false negatives. 5meC, 3'-deoxyribo-N5-methylcytosine-5'-monophosphate: As used herein, the term “specificity' is defined as a 5hmC, 3'-deoxyribo-N5-hydroxymethylcytosine-5'-mono 10 statistical measure of performance of an assay (e.g., method, test), calculated by dividing the number of true negatives by phosphate. (b) HPLC coupled to tandem mass spectrometry the Sum of true negatives and false positives. was used to measure the efficiency of the B-gt reaction. As used herein, the term “informative' or “informative Substrates analysed were 2.7 kb linear PCR products of ness' refers to a quality of a marker or panel of markers, and pUC18: the dC substrate contained only cytosine residues; 15 specifically to the likelihood of finding a marker (e.g., the 5meC substrate was created by methylating the CpG epigenetic marker; e.g., 5hmC at one or more particular dinucleotide of the cytosine substrate; the 5hmC substrate locations) in a positive sample. was created by using d5hmC in place of dCTP in the PCR As used herein, the term “CpG island refers to a genomic reactions; the B-glu-5hmC Substrate was created by incu DNA region that contains a high percentage of CpG sites bating the 5hmC substrate with the f-gt in the presence of relative to the average genomic CpG incidence (per same UDP-glucose. Control DNA was prepared from salmon species, per same individual, or per subpopulation (e.g., sperm. LC/MS/MS chromatograms of the cytosine residues strain, ethnic Subpopulation, or the like). Various parameters from each of the substrates are presented. Abbreviations: dC. and definitions for CpG islands exist; for example, in some 3'-deoxyribocytosine; 5me(dC), 3'-deoxyribo-N5-methylcy embodiments, CpG islands are defined as having a GC tosine; 5hm(dC), 3'-deoxyribo-N5-hydroxymethylcytosine; 25 percentage that is greater than 50% and with an observed/ 5-glu-hm(dC), 3'-deoxyribo-N5-(B-Dglucosyl(hydroxym expected CpG ratio that is greater than 60% (Gardiner ethyl)cytosine. Cytosines are only 5meC modified at CpG Garden et al. (1987) J Mol. Biol. 196:261-282; Baylin et al. Sequences. (2006) Nat. Rev. Cancer 6:107-116; Irizarry et al. (2009) FIG. 4 shows MS/MS production spectra of 5-me(dC), Nat. Genetics 41:178-186; each herein incorporated by 30 reference in its entirety). In some embodiments, CpG islands 5-hm(dC), and 5-Gly-hm(dC) detected in dsDNA pUC18 may have a GC content>55% and observed CpG/expected PCR products. The spectra were obtained by collision CpG of 0.65 (Takai et al. (2007) PNAS 99:3740-3745: induced dissociation of the positive precursor ions MHI at herein incorporated by reference in its entirety). Various m/Z 242.2 (5-me(dC)), 258.2 (5-hm(dC)), or 420.3 (5-Gly parameters also exist regarding the length of CpG islands. him (dC)). The proposed origins of key fragment ions are as 35 As used herein, CpG islands may be less than 100 bp: indicated. 100-200 bp, 200-300 bp, 300-500 bp, 500-750 bp: 750-1000 FIG. 5 shows that, in a sequence independent manner, bp; 100 or more by in length. In some embodiments, CpG JBP1 can selectively pull down DNA that contains B-glu islands show altered methylation patterns (e.g., altered 5hmC resulting from B-gt treatment. (a) Ten ng of each 5hmC patterns) relative to controls (e.g., altered 5hmC radiolabeled PCR (Example 1) was incubated with 40 methylation in cancer Subjects relative to Subjects without magnetic beads coated with 132 ng JBP1 or 132 ng BSA. cancer; tissue-specific altered 5hmC patterns; altered 5hmC The beads were pulled down with a magnet and the percent patterns in biological samples from Subjects with a neoplasia bound to the beads was measured using Scintillation count or tumor relative to Subjects without a neoplasia or tumor. In ing. (b) 10 ng of each Substrate were incubated in the same Some embodiments, altered methylation involves increased tube with various concentrations of JBP1 coated magnetic 45 incidence of 5hmC. In some embodiments, altered methyl beads. The beads were pulled down using a magnet and the ation involves decreased incidence of 5hmC. DNA that was bound to the magnetic beads was measured by As used herein, the term “CpG shore” or “CpG island quantitative real time PCR. (c) 208 fmol of 37 bp double shore” refers to a genomic region external to a CpG island Stranded oligonuleotides containing unmodified cytosines, a that is or that has potential to have altered methylation (e.g., single 5meC, 5hmC, or B-glu-5hmC on each strand were 50 5hmC) patterns (see, e.g., Irizarry et al. (2009) Nat. Genetics incubated with 132 ng JBP1 coated magnetic beads. Beads 41:178-186; herein incorporated by reference in its entirety). were pulled down as described and the percent DNA bound CpG island shores may show altered methylation (e.g., to the JBP1 coated magnetic beads was quantified using 5hmC) patterns relative to controls (e.g., altered 5hmC in scintillation counting. (d) Three different combinations of cancer Subjects relative to Subjects without cancer, tissue three different PCR products composed of only cytosines, 55 specific altered 5hmC patterns; altered 5hmC in biological only 5meC, or 5hmC (20 ng of each) were combined and samples \from Subjects with neoplasia or tumor relative to incubated with purified f-gt in the presence of UDP Glu Subjects without neoplasia or tumor. In some embodiments, cose. The DNA was purified and incubated with 132 ng altered methylation involves increased incidence of 5hmC. JBP1 coated magnetic beads. The JBP1 magnetic beads In some embodiments, altered methylation involves were pulled down as above and the amount of DNA pulled 60 decreased incidence of 5hmC. CpG island shores may be down was quantified using quantitative real time PCR. Data located in various regions relative to CpG islands (see, e.g., is presented as the fold enrichment over cytosine-containing Irizarry et al. (2009) Nat. Genetics 41; 178-186; herein DNA incorporated by reference in its entirety). Accordingly, in FIG. 6 provides the amino acid sequence of Crithidia Some embodiments, CpG island shores are located less than fasciculate JBP-1 (SEQ ID NO:1). 65 100 bp; 100-250 bp; 250-500 bp: 500-1000 bp: 1000-1500 FIG. 7 provides the amino acid sequence of T4,3-gluco bp; 1500-2000 bp: 2000-3000 bp: 3000 bp or more away syltransferase (SEQ ID NO:2). from a CpG island. US 9,677,128 B2 7 8 As used herein, the term “metastasis” is meant to refer to al., (2002) Nucleic Acids Research 30(12): e57; herein the process in which cancer cells originating in one organ or incorporated by reference in its entirety), multiplex PCR part of the body relocate to another part of the body and (see, e.g., Chamberlain, et al., (1988) Nucleic Acids continue to replicate. Metastasized cells Subsequently form Research 16(23) 11141-11156; Ballabio, et al., (1990) tumors which may further metastasize. Metastasis thus 5 Human Genetics 84(6) 571-573; Hayden, et al., (2008) refers to the spread of cancer from the part of the body where BMC Genetics 9:80; each of which are herein incorporated it originally occurs to other parts of the body. by reference in their entireties), nested PCR, overlap-exten As used herein, “an individual is Suspected of being sion PCR (see, e.g., Higuchi, et al., (1988) Nucleic Acids Susceptible to metastasized cancer is meant to refer to an Research 16(15) 7351-7367; herein incorporated by refer individual who is at an above-average risk of developing 10 ence in its entirety), real time PCR (see, e.g., Higuchi, et al., metastasized cancer. Examples of individuals at a particular (1992) Biotechnology 10:413-417; Higuchi, et al., (1993) risk of developing cancer of a particular type (e.g., colorectal Biotechnology 11:1026-1030; each of which are herein cancer, bladder cancer, breast cancer, prostate cancer) are incorporated by reference in their entireties), reverse tran those whose family medical history indicates above average scription PCR (see, e.g., Bustin, S. A. (2000) J. Molecular incidence of such cancer type among family members and/or 15 Endocrinology 25: 169-193; herein incorporated by refer those who have already developed cancer and have been ence in its entirety), solid phase PCR, thermal asymmetric effectively treated who therefore face a risk of relapse and interlaced PCR, and Touchdown PCR (see, e.g., Don, et al., recurrence. Other factors which may contribute to an above Nucleic Acids Research (1991) 19(14) 4008: Roux, K. average risk of developing metastasized cancer which would (1994) Biotechniques 16(5)812-814; Hecker, et al., (1996) thereby lead to the classification of an individual as being 20 Biotechniques 2003) 478-485; each of which are herein Suspected of being Susceptible to metastasized cancer may incorporated by reference in their entireties). Polynucleotide be based upon an individual’s specific genetic, medical amplification also can be accomplished using digital PCR and/or behavioral background and characteristics. (see, e.g., Kalinina, et al., Nucleic Acids Research. 25. The term “neoplasm' as used herein refers to any new and 1999-2004, (1997); Vogelstein and Kinzler, Proc Natl Acad abnormal growth of tissue. Thus, a neoplasm can be a 25 Sci USA. 96: 9236-41, (1999); International Patent Publi premalignant neoplasm or a malignant neoplasm. The term cation No. WO05023091A2: US Patent Application Publi “neoplasm-specific marker” refers to any biological material cation No. 20070202525; each of which are incorporated that can be used to indicate the presence of a neoplasm. herein by reference in their entireties). Examples of biological materials include, without limita As used herein, the terms “complementary' or “comple tion, nucleic acids, polypeptides, carbohydrates, fatty acids, 30 mentarity are used in reference to polynucleotides (i.e., a cellular components (e.g., cell membranes and mitochon sequence of nucleotides) related by the base-pairing rules. dria), and whole cells. For example, the sequence “5'-A-G-T-3', is complementary As used herein, the term “amplicon refers to a nucleic to the sequence 3'-T-C-A-5". Complementarity may be acid generated using primer pairs. The amplicon is typically “partial,” in which only some of the nucleic acids bases are single-stranded DNA (e.g., the result of asymmetric ampli- 35 matched according to the base pairing rules. Or, there may fication), however, it may be RNA or dsDNA. be “complete' or “total complementarity between the The term “amplifying or “amplification' in the context nucleic acids. The degree of complementarity between of nucleic acids refers to the production of multiple copies nucleic acid strands has significant effects on the efficiency of a polynucleotide, or a portion of the polynucleotide, and strength of hybridization between nucleic acid strands. typically starting from a small amount of the polynucleotide 40 This is of particular importance in amplification reactions, as (e.g., a single polynucleotide molecule), where the amplifi well as detection methods that depend upon binding between cation products or amplicons are generally detectable. nucleic acids. Amplification of polynucleotides encompasses a variety of As used herein, the term “primer' refers to an oligonucle chemical and enzymatic processes. The generation of mul otide, whether occurring naturally as in a purified restriction tiple DNA copies from one or a few copies of a target or 45 digest or produced synthetically, that is capable of acting as template DNA molecule during a polymerase chain reaction a point of initiation of synthesis when placed under condi (PCR) or a chain reaction (LCR; see, e.g., U.S. Pat. tions in which synthesis of a primer extension product that No. 5,494,810; herein incorporated by reference in its is complementary to a nucleic acid strand is induced (e.g., in entirety) are forms of amplification. Additional types of the presence of nucleotides and an inducing agent such as a amplification include, but are not limited to, allele-specific 50 biocatalyst (e.g., a DNA polymerase or the like) and at a PCR (see, e.g., U.S. Pat. No. 5,639,611; herein incorporated Suitable temperature and pH). The primer is typically single by reference in its entirety), assembly PCR (see, e.g., U.S. Stranded for maximum efficiency in amplification, but may Pat. No. 5,965,408; herein incorporated by reference in its alternatively be double stranded. If double stranded, the entirety), helicase-dependent amplification (see, e.g., U.S. primer is generally first treated to separate its strands before Pat. No. 7,662,594; herein incorporated by reference in its 55 being used to prepare extension products. In some embodi entirety), hot-start PCR (see, e.g., U.S. Pat. Nos. 5,773.258 ments, the primer is an oligodeoxyribonucleotide. The and 5,338,671; each herein incorporated by reference in primer is Sufficiently long to prime the synthesis of extension their entireties), intersequence-specific PCR, inverse PCR products in the presence of the inducing agent. The exact (see, e.g., Triglia, et al. (1988) Nucleic Acids Res., 16:8186: lengths of the primers will depend on many factors, includ herein incorporated by reference in its entirety), ligation- 60 ing temperature, source of primer and the use of the method. mediated PCR (see, e.g., Guilfoyle, R. et al., Nucleic Acids In certain embodiments, the primer is a capture primer. Research, 25:1854-1858 (1997); U.S. Pat. No. 5,508,169; As used herein, the term “nucleic acid molecule” refers to each of which are herein incorporated by reference in their any nucleic acid containing molecule, including but not entireties), methylation-specific PCR (see, e.g., Herman, et limited to, DNA or RNA. The term encompasses sequences al., (1996) PNAS 93(13) 9821-9826; herein incorporated by 65 that include any of the known base analogs of DNA and reference in its entirety), miniprimer PCR, multiplex liga RNA including, but not limited to, 4 acetylcytosine, 8-hy tion-dependent probe amplification (see, e.g., Schouten, et droxy-N6-methyladenosine, aziridinylcytosine, pseudoiso US 9,677,128 B2 9 10 cytosine, 5-(carboxyhydroxyl-methyl)uracil, 5-fluorouracil, production of a polypeptide, RNA (e.g., including but not 5-bromouracil, 5-carboxymethylaminomethyl-2-thiouracil, limited to, mRNA, tRNA and rRNA) or precursor. The 5-carboxymethyl-aminomethyluracil, dihydrouracil, inos polypeptide, RNA, or precursor can be encoded by a full ine, N6-isopentenyladenine, 1-methyladenine, 1-methylp length coding sequence or by any portion of the coding seudo-uracil, 1-methylguanine, 1-methylinosine, 2,2-dim sequence so long as the desired activity or functional prop ethyl-guanine, 2-methyladenine, 2-methylguanine, erties (e.g., enzymatic activity, ligand binding, signal trans 3-methyl-cytosine, 5-methylcytosine, N6-methyladenine, duction, etc.) of the full-length or fragment are retained. The 7-methylguanine, 5-methylaminomethyluracil, 5-methoxy term also encompasses the coding region of a structural gene amino-methyl-2-thiouracil, beta-D-mannosylqueosine, and the including sequences located adjacent to the coding 5'-methoxycarbonylmethyluracil, 5-methoxyuracil, 2-meth 10 region on both the 5' and 3' ends for a distance of about 1 kb on either end Such that the gene corresponds to the length of ylthio-N-isopentenyladenine, uracil-5-oxyacetic acid the full-length mRNA. The sequences that are located 5' of methylester, uracil-5-oxyacetic acid, oxybutoxosine, the coding region and which are present on the mRNA are pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thioura referred to as 5' untranslated sequences. The sequences that cil, 2-thiouracil, 4-thiouracil, 5-methyluracil, N-uracil-5- 15 are located 3' or downstream of the coding region and that oxyacetic acid methylester, uracil-5-oxyacetic acid, are present on the mRNA are referred to as 3' untranslated pseudouracil, queosine, 2-thiocytosine, and 2,6-diaminopu sequences. The term “gene' encompasses both cDNA and 1. genomic forms of a gene. A genomic form or clone of a gene As used herein, the term “nucleobase' is synonymous contains the coding region interrupted with non-coding with other terms in use in the art including “nucleotide.” sequences termed “introns' or “intervening regions” or “deoxynucleotide.” “nucleotide residue.” “deoxynucleotide “intervening sequences.” Introns are segments of a gene that residue.” “nucleotide triphosphate (NTP), or deoxynucle are transcribed into nuclear RNA (hnRNA); introns may otide triphosphate (dNTP). contain regulatory elements such as enhancers. Introns are An "oligonucleotide' refers to a nucleic acid that includes removed or “spliced out from the nuclear or primary at least two nucleic acid monomer units (e.g., nucleotides), transcript, introns therefore are absent in the messenger typically more than three monomer units, and more typically 25 RNA (mRNA) processed transcript. The mRNA functions greater than ten monomer units. The exact size of an during translation to specify the sequence or order of amino oligonucleotide generally depends on various factors, acids in a nascent polypeptide. including the ultimate function or use of the oligonucleotide. To further illustrate, oligonucleotides are typically less than DETAILED DESCRIPTION OF THE 200 residues long (e.g., between 15 and 100), however, as 30 INVENTION used herein, the term is also intended to encompass longer polynucleotide chains. Oligonucleotides are often referred to The present invention relates to methods and kits for the by their length. For example a 24 residue oligonucleotide is detection of 5-hydroxymethylcytosine (5hmC) in samples referred to as a "24-mer. Typically, the nucleoside mono (e.g., biological samples). In some embodiments, the present mers are linked by phosphodiester bonds or analogs thereof, 35 invention relates to methods and kits for detection of 5hmC including phosphorothioate, phosphorodithioate, phospho present in nucleic acid (e.g., DNA, RNA). In some embodi roselenoate, phosphorodiselenoate, phosphoroanilothioate, ments, the present invention relates to detection of 5hmC in phosphoranilidate, phosphoramidate, and the like, including genomic DNA, e.g., mammalian or plant genomic DNA. associated counterions, e.g., H, NH, Na', and the like, if In experiments conducted during the course of developing Such counterions are present. Further, oligonucleotides are Some embodiments of the present invention, novel methods typically single-stranded. Oligonucleotides are optionally 40 were developed for efficient and selective identification of prepared by any suitable method, including, but not limited 5hmC modified nucleotides in DNA. The only other method to, isolation of an existing or natural sequence, DNA repli available to recognize the location of 5hmC modifications— cation or amplification, reverse transcription, cloning and using polymerase kinetics—is prohibitively expensive and restriction digestion of appropriate sequences, or direct is only practical for genome wide sequencing rather than chemical synthesis by a method Such as the phosphotriester 45 analysis of single gene loci (Flusberg et al. (2010) Nature method of Narang et al. (1979) Meth Enzymol. 68:90-99; Methods 7:461–465; herein incorporated by reference in its the phosphodiester method of Brown et al. (1979) Meth entirety). Furthermore, it is unsuitable for clinical applica Enzymol. 68: 109-151; the diethylphosphoramidite method tions (e.g., wherein 5hmC levels, e.g., 5hmC levels at a of Beaucage et al. (1981) Tetrahedron Lett. 22: 1859-1862: particular location(s), are determined as indicators of bio the triester method of Matteucci et al. (1981) J Am Chem 50 logical state (e.g., disease, risk of disease). No antibodies Soc. 103:3185-3191; automated synthesis methods; or the exist that have been shown to specifically react with 5hmC: solid support method of U.S. Pat. No. 4.458,066, entitled to the contrary, commercially available antisera raised “PROCESS FOR PREPARING POLYNUCLEOTIDES.” against 5hmC cross-reacts significantly with 5meC6. In issued Jul. 3, 1984 to Caruthers et al., or other methods addition, a 5hmC residue produces similar results as a 5meC known to those skilled in the art. All of these references are residue when bisulfite sequencing is employed to identify incorporated by reference. 55 these residues (Nestor et al. (2010) Biotechniques 48:317 A “sequence' of a biopolymer refers to the order and 319; herein incorporated by reference in its entirety), ren identity of monomer units (e.g., nucleotides, etc.) in the dering these approaches to identifying 5hmC unusable with biopolymer. The sequence (e.g., base sequence) of a nucleic out significant improvements. acid is typically read in the 5' to 3’ direction. In some embodiments, the present invention provides As used herein, the term “subject” refers to any animal 60 systems, methods, kits and reagents for purifying or isolated (e.g., a mammal), including, but not limited to, humans, nucleic acid sequences containing 5hmC residues. In further non-human primates, rodents, and the like, which is to be the embodiments, the nucleic acid sequences containing 5hmC recipient of a particular treatment. Typically, the terms residues are analyzed and/or identified, for example by PCR, “subject' and “patient” are used interchangeably herein in real time PCR, microarray analysis, sequencing, or other reference to a human Subject. 65 methods that determine the sequence, identity or presence of The term “gene' refers to a nucleic acid (e.g., DNA) a nucleic acid sequence of interest. In some embodiments, sequence that comprises coding sequences necessary for the DNA is obtained from a subject. In some embodiments, the US 9,677,128 B2 11 12 DNA isolated by known methods. In some embodiments, the TABLE 1-continued DNA sample is treated by restriction endonucleases to provide nucleic acid sequences of an approximate desired J-binding Proteins length. In some embodiments, the DNA from a subject is Some JBP1 homologues treated with a glycosyltransferase to glycosylate any 5hmC 5 Expasy that are present in the sample. This results in a sample Accession containing glycosyl-5hmC bases within the nucleic acid. Number Organism Description The glycosyl-5hmC nucleic acid sample is then contacted Q4QHM7 Leishmania Thymine dioxygenase JBP1 (EC with a binding protein specific for the glycosyl-5hmC bases major .14.1.1.6) (J-binding protein 1) within the nucleic acid. A binding protein-glycosyl-5hmC 10 Q9U6M1 Leishmania Thymine dioxygenase JBP1 (EC tarentolae .14.1.1.6) (J-binding protein 1) base complex is formed. In some embodiments, nucleic Q4DLX9 (Sauroieishmania Thymine dioxygenase JBP1-B (EC acids containing glycosyl-5hmC bases are separated from tarentolae) .14.11.6) (J-binding protein 1B) other nucleic acid sequences in the sample based on the Trypanosoma formation of the binding protein-glycosyl-5hmC base com critzi lex. 15 Q4DBW3 Trypanosoma Thymine dioxygenase JBP1-A (EC critzi .14.1.1.6) (J-binding protein 1A) p The present invention is not limited to the use of a Q9U6M3 Trypanosoma Thymine dioxygenase JBP1 (EC particular glycosyl-5hmC binding agent. In some embodi bruceii brucei .14.1.1.6) (J-binding protein 1) ments, the glycosyl-5hmC binding agent is a protein. In D0A9Q3 Trypanosoma -binding protein, putative some embodiments, the protein is a J-binding protein (JBP). brucei TbgDal XI15230) gambiense Suitable J binding proteins are described in FIG. 6 and Table DAL972 1. Those of skill in the art will recognize that additional Q4QFY1 Leishmania Bifunctional helicase and thymine known JBPs and those that may be identified in the future major dioxygenase JBP2 (J-binding protein 2) may be tested for use with the present invention. Addition Includes: Probable DNA helicase JBP2 (EC protein 2) Includes: Probable DNA ally, the present invention encompasses the use of mutated helicase JBP2 (EC 3.6.4.12); Thymine or modified JBPs which can be identified by their percent 25 dioxygenase JBP2 (EC 1.14.11.6). JBP2 identity with reference sequences. In some embodiments, the JBP is J-binding protein 1 (JBP1). In some embodiments, the amino acid sequence of the JBP is at least 80%, 90%, A number of methods are suitable for separation of the 95%, or 99% identical to the sequence of one or more of the binding protein-glycosyl-5hmC base complex. In some JBPs identified in Table 1. In some embodiments, the JBP is 30 embodiments, the binding protein is immobilized on a Crithidia fasciculate JBP-1 (SEQ ID NO:1). In some Substrate. For example, in some embodiments, the binding embodiments, the JBP has an amino acid sequence that is at protein is covalently attached to a bead, such as a magnetic least 80%, 90%, 95%, or 99% identical to SEQ ID NO:1. bead. Nucleic acid sequences containing glycosyl-5hmC The present invention is not limited to the use of particular bases bind to the binding proteins immobilized on the beads. 5hmc modifying agents. In some embodiments, the 5hmc 35 The beads containing the bound nucleic acid sequences modifying agent is an enzyme. In some embodiments, the containing glycosyl-5hmC bases can then be washed to enzyme is a glyosyltransferase. Suitable glycosyltrans remove unbound sequences from the sample and the bound ferases are exemplified in Table 2. Those of skill in the art sequences can be analyzed on the beads or eluted from the will recognize that additional known JBPs and those that beads for further analysis. In some embodiments, the bind may be identified in the future may be tested for use with the 40 ing protein can be covalently attached to a bead or matrix present invention. Additionally, the present invention material Suitable for use in column chromatography. encompasses the use of mutated or modified JBPs which can Samples containing the glycosyl-5hmC bases are applied to be identified by their percent identity with reference columns comprising the beads or matrix. Nucleic acids sequences. In some embodiments, the glycosyltrasferase is containing the glycosyl-5hmC bases are bound by the bind at least 80%, 90%, 95%, or 99% identical to the sequence of 45 ing protein and retaining on the column, allowing separation one or more of the glycosyltransferases. In some embodi from sequences that do not contain glycosyl-5hmC bases. ments, the glycosyltransferase is a T2, T4 or T6 glucosyl The nucleic acids containing the glycosyl-5hmC bases can transferase, or a Uracil glycosylase. In some embodiments, then be eluted from the column and further analyzed. In still the glycosyltransferase is an C- or 3-glucosyltransferase. In other embodiments the binding protein may be associated Some embodiments, the glucosyltransferase is T4 -gluco 50 with another protein that allows separation of the complex. syltransferase (SEQ ID NO:2). In some embodiments, glu For example, the binding protein may be covalently modi cosyltransferase is at least 80%, 90%, 95%, or 99% identical fied with a second binding molecule. Suitable second bind to SEQ ID NO:2. ing molecules include biotin, avidin, haptens, immuno globulins and aptamers. In some of these embodiments, a TABLE 1. 55 nucleic acid glycosyl-FhmC base-binding protein-second J-binding Proteins binding molecule complex is formed. The complex can then Some JBP1 homologues be isolated by utilizing reagents that specifically bind the second binding molecules, for examples, beads, magnetic Expasy beads or columns comprising avidin, biotin, immunoglobu Accession 60 lins specific for the selected hapten, or aptamer binding Number Organism Description partner. Q9U6M2 Crithidia JBP1. CRIFA Thymine dioxygenase JBP1 Methods presented herein ensure that only 5hmC-con fasciculata (EC 1.14.11.6) (J-binding protein 1) A4H5X5 Leishmania Thymine dioxygenase JBP1 (EC taining genomic regions are isolated and identified. Given braziliensis 1.14.11.6) (J-binding protein 1) that the B-gt from T4 specifically catalyzes the glucosylation A4HU70 Leishmania Thymine dioxygenase JBP1 (EC 65 of 5hmC (FIG. 3: Tomaschewski et al. (1985) Nucleic Acids infantum 1.14.11.6) (J-binding protein 1) Res. 13:7551-7568; herein incorporated by reference in its entirety) and JBP1 specifically recognizes the resulting US 9,677,128 B2 13 14 |B-glu-5hmC base (b), the DNA pulled down by JBP1 is quantitation) of 5hmC modification level of an indicator of highly enriched in the 5hmC modification. The isolated a condition of interest, such as a neoplasm (e.g., the 5hmC DNA is then ready for analysis by real-time quantitative level of a CpG island or CpG shore in the coding or PCR, microarray analysis, sequencing by any method (e.g., regulatory region of a gene locus) in a sample. A skilled high-throughput sequencing), or other techniques. The artisan understands that an increased, decreased, informa simple and cost-effective method for identifying genomic tive, or otherwise distinguishably different 5hmC modifica regions containing 5hmC presented herein finds use for tion level is articulated with respect to a reference (e.g., a study of the temporal and spatial patterns of 5hmC residues reference level, a control level, a threshold level, or the like). in genomic regions. For example, in Some embodiments, the For example, the term “elevated 5hmC level” as used herein methods of the present are utilized to isolate DNA that 10 contains 5hmc modification and then the isolated DNA is with respect to the 5hmC status of a gene locus is any 5hmC analyzed to determine the sequence or identity of the iso level that is above a median 5hmC level in a sample from a lated DNA. random population of mammals (e.g., a random population Particular levels (e.g., thresholds, cutoff points) or loca of 10, 20, 30, 40, 50, 100, or 500 mammals) that do not have tions of 5hmC DNA modification may be used that show 15 a neoplasm (e.g., a cancer) or other condition of interest. optimal function with different ethnic groups or sex, differ Elevated levels of 5hmC modification can be any level ent geographic distributions, different stages of disease, provided that the level is greater than a corresponding different degrees of specificity or different degrees of sen reference level. For example, an elevated 5hmC level of a sitivity. Particular levels (e.g., thresholds, cutoff points) of locus of interest can be 0.5, 1,2,3,4,5,6,7,8,9, 10, or more 5hmC DNA modification may also be developed which are fold greater than the reference level 5hmC observed in a particularly sensitive to the effect of therapeutic regimens on normal sample. It is noted that a reference level can be any disease progression. Subjects may be monitored after a amount. The term "elevated 5hmC score” as used herein therapy and/or course of action to determine the effective with respect to detected 5hmC events in a matrix panel of ness of that specific therapy and/or course of action. particular nucleic acid markers is any 5hmC score that is In some embodiments, indicators of biological state (e.g., 25 above a median 5hmC score in a sample from a random cancer) include, for example, epigenetic alterations. Epigen population of mammals (e.g., a random population of 10, 20, etic alterations include but are not limited to DNA methyl 30, 40, 50, 100, or 500 mammals) that do not have a ation (e.g., 5hmC DNA modification). In some embodi neoplasm (e.g., a cancer). An elevated 5hmC score in a ments, the level (e.g., frequency or score) of 5hmC DNA matrix panel of particular nucleic acid markers can be any modification (e.g., increased levels relative to a control, 30 score provided that the score is greater than a corresponding decreased levels relative to a control) is determined without limitation to the technique used for such determining. Meth reference score. For example, an elevated score of 5hmC in ods of the present invention are not limited to particular a locus of interest can be 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or locations of epigenetic (e.g., 5hmC) alterations (e.g., 5hmC more fold greater than the reference 5hmC score observed in in coding or regulatory regions). Altered 5hmC levels may 35 a normal sample. It is noted that a reference score can be any occur in, for example, CpG islands; CpG island shores; or amount that is used for comparison. regions other than CpG islands or CpG island shores. In this Similar considerations apply to assays for decreased lev regard, the present invention finds use in the determination els of 5hmC modifications in a sample, target locus, target of a diagnosis and/or prognosis for cancers such as gastric genomic region and the like. For example, the term cancer, lung cancer, prostate cancer, pancreatic cancer, and 40 “decreased 5hmC level” as used herein with respect to the breast cancer. The present invention also finds use in the 5hmC status of a gene locus is any 5hmC level that is below determination of a diagnosis or prognosis for other diseases a median 5hmC level in a sample from a random population such as diabetes. In other embodiments, the methods may be of mammals (e.g., a random population of 10, 20, 30, 40, 50. used to screen Subjects to determine if administration of a 100, or 500 mammals) that do not have a neoplasm (e.g., a particular drug or treatment is indicated. 45 cancer). Decreased levels of 5hmC modification can be any In certain embodiments, methods, kits, and systems of the level provided that the level is less than a corresponding present invention involve determination of 5hmC state of a reference level. For example, a decreased 5hmC level of a locus of interest (e.g., in human DNA) (e.g., in human DNA locus of interest can be 0.5, 1,2,3,4,5,6,7,8,9, 10, or more extracted from a tissue sample, from a tumor sample, etc). fold less than the reference level 5hmC observed in a normal Any appropriate method can be used to quantitate 5hmC 50 sample. It is noted that a reference level can be any amount. and/or determine the location of the modification. PCR The term “decreased 5hmC score” as used herein with techniques, for example, can be used to determine which respect to detected 5hmC events in a matrix panel of residues are modified by 5hmC following isolation of the particular nucleic acid markers is any 5hmC score that is nucleic acid sequences by the processes described above. below a median 5hmC score in a sample from a random PCR reactions can contain, for example, 10 uL of captured 55 population of mammals (e.g., a random population of 10, 20, DNA, IX PCR buffer, 0.2 mM dNTPs, 0.5 uM sequence 30, 40, 50, 100, or 500 mammals) that do not have a specific primers (e.g., primers flanking a CpG island or CpG neoplasm (e.g., a cancer). A decreased 5hmC Score in a shore within the captured DNA), and 5 units DNA poly matrix panel of particular nucleic acid markers can be any merase (e.g., Amplitaq DNA polymerase from PE Applied score provided that the score is greater than a corresponding Biosystems, Norwalk, Conn.) in a total volume of 50 ul. A 60 reference score. For example, a decreased score of 5hmC in typical PCR protocol can include, for example, an initial a locus of interest can be 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or denaturation step at 94°C. for 5 min, 40 amplification cycles more fold less than the reference 5hmC score observed in a consisting of 1 minute at 94° C. 1 minute at 60° C., and 1 normal sample. It is noted that a reference score can be any minute at 72°C., and a final extension step at 72° C. for 5 amount that is used for comparison. minutes. 65 The methods are not limited to a particular type of In some embodiments, methods of the present invention mammal. In some embodiments, the mammal is a human. In involve the determination (e.g., assessment, ascertaining, Some embodiments, the neoplasm is premalignant. In some US 9,677,128 B2 15 16 embodiments, the neoplasm is malignant. In some embodi into a computer-based record. In some cases, information is ments, the neoplasm is cancer without regard to stage (e.g., communicated by making a physical alteration to medical or stage I, II, III, or IV). research records. For example, a medical professional can The present invention also provides methods and mate make a permanent notation or flag a medical record for rials to assist medical or research professionals in determin communicating a diagnosis to other medical professionals ing whether or not a mammal has a neoplasm (e.g., cancer). reviewing the record. In addition, any type of communica Medical professionals can be, for example, doctors, nurses, tion can be used to communicate the information. For medical laboratory technologists, and pharmacists. Research example, mail, e-mail, telephone, and face-to-face interac professionals can be, for example, principle investigators, tions can be used. The information also can be communi research technicians, postdoctoral trainees, and graduate 10 students. A professional can be assisted by (1) determining cated to a professional by making that information electroni the ratio of 5hmC and/or other markers in a sample, and (2) cally available to the professional. For example, the communicating information about the ratio to that profes information can be communicated to a professional by sional, for example. placing the information on a computer database Such that the After the level (e.g., score or frequency) of particular 15 professional can access the information. In addition, the 5hmC modification in a sample is reported, a medical information can be communicated to a hospital, clinic, or professional can take one or more actions that can affect research facility serving as an agent for the professional. patient care. For example, a medical professional can record It is noted that a single sample can be analyzed for one the results in a patient's medical record. In some cases, a neoplasm-specific marker or for multiple neoplasm-specific medical professional can record a diagnosis of a neoplasia, markers. In preferred embodiments, a single sample is or otherwise transform the patient's medical record, to analyzed for multiple neoplasm-specific markers, for reflect the patient's medical condition. In some cases, a example, using multi-marker assays. In addition, multiple medical professional can review and evaluate a patients samples can be collected for a single mammal and analyzed entire medical record, and assess multiple treatment strate as described herein. In some embodiments, a sample is split gies, for clinical intervention of a patient’s condition. In 25 into first and second portions, where the first portion under Some cases, a medical professional can record a prediction goes cytological analysis and the second portion undergoes of tumor occurrence with the reported indicators. In some further purification or processing (e.g., sequence-specific cases, a medical professional can review and evaluate a capture step(s) (e.g., for isolation of specific loci for analysis patient’s entire medical record and assess multiple treatment of 5hmC levels). In some embodiments, the sample under strategies, for clinical intervention of a patient’s condition. 30 goes one or more preprocessing steps before being split into A medical professional can initiate or modify treatment of portions. In some embodiments, the sample is treated, a neoplasm after receiving information regarding the level handled, or preserved in a manner that promotes DNA (score, frequency) associated with 5hmClevel in a patients integrity and/or inhibits DNA degradation (e.g., through use urine sample. In some cases, a medical professional can of storage buffers with stabilizing agents (e.g., chelating compare previous reports and the recently communicated 35 agents, DNase inhibitors) or handling or processing tech level (score, frequency) of 5hmC modification, and recom niques that promote DNA integrity (e.g., immediate pro mend a change in therapy. In some cases, a medical profes cessing or storage at low temperature (e.g., -80 degrees C.)). sional can enroll a patient in a clinical trial for novel In some embodiments, all the basic essential materials therapeutic intervention of neoplasm. In some cases, a and reagents required for detecting neoplasia through detect medical professional can elect waiting to begin therapy until 40 ing both the level (presence, absence, score, frequency) of the patient’s symptoms require clinical intervention. markers in a sample obtained from the mammal are A medical professional can communicate the assay results assembled together in a kit. Such kits generally comprise, to a patient or a patient’s family. In some cases, a medical for example, reagents useful, Sufficient, or necessary for professional can provide a patient and/or a patient’s family detecting and/or characterizing one or more markers (e.g., with information regarding neoplasia, including treatment 45 epigenetic markers; 5hmC modifications) specific for a options, prognosis, and referrals to specialists, e.g., oncolo neoplasm. In some embodiments, the kits contain enzymes gists and/or radiologists. In some cases, a medical profes Suitable for amplifying nucleic acids including various poly sional can provide a copy of a patient's medical records to merases, deoxynucleotides and buffers to provide the nec communicate assay results to a specialist. A research pro essary reaction mixture for amplification. In some embodi fessional can apply information regarding a Subjects assay 50 ments, the kits of the present invention include a means for results to advance neoplasm research. For example, a containing the reagents in close confinement for commercial researcher can compile data on the assay results, with sale Such as, e.g., injection or blow-molded plastic contain information regarding the efficacy of a drug for treatment of ers into which the desired reagent are retained. Other con neoplasia to identify an effective treatment. In some cases, tainers Suitable for conducting certain steps of the disclosed a research professional can obtain assay results to evaluate 55 methods also may be provided. a subjects enrollment, or continued participation in a In some embodiments, the methods disclosed herein are research study or clinical trial. In some cases, a research useful in monitoring the treatment of neoplasia (e.g., can professional can classify the severity of a subjects condi cer). For example, in Some embodiments, the methods may tion, based on assay results. In some cases, a research be performed immediately before, during and/or after a professional can communicate a subjects assay results to a 60 treatment to monitor treatment Success. In some embodi medical professional. In some cases, a research professional ments, the methods are performed at intervals on disease can refer a subject to a medical professional for clinical free patients to ensure treatment Success. assessment of neoplasia, and treatment thereof. Any appro The present invention also provides a variety of com priate method can be used to communicate information to puter-related embodiments. Specifically, in some embodi another person (e.g., a professional). For example, informa 65 ments the invention provides computer programming for tion can be given directly or indirectly to a professional. For analyzing and comparing a pattern of neoplasm-specific example, a laboratory technician can input the assay results marker detection results in a sample obtained from a subject US 9,677,128 B2 17 18 to, for example, a library of Such marker patterns known to gene products assayed), and test result data (e.g., the pattern be indicative of the presence or absence of a neoplasm, or a of neoplasm-specific marker (e.g., epigenetic marker, 5hmC particular stage or neoplasm. modification) detection results from a sample). This infor In Some embodiments, the present invention provides mation received can be stored at least temporarily in a computer programming for analyzing and comparing a first database, and data analyzed in comparison to a library of and a second pattern of neoplasm-specific marker detection marker patterns known to be indicative of the presence or results from a sample taken at least two different time points. absence of a pre-cancerous condition, or known to be In some embodiments, the first pattern may be indicative of indicative of a stage and/or grade of cancer. a pre-cancerous condition and/or low risk condition for Part or all of the input and output data can also be sent cancer and/or progression from a pre-cancerous condition to 10 electronically; certain output data (e.g., reports) can be sent a cancerous condition. In Such embodiments, the comparing electronically or telephonically (e.g., by facsimile, e.g., provides for monitoring of the progression of the condition using devices such as fax back). Exemplary output receiving from the first time point to the second time point. devices can include a display element, a printer, a facsimile In yet another embodiment, the invention provides com device and the like. Electronic forms of transmission and/or puter programming for analyzing and comparing a pattern of 15 neoplasm-specific marker detection results from a sample to display can include email, interactive television, and the a library of neoplasm-specific marker patterns known to be like. In some embodiments, all or a portion of the input data indicative of the presence or absence of a cancer, wherein and/or all or a portion of the output data (e.g., usually at least the comparing provides, for example, a differential diagnosis the library of the pattern of neoplasm-specific marker detec between a benign neoplasm, and an aggressively malignant tion results known to be indicative of the presence or neoplasm (e.g., the marker pattern provides for staging absence of a pre-cancerous condition) are maintained on a and/or grading of the cancerous condition). server for access, e.g., confidential access. The results may The methods and systems described herein can be imple be accessed or sent to professionals as desired. mented in numerous ways. In one embodiment, the methods A system for use in the methods described herein gener involve use of a communications infrastructure, for example 25 ally includes at least one computer processor (e.g., where the the internet. Several embodiments of the invention are method is carried out in its entirety at a single site) or at least discussed below. It is also to be understood that the present two networked computer processors (e.g., where detected invention may be implemented in various forms of hard marker data for a sample obtained from a Subject is to be ware, Software, firmware, processors, distributed servers input by a user (e.g., a technician or someone performing the (e.g., as used in cloud computing) or a combination thereof. 30 assays)) and transmitted to a remote site to a second com The methods and systems described herein can be imple puter processor for analysis (e.g., where the pattern of mented as a combination of hardware and software. The neoplasm-specific marker) detection results is compared to Software can be implemented as an application program a library of patterns known to be indicative of the presence tangibly embodied on a program storage device, or different or absence of a pre-cancerous condition), where the first and portions of the Software implemented in the user's comput 35 second computer processors are connected by a network, ing environment (e.g., as an applet) and on the reviewers e.g., via an intranet or internet). The system can also include computing environment, where the reviewer may be located a user component(s) for input; and a reviewer component(s) at a remote site (e.g., at a service provider's facility). for review of data, and generation of reports, including For example, during or after data input by the user, detection of a pre-cancerous condition, staging and/or grad portions of the data processing can be performed in the 40 ing of a neoplasm, or monitoring the progression of a user-side computing environment. For example, the user pre-cancerous condition or a neoplasm. Additional compo side computing environment can be programmed to provide nents of the system can include a server component(s); and for defined test codes to denote platform, carrier/diagnostic a database(s) for storing data (e.g., as in a database of report test, or both; processing of data using defined flags, and/or elements, e.g., a library of marker patterns known to be generation of flag configurations, where the responses are 45 indicative of the presence or absence of a pre-cancerous transmitted as processed or partially processed responses to condition and/or known to be indicative of a grade and/or a the reviewer's computing environment in the form of test stage of a neoplasm, or a relational database (RDB) which code and flag configurations for Subsequent execution of one can include data input by the user and data output. The or more algorithms to provide a results and/or generate a computer processors can be processors that are typically report in the reviewer's computing environment. 50 found in personal desktop computers (e.g., IBM, Dell, The application program for executing the algorithms Macintosh), portable computers, mainframes, minicomput described herein may be uploaded to, and executed by, a ers, or other computing devices. machine comprising any suitable architecture. In general, The input components can be complete, stand-alone per the machine involves a computer platform having hardware Sonal computers offering a full range of power and features Such as one or more central processing units (CPU), a 55 to run applications. The user component usually operates random access memory (RAM), and input/output (I/O) under any desired operating system and includes a commu interface(s). The computer platform also includes an oper nication element (e.g., a modem or other hardware for ating system and microinstruction code. The various pro connecting to a network), one or more input devices (e.g., a cesses and functions described herein may either be part of keyboard, mouse, keypad, or other device used to transfer the microinstruction code or part of the application program 60 information or commands), a storage element (e.g., a hard (or a combination thereof) which is executed via the oper drive or other computer-readable, computer-writable storage ating system. In addition, various other peripheral devices medium), and a display element (e.g., a monitor, television, may be connected to the computer platform such as an LCD, LED, or other display device that conveys information additional data storage device and a printing device. to the user). The user enters input commands into the As a computer system, the system generally includes a 65 computer processor through an input device. Generally, the processor unit. The processor unit operates to receive infor user interface is a graphical user interface (GUI) written for mation, which generally includes test data (e.g., specific web browser applications. US 9,677,128 B2 19 20 The server component(s) can be a personal computer, a for developing cancer, established norm for Subjects diag minicomputer, or a mainframe, or distributed across mul nosed with various stages of cancer). In some embodiments, tiple servers (e.g., as in cloud computing applications) and the risk profile indicates a subject’s risk for developing offers data management, information sharing between cli cancer or a Subject's risk for re-developing cancer. In some ents, network administration and security. The application embodiments, the risk profile indicates a subject to be, for and any databases used can be on the same or different example, a very low, a low, a moderate, a high, and a very servers. Other computing arrangements for the user and high chance of developing or re-developing cancer. In some server(s), including processing on a single machine Such as embodiments, a health care provider (e.g., an oncologist) a mainframe, a collection of machines, or other Suitable will use such a risk profile in determining a course of configuration are contemplated. In general, the user and 10 treatment or intervention (e.g., biopsy, wait and see, referral server machines work together to accomplish the processing to an oncologist, referral to a Surgeon, etc.). of the present invention. Other diseases and disorders that may be diagnosed or Where used, the database(s) is usually connected to the prognosed with the methods, reagents and systems of the database server component and can be any device which will present invention include, but are not limited to, Prader hold data. For example, the database can be any magnetic or 15 Willi syndrome, Angelman syndrome, Beckwith-Wiede optical storing device for a computer (e.g., CDROM, inter mann syndrome, Pseudohypoparathyroidism, Russell-Silver nal hard drive, tape drive). The database can be located syndrome, ICF syndrome, Rett syndrome, C-thalassemia/ remote to the server component (with access via a network, mental retardation, X-linked (ATR-X). Immunoosseous dys modem, etc.) or locally to the server component. plasia, Schimke type, Rubinstein-Taybi syndrome, MTHFR Where used in the system and methods, the database can deficiency, Recurrent hydatidiform mole, Fragile X mental be a relational database that is organized and accessed retardation syndrome, Deletion LCR YöB- and Öf-thalas according to relationships between data items. The relational semia, FSH dystrophy, disorders of XIC, Schimke immu database is generally composed of a plurality of tables noosseous dysplasia (SIOD), Sotos syndrome, Atrichia, (entities). The rows of a table represent records (collections X-linked Emery-Dreifuss muscular dystrophy (EDMD), of information about separate items) and the columns rep 25 Autosomal EDMD, CMT2B1, mandibuloacral dysplasia, resent fields (particular attributes of a record). In its simplest limb-girdle muscular dystrophy type 1B, familial partial conception, the relational database is a collection of data lipodystrophy, dilated cardiomyopathy 1A, Hutchinson-Gil entries that “relate” to each other through at least one ford progeria syndrome, and Pelger-Huet anomaly. common field. The following examples are provided in order to demon Additional workstations equipped with computers and 30 strate and further illustrate certain preferred embodiments printers may be used at point of service to enter data and, in and aspects of the present invention and are not to be some embodiments, generate appropriate reports, if desired. construed as limiting the scope thereof. The computer(s) can have a shortcut (e.g., on the desktop) to launch the application to facilitate initiation of data entry, EXAMPLE 1. transmission, analysis, report receipt, etc. as desired. 35 The present invention is useful for both the diagnosing Efficient and Selective Identification of diseases and disorders in a Subject as well as determining the 5-hydroxymethylcytosine in Genomic DNA prognosis of a Subject. The methods, reagents and systems of the present invention are applicable to a broad variety of Materials and Methods diseases and disorders. In certain embodiments, the present 40 Protein Purifications invention provides methods for obtaining a subjects risk B-glucosyltransferase profile for developing neoplasm (e.g., cancer). In some The bgt gene was amplified from T4 bacteriophage DNA embodiments, such methods involve obtaining a sample and was cloned into pET28a. Cultures of Rosetta (DE3) from a Subject (e.g., a human at risk for developing cancer, pIysS harboring pET28a-bgt were grown in 500 ml Studier a human undergoing a routine physical examination), detect 45 auto-inducing media to an Asoo of 0.6 at 37° C. followed ing the presence, absence, or level (e.g., 5hmC modification by a shift to 18°C. for 20 hrs. The cells were harvested by frequency or score) of one or more markers specific for a centrifugation and suspended in 10 ml B-gt lysis buffer (500 neoplasm in or associated with the sample (e.g., specific for mM NaCl, 25 mM Hepes KOH (pH 7.9), 5 mM imidazole, a neoplasm) in the sample, and generating a risk profile for 10% (v/v) glycerol). All subsequent steps were carried out at developing neoplasm (e.g., cancer) based upon the detected 50 4°C. unless otherwise specified. level (score, frequency) or presence or absence of the After suspension in lysis buffer, cells were incubated for indicators of neoplasia. For example, in Some embodiments, 1 hr with lysozyme added to a final concentration of 200 a generated risk profile will change depending upon specific ug/ml. After 1 hr Triton-X100 was added to the lysate to a markers and detected as present or absent or at defined final concentration of 0.1% (v/v) and the lysate was heated threshold levels. The present invention is not limited to a 55 briefly to 20° C. The lysate viscosity was reduced by particular manner of generating the risk profile. In some Sonication on ice. The lysate was clarified by centrifugation embodiments, a processor (e.g., computer) is used to gen for 60 min at 20,000 rpm in a SS34 rotor (Sorvall). After erate Such a risk profile. In some embodiments, the proces centrifugation the Supernatant was applied and batch bound Sor uses an algorithm (e.g., Software) specific for interpret for 1 hr to 2 ml of Talon Resin (GE Healthcare) equilibrated ing the presence and absence of specific 5hmC modifications 60 in 3-gt lysis buffer. The lysate was applied to a column and as determined with the methods of the present invention. In washed withf-gt lysis buffer until the flow through had had Some embodiments, the presence and absence of specific an Aso of 0. B-gt was eluted with 20 ml 3-gt lysis containing markers as determined with the methods of the present 200 mM imidazole. invention are inputted into Such an algorithm, and the risk Fractions containing the highest concentration of protein profile is reported based upon a comparison of Such input 65 were pooled and dialyzed 3 times against 100 volumes of with established norms (e.g., established norm for pre |B-gt low salt buffer (50 mM. NaCl, 25 mM Hepes KOH (pH cancerous condition, established norm for various risk levels 7.9), 0.5 mM EDTA (pH 8.0), 10% (v/v) glycerol). The pool US 9,677,128 B2 21 22 was then applied to a Resource S cation exchange column. brook et al (1989) Molecular Cloning, a Laboratory The column was eluted with a 50 to 500 mM linear NaCl Manual. 2" Ed., Cold Spring Harbor Laboratory Press). gradient. Relevant fractions were pooled and dialyzed PCR Amplified Substrates against 100 volumes B-gt storage buffer (250 mM NaCl, 25 Substrates containing cytosine residues or 5meC residues mM Hepes KOH (pH 7.9), 1 mM EDTA (pH 8.0), 50% (v/v) 5 were amplified from puC18 or from specific mouse glycerol). genomic regions using a PCR reaction containing unmodi J-Binding Protein 1 fied dNTPs with Pfu Turbo (Stratagene) according to the The gene encoding J-binding protein 1 (JBP1) from C. manufacturers instructions. 5meC residues were created by fasciculata was synthesized by GeneArt (Germany). After incubating the PCR product amplified using unmodified synthesis the gene was cloned into pET28a. Cultures of 10 dNTPs with M. SssI methyltransferase (NEB) according to Rosetta(DE3)plysS harboring pET28a-JBP1 were grown in the manufacturers instructions. Substrates containing 5hmC 500 ml Studier auto-inducing media to an ODAsoo of 0.6 at residues or B-glu-5hmC residues were amplified as other 37° C. followed by a shift to 18°C. for 20 hrs. The cells were substrates except that d5hmCTP (Bioline) was used in place harvested by centrifugation and suspended in 10 ml JEBP1 of dCTP. B-glu-5hmC substrates were generated by incu lysis buffer (500 mM. NaCl, 25 mM Tris HCl (pH 7.5), 5 mM 15 bating the PCR product created using d5hmCTP with 10 ug imidazole, 10% (v/v) glycerol). All subsequent steps were B-gt, 20 uM UDP-glucose (Sigma) in B-gt reaction buffer carried out at 4°C. unless otherwise specified. (20 mM KPO (pH 8.0), 25 mM MgCl). Substrates were After suspension in JBP1 lysis buffer, cells were incu purified on a 0.8% agarose gel and excised using a gel bated for 1 hr with lysozyme added to a final concentration extraction kit (Qiagen). of 200 ug/ml. After 1 h, Triton-X100 was added to the lysate Radiolabeled Substrates to a final concentration of 0.1% (v/v) and the lysate was Reactions (50 ul) containing 1 g DNA substrate, 10 uCi heated briefly to 20°C. The lysate viscosity was reduced by Y-LPATP, 2.5 units T4 polynucleotide kinase (NEB) and Sonication on ice. The lysate was clarified by centrifugation buffer supplied by the manufacturer were incubated at 37°C. for 10 min at 10,000 rpm in a SS34 rotor (Sorvall). After for 30 min. After 30 min 100 umol ATP was added and the centrifugation the Supernatant was applied and batch bound 25 reaction was allowed to proceed for 10 min. Enzyme and for 1 hr to 2 ml of Talon Resin (GE Healthcare) equilibrated free nucleotide was removed using a nucleotide removal kit in JBP1 lysis buffer. The lysate was applied to a column and (Qiagen). washed with JBP1 lysis buffer until the flow through had an B-gt Specificity and Activity Assay ODAs of 0. JBP1 was eluted with 20 ml JBP1 lysis buffer Reactions (50 ul) containing 1 pmol double stranded containing 200 mM imidazole. Fractions containing the 30 oligonucleotide substrate, 20 Mm UDP-Glucose, 10 ug B-gt, highest concentration of protein were pooled and concen and 3-gt reaction buffer were incubated at 37°C. for 60 min. trated to 500 ul using a centricon MWCO30000 according Reactions were terminated by the addition of 10 ul Stop to the manufacturers instructions. The concentrated protein Buffer (20 ug Proteinase K. 50 mM Tris HCl (pH 8.0), 100 was applied to a SuperdeX75 size exclusion column and mM EDTA (pH 8.0), 2% (w/v) SDS) followed by incubation eluted with JBP1 Superdex Buffer (250 mM. NaCl, 25 mM 35 at 42° C. for 60 min. Oligonucleotides were purified by Tris HCl (pH 7.5), 1 mM EDTA (pH 8.0), 1 mM DTT, 10% phenol:CHCl:IAA extraction followed by an ethanol pre (V/v) glycerol. Fractions containing no detectable contami cipitation. Pellets were suspended in 50 ul NEB buffer 4 and nants (as judged by SDS-PAGE) were pooled and 3 times 10 units TaqI (NEB) and were allowed to incubate at 65° C. dialyzed against 100 volumes PBS (150 mM. NaCl, 50 mM for 16 hrs. TaqI was inactivated by heating to 80° C. for 20 KPO (pH 7.2)). 40 min followed by incubation with 2.5 units of shrimp alkaline Coupling JBP1 to Magnetic Beads phosphatase (NEB) for 30 min. The alkaline phosphatase Five milligrams of Epoxy modified magnetic beads was inactivated by heating to 65° C. for 5 min. Reactions (Dynal, Oslo, Norway) were equilibrated according to the were radiolabeled by adding 2.5 units T4 polynucleotide manufacturer's instructions and suspended in 60 ul PBS. kinase, 10 uCi Y-P-ATP, and the volume was raised to 70 100 ug of JBP1 in PBS was added to the beads and the PBS 45 ul with T4 polynucleotide kinase buffer. DNA was cleaned was added until the volume equalled 120 ul, followed by the by phenol:CHCl:IAA (isoamyl alcohol) extraction fol addition of 40 ul 4 M (NHA)SO. The bead/JBP1 solution lowed by an ethanol precipitation. Pellets were suspended in was slowly rotated at 4°C. for 48 h. After incubation, the 5 ul DNase I buffer and 0.2 units DNase I (NEB) and 0.2 protein not bound to the beads was removed and binding units Snake Venom phosphodiesterase (Worthington) was efficiencies were calculated (typical binding efficiencies 50 added to the reactions. Reactions were incubated at 25° C. were between 70-80%). The beads were then blocked with for at least 4 hours. 0.5ul of each reaction was spotted onto 300 ul binding buffer (2 mM EDTA (pH 8.0), 10 mM Tris a 100 umx20 cmx20 cm cellulose thin layer plate. The (pH 8.0), 150 mM. NaCl, 0.02% (v/v) Tween 20, 1 mg/ml nucleotides were resolved in two phases: the first phase BSA). After blocking beads were washed 3 times with 300 contained isobutyric acid: water:NH (66:20:1) the second ul binding buffer and finally suspended in 300 ul binding 55 phase contained 4 Mammonium Sulfate: 1 M acetic acid: buffer. Beads were prepared freshly for each experiment as isopropanol (80:17:2). Radioactive spots were identified it was found that several freeze thaw cycles dramatically using a phosphor screen. reduced binding capacity. LC/MS/MS Analysis Substrates DNA substrates were composed of 2.7 kb linear puC18 Oligonucleotide Substrates 60 PCR products that contained only cytosines, 5meC at CpG Annealing reactions (50 ul) containing 100 pmol of each regions, only 5hmC, or were treated with 100 ng B-gt in the complementary oligonucleotide, 40 mM Tris-HCl (pH 7.5), presence of 20 uMUDP-glucose created as described above. 10 mM MgCl, 1 mM dithiothreitol, 50 mM NaCl were Substrates were hydrolyzed to nucleosides by incubation heated to 95°C. in a thermocycler for 5 minutes followed with nuclease P1, Snake venom phosphodiesterase, and decrease in temperature of 1° C./min to 25° C. Duplex 65 alkaline phosphatase (Sigma-Aldrich, St. Louis, Mo.) as oligonucleotides were then purified from a 15% nondena described (Crain (1990) Methods Enzymol. 193:782-790: turing PAGE according to the protocol established by Sam herein incorporated by reference in its entirety). Three US 9,677,128 B2 23 24 volumes of methanol were added to the reactions after cytosine was calculated by dividing each pulled down digestion was completed and the reactions were centrifuged amount by the amount of cytosine containing DNA pulled at 16,000 g for 30 min. The supernatants were dried under down. vacuum and the resulting residues were dissolved in 50 ul Results 5% (v/v) methanol for analysis by LC/MS/MS. Chromato A brief overview of a scheme of one embodiment of the graphic separation of nucleosides was performed using a present invention to specifically identify genomic regions Shimadzu Prominence HPLC system with a Zorbax SB-C18 containing 5hmC residues is shown (FIG. 1). Briefly, the 2.1 x 150 mm i.d. (3.5 um) reverse phase column equipped B-gt was used to specifically modify 5hmC residues creating with a Eclipse XDB-C8 2.1 x 12.5 mm i.d. (5 um) guard B-glu-5hmC residues. Following the complete conversion of 10 5hmC to 3-glu-5hmC, DNA was incubated with JBP1 column (Agilent Technologies, USA), with a flow rate of 0.2 coated magnetic beads. JBP1 specifically binds f-glu-5hmC ml/min at ambient temperature. The mobile phase consisted containing DNA, allowing for efficient and selective iden of A (0.1% formic acid in water) and B (0.1% formic acid tification of DNA containing 5hmC. The resulting DNA was in methanol), starting with 95% A/5% B for 0.5 min, purified and ready for analysis by various methods. followed by a 6.5-min linear gradient of 5-50% B, 2 min 15 The B-Glucosyltransferase Specifically Glucosylates with 50% B and 6 min re-equilibration with the initial 5-Hydroxymethylcytosine mobile phase conditions. Online mass spectrometry detec Recombinant B-gt from phage T4 was expressed and tion was performed using an Applied Biosystems/MDS purified to greater than 90% purity (FIG. 2; lane 1) by Cobalt Sciex 5000 triple quadrupole mass spectrometer (Applied affinity chromatography, followed by cation exchange chro Biosystems Sciex, USA) with TurbolonSpray probe operat matography. The B-gt specifically glucosylates 5hmC resi ing in positive electrospray ionization mode. The deoxyri dues (Tomaschewski et al. (1985) Nucleic Acids Res. bonucleosides were monitored by multiple reaction moni 13:7551+7568: Georgopoulos et al. (1971) Virology 44:271 toring using the mass transitions 228.2-> 112.1 (dC). 285; Kornberg et al. (1961) J. Biol. Chem. 236:1487-1493; 242.2->126.1 (5-me(dC)), 258.2->142.1 (5-hm(dC)), and each herein incorporated by reference in its entirety); yet, the 420.2->304.1 (5-Gly-hm(dC)). 25 efficiency and specificity of the purified enzyme was tested JBP1 Pull Down Reactions to ensure that the purification scheme resulted in active T4 Pull down reactions (200 ul) containing 132 ng JBP1 B-gt. Three double-stranded 37-bp substrates were created: coated beads and 10 ng radiolabeled DNA when using PCR one contained unmodified cytosine at a TaqI restriction site, amplified substrates or 208 fmol 37mer, binding buffer (2 the second contained 5meC at the Taq.I site, and the third 30 contained a 5hmC at the Taq.I site. Each of the three mM EDTA (pH 8.0), 10 mM Tris HCl (pH 8.0) 150 mM substrates was treated with purified f-gt in the presence of NaCl, 0.02% (v/v) Tween 20, 1 mg/ml BSA) were incubated UDP-glucose or incubated in the absence of B-gt. The for 60 min at room temperature with gentle rotation. After resulting products were digested with TaqI, treated with incubation the Supernatant was collected and counted by alkaline phosphatase, 5'-end labelled using T4 polynucle scintillation counting. The JBP1 coated beads were washed 35 otide kinase, and digested to 5' mononucleotides using 3 times with 300 ul binding buffer without BSA. Radioac DNase I and Snake Venom Phosphodiesterase. The resulting tivity bound to the beads was counted using Scintillation nucleotides were then resolved using two-dimensional thin counting. layer chromatography (TLC). From the TLC analysis it was qPCR Varying Bead Concentrations deduced that the B-gt has no effect on the substrates that lack Reactions (200 ul) contained 10 ng of each DNA substrate 40 5hmC (FIG. 3; left and center); however, upon treatment combined in one vial, with binding buffer and varying with B-gt the 5hmC spot is absent from the TLC plate (FIG. amounts of JBP1 modified magnetic beads were incubated 3; right). This result confirms that the 5hmC has been for 60 min at room temperature with gentle rotation. After specifically modified by the B-gt, demonstrating that B-gt is washing 3 times with 300 ul binding buffer without BSA and active. The cytosine and thymidine spots are good reference the beads were suspended in 90 ul binding buffer and 10 ul 45 markers and are present because the Substrates terminate stop buffer was added (20 g/ml Proteinase K. 50 mM Tris with 5'-cytosine and 5'-thymidine residues. Additionally, HCl (pH 8.0), 100 mM EDTA (pH 8.0), 2% (w/v) SDS) and mass spectrometry was performed on each of the Substrates allowed to incubate at 42° C. for 60 min. After incubation to reinforce that the B-gt can modify 5hmC residues at nearly the DNA was cleaned using a PCR clean kit (Qiagen) and 100% efficiency (FIG. 3b and FIG. 4). the amount of each DNA pulled down relative to the input 50 JPB 1 from C. fasciculata Can Specifically Pull Down was measured by quantitative real time PCR. B-glucosyl-5hmC Complete Glucosylation and Pull Down Assay Recombinant C. fasciculata JBP1 containing a his-tag Reactions (50 ul) containing 20 ng of each of three DNA was purified to greater than 90% homogeneity by cobalt Substrates (unmodified cytosine, 5meC, 5hmC genomic affinity chromatography followed by size exclusion chro regions combined in one vial), 20 uM UDP-Glucose, and 55 matography. Fractions included in the final pool of purified B-gt reaction buffer, and either 10 Jug B-gt or B-gt storage JBP1 were determined to be free of detectable nucleases buffer were incubated at 37° C. for 30 minutes. Reactions (FIG. 2; lane 2). Following purification, JBP1 was cova were stopped by the addition of 10 ul stop buffer. DNA was lently linked to epoxy modified magnetic beads as described purified using a PCR clean kit (Qiagen). The cleaned DNA Supra. was incubated with 20 ug JBP1 coated magnetic beads in 60 JBP1 coated magnetic beads were incubated with four 200 ul binding buffer for 60 minutes with gentle rotation. different radiolabeled substrates and precipitated using a Beads were washed 3 times with 300 ul binding buffer magnet. Each substrate was a 2.7 kb linear PCR product without BSA and the beads were suspended in 90 ul binding created using puC18 as a template. The Substrates contained buffer and incubated at 42° C. for 60 min with 10 ul stop normal adenines (A), guanines (G) and thymidines (T) and buffer. DNA was then cleaned using a PCR clean kit. The 65 either: only cytosine residues, 5meC residues in CpG amount of each DNA pulled down relative to the input was sequences, only 5hmC residues, or B-glu-5hmC residues quantified using real time PCR. The fold enrichment above produced by incubating the 5hmC substrate with the f-gt. US 9,677,128 B2 25 26 JBP1 coated magnetic beads (132 ng) could effectively pull Strand and the fourth contained a single B-glu-5hmC on each down up to 6% of the DNA that contained the f-glu-5hmC strand. Each of these substrates was 'P end-labelled and modification (FIG. 5a). Additionally, JBP1 had little affinity incubated with 132 ng JBP1 coated beads. After the sub for cytosine, 5meC, or 5hmC containing substrates. Further strate was incubated with JBP1 the bound fraction was more, BSA coated magnetic beads were unable to pull down measured by Scintillation counting. JBP1 coated magnetic any of the Substrates, Suggesting that the binding of B-glu beads pull down the Substrate containing a single B-glu 5hmC is due to JBP1 bound to the magnetic beads. 5hmC efficiently while showing little affinity for the other The Efficiency of 3-gly-5hmC Pull Down is Dependent substrates composed of an identical sequence (FIG. 5c). on the Amount of JBP1 Coated Magnetic Beads JBP1 Facilitated Pull Down of B-gt Treated 5hmC is Not Four different DNA substrates were created by PCR 10 Sequence Specific amplification of 4 different 2 kb mouse genomic regions An assay was performed that mimicked the JBP1 pull (Sub 1: contained only cytosines, Sub2 5me: contained down used on mammalian DNA. Several different mixtures 5meC at CpG dinucleotides, Sub3 5hmC: contained 5hmC of DNA substrates were combined with differing sequences, instead of cytosines, Sub4 B-glu-5hmC: contained B-glu with each Substrate mixture containing unmodified cytosine 5hmC instead of cytosines). These four substrates were 15 substrate, 5meC modified substrate and 5hmC modified incubated together with various concentrations of JBP1 substrate. The substrate mixes were then treated with puri coated magnetic beads. The beads were pulled down and the fied B-gt in the presence of UDP-glucose. The resulting amount of each Substrate captured was quantified using real DNA products were purified and incubated with JBP1 time quantitative PCR. There was a clear dependence on the coated magnetic beads. In all three mixtures tested, the JBP1 amount of JBP1 coated magnetic beads and the pull down coated magnetic beads were able to significantly enrich for efficiency (FIG. 5b). Furthermore it was demonstrated that the DNA substrate that contained the 5hmC modification the highest concentration of JBP1 coated magnetic beads after treatment with the f-gt (FIG. 5d). This result shows could pull down 14% of the B-glu-5hmC DNA and provide that JBP1 binding is not dependent on DNA sequence. an 87-fold enrichment ofB-glu-5hmC modified DNA over All publications and patents mentioned in the above cytosine containing DNA and a 319-fold enrichment over 25 specification are herein incorporated by reference. Various 5meC containing DNA. modifications and variations of the described method and JBP1 Can Pull Down DNA Containing a Single B-glu system of the invention will be apparent to those skilled in 5hmC Modification the art without departing from the scope and spirit of the The experiments described Supra were conducted using invention. Although the invention has been described in Substrates that contained many B-glu-5hmC modifications. It 30 connection with specific preferred embodiments, it should was desirable to determine whether a single B-glu-5hmC be understood that the invention as claimed should not be residue is sufficient for recognition by JBP1 coated magnetic unduly limited to such specific embodiments. Indeed, vari beads. Therefore, four double stranded 37 bp substrates were ous modifications of the described modes for carrying out created: the first containing unmodified cytosines, the sec the invention that are obvious to those skilled in the medical ond containing a single 5meC modification on each strand, 35 sciences are intended to be within the scope of the following the third contained a single 5hmC modification on each claims.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 2

<21 Os SEQ ID NO 1 &211s LENGTH: 83 O 212s. TYPE: PRT <213> ORGANISM: Crithidia fasciculata

<4 OOs SEQUENCE: 1

Met Gly Ser Ser His His His His His His Ser Ser Tell Wall Pro 1. 5 15

Arg Gly Ser His Met Glu Pro Ser Llys Llys Wall Glin Asp Ile 25 3 O

Phe Asn Phe Pro Asp Gly Lys Wa Pro Thir Thr Glu Ala 35

Glu Ala Tyr Wall Asp Ala Luell Ala His Pro Phe Asn. Wall SO 55 60

His Ser Wall Wall Asp Wall Tyr Ser Ala Thr Luell Asp Gly Lys 65 70 8O

Gly Arg Val Ile Gly Wal Met Lieu Arg Llys Ala Lieu Pro Glu His Ala 85 90 95

Thir Thir Ala Ala Gly Lieu. Lieu. Ser Ala Ala Ala Wall Arg Thir Ser Luell 105 110

Arg Ser Ser Met Phe Gly Gly Glu Ser Pro Luel Ser Gly le Ala Gly 115 12O 125 US 9,677,128 B2 27 28 - Continued

Phe Asp Arg Gly Ser Pro Wall Glu Luell Lys Ala Arg Lys Thir 13 O 135 14 O

Ala Phe Thir Glu His Glu Trp Pro Ala Wall Phe Pro Luell 145 150 155 160

Wall Asp Wall Ser Glu Ile Ser Wall Met Pro Glu His Trp 1.65 17O 17s

Ala Ala Glin Asp Ser Ala Ile Pro Asp Ile Wall Arg Ile His Gly Thir 18O 185 19 O

Pro Phe Ser Thir Lell Thir Ile Asn Ser Arg Phe Thir Ala Ser His 195

Thir Asp Ala Gly Asp Phe Asp Gly Gly Tyr Ser Cys Ile Ala Ile 21 O 215 22O

Asp Gly Asp Phe Gly Lell Ala Luell Gly Phe Asp Asp Phe His Wall 225 23 O 235 24 O

Asn Wall Pro Met Glin Pro Arg Asp Wall Luell Wall Phe Asp Ser His 245 250 255

Phe His Ser Asn Ser Glu Lell Glu Ile Ser Pro Thir Glu Glu Trp 26 O 265 27 O

Arg Arg Luell Thir Wall Phe Tyr Arg Ser Ala Lell Gly Glu Pro 285

Ser Ser Ala Glu Tyr Arg Arg Arg Luell Ala Ala Ala Glin Glin Asp 29 O 295 3 OO

Ser Thir Ala Glin Pro Wall Wall Ser Ser Wall Wall Glu Pro Asn Gly 3. OS 310 315

Asn Luell Lys Pro Ser Thir Wall Phe Pro Ile Pro Thir Pro 3.25 330 335

Phe Ala Wall Wall Ala Glin Lell His Arg Luell His His Ala Ala 34 O 345 35. O

Gly Luell Cys Wall His Glu Lell Luell Ala Wall Pro Ser Ser Pro Luell Ala 355 360 365

Wall Luell Luell Phe Gly Glu Arg Luell Ser Ser Asp Gly Ile Pro Luell 37 O 375

Arg Ala Ala Glu Glin Lys Lell Ala Asn Ala Asp Gly Ala Ser Arg 385 390 395 4 OO

Gly Wall Thir Ser Ser Gly Gly Phe Ser Glu Ser Asp Ala Wall Luell Thir 4 OS 415

Thir Ala Wall Glu Ser Luell Glu Arg Asp His Luell Ser Glin 425 43 O

Ile Ser Ala Glu Lell Lell Ala Met Trp Wall Glu Ala Arg His 435 44 O 445

Trp Luell Arg Luell Wall Ala Thir Glu Trp Ala Arg Met Ile Ala Thir Ala 450 45.5 460

Pro Glu Arg Thir Asp Phe Lell Trp Asn Lys Ser Pro Met Asn Thir 465 470

Ala Phe Phe Asp Lell Glu Wall Ala Lys Glin Wall Met Luell Gly Luell 485 490 495

Lell Asp Glu Thir Ala Thir Pro Thir Glu Glu Arg His Phe Trp Ser SOO 505

Wall Ala Ala His Lell His Arg Ala Ala Glu Arg Luell Met Met 515 525

Pro Glu Glu Ala Met Ser Lell Arg Luell ASn Wall Luell Asp 53 O 535 54 O US 9,677,128 B2 29 30 - Continued Phe Ser Phe Gly Gly Thr Arg Tyr Phe Lys Asp Met Pro Val Glu Glu 5.45 550 555 560 Glin Glu Arg Arg Val Ala Arg Lys Ala Ser Ile Glu Glu Ala Arg Arg 565 st O sts Arg Ser Thir Ala Ala Lys Asp Gly Glu Glin Arg Ser Asn Trp Lieu. Thr 58O 585 59 O Asn Asp Ala Phe Asp Tyr Glin Thr Glu Asp Cys Glu Val Asp Tyr Ala 595 6OO 605 Gly His Gly Trp Ala Val Pro Lys Gln His Ala Lys Thr Val Thr Ala 610 615 62O Asn. Wal His Glin Glu Ala Val Ala Ala Thir Thr Glu Ala Val Arg Val 625 630 635 64 O Lieu Val Val Lieu Pro Arg Pro Pro Ser Gly Asp Arg Gly Asp Ala Ala 645 650 655 Val Asp Lieu Pro Llys Glu Val Thir Thir Ser Ala Glu Trp Val Arg Lieu 660 665 67 O Met Ser Ser Pro Ala Val Arg Arg Val Lieu Ala Ala Lys Glin Arg Asn 675 68O 685 Lieu. Thir Lieu. Lieu Pro Asn. Cys Asn Val Glu Ala Val Ser Lieu. Asn. Phe 69 O. 695 7 OO Ala Tyr His Asp Ser Lieu Pro Gln Lys Ala Thr Phe Asp Phe Val Val 7 Os 71O 71s 72O Lieu. Glin His Val Lieu. Ser Ala Met Pro Glu Asp Ala Ile Ala Thr Asp 72 73 O 73 Tyr Val Ser Arg Met Arg Ser Ile Cys Thr Gly Cys Lieu Phe Val Val 740 74. 7 O Glu Thir Asp Val Glin Cys Arg Glin Tyr Phe Thr Lieu. His Tyr Pro Leu 7ss 760 765 Arg Val Glin Tyr Asp Ala Val Ala Pro Ala Phe Phe Glin Lieu. Lieu. His 770 775 78O Arg Cys Ser Tyr Gly. Thr Pro Lieu Ala Arg Thr Arg Thr Lys Ala Glu 78s 79 O 79. 8OO Val Glu Ala Lieu. Phe Pro Phe Val Cys Cys Ala Arg Tyr Lys Lieu. Glin 805 810 815 Gly Ser Pro Met Asn Thr Val Val His Lieu. Leu Ala Lieu. Glu 82O 825 83 O

<210s, SEQ ID NO 2 &211s LENGTH: 371 212. TYPE: PRT <213> ORGANISM: T4 bacteriophage

<4 OOs, SEQUENCE: 2 Met Gly Ser Ser His His His His His His Ser Ser Gly Lieu Val Pro 1. 5 1O 15 Arg Gly Ser His Met Lys Ile Ala Ile Ile Asn Met Gly Asn. Asn Val 2O 25 3O

Ile Asin Phe Llys Thr Val Pro Ser Ser Glu Thir Ile Tyr Lieu. Phe Lys 35 4 O 45

Val Ile Ser Glu Met Gly Lieu. Asn Val Asp Ile Ile Ser Lieu Lys Asn SO 55 6 O

Gly Val Tyr Thr Lys Ser Phe Asp Glu Val Asp Val Asn Asp Tyr Asp 65 70 7s 8O

Arg Lieu. Ile Val Val Asn Ser Ser Ile Asin Phe Phe Gly Gly Llys Pro 85 90 95 US 9,677,128 B2 31 32 - Continued

ASn Luell Ala Ile Lieu. Ser Ala Glin Llys Phe Met Ala Lys Tyr Llys Ser 105 11 O

Ile Tyr Lel Phe Thir Asp Ile Arg Leul Pro Phe Ser Glin Ser 115 12 O 125

Trp Pro Asn Wall Asn Arg Pro Trp Ala Lell Tyr Thir Glu Glu 13 O 135 14 O

Glu Lieu. Luell Ile Ser Pro Ile Wall Ile Ser Glin Gly Ile Asn 145 150 155 160

Lieu. Asp Ile Ala Lys Ala Ala His Llys Val Asp Asn Wall Ile Glu 1.65 17O

Phe Glu Tyr Phe Ile Glu Gln Tyr Ile His Met Asn Asp Phe 18O 185 19 O

Glin Luell Ser Thir Lys Thir Luell Asp Wall Ile Gly Gly 195

Ser Phe Arg Ser Gly Glin Arg Glu Ser Met Wall Glu Phe Lieu. Phe 21 O 215 22O

Asp Thir Gly Luell Asn Ile Glu Phe Phe Gly Asn Ala Arg Glu Glin 225 23 O 235 24 O

Phe Asn Pro Llys Pro Trp Thir Lys Ala Pro Wall Phe Thir Gly 245 250 255

Ile Pro Met Asn Met Wall Ser Glu ASn Ser Glin Ala Ile Ala 26 O 265 27 O

Ala Luell Ile Ile Gly Asp Asn Tyr Asn Asp Asn Phe Ile Thir Lieu. 285

Arg Wall Trp Glu Thir Met Ala Ser Asp Ala Wal Met Lell Ile Asp Glu 29 O 295 3 OO

Glu Phe Asp Thir His Arg Ile Ile Asn Asp Ala Arg Phe Wall 3. OS 310 315 32O

Asn Asn Arg Ala Glu Lell Ile Asp Arg Wall ASn Glu Lieu. His Ser 3.25 330 335

Asp Wall Lieu. Arg Lys Glu Met Luell Ser Ile Glin His Asp Ile Luell Asn 34 O 345 35. O

Thir Arg Ala Lys Ala Glu Trp Glin Asp Ala Phe Ala 355 360 365 Ile Asp Lieu. 37 O

We claim: 4. The isolated biomolecule complex of claim 1, wherein 1. An isolated biomolecule complex comprising a nucleic acid comprising a glucosylated 5-hydroxymethylcytosine 50 said C. fasciculata J binding protein is modified with a base bound to a C. fasciculata J binding protein, wherein the binding molecule. C. fasciculata J binding protein is coated on a Solid Sub Strate. 5. The isolated biomolecule complex of claim 4, wherein 2. The isolated biomolecule complex of claim 1, wherein said binding molecule is selected from the group consisting said solid substrate is a bead. 55 of biotin, avidin, a hapten, an immunoglobulin, and an 3. The isolated biomolecule complex of claim 2, wherein aptamer. said bead is selected from the group consisting of magnetic and polymer beads.