© 2002 Oxford University Press Human Molecular Genetics, 2002, Vol. 11, No. 1 13–21

ARTICLE -wide assessment of replication timing for human 11q and 21q: disease-related in timing-switch regions

Yoshihisa Watanabe1, Asao Fujiyama2,3, Yuta Ichiba1, Masahira Hattori3, Tetsushi Yada3, Yoshiyuki Sakaki3 and Toshimichi Ikemura1,*

1Division of Evolutionary Genetics, Department of Population Genetics and 2Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Yata 1111, Mishima, Shizuoka-ken 411-8540, Japan and 3Human Genome Research Group, RIKEN Genomic Sciences Center, RIKEN, Suehiro-cho 1-7-22, Turumi-ku, Yokohama, Kanagawa-ken 230-0045, Japan

Received August 15, 2001; Revised and Accepted November 2, 2001

The completion of the sequence will greatly accelerate development of a new branch of bioscience and provide fundamental knowledge to biomedical research. We used the sequence information to measure replication timing of the entire lengths of human chromosomes 11q and 21q. Megabase-sized zones that replicate early or late in S phase (thus early/late transition) were defined at the sequence level. Early zones were more GC-rich and -rich than were late zones, and early/late transitions occurred primarily at positions identical to or near GC% transitions. We also found the single nucleotide polymorphism (SNP) frequency was high in the late-replicating and replication-transition regions. In the early/late transition regions, concentrated occurrence of cancer-related genes that include CCND1 encoding cyclin D1 (BCL1), FGF4 (KFGF), TIAM1 and FLI1, was observed. The transition regions contained other disease-related genes including APP associated with familial Alzheimer’s disease (AD1), SOD1 associated with familial amyotrophic lateral sclerosis (ALS1) and PTS associated with phenylketonuria. These findings are discussed with respect to the prediction that increased DNA damage occurs in replication-transition regions. We propose that genome- wide assessment of replication timing serves as an efficient strategy for identifying disease-related genes.

INTRODUCTION the correlations between replication timing and genome characteristics that have biomedical significance and have Mammalian DNA replication proceeds in a precisely regulated been compiled on a genome-wide scale. Ten of the 15 known manner whereby megabase-sized clusters of replicons are oncogenes/tumor suppressor genes on 11q and 21q were found activated at distinct times in S phase (1,2). The replicons are to be located in or in the close vicinity of replication-timing heterogeneous in size, but most are 50–450 kb in length. transition regions. Furthermore, additional 21 genes related to Several to 10 (or more) contiguous replicons with origins that well characterized diseases were found in the transition fire synchronously at a specific time comprise a megabase- regions. This indicates that replication-timing transition regions sized domain that can be visualized cytogenetically as a band are disease-gene-rich and therefore, are of medical significance. by the replication-banding method (3–5). The replication- banding pattern is known to be fairly equivalent to G- or R-banding patterns. In general, G bands, which are composed primarily of RESULTS AT-rich sequences, correspond to late-replicating zones; T bands (a subgroup of R bands), composed of GC-rich General correlation between replication timing and GC% sequences, correspond to zones that replicate very early level (3–11). Ordinary R bands replicate early and are composed of We measured replication timing of chromosomes 21q and 11q both GC- and AT-rich sequences (9–11). In the present study, at the sequence level (14,15) by examining 171 and 278 we measured replication timing for the entire lengths of human sequence-tagged sites (STSs), respectively. The average chromosomes 11q and 21q, large portions of which our group distance between the adjacent STSs was designed to be (RIKEN) sequenced as a member of the International Human ∼200 kb for 21q and thus represents the length of a mid-sized Genome Sequencing Consortium (12,13). We then examined replicon (1–5). The average STS spacing for chromosome 11q,

*To whom correspondence should be addressed. Tel: +81 559 81 6788; Fax: +81 559 81 6794; Email: [email protected] 14 Human Molecular Genetics, 2002, Vol. 11, No. 1 which is larger, was ∼300 kb. It should be noted that as the study progressed, STS markers were designed more closely (e.g. one STS per 100 kb) in and near early/late transition regions to define these regions more accurately. The level of newly replicated DNA from cell-cycle fractionated THP-1 cells (a monocytic leukemia cell line) was quantified for each STS locus by a PCR-based method (14,15) (Fig. 1). To verify the protocol used, two X-chromosome genes F9 and PGK1, for which the replication timing was reported previously (14), were analysed in advance using two different PCR primer sets each (Fig. 1B and C). The replication period (S1–S4) for each locus was assigned on the basis of the cell-cycle fraction showing the highest level of replicated DNA. Replication timings of the two genes thus obtained were consistent with those reported previously (14). Examples of electrophoretic patterns for 11q and 21q loci were presented in Figure 1D. In Figures 2 and 3, early (S1–S2)- and late (S3–S4)-replicated loci are denoted by red and blue ovals, respectively, and the intermediate stage (S2.5) is denoted by green ovals. Several to 10 contiguous loci that covered a few to several megabase areas replicated at identical or similar times during S phase. This provided molecular confirmation of the cytogenetic observation that early or late replicons cluster into a megabase-sized domain in which multiple origins fire fairly synchronously at a specific time during S phase (1–5). General correlation between replication timing and GC% level was observed for both 21q and 11q (Figs 2 and 3; atypical cases such as those observed near the centromeres or telomeres are described in the legends). Early zones were more GC-rich than were late zones. This was especially evident between the adjacent early and late zones, and replication transition occurred at positions identical to or near GC% transitions. The concordant transition of replication timing and GC% level is consistent with the molecular characteristics that were predicted for chromosome band boundaries (16,17). The histo- grams in Figure 4A, in which the GC% of BAC sequences containing individual STSs were examined for each timing category, reveal a general correlation between replication timing and GC% level. Late-replicating sequences corres- ponding to the S3 or S4 fraction had major peaks in the AT-rich range (<40% GC). In contrast, the major peaks for the earliest S1 sequences were more GC-rich (47.5–52.5% GC). The GC% of S2 sequences was distributed over a wider range. A megabase-sized zone composed primarily of STSs belonging to one timing category is denoted in italic letters that specify the timing. The S1 and S2 zones are indicated with red and pink horizontal lines, respectively, in Figures 2 and 3. An ideogram of the 11q banding pattern reported by Saccone et al. (10) and Bernardi et al. (11), in which the isochore distribution Figure 1. Determination of replication timing. (A) BrdU-labeled THP-1 cells were flow-sorted into six cell-cycle fractions (G1, S1–S4, G2/M). (B) Determination of was included, is redrawn in Figure 3B. Bands composed replication timing of PGK1 and F9 using two different PCR primer sets each. Each primarily of GC-rich H3 isochores and thus corresponding to quantitative PCR contained BrdU-labeled DNA, two sets of locus-specific primers, T bands are indicated in red; ordinary R bands are indicated plasmid pKF3 DNA and pKF3-specific primers (15). PGK1 primer set 1 (223 bp ′ ′ ′ ′ in white. The ideogram of the isochore distribution and the product): C2 (5 -gggttggggttgcgccttttccaa-3 ), D2 (5 -acgccgcgaaccgcaaggaacct-3 ). F9 primer set 1 (156 bp product): F9F1 (5′-ataccttggctttttgtggattcc-3′), F9R known mapping data compiled for STSs and genes by NCBI (5′-aggaactagcaagagtgaggcct-3′). PGK1 primer set 2 (192 bp product): C2 UniSTS (http://www.ncbi.nlm.nih.gov/sts/) and OMIM (5′-gggttggggttgcgccttttccaa-3′), C2R (5′-taggggcggagcaggaagcgtcgc-3′). F9 (http://www.ncbi.nlm.nih.gov/Omim/) indicate that the S1 zones primer set 2 (244 bp product): F9F2 (5′-taatccttctatcttgaatcttct-3′), F9R correspond primarily to T bands and that the S2 zones correspond (5′-aggaactagcaagagtgaggcct-3′). (C) Quantification of PCR products of PGK1 and F9. PCR products were quantified with FluoroImager. Results of five primarily to ordinary R bands. That the GC% of S2 sequences experiments using BrdU-labeled DNAs isolated from five independent FACS was distributed over a wide range (Fig. 4A) is consistent with sorts are expressed as means with SD values (small bars). (D) Electrophoretic the known complex characteristics of ordinary R bands; patterns for eight loci on 11q and 21q. Human Molecular Genetics, 2002, Vol. 11, No. 1 15

Figure 2. Replication timing and GC% and SNP distributions on chromosome 21q. GC% distribution of the 21q sequence obtained from http:// www.ncbi.nlm.nih.gov/genome/seq/ was calculated with a 500 kb window and a 50 kb step. Start position was chosen according to Hattori et al. (12) and http:// hgp.gsc.riken.go.jp/chr21/chr21map/. Early (S1–S2) and late (S3–S4) replicated loci are indicated by red and blue ovals, respectively. If the difference in levels of replicated DNA between two consecutive cell-cycle fractions was <10%, the oval was placed at an intermediate position (e.g. S2.5, which is denoted by a green oval). Two loci near the centromere were found to replicate later than other S4 loci; a significant level of PCR product was also detected in the G2/M fraction, as observed previously for the FMR1 gene, which is responsible for fragile X syndrome (14,25). The respective blue ovals are placed below the S4 level. The adjacent S4-replicated region contains GC-rich repetitive sequences and is, therefore, more GC-rich than the neighboring S4 sequences. Distribution of SNP frequency was calculated with a 200 kb window and a 20 kb step and is indicated with a bar diagram. A megabase-sized zone that is composed primarily of STSs belonging to one timing category is indicated by a horizontal colored line; the S1 and S2 zones are indicated by red and pink horizontal lines, respectively, and the S3 and S4 zones are indicated by light- and dark-blue lines, respectively. Therefore, a space between the colored horizontal lines corresponds to the replication-transition region and the intermediate space between the edge STS of one timing zone and the adjacent STS in the transition region. Chromosomal band numbers were assigned by referring to NCBI OMIM (http://www.ncbi.nlm.nih.gov/Omim/), UniSTS (http://www.ncbi.nlm.nih.gov/sts/) and GENATLAS (http://www.citi2.fr/GENATLAS/). An ambiguous zone (21q22.1–q22.2) where band assignment differed among the databases was left blank. Positions of genes categorized as 1.1 and 1.2 by Hattori et al. (12) and of pseudogenes (category 5) are marked by a plus sign. although R bands replicate early, they are composed of both replication timing is not merely a reflection of segmental GC% GC- and AT-rich sequences (9,11). This indicates that differential distribution. The S4 and S3 zones, which are shown by dark- and 16 Human Molecular Genetics, 2002, Vol. 11, No. 1

Table 1. Oncogenes/tumor suppressor genes on chromosomes 21q and 11q

Chromosome Gene Transition Gene product, function or disease Trans. cases 21q TIAM1 E/L T-cell lymphoma invasion and metastasis 1 AML1 S1/S2 Runt-related transcription factor 1: oncogene AML1 1061 ERG Oncogene ERG 23 ETS2 Ed(E/L) Oncogene ETS2 11q MEN1 Multiple endocrine neoplasia I RELA Transcription factor NFKB3 FOSL1 FOS-like antigen-1 FGF3 E/L Fibroblast growth factor 3: oncogene INT2 FGF4 E/L Fibroblast growth factor 4: oncogene KFGF CCND1 E/L Cyclin D1: oncogene BCL1 323 EMS1 E/L Oncogene EMS1 DDX6 RNA helicase, 54 kDa: oncogene RCK CBL E/L Oncogene CBL ETS1 E/L Oncogene ETS1 FLI1 E/L Friend leukemia virus integration 1 110

Genes in early/late transitions (E/L) and those near the edge of the early/late transitions [Ed(E/L)] or in S1/S2 transitions (S1/S2) are shown in bold. Numbers of known translocation cases (Trans. cases) are listed. light-blue horizontal lines in Figure 3A, respectively, were slightly higher than that in the S3 and S4 zones (Fig. 4C). found to correspond primarily to G bands composed of AT-rich Histogram analysis in Figure 4D shows also that SNP-rich L isochores and those of composed both L and H isochores regions are more abundant in the late and S3/S4 zones than in (10,11), respectively (Fig. 3B). General correlations between the early zones. Analysis on the incomplete 11q sequence replication timing zones and chromosomal bands were also available gave analogous results. However, from the viewpoint observed for 21q, which has a much simpler banding pattern of statistical analysis, results were too preliminary to be (10,11) (data not shown). Strehl et al. (18) have reported published because of a vast amount of undetermined bases and correlation of replication timing zones with chromosomal abnormally long stretches of SNP-deficient sequence. bands and identified a band transition in the 5 Mb region on Statistical analyses presented in Figure 4C and D support the human chromosome 13q. results found in the SNP distribution on 21q presented in Figure 2. The present findings are consistent with the prediction Gene and single nucleotide polymorphism (SNP) densities that the mutation rate in late-replicating zones may be higher in individual timing categories than that in early zones (19,20) and that levels of DNA damage may be elevated in paused regions for replication-fork movement The positions of genes with known functions on 21q (12) are (21–23) (the mechanism is discussed later). indicated in Figure 2. Genes, but not pseudogenes, were found to be densely clustered in early replication zones. The histograms Characteristic locations of oncogenes and tumor in Figure 4B show that the gene densities in the S1 and S2 suppressor genes zones on both 21q and 11q are higher than those in the S3 and S4 zones and the early/late transition zones. The gene density We then investigated correlation of replication timing and on 11q was lower than that on 21q and this may be due to the localization of disease-related genes. Cancer-related genes fact that the genomic sequence of 11q is incomplete (13). We were examined initially because 226 genes have been then examined correlation of replication timing with other compiled in the Tumor Suppressor/Oncogenes file in the genome characteristics that have biomedical significance. Curated Gene Lists of the Cancer Genome Anatomy Project Distributions of SNPs compiled by the SNP Consortium (http:// (http://cgap.nci.nih.gov/Genes/CuratedGeneLists) and there- snp.cshl.org/) are shown in Figures 2 and 3. In general, SNP fore, synthetic information for the oncogenes/tumor suppressor density in late-replicating zones appeared to be higher than that genes is available. Fifteen genes that have been mapped in early zones, and interestingly, major peaks of SNP precisely on 21q and 11q were found in the list (Table 1). To frequency were often located in or near replication-transition show their genomic positions, all genes are listed above replic- regions especially belonging to late-replicating zones. Statis- ation-timing graphs on Figures 2 and 3. It should be noted that tical analysis of SNP density on 21q showed that the SNP these genes were located primarily in or near replication- density in the S3 and S4 zones was higher than that in the S1 transition regions. Eight of the 15 genes were located in early/ and S2 zones and the density in the S3/S4 transition was late transitions (indicated in bold in Table 1), and these Human Molecular Genetics, 2002, Vol. 11, No. 1 17

Figure 3. Replication timing and GC% and SNP distributions on chromosome 11q. (A) GC% distribution of the 11q sequence obtained from http:// genome.cse.ucsc.edu/ was calculated with a 500 kb window and a 50 kb step. Undetermined bases (Ns) were omitted from the GC% calculation, but the nucleotide positions were not changed. If a total of N bases in a window exceeded a half of the window size, the data point was omitted and the neighboring data points were connected by a line. Early and late loci are denoted by colored ovals as described in Figure 2. Calculation of SNP frequency and assignments of chromosome band numbers to individual megabase-sized timing zones were done as described in Figure 2. Sequences in the subtelomeric region were found to replicate late even though their GC% levels were rather high. Positions of genes are not presented because those have been reported previously (13,32). (B) An ideogram of the 11q band-pattern reported by Saccone et al. (10) and Bernardi et al. (11). Red indicates bands composed mainly of the most GC-rich H3 isochores and thus correspond to T bands. Ordinary R bands are indicated in white. G bands composed primarily of AT-rich L isochores are indicated in black, and those composed of both L and H isochores are indicated in gray.

included CCND1, which encodes cyclin D1, FGF4, ETS1 and timing that could not be evaluated conclusively at the present FLI1. Oncogene ETS2 at an edge of the early/late transition level of resolution (Figs 2 and 3). and AML1 (RUNX1) in the S1/S2 transition are also indicated Recurrent chromosome aberrations found in cancer were in bold. Collectively, 10 of the 15 genes were located in or in also compiled by the Cancer Genome Anatomy Project (http:// the close vicinity of the replication-transition regions. It is cgap.nci.nih.gov/Chromosomes/RecurrentAberrationsTrans- worth noting that three of the remaining five genes appeared to locations). Translocation cases were observed for four be located near the small-sized local transition of replication oncogenes/tumor suppressor genes on 21q and 11q and high 18 Human Molecular Genetics, 2002, Vol. 11, No. 1

frequencies (>100 cases) were observed for the three genes located in replication-transition regions (Table 1; indicated in yellow on Figs 2 and 3).

Genes in replication-transition regions To further clarify the characteristics of the genes present in replication-transition regions, we searched for genes with known functions that were localized in early/late transition regions with the Human Genome Server ‘Ensembl’ (http:// www.ensembl.org/) and ‘HGREP’ (http://hgrep.ims.u- tokyo.ac.jp/). In addition to the eight oncogenes/tumor suppressor genes indicated in bold in Table 1, 44 genes were found in the early/late transition regions (Table 2). Twenty-one genes associated with well characterized diseases in OMIM are indicated in bold. To show their genomic positions, 11 examples are listed below the replication-timing graphs on Figures 2 and 3. These include APP, which is related to familial Alzheimer’s disease (AD1); GRIK1, which encodes the neuronal subunit GluR-5 that is related to juvenile absence epilepsy; SOD1, which is related to familial amyotrophic lateral sclerosis (ALS1); PTS, which is related to phenylketonuria III, IV; and three genes involved in lymphoma- or leukemia-associated gene fusion (NUMA1, API2, ARHGEF12).

DISCUSSION Twenty-nine of the 52 genes located in early/late transition regions were found to be associated with well characterized diseases: eight oncogenes/tumor suppressor genes indicated in bold in Table 1 and 21 genes indicated in bold in Table 2. Hypothetical genes predicted by computer analyses are not listed in Table 2. Disease-related genes might be found among these hypothetical genes. We also did not include genes located in the S1/S2 or S3/S4 transitions (e.g. MLL or USP25, respectively) or close to the early/late transition (e.g. ATM). At the present level of PCR-marker density, it is unclear whether these genes are localized within the transition region. Further high density mapping of replication timing may be necessary because oncogenes/tumor suppressor genes are often located in or near S1/S2 transition regions (Figs 2 and 3). Figure 4. Replication timing, GC% level and gene and SNP densities. (A) The It is unclear whether the present findings can be generalized GC% of BAC sequences (>100 kb) of 21q and 11q that contain individual to other chromosomes because this is the first chromosome- STSs were analyzed with a histogram for each replication-timing category. wide study of replication timing at the sequence level. (B) Gene density in four timing zones and early/late transition regions. Genes reported in Hattori et al. (12) and the International Human Genome Sequencing However, several replication-timing transitions have been Consortium (13) were analyzed. In defining the early/late transition region localized at the sequence level by megabase-sized measure- (and also in analyses in Tables 1 and 2), the intermediate region between the ments of replication timing. One transition was found in the edge STS of one transition region and the adjacent STS belonging to the neigh- junction region of MHC classes II and III where NOTCH4 is boring timing zone was excluded from the transition region and the timing zone. ′ (C) SNP density in four timing zones and replication-transition regions on located (16,17,24). Other transitions were found in the 3 end 21q. Number of SNP sites in each 10 kb sequence belonging to four timing of XIST (15) and near FMR1 (25) on chromosome X. An early/ categories and replication-transition regions was calculated separately and late transition was found in the mouse Igh locus (26). The means and SD values (small bars) are presented for individual categories. In respective genes have been associated with diseases in OMIM. this and the following histogram analysis, not only the early/late transitions It has been suggested that the probability of DNA damage, but also the S1/S2 and S3/S4 transitions were included, and each early/late transition was subdivided into the individual transitions (e.g. into the S1/S2 including DNA rearrangements, is higher for replication-tran- and S2/S3 transitions). The intermediate region between the edge STS of one sition regions than for other genome regions (21–23). Multiple transition region and the adjacent STS belonging to the neighboring timing origins belonging to a single early-replicating zone are known zone was included in the transition regions because of the operational to fire fairly synchronously at a specific time in early S phase, necessity to define the S1/S2- and S3/S4-transition regions. (D) SNP-density distribution in individual timing zones and transition regions on 21q. Distribu- and the individual DNA forks meet and merge with oncoming tion of SNP sites per 10 kb was analyzed by histogram for individual timing adjacent forks in this early zone typically within 1 h (2,27,28). categories. However, only the fork that starts at the edge of the early zone Human Molecular Genetics, 2002, Vol. 11, No. 1 19

Table 2. Genes in early/late transition regions

Chromosome Gene Gene product: function or disease 21q APP Amyloid A4 precursor: familial Alzheimer’s disease AD1 GRIK1 Glutamate receptor subunit, GluR-5: juvenile absence epilepsy JAE SOD1 Superoxide dismutase 1: familial amyotrophic lateral sclerosis ALS1 CTBP2 C-terminal binding 2 SYNJ1 Synaptojanin 1 B3GALT5 β-3-Galactosyltransferase 5 PCP4 Purkinje cell protein 4 DSCAM Down syndrome cell adhesion molecule MX1, 2 Interferon-induced GTPase MxA, B TMPRSS2 Transmembrane serine protease ABCG1 ATP-binding cassette transporter 8 11q CTNND1 Cadherin-associated Src substrate OSBP Oxysterol binding protein IGHMBP2 Spinal muscular atrophy with respiratory distress type 1 DHCR7 7-Dehydrocholesterol reductase: Smith–Lemli–Opitz syndrome SLOS KRN1 Keratin, cuticle, ultrahigh sulfur 1 IL18BP Interleukin 18 binding protein NUMA1 Nuclear mitotic apparatus protein 1: APL (NUMA1–RARA fusion) FOLR1 Folate receptor 1 (adult) THRSP Thyroid hormone responsive SPOT14: breast tumor metabolism NDUFC2 NADH-ubiquinone oxidoreductase 1, subunit C2 PRCP Prolylcarboxypeptidase (angiotensinase C) FZD4 -4 MTNR1B receptor 1B MTMR2 Myotubularin related protein 2: Charcot–Marie–Tooth disease, type 4B API2 Apoptosis inhibitor 2: MALT lymphoma (API2–MALT1 fusion) API1 Apoptosis inhibitor 1 MMP cluster Matrix metalloproteinase 7, 20, 10, 1, 3, 12, 13: tumor metastasis PTS 6-Pyruvoyltetrahydropterin synthase: phenylketonuria III, VI MCAM Melanoma cell adhesion molecule USP2 Ubiquitin-specific protease 2 THY1 Thy-1 cell surface antigen PVRL1 Poliovirus receptor-related 1: Margarita island ectodermal dysplasia ATDC Ataxia-telangiectasia group D complementing protein ARHGEF12 Rho guanine exchange factor 12: AML (MLL–ARHGEF12 fusion) KCNJ1 Potassium channel ROM-K: Bartter syndrome type 2 KCNJ5 Cardiac ATP-sensitive potassium channel

Genes in early/late transition regions, except for oncogenes and tumor suppressor genes (shown in bold in Table 1), are listed. Genes related to well characterized diseases are shown in bold. is predicted to continue replicating for a period of a few the present finding that disease-related genes tend to be located additional hours or pause at specific sites in the replication- in or in the close vicinity of the transition regions. transition region until it meets the edge fork initiated in the It has been shown that early-replicating zones are composed adjacent later-replicating zone (2,17,26). A pause during primarily of loose chromatin structures and that late zones replication is known to increase DNA breaks and rearrange- comprise compact chromatin in metaphase or interphase nuclei ments (21–23). These characteristics may be associated with (1,4,5,7). Therefore, transition of chromatin compaction likely 20 Human Molecular Genetics, 2002, Vol. 11, No. 1 occurs within replication-transition regions. In terminally followed by a 2 h incubation at 50°C. To provide controls for differentiated cells, such as neurons, the level of chromatin recovery of BrdU-labeled DNA, a mixture of [14C]thymidine- compaction that is established during the final round of DNA labeled (5 × 104 d.p.m.) Chinese hamster ovary cell (CHO) replication may be maintained. In the early/late transition DNA and BrdU- and [3H]deoxycytidine-labeled (5 × 104 d.p.m.) regions, four neural disease genes (APP, GRIK1, SOD1 and CHO DNA was added to each fraction. The DNA samples were DSCAM) were found (Table 2). Interestingly, the 5′ ends of all purified by phenol/chloroform extraction and ethanol precipita- four of these genes were found to replicate later than did the 3′ tion and dissolved in 460 µl of TE containing sheared salmon ends. For example, the 5′ end of APP replicated in S3, but the testis DNA (0.5 mg/ml); they were then sonicated to obtain 3′ end, which is located 290 kb downstream, replicated in S2. fragments with an average size of ∼1 kb. These fragments were This may reflect tissue-specific expression of these genes and heat-denatured for 3 min at 95°C and cooled on ice. After addition suggests that the chromatin is more compact at the 5′ end. The of 56 µl of 10× immunoprecipitation buffer (14) and 80 µl of transition of chromatin compaction within a gene might lead to anti-BrdU mouse monoclonal antibody (25 µg/ml), the samples reduced stability and increased susceptibility of the chromatin were incubated at room temperature for 20 min under constant to various reagents in the gene locus including regulatory rotation. Antibody-bound BrdU DNA was precipitated by regions for . addition of 15 µl of 2.5 mg/ml anti-mouse IgG, and the mixture Cytogenetic data have shown that the replication-banding was incubated for 20 min at room temperature. After centrifugation, pattern and the R- and G-banding patterns are tissue invariant the pellet was washed with 1× immunoprecipitation buffer, (3–5). This, and the global correlation of replication timing resuspended in 200 µl of digestion buffer (14), and incubated with GC% level, suggest that the megabase-sized early- or overnight at 37°C. An additional 100 µl of digestion buffer was late-replicating domains should be common among cell types. added, followed by overnight incubation at 37°C. The sample Despite the tissue-related invariance of the megabase-sized was subjected to phenol/chloroform extraction and, after addition banding patterns, replicon-sized segments that harbor tissue- of 20 µg yeast tRNA, DNA was precipitated with ethanol and specifically expressed genes are known to replicate early in the dissolved in TE. The recovery and purity level of the BrdU respective tissues even if the segment is located within a late- DNA was checked by monitoring the radioactivity of the [3H] replicating G-band zone (4,5,29). Therefore, if the edge and [14C] counts of the CHO DNA (15,17). BrdU-labeled DNAs replicon in a G-band zone contains a tissue-specific gene, the isolated from five independent FACS sorts were used and each edge segment may replicate early in that tissue. In such a case, DNA sample was tested in advance for two X-chromosome genes the exact area of the replication-transition is cell-type F9 and PGK1, for which replication timing was reported dependent (25,26). If such a situation exists, it does not limit previously (14). Bar diagrams in Figure 1C summarize the the usefulness of the present chromosome-wide approach, quantification data of the two genes testing the five because the hypothesized dependence represents an important independent DNA samples. functional characteristic of the genomic region. Information acquired through genome-wide studies of replication timing PCR-based quantification of replicated DNA will contribute substantially to our understanding of genome structure and function and of the molecular mechanisms in the Replication timing of a particular locus was determined by pathogenesis of diseases. It may also serve as an efficient quantifying the number of copies of that locus in the newly strategy for identifying disease-related genes. replicated DNA of each fraction with quantitative PCR (14,15). Locus-specific primers were selected based on the criterion that a single PCR product of the predicted size was MATERIALS AND METHODS amplified from THP-1 genomic DNA but not from CHO DNA. Primer pairs for two different loci were added to a single tube Cell-cycle fractionation and isolation of newly replicated containing a constant amount of BrdU-labeled DNA from each DNA cell-cycle fraction, which had been calibrated on the basis of Human acute monocytic leukemia cell line (THP-1; 2n = 46, the [3H] count as described previously (15,17). In addition to XY; obtained from Health Science Research Resources Bank, locus-specific primers, each reaction contained a constant Tokyo, Japan) was chosen because of the integrity of the amount of plasmid pKF3 DNA and pKF3-specific primers for karyotype (30); the integrities of chromosomes 11 and 21 were assessment of the PCR efficiency (15). The reaction buffer confirmed prior to this study (31). THP-1 cells were labeled for contained 0.5 U AmpliTaq Gold (Perkin-Elmer, Branchburg, 60 min with 75 µM 5′-bromo-2′-deoxyuridine (BrdU; NJ) in 100 µl of the manufacturer’s reaction buffer. The PCR Boehringer Mannheim, Germany). The BrdU labeling, cell- conditions were as follows: 95°C for 9 min to activate cycle fractionation, and isolation of newly replicated DNA AmpliTaq Gold; 32 cycles of 94°C for 30 s, 55°C for 1 min, were performed according to the methods described by Hansen 72°C for 1 min and a final extension at 72°C for 10 min. Gel et al. (14) but with minor modifications (15). The BrdU-labeled electrophoresis, and FluorImager quantification were cells were washed with cold phosphate-buffered saline, fixed performed as described previously (15). Replication timing of in 70% ethanol for 60 min, pelleted and resuspended in 70% each locus was assigned after quantification of at least three ethanol at 4°C; they were then resuspended in staining buffer independent PCR products from reactions in which annealing (14) followed by a 30 min incubation at room temperature. The temperature, ratio of BrdU-labeled DNAs to pKF3 DNA, and cells were fractionated into six parts on the basis of the cell-cycle pairs of locus-specific primers were altered. Unusual cases phase, G1, S1–S4 and G2/M, using an EPICS Elite cell sorter where a certain combination of primer sets brought on PCR (Beckman Coulter, Fullerton, CA). Equal numbers of cells (4 × 104) products with unexpected sizes, additional bands or bands of were collected in microfuge tubes containing lysis buffer (14) evidently unequal intensity, were omitted from the analysis. Human Molecular Genetics, 2002, Vol. 11, No. 1 21

For each locus, at least two independent DNAs isolated 16. Fukagawa,T. , Sugaya,K., Matumoto,K., Okumura,K., Ando,A., Inoko,H. from independent FACS sorts were used. Sequences and and Ikemura,T. (1995) A boundary of long-range G+C% mosaic domains in the human MHC locus: pseudoautosomal boundary-like sequence positions of the locus-specific primers used in this study and exists near the boundary. Genomics, 25, 184–191. the timing of each locus will be published on the web 17. Tenzen,T., Yamagata,T., Fukagawa,T., Sugaya,K., Ando,A., Inoko,H., pages (http://spinner.lab.nig.ac.jp/lab_evol/home-e.html; http:// Gojobori,T., Fujiyama,A., Okumura,K. and Ikemura,T. (1997) Precise hgp.gsc.riken.go.jp/). switching of DNA replication timing in the GC content transition area in the human major histocompatibility complex. Mol. Cell. Biol., 17, 4043–4050. 18. Strehl,S., LaSalle,J.M. and Lalande,M. (1997) High-resolution analysis of ACKNOWLEDGEMENTS DNA replication domain organization across an R/G-band boundary. Mol. Cell. Biol., 17, 6157–6166. This work was supported by a Grant-in-Aid for Scientific 19. Filipski,J. (1990) Evolution of DNA sequence contributions of mutational Research on Priority Areas (Human Genome Project) and by a bias and selection to the origin of chromosomal compartments. In Obe,G. Grant-in-Aid for Scientific Research from Ministry of (ed.), Advances in Mutagenesis Research 2. Springer-Verlag, Berlin Education, Science, and Culture of Japan. We thank Dr Mizuki Heidelberg, Germany, pp. 1–54. 20. Deschanne,P. and Filipski,J. (1995) Correlation of GC content with Ohno (National Institute of Genetics; Kyusyu University) for replication timing and repair mechanisms in weakly expressed E.coli karyotype analysis of THP-1 cells. genes. Nucleic Acids Res., 23, 1350–1353. 21. Bierne,H. and Michel,B. (1994) When replication forks stop. Mol. Microbiol., 13, 17–23. REFERENCES 22. Verbovaia,L.V. and Razin,S.V. (1997) Mapping of replication origins and 1. Craig,J.M. and Bickmore,W.A. (1993) Chromosome bands—flavours to termination sites in the Duchenne muscular dystrophy gene. Genomics, savour. Bioessays, 15, 349–354. 45, 24–30. 2. Berezney,R., Dubey,D.D. and Huberman,J.A. (2000) Heterogeneity of 23. Rothstein,R., Michel,B. and Gangloff,S. (2000) Replication fork pausing eukaryotic replicons, replicon clusters, and replication foci. Chromosoma, and recombination or ‘gimme a break’. Genes Dev., 14, 1–14. 108, 471–484. 24. Sugaya,K., Fukagawa,T., Matsumoto,K., Mita,K., Takahashi,E., 3. Drouin,R., Holmquist,G.P. and Richer,C.L. (1994) High-resolution Ando,A., Inoko,H. and Ikemura,T. (1994) Three genes in the human MHC replication bands compared with morphologic G- and R-bands. class III region near the junction with the class II: gene for receptor of Adv. Hum. Genet., 22, 47–115. advanced glycosylation end products, PBX2 homeobox gene and a 4. Holmquist,G., Gray,M., Porter,T. and Jordan,J. (1982) Characterization of Notch-homolog, human counterpart of mouse mammary tumor gene int-3. Giemsa dark- and light-band DNA. Cell, 31, 121–129. Genomics, 23, 408–419. 5. Holmquist,G.P. (1989) Evolution of chromosome bands: molecular 25. Hansen,R.S., Canfield,T.K., Fjeld,A.D., Mumm,S., Laird,C.D. and ecology of noncoding DNA. J. Mol. Evol., 28, 469–486. Gartler,S.M. (1997) A variable domain of delayed replication in FRAXA 6. Ikemura,T. and Aota,S. (1988) Global variation in G+C content along fragile X chromosomes: X inactivation-like spread of late replication. vertebrate genome DNA: possible correlation with chromosome band Proc. Natl Acad. Sci. USA, 94, 4587–4592. structures. J. Mol. Biol., 203, 1–13. 26. Ermakova,O.V., Nguyen,L.H., Little,R.D., Chevillard,C., Riblet,R., 7. Bernardi,G. (1989) The isochore organization of the human genome. Ashouian,N., Birshtein,B.K. and Schildkraut,C.L. (1999) Evidence that a Annu. Rev. Genet., 23, 637–661. single replication fork proceeds from early to late replicating domains in 8. Ikemura,T., Wada,K. and Aota,S. (1990) Giant G+C% mosaic structures the Igh locus in a non-B cell line. Mol. Cell, 3, 321–330. of the human genome found by arrangement of GeneBank human DNA 27. Jackson,D.A. and Pombo,A. (1998) Replicon clusters are stable units of sequences according to genetic positions. Genomics, 8, 207–216. chromosome structure: evidence that nuclear organization contributes to 9. Ikemura,T. and Wada,K. (1991) Evident diversity of codon usage patterns the efficient activation and propagation of S phase in human cells. of human genes with respect to chromosome banding patterns and J. Cell Biol., 140, 1285–1295. chromosome numbers; relation between nucleotide sequence data and 28. Ma,H., Samarabandu,J., Devdhar,R.S., Acharya,R., Cheng,P., Meng,C. cytogenetic data. Nucleic Acids Res., 19, 4333–4339. and Berezney,R. (1998) Spatial and temporal dynamics of DNA 10. Saccone,S., Federico,C., Solovei,I., Croquette,M.F., Valle,G.D. and replication sites in mammalian cells. J. Cell Biol., 143, 1415–1425. Bernardi,G. (1999) Identification of the gene-richest bands in human prometaphase chromosomes. Chromosome Res., 7, 379–386. 29. Goldman,M.A., Holmquist,G.P., Gray,M.C., Caston,L.A. and Nag,A. 11. Bernardi,G. (2000) Isochores and the evolutionary genomics of (1984) Replication timing of genes and middle repetitive sequences. vertebrates: organization of the human genome. Gene, 241, 3–17. Science, 224, 686–692. 12. Hattori,M., Fujiyama,A., Taylor,T.D., Watanabe,H., Yada,T., Park,H.S., 30. Tsuchiya,S., Yamabe,M., Yamaguchi,Y., Kobayashi,Y., Konno,T and Toyoda,A., Ishii,K., Totoki,Y., Choi,D.K. et al. (2000) The DNA Tada,K. (1980) Establishment and characterization of a human acute sequence of human . Nature, 405, 311–319. monocytic leukemia cell line (THP-1). Int. J. Cancer, 26, 171–176. 13. International Human Genome Sequencing Consortium (2001) Initial 31. Ohno,M., Tenzen,T., Watanabe,Y., Yamagata,T., Kanaya,S. and Ikemura,T. sequencing and analysis of the human genome. Nature, 409, 860–921. (2000) Non-B DNA structures spatially and sequence-specifically associated 14. Hansen,R.S., Canfield,T.K., Lamb,M.M., Gartler,S.M. and Laird,C.D. with individual centromeres in the human interphase nucleus. In (1993) Association of fragile X syndrome with delayed replication of the Olmo,E. and Redi,C.A. (eds), Chromosomes Today. Birkhauser Verlag, FMR1 gene. Cell, 73, 1403–1409. Basel, Boston, Berlin, Vol. 13, pp. 57–69. 15. Watanabe,Y., Tenzen,T., Nagasaka,Y., Inoko,H. and Ikemura,T. (2000) 32. Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Replication timing of the human X-inactivation center (XIC) region: Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) correlation with chromosome bands. Gene, 252, 163–172. The sequence of the human genome. Science, 291, 1304–1351. 22 Human Molecular Genetics, 2002, Vol. 11, No. 1